Refying PDF with subset embedded fonts fixes text extraction
Hi All,
I know it is not a good idea to (just) refry PDF files (PDF -> EPS -> PDF). Especially when the PDF contains subset embedded fonts. Chances are you will end up with a PDF file which does not contain valid (searchable) text.
I did not know the apposite could also be true. The following zip file contains 2 PDF files echo containing two words: the original and the refried version.
Refried.zip
When selecting text from the original PDF (using acrobat 6 through X) file it contains incorrect text, in this case invalid capitals. If I try the same in the refried version the extracted text is correct.
It seems strange to me that a process which only can result in loss of information "fixes" this text issue. Somewhere the correct text must be hidden in the original PDF file. Not only capitals seem to be effected but also random characters which seem to be fixed once refried.
Could anyone think of an explanation?
Is there a workaround without having to refry the PDF (refrying often results in loss of information). I have no influence on the PDF files I recieve, therefore I cannot embed the full fonts.
I am using de C++ SDK for Acrobat to write plugins.
Any pointers would be great!
Kind regards,
Robert
Thanks again for your reply,
Your explanation makes sense.
I went ahead and removed the tounicode cmap just to see what would happen
if (CosDictKnown (cosFont, ASAtomFromString ("ToUnicode")))
CosDictRemove (cosFont,ASAtomFromString ("ToUnicode"));
As you predicted this fixes some issues and introduces new ones.
The results differed from the refry method, in some cases the refried PDF did not contain extractable text, in other cases the PDF without "ToUnicode Cmap" had no extractable text.
Maybe I could combine the information of different text extraction methods to make an educated gues which one (or combination of) is best :S
I suppose looking at individual textruns (with all its complexity) would not help me either...
Kind regards,
Robert
Similar Messages
-
I am on a Windows 7 OS attempting to reduce pdf size with my Adobe Acrobat Standard XI & Pro. The application keeps timing out and at the Subsetting embedded fonts portion and the application gives "Adobe Acrobat has stopped working" and then closes. The document is 275 pages. Is there something I can do to stop this?
Hi Ricci,
Since when are you facing this issue? Did you tried system restore to a date before this problem occured.
Does acrobat stop working when you open this specific pdf file or with any pdf file that you open?
Regards,
Rahul -
Problem with PDF export and embedded font (characters disappear)
Designer: Crystal Reports 2008 SP 2
Engine: CR4E 2.0 SP2 (runtime_12.2.203)
Hi there!
we found a problem in the pdf export. It seems like there would be a problem with the embedded fonts, the problem is as follows:
Rpt file with, for example only a text box which contains the german string " Änderungs Schlüssel ".
Export the Rpt file with CR4E to a pdf file.
When we open the pdf file in Adope Reader 8, the text appears to be correct,
but if we print the PDF file from the Adope Reader, the text changes to " nderungs Schl sselu201C,
here we are missing ther german umlaute.
When we open the file for example with an alternative PDF reader like Foxit Reader, there they are also missing.
After i found some posts here in the forum, there are people facing the same problem, since i couldn't find a solution in the forum, we build a little workaround for it that works for us.
For all of you that have the same problem here the workaround:
We used the IText JAVA library, this jar can can help as to fix the PDF file so the text is displayed correctly.
Here the code:
ReportClientDocument doc = new ReportClientDocument();
doc.setReportAppServer(ReportClientDocument.inprocConnectionString);
doc.open("C:\XY.rpt", OpenReportOptions._openAsReadOnly);
//... database logon,.....
InputStream inputStream = doc.getPrintOutputController().export(ReportExportFormat.PDF);
inputStream = PDFHealer.heal(inputStream);
//... write the stream some whereThe helper class using IText:
import java.io.ByteArrayInputStream;
import java.io.ByteArrayOutputStream;
import java.io.IOException;
import java.io.InputStream;
import com.lowagie.text.Document;
import com.lowagie.text.DocumentException;
import com.lowagie.text.pdf.PdfContentByte;
import com.lowagie.text.pdf.PdfImportedPage;
import com.lowagie.text.pdf.PdfReader;
import com.lowagie.text.pdf.PdfWriter;
public class PDFHealer
public static InputStream heal(InputStream in) throws DocumentException, IOException
try
ByteArrayOutputStream out = new ByteArrayOutputStream();
PdfReader reader = new PdfReader(in);
// we retrieve the total number of pages
int n = reader.getNumberOfPages();
// step 1: creation of a document-object
Document document = new Document();
// step 2: we create a writer that listens to the document
PdfWriter writer = PdfWriter.getInstance(document, out);
// step 3: we open the document
document.open();
// step 4: we add content
PdfContentByte cb = writer.getDirectContent();
int i = 0;
while( i < n )
document.newPage();
i++;
PdfImportedPage page1 = writer.getImportedPage(reader, i);
cb.addTemplate(page1, 0, 0);
// step 5: we close the document
document.close();
ByteArrayInputStream ret = new ByteArrayInputStream(out.toByteArray());
out.close();
return ret;
finally
in.close(); -
I need to embed a standard font (such as Times New Roman) and make it so that the document opens with this font presented. I see information on embedding fonts, but nothing related to the second part of my question. Thanks.
The second part is automatic. If you manage to embed a font (for example with Preflight in Acrobat Pro), then a PDF viewer should use that. In some cases a locally installed font of the SAME NAME might take preference, which is only a difficulty if a private edited font is used.
-
Text display issues with htmlText, Embedded Font
Hey All,
I'm having an issue with the display of my hyperlinks in a
textfield that is using embedded fonts. It offsets the hyperlinks
to the left along the line they are on and the underline doesn't
stretch all the way under the text field. The text display normally
when I don't embed the font. For some reason I think this might
have to do with the embedded character range so I opened it up.
Does anyone know if this range is enough or if there are special
characters flash uses that need to be embedded for their width even
though they arn't displayed.
Embedded range:
[Embed(source='MyriadPro-Regular.otf', fontName='Myriad Pro',
unicodeRange='U+0000-U+00fe')]
Here's a link to the related code:
http://pcpnew.privatepaste.com/a7eKaAuCTtAny ideas?
-
Using HTML text with an embedded font in Flex 4
I have spent a day searching the interwebs and have not found a working example of how to use an embedded font with html formatting.
Anybody know if it can even be done??Should be doable, but all fonts used in the html have to be embedded, and
one of the fonts should be specified as the fontFamily for the component. -
How to set fontFamily with an embedded font of a textFlow ?
Hy,
When I create a TextFlow without use any component of the flex SDK (4.0.13827) and then I try to change or apply a FontFamily of an embedded font, it doesn't work. Whereas when I use a component like RichEditableText or Label, it works.
Bellow the code I wrote for my test :
<?xml version="1.0" encoding="utf-8"?>
<s:WindowedApplication xmlns:fx="http://ns.adobe.com/mxml/2009"
xmlns:s="library://ns.adobe.com/flex/spark"
xmlns:mx="library://ns.adobe.com/flex/mx"
creationComplete="creationCompleteHandler(event)"
width="800" height="600"
>
<fx:Style>
@namespace s "library://ns.adobe.com/flex/spark";
@namespace mx "library://ns.adobe.com/flex/mx";
@namespace local "*";
@font-face {
src: url("assets/Fonts/arial.ttf");
fontFamily: ArialEmbedded;
advandedAntiAliasing: true;
cff: true;
unicodeRange: U+0020-U+002F,U+0030-U+0039,U+003A-U+0040,U+0041-U+005A,U+005B-U+0060,U+0061-U+007A,U+007B-U+007E,U+00A1-U+00FF,U+2000-U+206F,U+20A0-U+20CF,U+2100-U+2183;
@font-face {
src: url("assets/Fonts/cour.ttf");
fontFamily: CourierEmbedded;
advandedAntiAliasing: true;
cff: true;
unicodeRange: U+0020-U+002F,U+0030-U+0039,U+003A-U+0040,U+0041-U+005A,U+005B-U+0060,U+0061-U+007A,U+007B-U+007E,U+00A1-U+00FF,U+2000-U+206F,U+20A0-U+20CF,U+2100-U+2183;
s|WindowedApplication
</fx:Style>
<fx:Script>
<![CDATA[
import flash.text.Font;
import flash.text.engine.FontLookup;
import flashx.textLayout.container.ContainerController;
import flashx.textLayout.conversion.TextConverter;
import flashx.textLayout.edit.EditManager;
import flashx.textLayout.edit.IEditManager;
import flashx.textLayout.elements.TextFlow;
import flashx.textLayout.events.SelectionEvent;
import flashx.textLayout.formats.TextLayoutFormat;
import flashx.undo.UndoManager;
import mx.collections.ArrayCollection;
import mx.events.FlexEvent;
import spark.core.SpriteVisualElement;
import spark.events.IndexChangeEvent;
private var dynTextFlow : TextFlow;
private var ctTextFlow : TextFlow;
protected function creationCompleteHandler(event:FlexEvent):void
controlBarVisible=false;
dynTextFlow = TextConverter.importToFlow("Hello World", TextConverter.PLAIN_TEXT_FORMAT);
drawTextBloc(dynTextFlow);
dynTextFlow.addEventListener(SelectionEvent.SELECTION_CHANGE, selectionChangeListener);
dynTextFlow.fontFamily = "ArialEmbedded";
dynTextFlow.fontLookup = FontLookup.EMBEDDED_CFF;
dynTextFlow.fontSize = 24;
dynTextFlow.interactionManager = new EditManager(new UndoManager());
dynTextFlow.flowComposer.updateAllControllers();
dynTextFlow.invalidateAllFormats();
dynTextFlow.flowComposer.updateAllControllers();
protected function cbFont_creationCompleteHandler(event:FlexEvent):void
var fonts:ArrayCollection=new ArrayCollection(Font.enumerateFonts());
cbFont.dataProvider=fonts;
protected function cbFont_changeHandler(event:IndexChangeEvent):void
var cf : TextLayoutFormat = new TextLayoutFormat();
cf.fontLookup = FontLookup.EMBEDDED_CFF;
cf.fontFamily = ComboBox(event.currentTarget).selectedItem.fontName;
IEditManager(ctTextFlow.interactionManager).applyLeafFormat(cf);
ctTextFlow.interactionManager.setFocus();
private function drawTextBloc(txt : TextFlow) : void
var container : SpriteVisualElement = new SpriteVisualElement();
var controller : ContainerController = new ContainerController(container, 300, 200);
addElement(container);
txt.fontLookup = FontLookup.EMBEDDED_CFF;
txt.fontFamily = "ArialEmbedded";
txt.flowComposer.addController(controller);
private function selectionChangeListener(event : SelectionEvent) : void
ctTextFlow = event.currentTarget as TextFlow;
protected function txt_selectionChangeHandler(event:FlexEvent):void
ctTextFlow = (event.currentTarget as RichEditableText).textFlow;
]]>
</fx:Script>
<fx:Declarations>
<!-- Place non-visual elements (e.g., services, value objects) here -->
</fx:Declarations>
<s:layout>
<s:VerticalLayout paddingLeft="10" paddingTop="10"/>
</s:layout>
<s:RichEditableText x="10"
y="10"
selectionChange="txt_selectionChangeHandler(event)"
paddingTop="5" paddingLeft="5" paddingRight="5" paddingBottom="5"
id="txt"
fontFamily="CourierEmbedded"
text="RichEditableText"
height="200"
width="350"/>
<s:ComboBox id="cbFont"
labelField="fontName"
creationComplete="cbFont_creationCompleteHandler(event)"
change="cbFont_changeHandler(event)"
/>
<s:Label text="TEST" fontFamily="CourierEmbedded" fontSize="22" rotation="45"/>
</s:WindowedApplication>
Please, help me...
Thank you very much...Thank you very much,
I finally found the solution using the swfContext :
use namespace mx_internal;
myTextFlow.swfContext = ISWFContext(getFontContext("myFontName", false, false, FontLookup.EMBEDDED_CFF));
It works fine both with a dynamic component like RichEditableText or a dynamic textflow with a ContainerController.
Thanks -
External Embedded Fonts, Dynamic Text Fields, Latest?
Hi all,
I'm stuck in Flash8 land. Mostly because I use mProjector and
MDM Zinc to extend flash projectors and neither support AS3
correctly to date. That said..
Is a Flash8 SWF capable of using fonts embedded in "other"
SWFs? I realize this is an ongoing difficulty and a well known area
of confusion and well, I'm confused. All the attempts I've made so
far in linking have succeeded or failed in various ways, but never
fully work.
What I'd love to do is use dynamic text fields populated by
data from a database (or XML file), with CSS styling, using fonts
that are embedded in a 'master font SWF'. (and I'd like ice cream
with that too!)
I import fonts into, say, 'shared.fla'. I set them all up for
exporting via linkage (to shared.swf). I open up my other FLAs
(say, main.fla) and I drag the fonts from the shared.fla library
into main.fla's library. In looking at the linkage, I see it
properly set it to Import for Runtime Sharing (shared.swf). I can
see the font available in the main.fla font list and can select it
and use it just fine. I have to set the dynamic text field to embed
fonts to actually see them (and also
myTextField.setStyle("embedFonts",true)).
Now all that works well and fine, but the kicker is when I
want CSS to style my text. If I specify a embedded font linkage
identifyer in CSS, the text disappears. i.e. I load 'style.css' and
I have h1 { font-family: someEmbeddedFont; }, the <h1> text
will now disappear.
Any clues in how I can specify a font to use in CSS that's
embedded so it'll work, WITHOUT This font needed to be embedded in
the actual library (as in, not a linked asset)? Because it works
fine if I embed the font into every single SWF. But when I try to
use it as a shared asset, this doesn't work.
Any ideas on how someone can achieve this?
This is so I can changed my shared.swf and supply all new
fonts with the same linkage identifiers and change the font in a
whole project without re-exporting any other SWFs.
Thanks for any info!I guess it's amazing but I honestly, wholly cannot get this
to work.
I made a new AS2 Flash 8 FLA (Forte.fla) with only the Forte
font with a size of 22 in the library. The name of the library
element was Forte. The linkage was set to "Export for Actionscript"
and "Export in first frame".
I made another FLA (main.fla) Flash8 AS2. In actionscript I
created an empty movie clip named "Asset_Forte" at the next highest
depth and Asset_Forte.loadMovie("Forte.swf"). I also made a dynamic
text field on the stage of this main.fla document and set it to
Arial 22pt (no bold or italics, etc). I did not embed anything into
it.
I made a TextTormat object (my_fmt) and set my_fmt.font =
"Forte";. I put some text in the dynamic text field to start so I
just applied the formatting (status_text.setTextFormat(my_fmt);).
This did not work. I started adjusting random things like
naming the font in the Forte.swf library to Forte22 and tried
my_fmt.font = "Forte22";. That didn't work.
I adjusted the linkage to "Export for runtime sharing" and
specified Forte.swf as the SWF to share from. This did not work.
I then dragged the font from the Forte.fla's library (while
"Export for runtime sharing" was enabled) into the library of
main.fla. I checked the link and it was proper, "Import for runtime
sharing, Forte.swf". I used both linkage attempts again (Forte and
Forte22) with my_fmt.font and neither worked.
Would it be possible at all to get a couple FLAs from you
that examplifies how you do this particular trick? I can't seem to
get the settings right. I would be indebted to you! -
Create a pdf with all the fonts embed in indesign CS6
Hi,
I created a ps from Indesign CS6 & CS7 with all fonts embed,
and drop it on to distiller and made a PDF. Where upon font seem to split into outline, but random as done files for this job and no dramas at all using same fonts/indesign and distiller setting.
any ideas to resolve this?Hi Evoteam,
Please check the Distiller PDF settings and make sure Embed fonts option has been checked.
Hope this helps.
Regards,
Sumit Singh -
Name Exported PDFs with a filename generated from text frame on page
Does anyone have an idea how to do the following? We are on InDesign CS5 on a Mac running OSX 10.6
We are plan on creating a document of say 100 pages. On each page will be a photo of a product and some text frames. In one of the frames will be the product's SKU code - which we will enter manually.
What I want is that we can then export each page as a separate PDF and the filename of each PDF will be taken from the SKU code present in the text frame with .pdf appended.
Has anyone done something like this?
ThanksAsk in the Scripting forum... InDesign Scripting
-
Appling Ext. Embedded Fonts to Text
Hi All,
I am spinning here. I believe I am missing the intermediate step of getting/creating a font obj to apply to the text format. The TLFTextField (instance name Label) was created on stage and has _typewriter font set. It is in movieclip instance mc and is rotated.
AS3 CS5 Player 10
The text is substituted but the Arial Unicode MS font is not applied.
import flash.display.MovieClip;
import com.greensock.*;
import com.greensock.loading.SWFLoader;
import com.greensock.events.LoaderEvent;
import fl.text.TLFTextField;
import flash.text.TextFormat;
import flash.text.Font;
var loader:SWFLoader = new SWFLoader("Code/FontTLF.swf", {onComplete:completeHandler});
loader.load();
function completeHandler(event:LoaderEvent):void {
var fontClass:Class = loader.getClass("FontClassName");
var fontArray:Array = Font.enumerateFonts(true);
var formatMC:TextFormat = new TextFormat();
for(var i:int = 0; i < fontArray.length; i++) {
var font:Font = fontArray[i];
if (font.fontName == "Arial Unicode MS") {
trace("name: " + font.fontName);//name: Arial Unicode MS
trace("typeface: " + font.fontStyle);//typeface: regular
formatMC.font = font.fontName;//I may need an intermediate step here??
break;
trace (formatMC.font);//Arial Unicode MS
mc.Label.text = "New S"; // Rotated MC with text displays as Times s/b Arial...
mc.Label.defaultTextFormat = formatMC;
mc.Label.embedFonts = true;
Thanks,
JimYes , I believe you are right but ...
Here's a common TLFText example I am referencing.
import fl.text.TLFTextField;
import flash.text.TextFieldAutoSize;
import flash.text.TextFormat;
var fmt:TextFormat = new TextFormat();
fmt.color = 0xFF0000; // red
fmt.font = "Arial"; fmt.size = 32;
var tlfTxt:TLFTextField = new TLFTextField();
tlfTxt.defaultTextFormat = fmt;
tlfTxt.border = true;
tlfTxt.text = "The quick brown fox jumps over the lazy dog.";
tlfTxt.wordWrap = true;
tlfTxt.width = 300;
tlfTxt.autoSize = TextFieldAutoSize.LEFT;
tlfTxt.x = tlfTxt.y = 40;
addChild(tlfTxt);
It also runs out sequence, but I am cleaning that up.
I am trying to use a method that allows me to use plain text strings w/o markup.
It a CD based captioned video player and uses a Director platform that has flash player 10.3 built in. 30 plus languages, some RTL and maybe a BiDi. -
Embedding Fonts breaks text display
I'm on Mac OS 10.5.6 (but saw this with earlier versions too)
using CS3 and/or CS4.
For work we publish back to Flash 5 player (don't ask) and
that is where I'm having the problem. If I put a dynamic text box
on the stage. Add some text. Embed the font. Change the publish
setting to 5. Publish I don't see the text. If the box is
selectable the player does show the I beam when I move over where
it should be, so it does know the text box is there.
I sent the same file to one of my coworkers and he can
publish it just fine.
We tried replacing the fonts on my machine with those from
his and that didn't solve the problem.
I've tried using both CS3 and CS4 and get the same results.
If I change the publish settings to Flash 6 everything works
like it should.
Any ideas?Tried replacing the Arrial Narrow font with a different one
from another coworker. Still no dice.
Tried repairing permissions. Nope
Tried making a new user and checking it there. Nope.
Moved the font from my Library to the system wide library.
Bupkiss. (Is that how it is spelled?) -
Is a scrollable text frame possible with hyperlinks embedded in the text?
I'm new to this whole digital design world. I have a document set up with several links, some active buttons and I am struggling with this scrollable text frame. I have the frame finally working but it only scrolls from the preview button in the overlay creator panel. From the Preview Window, preview on the Internet, or exported as a SWF everything but the scrolling frame. Anyone have any suggestions??
You can, but it's not very elegant. For one thing, whatever text you have will have to basically be on another webpage addressable by URL in the iframes expression. I guess it would work, though. It's just like having two pages, where maybe one would suffice, you know?
There are some javascript examples that I have come across that would probably work. The HTML tags seem to be unreliable....I have seen some that only work on IE and not for the mozilla based browsers.
Anyways, here's an example of a scrolling iframe....it's the messageboard from my Guestmap...
http://guestmap.dirtdoog.com
As you can see...it's not great, but it's workable. -
Convert MS Word to PDF with an embedded Excel File Q
I have an MS Word document that contains embedded Excel and Word files (double click these icons and you can view), how can I convert this MS Word document and still be able to view these embedded file when I convert to PDF? I'm doing the straiaght forward 'createPDF' using PDF maker, when it is finished creating the PDF you can obviously see these embedded images in the PDF file but you can't access them. Can this be done?
ThanksNo. Covert these other documents to pdf and attach them in the first pdf and attach them to the new pdf file.
-
Creating PDF with SWF embedded
Hello all!
I'm trying to build a PDF file which contains a Financial Statement, for example, where I could see my balance, withdraw, things paid, etc.
I have all the information in database and I'd like to know if I can achieve this by developing a process in LC workbench.
This process will catch a SWF flex based layout, merge with data from a database and then build a PDF file with dynamic features like data ordering.
The question is, how could I merge a layout built in Flash Builder with data inside a process in Workbench?
Thanks in advance.
DiegoYes, I have beeen doing that manually. I want to set up Acrobat to do this automatically for every PDF I generate. For example, I want every PDF to be automatically created to open with 100% magnification and the document title showing in the title bar instead of the file name. I don't want to have to do this manually. Is there some way to do this automatically when the PDF is generated?
Maybe you are looking for
-
How to display desktopt in 16x9 using a projector?
I have my macbook pro hooked up to a projector via vga. The projector is setup for widescreen 16x9.. Will my macbook pro show my desktop and programs in widescreen 16x9? How do I do this? Thank RD
-
Hi, my name is Kevin and I am a Iphone user in HK, early before I have reported to your support team regarding my Apple ID and credit card information has been stolen so in use, so your support team renew my Apple ID an change into a non-credit card(
-
Hello all, My main application retrieves data from a server(callresponders and such). I have a button that opens a popup. On this popup are two other buttons, one of which opens another popup. Now, I'd like to be able to fetch information about a sel
-
1. What is the most widely used ETL tool for data loading? 2.what are bridge tables? 3. explain opaque views 4. explain database Hints. 5. explain fragmented data.
-
Hello, The size of content folder in my WSUS upstream Server is 170GB, i am installing updates in one language only and 46 products. its showing low disk space now. i used server cleaning wizard but it cleared only 2GB. I thing somthing is going wron