Refying PDF with subset embedded fonts fixes text extraction

Hi All,
I know it is not a good idea to (just) refry PDF files (PDF -> EPS -> PDF). Especially when the PDF contains subset embedded fonts. Chances are you will end up with a PDF file which does not contain valid (searchable) text.
I did not know the apposite could also be true. The following zip file contains 2 PDF files echo containing two words: the original and the refried version.
Refried.zip
When selecting text from the original PDF (using acrobat 6 through X) file it contains incorrect text, in this case invalid capitals. If I try the same in the refried version the extracted text is correct.
It seems strange to me that a process which only can result in loss of information "fixes" this text issue. Somewhere the correct text must be hidden in the original PDF file. Not only capitals seem to be effected but also random characters which seem to be fixed once refried.
Could anyone think of an explanation?
Is there a workaround without having to refry the PDF (refrying often results in loss of information). I have no influence on the PDF files I recieve, therefore I cannot embed the full fonts.
I am using de C++ SDK for Acrobat to write plugins.
Any pointers would be great!
Kind regards,
Robert

Thanks again for your reply,
Your explanation makes sense.
I went ahead and removed the tounicode cmap just to see what would happen
       if (CosDictKnown (cosFont, ASAtomFromString ("ToUnicode")))
         CosDictRemove (cosFont,ASAtomFromString ("ToUnicode"));
As you predicted this fixes some issues and introduces new ones.
The results differed from the refry method, in some cases the refried PDF did not contain extractable  text, in other cases the PDF without "ToUnicode Cmap" had no extractable text.
Maybe I could combine the information of different text extraction methods to make an educated gues which one (or combination of) is best :S
I suppose looking at individual textruns (with all its complexity) would not help me either...
Kind regards,
Robert

Similar Messages

  • I am on a Windows 7 OS attempting to reduce pdf size with my Adobe Acrobat Standard XI & Pro.  The application keeps timing out and  at the Subsetting embedded fonts portion and the application gives "Adobe Acrobat has stopped working" and then closes.  T

    I am on a Windows 7 OS attempting to reduce pdf size with my Adobe Acrobat Standard XI & Pro.  The application keeps timing out and  at the Subsetting embedded fonts portion and the application gives "Adobe Acrobat has stopped working" and then closes.  The document is 275 pages.  Is there something I can do to stop this?

    Hi Ricci,
    Since when are you facing this issue? Did you tried system restore to a date before this problem occured.
    Does acrobat stop working when you open this specific pdf file or with any pdf file that you open?
    Regards,
    Rahul

  • Problem with PDF export and embedded font (characters disappear)

    Designer: Crystal Reports 2008 SP 2
    Engine: CR4E 2.0 SP2 (runtime_12.2.203)
    Hi there!
    we found a problem in the pdf export. It seems like there would be a problem with the embedded fonts, the problem is as follows:
    Rpt file with, for example only a text box which contains the german string " Änderungs Schlüssel ".
    Export the Rpt file with CR4E to a pdf file.
    When we open the pdf file in Adope Reader 8, the text appears to be correct,
    but if we print the PDF file from the Adope Reader, the text changes to " nderungs Schl sselu201C,
    here we are missing ther german umlaute.
    When we open the file for example with an alternative PDF reader like Foxit Reader, there they are also missing.
    After i found some posts here in the forum, there are people facing the same problem, since i couldn't find a solution in the forum, we build a little workaround for it that works for us.
    For all of you that have the same problem here the workaround:
    We used the IText JAVA library, this jar can can help as to fix the PDF file so the text is displayed correctly.
    Here the code:
    ReportClientDocument doc = new ReportClientDocument();
    doc.setReportAppServer(ReportClientDocument.inprocConnectionString);
    doc.open("C:\XY.rpt", OpenReportOptions._openAsReadOnly);
    //... database logon,.....
    InputStream inputStream = doc.getPrintOutputController().export(ReportExportFormat.PDF);
    inputStream = PDFHealer.heal(inputStream);
    //... write the stream some where

    The helper class using IText:
    import java.io.ByteArrayInputStream;
    import java.io.ByteArrayOutputStream;
    import java.io.IOException;
    import java.io.InputStream;
    import com.lowagie.text.Document;
    import com.lowagie.text.DocumentException;
    import com.lowagie.text.pdf.PdfContentByte;
    import com.lowagie.text.pdf.PdfImportedPage;
    import com.lowagie.text.pdf.PdfReader;
    import com.lowagie.text.pdf.PdfWriter;
    public class PDFHealer
       public static InputStream heal(InputStream in) throws DocumentException, IOException
          try
             ByteArrayOutputStream out = new ByteArrayOutputStream();
             PdfReader reader = new PdfReader(in);
             // we retrieve the total number of pages
             int n = reader.getNumberOfPages();
             // step 1: creation of a document-object
             Document document = new Document();
             // step 2: we create a writer that listens to the document
             PdfWriter writer = PdfWriter.getInstance(document, out);
             // step 3: we open the document
             document.open();
             // step 4: we add content
             PdfContentByte cb = writer.getDirectContent();
             int i = 0;
             while( i < n )
                document.newPage();
                i++;
                PdfImportedPage page1 = writer.getImportedPage(reader, i);
                cb.addTemplate(page1, 0, 0);
             // step 5: we close the document
             document.close();
             ByteArrayInputStream ret = new ByteArrayInputStream(out.toByteArray());
             out.close();
             return ret;
          finally
             in.close();

  • How can embed fonts into a PDF? Additionally, how can I make so that the PDF then opens with the embedded font?

    I need to embed a standard font (such as Times New Roman) and make it so that the document opens with this font presented. I see information on embedding fonts, but nothing related to the second part of my question. Thanks.

    The second part is automatic. If you manage to embed a font (for example with Preflight in Acrobat Pro), then a PDF viewer should use that. In some cases a locally installed font of the SAME NAME might take preference, which is only a difficulty if a private edited font is used.

  • Text display issues with htmlText, Embedded Font

    Hey All,
    I'm having an issue with the display of my hyperlinks in a
    textfield that is using embedded fonts. It offsets the hyperlinks
    to the left along the line they are on and the underline doesn't
    stretch all the way under the text field. The text display normally
    when I don't embed the font. For some reason I think this might
    have to do with the embedded character range so I opened it up.
    Does anyone know if this range is enough or if there are special
    characters flash uses that need to be embedded for their width even
    though they arn't displayed.
    Embedded range:
    [Embed(source='MyriadPro-Regular.otf', fontName='Myriad Pro',
    unicodeRange='U+0000-U+00fe')]
    Here's a link to the related code:
    http://pcpnew.privatepaste.com/a7eKaAuCTt

    Any ideas?

  • Using HTML text with an embedded font in Flex 4

    I have spent a day searching the interwebs and have not found a working example of how to use an embedded font with html formatting.
    Anybody know if it can even be done??

    Should be doable, but all fonts used in the html have to be embedded, and
    one of the fonts should be specified as the fontFamily for the component.

  • How to set fontFamily with an embedded font of a textFlow ?

    Hy,
    When I create a TextFlow without use any component of the flex SDK (4.0.13827) and then I try to change or apply a FontFamily of an embedded font, it doesn't work. Whereas when I use a component like RichEditableText or Label, it works.
    Bellow the code I wrote for my test :
    <?xml version="1.0" encoding="utf-8"?>
    <s:WindowedApplication xmlns:fx="http://ns.adobe.com/mxml/2009"
                                xmlns:s="library://ns.adobe.com/flex/spark"
                                xmlns:mx="library://ns.adobe.com/flex/mx"
                                creationComplete="creationCompleteHandler(event)"
                                width="800" height="600"
                                >
         <fx:Style>
              @namespace s "library://ns.adobe.com/flex/spark";
              @namespace mx "library://ns.adobe.com/flex/mx";
              @namespace local "*";
              @font-face {
                   src:                         url("assets/Fonts/arial.ttf");
                   fontFamily:                  ArialEmbedded;
                   advandedAntiAliasing:      true;
                   cff:                              true;
                   unicodeRange:                U+0020-U+002F,U+0030-U+0039,U+003A-U+0040,U+0041-U+005A,U+005B-U+0060,U+0061-U+007A,U+007B-U+007E,U+00A1-U+00FF,U+2000-U+206F,U+20A0-U+20CF,U+2100-U+2183;
              @font-face {
                   src:                         url("assets/Fonts/cour.ttf");
                   fontFamily:                  CourierEmbedded;
                   advandedAntiAliasing:      true;
                   cff:                              true;
                   unicodeRange:                U+0020-U+002F,U+0030-U+0039,U+003A-U+0040,U+0041-U+005A,U+005B-U+0060,U+0061-U+007A,U+007B-U+007E,U+00A1-U+00FF,U+2000-U+206F,U+20A0-U+20CF,U+2100-U+2183;
              s|WindowedApplication
         </fx:Style>
         <fx:Script>
              <![CDATA[
                   import flash.text.Font;
                   import flash.text.engine.FontLookup;
                   import flashx.textLayout.container.ContainerController;
                   import flashx.textLayout.conversion.TextConverter;
                   import flashx.textLayout.edit.EditManager;
                   import flashx.textLayout.edit.IEditManager;
                   import flashx.textLayout.elements.TextFlow;
                   import flashx.textLayout.events.SelectionEvent;
                   import flashx.textLayout.formats.TextLayoutFormat;
                   import flashx.undo.UndoManager;
                   import mx.collections.ArrayCollection;
                   import mx.events.FlexEvent;
                   import spark.core.SpriteVisualElement;
                   import spark.events.IndexChangeEvent;
                   private var dynTextFlow : TextFlow;
                   private var ctTextFlow : TextFlow;
                   protected function creationCompleteHandler(event:FlexEvent):void
                        controlBarVisible=false;
                        dynTextFlow = TextConverter.importToFlow("Hello World", TextConverter.PLAIN_TEXT_FORMAT);
                        drawTextBloc(dynTextFlow);
                        dynTextFlow.addEventListener(SelectionEvent.SELECTION_CHANGE, selectionChangeListener);
                        dynTextFlow.fontFamily = "ArialEmbedded";
                        dynTextFlow.fontLookup = FontLookup.EMBEDDED_CFF;
                        dynTextFlow.fontSize = 24;
                        dynTextFlow.interactionManager = new EditManager(new UndoManager());
                        dynTextFlow.flowComposer.updateAllControllers();
                        dynTextFlow.invalidateAllFormats();
                        dynTextFlow.flowComposer.updateAllControllers();
                   protected function cbFont_creationCompleteHandler(event:FlexEvent):void
                        var fonts:ArrayCollection=new ArrayCollection(Font.enumerateFonts());
                        cbFont.dataProvider=fonts;
                   protected function cbFont_changeHandler(event:IndexChangeEvent):void
                        var cf : TextLayoutFormat = new TextLayoutFormat();
                        cf.fontLookup = FontLookup.EMBEDDED_CFF;
                        cf.fontFamily = ComboBox(event.currentTarget).selectedItem.fontName;
                        IEditManager(ctTextFlow.interactionManager).applyLeafFormat(cf);
                        ctTextFlow.interactionManager.setFocus();
                   private function drawTextBloc(txt : TextFlow) : void
                        var container : SpriteVisualElement = new SpriteVisualElement();
                        var controller : ContainerController = new ContainerController(container, 300, 200);
                        addElement(container);
                        txt.fontLookup = FontLookup.EMBEDDED_CFF;
                        txt.fontFamily = "ArialEmbedded";
                        txt.flowComposer.addController(controller);
                   private function selectionChangeListener(event : SelectionEvent) : void
                        ctTextFlow = event.currentTarget as TextFlow;
                   protected function txt_selectionChangeHandler(event:FlexEvent):void
                        ctTextFlow = (event.currentTarget as RichEditableText).textFlow;
              ]]>
         </fx:Script>
         <fx:Declarations>
              <!-- Place non-visual elements (e.g., services, value objects) here -->
         </fx:Declarations>
         <s:layout>
              <s:VerticalLayout paddingLeft="10" paddingTop="10"/>
         </s:layout>
         <s:RichEditableText x="10"
                                  y="10"
                                  selectionChange="txt_selectionChangeHandler(event)"
                                  paddingTop="5" paddingLeft="5" paddingRight="5" paddingBottom="5"
                                  id="txt"
                                  fontFamily="CourierEmbedded"
                                  text="RichEditableText"
                                  height="200"
                                  width="350"/>
         <s:ComboBox id="cbFont"
                        labelField="fontName"
                        creationComplete="cbFont_creationCompleteHandler(event)"
                        change="cbFont_changeHandler(event)"
                        />
         <s:Label text="TEST" fontFamily="CourierEmbedded" fontSize="22" rotation="45"/>
    </s:WindowedApplication>
    Please, help me...
    Thank you very much...

    Thank you very much,
    I finally found the solution using the swfContext :
    use namespace mx_internal;
    myTextFlow.swfContext = ISWFContext(getFontContext("myFontName", false, false, FontLookup.EMBEDDED_CFF));
    It works fine both with a dynamic component like RichEditableText or a dynamic textflow with a ContainerController.
    Thanks

  • External Embedded Fonts, Dynamic Text Fields, Latest?

    Hi all,
    I'm stuck in Flash8 land. Mostly because I use mProjector and
    MDM Zinc to extend flash projectors and neither support AS3
    correctly to date. That said..
    Is a Flash8 SWF capable of using fonts embedded in "other"
    SWFs? I realize this is an ongoing difficulty and a well known area
    of confusion and well, I'm confused. All the attempts I've made so
    far in linking have succeeded or failed in various ways, but never
    fully work.
    What I'd love to do is use dynamic text fields populated by
    data from a database (or XML file), with CSS styling, using fonts
    that are embedded in a 'master font SWF'. (and I'd like ice cream
    with that too!)
    I import fonts into, say, 'shared.fla'. I set them all up for
    exporting via linkage (to shared.swf). I open up my other FLAs
    (say, main.fla) and I drag the fonts from the shared.fla library
    into main.fla's library. In looking at the linkage, I see it
    properly set it to Import for Runtime Sharing (shared.swf). I can
    see the font available in the main.fla font list and can select it
    and use it just fine. I have to set the dynamic text field to embed
    fonts to actually see them (and also
    myTextField.setStyle("embedFonts",true)).
    Now all that works well and fine, but the kicker is when I
    want CSS to style my text. If I specify a embedded font linkage
    identifyer in CSS, the text disappears. i.e. I load 'style.css' and
    I have h1 { font-family: someEmbeddedFont; }, the <h1> text
    will now disappear.
    Any clues in how I can specify a font to use in CSS that's
    embedded so it'll work, WITHOUT This font needed to be embedded in
    the actual library (as in, not a linked asset)? Because it works
    fine if I embed the font into every single SWF. But when I try to
    use it as a shared asset, this doesn't work.
    Any ideas on how someone can achieve this?
    This is so I can changed my shared.swf and supply all new
    fonts with the same linkage identifiers and change the font in a
    whole project without re-exporting any other SWFs.
    Thanks for any info!

    I guess it's amazing but I honestly, wholly cannot get this
    to work.
    I made a new AS2 Flash 8 FLA (Forte.fla) with only the Forte
    font with a size of 22 in the library. The name of the library
    element was Forte. The linkage was set to "Export for Actionscript"
    and "Export in first frame".
    I made another FLA (main.fla) Flash8 AS2. In actionscript I
    created an empty movie clip named "Asset_Forte" at the next highest
    depth and Asset_Forte.loadMovie("Forte.swf"). I also made a dynamic
    text field on the stage of this main.fla document and set it to
    Arial 22pt (no bold or italics, etc). I did not embed anything into
    it.
    I made a TextTormat object (my_fmt) and set my_fmt.font =
    "Forte";. I put some text in the dynamic text field to start so I
    just applied the formatting (status_text.setTextFormat(my_fmt);).
    This did not work. I started adjusting random things like
    naming the font in the Forte.swf library to Forte22 and tried
    my_fmt.font = "Forte22";. That didn't work.
    I adjusted the linkage to "Export for runtime sharing" and
    specified Forte.swf as the SWF to share from. This did not work.
    I then dragged the font from the Forte.fla's library (while
    "Export for runtime sharing" was enabled) into the library of
    main.fla. I checked the link and it was proper, "Import for runtime
    sharing, Forte.swf". I used both linkage attempts again (Forte and
    Forte22) with my_fmt.font and neither worked.
    Would it be possible at all to get a couple FLAs from you
    that examplifies how you do this particular trick? I can't seem to
    get the settings right. I would be indebted to you!

  • Create a pdf with all the fonts embed in indesign CS6

    Hi,
    I created a ps from Indesign CS6 & CS7 with all fonts embed,
    and drop it on to distiller and made a PDF. Where upon font seem to split into outline, but random as done files for this job and no dramas at all using same fonts/indesign and distiller setting.
    any ideas to resolve this?

    Hi Evoteam,
    Please check the Distiller PDF settings and make sure Embed fonts option has been checked.
    Hope this helps.
    Regards,
    Sumit Singh

  • Name Exported PDFs with a filename generated from text frame on page

    Does anyone have an idea how to do the following? We are on InDesign CS5 on a Mac running OSX 10.6
    We are plan on creating a document of say 100 pages. On each page will be a photo of a product and some text frames. In one of the frames will be the product's SKU code - which we will enter manually.
    What I want is that we can then export each page as a separate PDF and the filename of each PDF will be taken from the SKU code present in the text frame with .pdf appended.
    Has anyone done something like this?
    Thanks

    Ask in the Scripting forum... InDesign Scripting

  • Appling Ext. Embedded Fonts to Text

    Hi All,
    I am spinning here. I believe I am missing the intermediate step of getting/creating a font obj to apply to the text format. The TLFTextField (instance name Label) was created on stage and has _typewriter font set. It is in movieclip instance mc and is rotated.
    AS3 CS5 Player 10
    The text is substituted but the Arial Unicode MS font is not applied.
    import flash.display.MovieClip;
    import com.greensock.*;
    import com.greensock.loading.SWFLoader;
    import com.greensock.events.LoaderEvent;
    import fl.text.TLFTextField;
    import flash.text.TextFormat;
    import flash.text.Font;
    var loader:SWFLoader = new SWFLoader("Code/FontTLF.swf", {onComplete:completeHandler});
    loader.load();
    function completeHandler(event:LoaderEvent):void {
    var fontClass:Class = loader.getClass("FontClassName");
    var fontArray:Array = Font.enumerateFonts(true);
    var formatMC:TextFormat = new TextFormat();
    for(var i:int = 0; i < fontArray.length; i++) {
    var font:Font = fontArray[i];
    if (font.fontName == "Arial Unicode MS") {
      trace("name: " + font.fontName);//name: Arial Unicode MS
      trace("typeface: " + font.fontStyle);//typeface: regular
    formatMC.font = font.fontName;//I may need an intermediate step here??
      break;
    trace (formatMC.font);//Arial Unicode MS
    mc.Label.text = "New S"; // Rotated MC with text displays as Times s/b Arial...
    mc.Label.defaultTextFormat = formatMC;
    mc.Label.embedFonts = true;
    Thanks,
    Jim

    Yes , I believe you are right but ... 
    Here's a common TLFText example I am referencing.
    import fl.text.TLFTextField;
    import flash.text.TextFieldAutoSize; 
    import flash.text.TextFormat;
    var fmt:TextFormat = new TextFormat();
    fmt.color = 0xFF0000; // red 
    fmt.font = "Arial";  fmt.size = 32;
    var tlfTxt:TLFTextField = new TLFTextField();
    tlfTxt.defaultTextFormat = fmt;
    tlfTxt.border = true;
    tlfTxt.text = "The quick brown fox jumps over the lazy dog.";
    tlfTxt.wordWrap = true;
    tlfTxt.width = 300;
    tlfTxt.autoSize = TextFieldAutoSize.LEFT;
    tlfTxt.x = tlfTxt.y = 40;
    addChild(tlfTxt);
    It also runs out sequence, but I am cleaning that up.
    I am trying to use a method that allows me to use plain text strings w/o markup.
    It a CD based captioned video player and uses a Director platform that has flash player 10.3 built in. 30 plus languages, some RTL and maybe a BiDi.

  • Embedding Fonts breaks text display

    I'm on Mac OS 10.5.6 (but saw this with earlier versions too)
    using CS3 and/or CS4.
    For work we publish back to Flash 5 player (don't ask) and
    that is where I'm having the problem. If I put a dynamic text box
    on the stage. Add some text. Embed the font. Change the publish
    setting to 5. Publish I don't see the text. If the box is
    selectable the player does show the I beam when I move over where
    it should be, so it does know the text box is there.
    I sent the same file to one of my coworkers and he can
    publish it just fine.
    We tried replacing the fonts on my machine with those from
    his and that didn't solve the problem.
    I've tried using both CS3 and CS4 and get the same results.
    If I change the publish settings to Flash 6 everything works
    like it should.
    Any ideas?

    Tried replacing the Arrial Narrow font with a different one
    from another coworker. Still no dice.
    Tried repairing permissions. Nope
    Tried making a new user and checking it there. Nope.
    Moved the font from my Library to the system wide library.
    Bupkiss. (Is that how it is spelled?)

  • Is a scrollable text frame possible with hyperlinks embedded in the text?

    I'm new to this whole digital design world. I have a document set up with several links, some active buttons and I am struggling with this scrollable text frame. I have the frame finally working but it only scrolls from the preview button in the overlay creator panel. From the Preview Window, preview on the Internet, or exported as a SWF everything but the scrolling frame. Anyone have any suggestions??

    You can, but it's not very elegant. For one thing, whatever text you have will have to basically be on another webpage addressable by URL in the iframes expression. I guess it would work, though. It's just like having two pages, where maybe one would suffice, you know?
    There are some javascript examples that I have come across that would probably work. The HTML tags seem to be unreliable....I have seen some that only work on IE and not for the mozilla based browsers.
    Anyways, here's an example of a scrolling iframe....it's the messageboard from my Guestmap...
    http://guestmap.dirtdoog.com
    As you can see...it's not great, but it's workable.

  • Convert MS Word to PDF with an embedded Excel File Q

    I have an MS Word document that contains embedded Excel and Word files (double click these icons and you can view), how can I convert this MS Word document and still be able to view these embedded file when I convert to PDF? I'm doing the straiaght forward 'createPDF' using PDF maker, when it is finished creating the PDF you can obviously see these embedded images in the PDF file but you can't access them. Can this be done?
    Thanks

    No. Covert these other documents to pdf and attach them in the first pdf and attach them to the new pdf file.

  • Creating PDF with SWF embedded

    Hello all!
    I'm trying to build a PDF file which contains a Financial Statement, for example, where I could see my balance, withdraw, things paid, etc.
    I have all the information in database and I'd like to know if I can achieve this by developing a process in LC workbench.
    This process will catch a SWF flex based layout, merge with data from a database and then build a PDF file with dynamic features like data ordering.
    The question is, how could I merge a layout built in Flash Builder with data inside a process in Workbench?
    Thanks in advance.
    Diego

    Yes, I have beeen doing that manually. I want to set up Acrobat to do this automatically for every PDF I generate. For example, I want every PDF to be automatically created to open with 100% magnification and the document title showing in the title bar instead of the file name. I don't want to have to do this manually. Is there some way to do this automatically when the PDF is generated?

Maybe you are looking for

  • How to display desktopt in 16x9 using a projector?

    I have my macbook pro hooked up to a projector via vga. The projector is setup for widescreen 16x9.. Will my macbook pro show my desktop and programs in widescreen 16x9? How do I do this? Thank RD

  • Purchase In-game items

    Hi, my name is Kevin and I am a Iphone user in HK, early before I have reported to your support team regarding my Apple ID and credit card information has been stolen so in use, so your support team renew my Apple ID an change into a non-credit card(

  • Passing data to second popup

    Hello all, My main application retrieves data from a server(callresponders and such). I have a button that opens a popup. On this popup are two other buttons, one of which opens another popup. Now, I'd like to be able to fetch information about a sel

  • Obiee GENERAL

    1. What is the most widely used ETL tool for data loading? 2.what are bridge tables? 3. explain opaque views 4. explain database Hints. 5. explain fragmented data.

  • Low Disk Space in WSUS

    Hello, The size of content folder in my WSUS upstream Server is 170GB, i am installing updates in one language only and 46 products. its showing low disk space now. i used server cleaning wizard but it cleared only 2GB. I thing somthing is going wron