Exporting pdf's to text

I have Acrobat 8.0 Standard.
I can search for words... so the pdf's are not an image but were created from another program. I do not have access to the original files. I only have the pdf's.
When I export to text, html, word doc, rtf, etc, the text is "mostly" right. But there are many instances where characters are just in the wrong spot.
i.e., say the pdf has a couple of lines of text like this
District 24 District 205 District 216
389 .....
The corresponding export text looks like this:
istrict 24 Distict 25 District 215
Dr0389 ......
"Dr0389" is the problem. The "D" is from "District 24" the "r" is from "District 205" and the "0" is also from "District 205".
If I use the select icon and right arrow over the document. It moves from the first D, to the r, to the zero, and then to the number 389 on the next line.
Any ideas? or is it just the engine that converted the original document to pdf that has messed up?
Thanks!

I know PDFs are not intended to be edited, but sometimes you have no choice, and it isn't all that rare. Opening in Illustrator can work after a fashion, but that's no easy trick.
We produce all our PDF newsletters now using InDesign, but for a few years they were made using Quark, and those original Quark files are long gone - all that remains is the PDFs. We are adding these old issues to a searchable database, so I need to add XML tags to all the stories. I haven't figured out a way to do that in the PDF, so I'm trying to put it back into some sort of form where tagging is possible. The pages in these documents are fairly highly designed - text in three columns, pull quotes in boxes centered on the page overlapping all three columns, graphics, etc. - so selecting the text from a whole story is challenging, to say the least. But while labor intensive, it is possible to copy the text, paste into a Word document, then manually kill all the unwanted stuff that came along with the Copy and use a macro of several Find/Replace routines to get rid of all the spurious paragraph returns.
I investigated a promising solution offered by Recosoft (www.recosoft.com) called, appropriately, PDF2ID. It lets you set up parameters, and auto-converts a PDF to a fully-formatted ID file. They offer a somewhat free demo that's both brain-dead (after the first page, the text in the rest of the document is replaced by x's) and time limited. It does a good job of conversion, though. Multiple-column type is rendered with each column as a separate story, so it has to be re-threaded. And it's fairly expensive at $249. But it might be worth a look if you need to do a lot of this.
I'm proceeding with my labor intensive approach. I just have to do about 30 issues, each of which I can convert to taggable Word docs in about a half hour. Very tedious, so I'm doing one a day. Then I'm done.

Similar Messages

  • Working with Exported PDF in Word - text jumps around

    I just purchased an Adobe Export PDF subscription and am having trouble working with the document once it has been exported to Word.  The cursor jumps from place to place, as do words.

    Try selecting the text you want to change within Word, then choosing a different font for that text.  I've seen this really help with the text-jumpy issue that occurs from time to time in files converted with ExportPDF. The underlying cause is usually a poorly or incorrectly embedded font in the original PDF.

  • Problems exporting PDF in CS5, text missing, images not transparent....

    I am using InDesign CS5 v7.04 on a Mac running 10.8.1.
    (This exact problem has happened to me in previous versions of ID as well though?)
    When I export a document to PDF and view in Preview OR place into another ID document, every other page has an issue where some or all of the text is missing and images that should be transparent are not or are missing as well.  Actually, the first two pages in my current document are OK, the rest, every other page is missing text, images, has non-transparent images that should be transparent... The font I am using is Avenir and the imaages are .png created in PS. The images are in the master.
    WHY is this happening? What do I need to do to fix?
    Thanks in advance for your assistance. I am a mostly self-taught ID user since version 1, just getting back into using it after a long break.

    pixelpusher_mama wrote:
    OK, sorry, yes, it looks the same in Acrobat as it does in Preview as it does when placed into ID. 
    I was able to save as a non-current version of PDF (Acrobat 4) and it shows correctly. (???) What am I missing, here? I mean, I have a work around, but?
    So the original PDF is bad, rather than there being a problem with the file after placing. That implicates the original document, and since Acrobat 4 compatibility works it's probably either a transparency or layer issue.
    Try trashin g your prefs and export again and see if it works. See Replace Your Preferences

  • Trouble exporting pdf with Arabic text to .doc file

    I am having an issue exporting a pdf from the trial of Acrobat Pro XI on a Windows 7 computer.
    The pdf has a table with English fields in the left column and Arabic equivalents in the right column.
    It appears to display fine within Acrobat, so I assume it has fonts embedded and that Acrobat is somehow handling the display of the Arabic font correctly.
    I need to export this to .doc or .docx so that I can enter additional data.
    When I convert to .doc, the English column and fields convert pretty good, but the Arabic fields do not display well at all. I am not sure if this is some sort of trouble with Word (or the Windows OS) requiring an Arabic font or Arabic text support to be installed on the system or if there is some other issue (such as Acrobat not handling the conversion well)??.
    I did note that there is an Arabic Font add on to Acrobat that can be downloaded from Here:
    http://www.adobe.com/support/downloads/thankyou.jsp?ftpID=4885&fileID=4558
    However, since Acrobat is seeming to properly display the Arabic text, I don't know that this is required...additionally when I tried to install the Arabic font support from that URL, it told me that Acrobat X was require, I am using Acrobat XI reader (installed with the trial version of Acrobat pro that I wish to purchase if I can get this working properly).
    Thanks in advance for any assistance or advice offered on this subject.
    David Dean

    Unfortunately our conversion engine cannot convert text, such as Arabic, which reads from right to left.  It maybe something that we add in the future however it is not available at this time.
    Please refer to the article mentioned below : http://forums.adobe.com/docs/DOC-1812
    ~Pranav

  • Exporting pdf to word - text becomes pic blocks not editable

    Just got subscription online and have tried both online and thru READER.  Whenever I convert .pdf to .docx it turns the text into pic block in Word, which I can move around and change the size of, but can't edit text.  I followed similar thread and turned off OCR recognition in READER and it still did same thing.

    Hi worshipdude,
    With OCR turned off, ExportPDF won't convert scanned text to editable/searchable text. Therefore, if the PDF you're trying to convert was created from a scanned document with image text, then the Word document will also have image text.
    Please convert without OCR and then triple-click in the Word document to select the text. Did that do the trick?
    Best,
    Sara

  • Exporting pdf containing Unicode text

    I tried to covert a pdf with unicode (Urdu language) text into MS Word.  It produced all garbage. . . Any ideas?

    Check FAQ for supported languages.

  • Exporting PDF text to html

    How can I export PDF text and post the exported text on a web page, to which I can then apply Google Translate?  Our organization post PDF articles from our journal.  (I can manually block and copy the text, so I know the text can be captured.)  I want a program/app/software to run on our website that will allow a user to extract the text from the PDF and display the text as html.  From there, the user can apply Google Translate.  So does anyone know how I can do this?  It doesn't seem like a difficult task -- I can do it manually -- but I want an app that will do it automatically.

    Thanks for the reply.  Do you have any idea how I could do what I want to do, perhaps with some other software?

  • It concerns adobe export pdf program. When we open  this program, it appears on the right of the screen." recognize the text in english"but. we would like to change it for french language. Because when we export the document under word program , it uses

    It concerns adobe export pdf program. When we open  this program, it appearsq on the right of the screen." recognize the text in english"but. we would like to change it for french language. Because when we export the document under word program , it uses english dictonnary to correct the text. thanks for your answer..

    [topic moved from Developers to Acrobat.com forum]

  • Anti-aliased text when exporting PDF to image

    I need to be able to batch-convert multi-page PDFs to individual bitmap images (one image for each page) with anti-aliased text.
    Photoshop works this way if you open a single PDF, allowing you to select one or more pages to rasterize as separate images, but not when batch processing (specifically, if you use the Image Processor script on a folder of images and PDFs, it will rasterize the PDFs automatically, but it will only do so with the first page of each PDF.)
    Acrobat, on the other hand, automatically creates an image for each page when exporting, and can do this in a batch sequence, but the text is not anti-aliased, making the image look like a screenshot from 1997. No matter how high an image resolution you select, the text is still jagged when you zoom in.
    So, is there a setting I'm missing that will allow the text to be anti-aliased when using Acrobat to export PDF to an image? I am using Acrobat 8, not 9, so something might have changed in the newest version.

    Not sure about Acrobat Pro 8, but in Acrobat Pro 9 (not Extended) you can Export>Image>Multiple Choice: JPEG, JPEG2000, PNG, TIFF.  I used JPEG and under the options in the export dialog box, leave the filename as is to coincide with the PDF filename and then choose Maximum Resolution under File Settings: Grayscale (JPEG, Quality:Maximum); Color (JPEG, Quality:Maximum) . . . skip down to Conversion Colorspace: Determine Automatically and Resolution choose 600pixels/inch for a letter size document.  This will result in a file size of 1.3MB per JPEG image if there is not a lot of information on the page.  I chose a simple header, footer with page numbering, and 5 lines of Lorem Ipsum text.  600dpi is overkill, you can go for 300dpi and still result in a decent image that will be able to be printed on a laser photocopier that is connected to a production computer.  Obviously if you are printing to a laser printer or a high quality inkjet 300dpi will suffice as well for a letter sized document.  But I have been told that 300dpi is not a standard rule of thumb and you must obtain specs from your printer since he/she can calculate by very strict rules the dpi you need for your content.  It depends on whether you have background images such as watermarks and also if your text body contains line-art.

  • When trying to export pdf to .docx the text doesn't convert. how do I remedy this?

    How do I get exported pdf to copy exact document, text, graphics and all?

    Hi hamsa142,
    I'm sorry that you're having trouble converting your PDF files to Word. I see you posted a similar question earlier. Please try disabling OCR as outlined here: How to disable Optical Character Recognition (O... | Adobe Community
    Let us know how it goes.
    Best,
    Sara

  • Will Export PDF transfer graphics to Word as well as text?

    Does Export PDF transfer graphics as well as text to Word? 

    Hi Mary,
    Yes it will as long as the entire file is below 100 MB. Let me know if you have further questions!
    Regards, Stacy

  • Images present in datagridview not exporting to file only text contents are generating into PDF file..

    Hi Everyone,
       I have created simple Desktop app in that I trying to generate PDF file from Datagridview...when I click on ExportPDf button Pdf file is generation successfully but the issue is in that pdf whatever the images has present in datagridview that images
    are not generation into PDF only the text contents are Present in PDF file.
      Does any one can tell me how to generate the PDF file along with images.
    Here is my code:
      private void btnexportPDF_Click(object sender, EventArgs e)
                int ApplicationNameSize = 15;
                int datesize = 12;
                Document document = null;
                try
                    SaveFileDialog savefiledg = new SaveFileDialog();
                    savefiledg.Filter = "All Files | *.* ";
                    if (savefiledg.ShowDialog() == DialogResult.OK)
                        string path = savefiledg.FileName;
                        document = new Document(PageSize.A4, 3, 3, 10, 5);
                        PdfWriter writer = PdfWriter.GetInstance(document, new FileStream(path + ".pdf", FileMode.Create));
                        document.Open();
                        // Creates a phrase to hold the application name at the left hand side of the header.
                        Phrase phApplicationName = new Phrase("Sri Lakshmi Finance,Hosur-560068", FontFactory.GetFont("Arial", ApplicationNameSize, iTextSharp.text.Font.NORMAL));
                        // Creates a phrase to show the current date at the right hand side of the header.
                        Phrase phDate = new Phrase(DateTime.Now.ToLongDateString(), FontFactory.GetFont("Arial", datesize, iTextSharp.text.Font.NORMAL));
                        document.Add(phApplicationName);
                        document.Add(phDate);
                        iTextSharp.text.Image img = iTextSharp.text.Image.GetInstance("D:\\logo.JPG");
                        document.Add(img);
                        iTextSharp.text.Font font5= iTextSharp.text.FontFactory.GetFont(FontFactory.TIMES_ROMAN, 5);
                        iTextSharp.text.Font font6 = iTextSharp.text.FontFactory.GetFont(FontFactory.HELVETICA_BOLD, 6);
                        //float[] columnDefinitionSize = { 2.5f, 7.0f,6.6f, 8.6f, 6.6f, 5.0f, 4.5f, 7.0f, 6.3f, 7.0f, 3.5f, 6.0f, };
                        PdfPTable table = null;
                        table = new PdfPTable(dataGridView1.Columns.Count);
                        table.WidthPercentage = 100;
                        PdfPCell cell = null;
                        foreach (DataGridViewColumn c in dataGridView1.Columns)
                            cell = new PdfPCell(new Phrase(new Chunk(c.HeaderText,font6)));
                            cell.HorizontalAlignment = PdfPCell.ALIGN_CENTER;
                            cell.VerticalAlignment = PdfPCell.ALIGN_CENTER;
                            cell.BackgroundColor = new iTextSharp.text.BaseColor(240, 240, 240);
                            table.AddCell(cell);
                        if (dataGridView1.Rows.Count > 0)
                            for (int i = 0; i < dataGridView1.Rows.Count; i++)
                                PdfPCell[] objcell = new PdfPCell[dataGridView1.Columns.Count];
                                for (int j = 0; j < dataGridView1.Columns.Count - 0; j++)
                                    cell = new PdfPCell(new Phrase(dataGridView1.Rows[i].Cells[j].Value.ToString(), font5));
                                    cell.HorizontalAlignment = PdfPCell.ALIGN_LEFT;
                                    cell.VerticalAlignment = PdfPCell.ALIGN_LEFT;
                                    cell.Padding = PdfPCell.ALIGN_LEFT;
                                    objcell[j] = cell;
                                PdfPRow newrow = new PdfPRow(objcell);
                                table.Rows.Add(newrow);
                        document.Add(table);
                        MessageBox.Show("PDF Generated Successfully");
                        document.Close();
                    else
                        //Error 
                catch (FileLoadException fle)
                    MessageBox.Show(fle.Message);
                    MessageBox.Show("Error in PDF Generation", "Error", MessageBoxButtons.OK, MessageBoxIcon.Error);
    Runtime Gridview content:
    Generated PDF File:
    Thanks & Regards RAJENDRAN M

    Hi Everyone,
       I have created simple Desktop app in that I trying to generate PDF file from Datagridview...when I click on ExportPDf button Pdf file is generation successfully but the issue is in that pdf whatever the images has present in datagridview that images
    are not generation into PDF only the text contents are Present in PDF file.
      Does any one can tell me how to generate the PDF file along with images.
    Here is my code:
      private void btnexportPDF_Click(object sender, EventArgs e)
                int ApplicationNameSize = 15;
                int datesize = 12;
                Document document = null;
                try
                    SaveFileDialog savefiledg = new SaveFileDialog();
                    savefiledg.Filter = "All Files | *.* ";
                    if (savefiledg.ShowDialog() == DialogResult.OK)
                        string path = savefiledg.FileName;
                        document = new Document(PageSize.A4, 3, 3, 10, 5);
                        PdfWriter writer = PdfWriter.GetInstance(document, new FileStream(path + ".pdf", FileMode.Create));
                        document.Open();
                        // Creates a phrase to hold the application name at the left hand side of the header.
                        Phrase phApplicationName = new Phrase("Sri Lakshmi Finance,Hosur-560068", FontFactory.GetFont("Arial", ApplicationNameSize, iTextSharp.text.Font.NORMAL));
                        // Creates a phrase to show the current date at the right hand side of the header.
                        Phrase phDate = new Phrase(DateTime.Now.ToLongDateString(), FontFactory.GetFont("Arial", datesize, iTextSharp.text.Font.NORMAL));
                        document.Add(phApplicationName);
                        document.Add(phDate);
                        iTextSharp.text.Image img = iTextSharp.text.Image.GetInstance("D:\\logo.JPG");
                        document.Add(img);
                        iTextSharp.text.Font font5= iTextSharp.text.FontFactory.GetFont(FontFactory.TIMES_ROMAN, 5);
                        iTextSharp.text.Font font6 = iTextSharp.text.FontFactory.GetFont(FontFactory.HELVETICA_BOLD, 6);
                        //float[] columnDefinitionSize = { 2.5f, 7.0f,6.6f, 8.6f, 6.6f, 5.0f, 4.5f, 7.0f, 6.3f, 7.0f, 3.5f, 6.0f, };
                        PdfPTable table = null;
                        table = new PdfPTable(dataGridView1.Columns.Count);
                        table.WidthPercentage = 100;
                        PdfPCell cell = null;
                        foreach (DataGridViewColumn c in dataGridView1.Columns)
                            cell = new PdfPCell(new Phrase(new Chunk(c.HeaderText,font6)));
                            cell.HorizontalAlignment = PdfPCell.ALIGN_CENTER;
                            cell.VerticalAlignment = PdfPCell.ALIGN_CENTER;
                            cell.BackgroundColor = new iTextSharp.text.BaseColor(240, 240, 240);
                            table.AddCell(cell);
                        if (dataGridView1.Rows.Count > 0)
                            for (int i = 0; i < dataGridView1.Rows.Count; i++)
                                PdfPCell[] objcell = new PdfPCell[dataGridView1.Columns.Count];
                                for (int j = 0; j < dataGridView1.Columns.Count - 0; j++)
                                    cell = new PdfPCell(new Phrase(dataGridView1.Rows[i].Cells[j].Value.ToString(), font5));
                                    cell.HorizontalAlignment = PdfPCell.ALIGN_LEFT;
                                    cell.VerticalAlignment = PdfPCell.ALIGN_LEFT;
                                    cell.Padding = PdfPCell.ALIGN_LEFT;
                                    objcell[j] = cell;
                                PdfPRow newrow = new PdfPRow(objcell);
                                table.Rows.Add(newrow);
                        document.Add(table);
                        MessageBox.Show("PDF Generated Successfully");
                        document.Close();
                    else
                        //Error 
                catch (FileLoadException fle)
                    MessageBox.Show(fle.Message);
                    MessageBox.Show("Error in PDF Generation", "Error", MessageBoxButtons.OK, MessageBoxIcon.Error);
    Runtime Gridview content:
    Generated PDF File:
    Thanks & Regards RAJENDRAN M
    Hello,
    Since this issue is mainly related to iTextSharp which belongs to third-party, I would recommend you consider posting this issue on its support website to get help.
    Maybe the following forum will help.
    http://support.itextpdf.com/forum/26
    Regards,
    Carl
    We are trying to better understand customer views on social support experience, so your participation in this interview project would be greatly appreciated if you have time. Thanks for helping make community forums a great place.
    Click
    HERE to participate the survey.

  • Export pdf and save as using digits.pdf eliminating all the text characters from the source file.

    Hi all,
    Usually when we export pdf from indesign or whatever software,,, the pdf file naming will be as per source file...
    eg.,  123456_dfdfjljf_ULCC.indd will be exported and saved as 123456_dfdfjljf_ULCC.pdf.
    my request is "the pdf should be saved as 12345.pdf avoiding all the characters following first digits.
    hint: first digists may be more or less than 4 digits....
    My request is any number of digits should be saved as 23232.......pdf and avoding the following characters.
    please help me... i have hell lot of pdf to export from indesign.......
    thanks
    bobylon.

    There is no "whatever" here - when you export from InDesign or other software each piece of software makes its own naming decisions. It isn't an Acrobat or JavaScript thing at all.

  • Exporting to PDF right hand text trunctation when using ForceLargerFonts.

    Hi,
    We are using Crystal Reports Basic .NET 2008 SP0 (10.5.3700.0) under ASP.NET.  We have decided to use the PDF printing option instead of ActiveX which works fine except for the font sizing issues.  After reading a few articles we resolved the font sizing issue by implementing the registry key ForceLargerFonts.
    HKEY_CURRENT_USER\Software\Business Objects\10.5\Crystal Reports\Export\PDF
    "ForceLargerFonts"=dword:00000001
    This has solved the problem of font scaling, however a new problem emerged relating to truncation on the right-hand side of textboxes, especially when the fonts are extremely small and the textbox is large (appears that crystal can't calulate the wrapping correctly). As described in this article - [http://ourchiwai.blogspot.com/2009/03/crystal-report-xi-service-packs-5-solve.html|http://ourchiwai.blogspot.com/2009/03/crystal-report-xi-service-packs-5-solve.html]
    We located an apparant fix in Crystal XI SP5 using these keys
    "UsePrecisePositioningForText"=dword:00000001
    "TruncationAdjustment"=dword:00000002
    however I can't seem to see any reference for this fix in Crystal Basic 2008.  Does anyone know if this fix is available in the Crystal Basic 2008 .NET version?
    Thanks
    Glenn

    Over 3 posts to avoid the 2500 formatting limitation.
    Sections of Web.Config
    <controls>
            <add tagPrefix="CR"  namespace="CrystalDecisions.Web" assembly="CrystalDecisions.Web, Version=11.5.3700.0, Culture=neutral, PublicKeyToken=692fbea5521e1304, processorArchitecture=MSIL" />
    <assemblies>
            <add assembly="CrystalDecisions.Web, Version=11.5.3700.0, Culture=neutral, PublicKeyToken=692fbea5521e1304, processorArchitecture=MSIL"/>
            <add assembly="CrystalDecisions.Shared, Version=11.5.3700.0, Culture=neutral, PublicKeyToken=692fbea5521e1304, processorArchitecture=MSIL"/>
            <add assembly="CrystalDecisions.ReportSource, Version=11.5.3700.0, Culture=neutral, PublicKeyToken=692fbea5521e1304, processorArchitecture=MSIL"/>
            <add assembly="CrystalDecisions.Enterprise.Framework, Version=11.5.3300.0, Culture=neutral, PublicKeyToken=692fbea5521e1304"/>
            <add assembly="CrystalDecisions.Enterprise.Desktop.Report, Version=11.5.3300.0, Culture=neutral, PublicKeyToken=692fbea5521e1304"/>
            <add assembly="CrystalDecisions.CrystalReports.Engine, Version=11.5.3700.0, Culture=neutral, PublicKeyToken=692fbea5521e1304, processorArchitecture=MSIL"/>
            <add assembly="CrystalDecisions.Enterprise.InfoStore, Version=11.5.3300.0, Culture=neutral, PublicKeyToken=692fbea5521e1304"/>
            <add assembly="CrystalDecisions.ReportAppServer.ClientDoc, Version=11.5.3300.0, Culture=neutral, PublicKeyToken=692fbea5521e1304"/>
    <httpHandlers>
         <add path="CrystalImageHandler.aspx" verb="GET" type="CrystalDecisions.Web.CrystalImageHandler, CrystalDecisions.Web, Version=11.5.3700.0, Culture=neutral, PublicKeyToken=692fbea5521e1304, processorArchitecture=MSIL"/>
        <handlers>
          <add name="CrystalImageHandler.aspx_GET" verb="GET" path="CrystalImageHandler.aspx" type="CrystalDecisions.Web.CrystalImageHandler, CrystalDecisions.Web, Version=11.5.3700.0, Culture=neutral, PublicKeyToken=692fbea5521e1304, processorArchitecture=MSIL" preCondition="integratedMode"/>

  • Enable tool tip in Exported PDF document from a CRE Report

    Hi All,
    We are on BObj 4.0 and using CRE for our reports. We have enabled Tool tip text on mouse over for certain labels which works great in browser. When we export to PDF, we dont see the Tool tip text. Is there a way we can configure this in CRE so that the exported PDF also shows the Tool tip text?
    Regards,
    Sundar

    I checked online and found this Note - 1567798 - Crystal Reports does not export Tool Tips when exported to Pdf Format. Does this apply to CRE also or just CR 2008?
    Regards,
    Sundar

Maybe you are looking for

  • Data in dropdown list field are empty in PDF form

    Hi all. Viewing a pdf form created with SAP interactive Forms offline method, the data in the dropdown list field are empty. The data in the dropdown list field are generated by xml file produced by an abap report. The problem occurs opening the form

  • Cannot save to SMB shares in Pages

    This issue is actually occurring ONLY in the iWork applications, Pages, Keynote and Numbers. User recently upgraded to Mavericks 10.9.1.  Has two SMB shares to smb://servername called "Public" and "Company Name". The two locations are available in Fi

  • Integrating an NI-USB 8451 driver and Labview Run-Time Files in an installation package of a .vi application

    Hello, I've built a .vi which uses the NI-USB 8451 device (please see http://sine.ni.com/nips/cds/view/p/lang/en/nid/202368), and I would like to create an installation package to be used on a costumer's PC, which has no Labview installed on or a dri

  • Unable to save home folder while deleting account

    Hi there: I am trying to delete a user account from a 10.6.8 MBP. I want to save the user's home folder to a DMG, like I always do. There is 201.42GB free on the HD. I get an error when trying to delete the account, however, that: "The home folder ca

  • Copy of answer

    Hello. Is it posible to send copy of filled out PDF form to an email or upload to a external server ? We are looking for a solution for service check lists. When user fill out form and send, we want them to get a copy in their mail, and one copy uplo