OCR-text from pdf to pdf ?

Is it possible to copy the OCR-text from a pdf file obtained after Finereader and paste it in the original pdf?

Having had to perform OCR indicates the original PDF is a scanner's output. That makes it an image / picture of text forming the PDF page content.
An alternative approach -
Using Acrobat XI Pro create a new, blank PDF page (****+Ctrl+T).
Copy the OCR text.
Use Acrobat XI's Tools - Add Text and paste the OCR text at the cursor location.
Be well...

Similar Messages

  • How to read/extract text from pdf

    Respected All,
    I want to read/extract text from pdf. I tried using etymon but not succed.
    Could anyone will guide me in this.
    Thanks and regards,
    Ajay.

    Thank you very much Abhilshit, PDFBox works for reading pdf.
    Regards,
    Ajay.

  • Copying text from pdf with embedded font

    I have tried everything to copy and paste text from pdf into word. I think because it has embedded text it comes over as garbled. I have downloaded the font, tried to open it in several other aps, viewed it as html -- to copy and paste ...
    anyone have a trick that they can share with me before I poke my eyes out
    thank you

    Thanks for your prompt reply.
    As i said i have the font installed on my system. for your reference,
    following is the link to the pdf file. also the second link is the link to
    the fonts used. Kindly help me to sort this issue.
    https://www.yousendit.com/download/T2dkcHBEVEh0QTIwYjhUQw
    https://www.yousendit.com/download/T2dkcHBFQXBrYUJYd3NUQw

  • Copying text from PDF to Pages

    I am trying to copy text from a PDF file into Pages, after pasting the copied text into my new Pages document the spacing between most of the text becomes corrupeted,
    for ex.
    "Copying text from PDF to Pages" is imported as "CopyingtextfromPDFtoPages"
    does anyone know how to correct this?
    Imac   Mac OS X (10.4.7)  

    Rishi,
    Welcome to Apple Discussions.
    After reading your post, I tried to duplicate this problem. I opened a PDF, selected a sentence, then copied it to the clipboard. I then opened Pages, selected the blank template, then pasted in the text. It pasted perfectly.
    Does this problem happen with all text in a PDF? With different PDFs?
    -Dennis

  • Will not print text from PDFs - all other print is fine - Using nitro reader - Win7- HP4255

    Will not print text from PDFs - all other print is fine - Using nitro reader - Win7- HP4255

    Mulga
    Welcome to the HP Community Forum.
    Have you tried asking your question on the Nitro-Reader Forum?
    Nitro Reader Forum
    If you would like to try using the Adobe Reader, you might find help here:
    Manage Print Output with Print Preview
    See the section on PDF files
    Click the Kudos Thumbs-Up to show you appreciate the help.
    Click Accept as Solution when the Answer provides a Fix or Workaround!
    I am pleased to provide assistance on behalf of HP. I do not work for HP. 
    Kind Regards,
    Dragon-Fur

  • Extract Text from pdf using C#

    Hi,
    We are Solution developer using Acrobat,as we have reuirement of extracting text from pdf using C# we have downloaded adobe sdk and installed. We have found only four exmaples in C# and those are used only for viewing pdf in windows application. Can you please guide us how to extract text from pdf using SDK in C#.
    Thanks you for your help.
    Regards
    kiranmai

    Okay so I went ahead and actually added the text extraction functionality to my own C# application, since this was a requested feature by the client anyhow, which originally we were told to bypass if it wasn't "cut and dry", but it wasn't bad so I went ahead and gave the client the text extraction that they wanted. Decided I'd post the source code here for you. This returns the text from the entire document as a string.
           private static string GetText(AcroPDDoc pdDoc)
                AcroPDPage page;
                int pages = pdDoc.GetNumPages();
                string pageText = "";
                for (int i = 0; i < pages; i++)
                    page = (AcroPDPage)pdDoc.AcquirePage(i);
                    object jso, jsNumWords, jsWord;
                    List<string> words = new List<string>();
                    try
                        jso = pdDoc.GetJSObject();
                        if (jso != null)
                            object[] args = new object[] { i };
                            jsNumWords = jso.GetType().InvokeMember("getPageNumWords", BindingFlags.InvokeMethod, null, jso, args, null);
                            int numWords = Int32.Parse(jsNumWords.ToString());
                            for (int j = 0; j <= numWords; j++)
                                object[] argsj = new object[] { i, j, false };
                                jsWord = jso.GetType().InvokeMember("getPageNthWord", BindingFlags.InvokeMethod, null, jso, argsj, null);
                                words.Add((string)jsWord);
                        foreach (string word in words)
                            pageText += word;
                    catch
                return pageText;

  • Editing text from pdf file

    how to edit text from pdf file?

    Adobe Reader does not allow editing the text of a PDF document. You will need to get Acrobat on your Windows or Mac to do that.

  • Applescript or workflow to extract text from PDF and rename PDF with the results

    Hi Everyone,
    I get supplied hundreds of PDFs which each contain a stock code, but the PDFs themselves are not named consistantly, or they are supplied as multi-page PDFs.
    What I need to do is name each PDF with the code which is in the text on the PDF.
    It would work like this in an ideal world:
    1. Split PDF into single pages
    2. Extract text from PDF
    3. Rename PDF using the extracted text
    I'm struggling with part 3!
    I can get a textfile with just the code (using a call to BBEDIT I'm extracting the code)
    I did think about using a variable for the name, but the rename functions doesn't let me use variables.

    Hello
    You may also try the following applescript script, which is a wrapper of rubycocoa script. It will ask you choose source pdf files and destination directory. Then it will scan text of each page of pdf files for the predefined pattern and save the page as new pdf file with the name as extracted by the pattern in the destination directory. Those pages which do not contain string matching the pattern are ignored. (Ignored pages, if any, are reported in the result of script.)
    Currently the regex pattern is set to:
    /HB-.._[0-9]{6}/
    which means HB- followed by two characters and _ and 6 digits.
    Minimally tested under 10.6.8.
    Hope this may help,
    H
    _main()
    on _main()
        script o
            property aa : choose file with prompt ("Choose pdf files.") of type {"com.adobe.pdf"} ¬
                default location (path to desktop) with multiple selections allowed
            set my aa's beginning to choose folder with prompt ("Choose destination folder.") ¬
                default location (path to desktop)
            set args to ""
            repeat with a in my aa
                set args to args & a's POSIX path's quoted form & space
            end repeat
            considering numeric strings
                if (system info)'s system version < "10.9" then
                    set ruby to "/usr/bin/ruby"
                else
                    set ruby to "/System/Library/Frameworks/Ruby.framework/Versions/1.8/usr/bin/ruby"
                end if
            end considering
            do shell script ruby & " <<'EOF' - " & args & "
    require 'osx/cocoa'
    include OSX
    require_framework 'PDFKit'
    outdir = ARGV.shift.chomp('/')
    ARGV.select {|f| f =~ /\\.pdf$/i }.each do |f|
        url = NSURL.fileURLWithPath(f)
        doc = PDFDocument.alloc.initWithURL(url)
        path = doc.documentURL.path
        pcnt = doc.pageCount
        (0 .. (pcnt - 1)).each do |i|
            page = doc.pageAtIndex(i)
            page.string.to_s =~ /HB-.._[0-9]{6}/
            name = $&
            unless name
                puts \"no matching string in page #{i + 1} of #{path}\"
                next # ignore this page
            end
            doc1 = PDFDocument.alloc.initWithData(page.dataRepresentation) # doc for this page
            unless doc1.writeToFile(\"#{outdir}/#{name}.pdf\")
                puts \"failed to save page #{i + 1} of #{path}\"
            end
        end
    end
    EOF"
        end script
        tell o to run
    end _main

  • How to read line number text from PDF using plugin?

    Hi, I would like to know how to read line number text from PDF using plugin?
    Thanks in advance.

    Ok, some background reading of the PDF Reference will help you understand why this is so difficult. PDF files are not organised into lines. It is best to think of each word or character on the page as being a graphic with its own position. The human eye sees lines where a series of graphics (words) are roughly in the same horizontal region.
    In the general case it is difficult or even impossible to answer this. You may have columns with different spacing (but the PDF stores no information on what is a column). You may have subscripts and superscripts. You may have text in graphics coinciding with other text. Commonly, there may be titles, headings or page numbers which are just ordinary text and might count as lines.
    That said, what you need to do is extract the text on the page and its positions. The WordFinder APIs are the way to do that. Now, sort all the words out, using the Y coordinates and size to try and guess what makes a "line". Now you are in a position to find the text (divided into words, not strings) and report the "line number" you have estimated.

  • How can I copy text from PDF and include the source filename in the pasted selection?

    I'm a biologist and frequently cut-and-paste notes from PDFs of scientific articles.  I name all of the PDF articles with their PubMed ID, a short unique identifier (e.g. 19397482.pdf).  When I take notes, I will select a few sentences from the PDF and then paste them into a text editor for later reference. 
    Can anyone suggest a method or script that would allow me to paste the copied text with the Pubmed filename included in a single action?  I would want the pasted output it to look something like this, with the filename appended to the end:
    Of the transcripts that were significantly different, there was a greater number of transcripts that were down-regulated in the IVC embryos (380) than the number of transcripts that were up-regulated (208).  [20668257.pdf]
    This would really help me to properly cite information sources during the writing process.  I know there are bibliography managers that might be able to do something like this, but I prefer to read the PDF articles directly in Preview and select the text as I am reading. 
    Thanks very much for any suggestions / ideas.
    jjw

    To copy and paste in a single action:
    tell application "Preview" to activate
    tell application "System Events" to tell process "Preview"
        -- Get the PubMed ID:
        get the title of the front window
        set thePubMedID to word 1 of result
        -- Copy the selected text to the clipboard:
        keystroke "c" using {command down} -- ⌘C
        delay 0.25 -- adjust if necessary
        -- Add the PubMed ID to the contents of the clipboard:
        set theNotes to the clipboard
        set the clipboard to (theNotes & space & "[" & thePubMedID & ".pdf]")
    end tell
    tell application "Notational Velocity" to activate
    tell application "System Events"
        -- Paste the contents of the clipboard to the end of the Notational Velocity document
        key code 125 using command down -- ⌘↓
        keystroke return & return
        keystroke "v" using {command down} -- ⌘V
    end tell

  • Copying text from PDF created using print to PDF function in OS X

    I use a MacBook Pro with Mac OS X Lion, and Microsoft Word 2008 for Mac and Adobe Acrobat Pro.
    For some reason when I use the Print to PDF function to export a PDF of a Word document, then open it with Acrobat Reader or Acrobat Pro 9 and try to select text and copy it then paste it into a word processor (include Word 2008) the resulting text is gibberish. It looks like some sort of encoding issue, but I can't understand that, since it's all happening on the same Mac! I have also tried to do this with Preview as the PDF reader but I still get gibberish.
    The issue first started occuring with Snow Leopard, and all software is patched, but no dice.
    I've attempted to work around this by using all of the different PDF options under the print dialog, and by saving the doc as a PDF, but I still get the same thing.
    I've also tried copying and pasting the text int Pages, then saving it as PDF and trying to ready it .. again, no luck.  I was able to output the file directly from Pages to Preview and save it from there, but it really doesn't seem like this should be necessary, given that the functionality is build into the OS.
    Anybody else have experience with this? I have just one user that needs to copy and paste text from the doc, so it's a real pain to have to maintain separate PDF and Word versions.
    Thanks!
    D

    Rishi,
    Welcome to Apple Discussions.
    After reading your post, I tried to duplicate this problem. I opened a PDF, selected a sentence, then copied it to the clipboard. I then opened Pages, selected the blank template, then pasted in the text. It pasted perfectly.
    Does this problem happen with all text in a PDF? With different PDFs?
    -Dennis

  • Need to change auto generated text from pdf report

    I have implemented pdf but it generated text from the autogenrated xml.
    If I need to change that text then what should I want to do for that?
    Following is my xsl file and the generated output in pdf.
    <?xml version="1.0" encoding="UTF-8"?>
    <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    xmlns:fo="http://www.w3.org/1999/XSL/Format"
    xmlns:fn="http://www.w3.org/2005/xpath-functions">
    <xsl:output method="xml" indent="yes" encoding="utf-8" omit-xml-declaration = "yes" />
    <xsl:template match="/">
    <fo:root>
    <fo:layout-master-set>
    <fo:simple-page-master master-name="my-page">
    <fo:region-body margin="1in"/>
    <fo:region-before extent="1in" background-color="silver" />
    </fo:simple-page-master>
    </fo:layout-master-set>
    <fo:page-sequence master-reference="my-page">
         <fo:static-content flow-name="xsl-region-before">
    <fo:block height="150px" width="1024px" background-color="blue" >
    <fo:external-graphic width="340px" src="http://localhost:9000/web-determinations9000/images/logo.png"/>
    </fo:block>
    </fo:static-content>
    <fo:flow flow-name="xsl-region-body">
    <fo:block >
    <xsl:apply-templates mode="dump" select="/session/entity/instance/attribute"/>
    </fo:block>
    </fo:flow>
    </fo:page-sequence>
    </fo:root>
    </xsl:template>
    output from pdf -
    What is Student Id?=
    65.0
    Is scored more than 80% true%=
    true
    What is Student Name?=
    asf
    What is lastname?=
    asdf
    Is student eligible for gold medal?=
    true
    If i want "what is student id?" text to only student Id then what change I need for that?

    Hi,
    Following is the code sample for your understanding:
    <?xml version="1.0" encoding="UTF-8"?>
    <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    xmlns:fo="http://www.w3.org/1999/XSL/Format"
    xmlns:fn="http://www.w3.org/2005/xpath-functions">
    <xsl:output method="xml" indent="yes" encoding="utf-8" omit-xml-declaration = "yes" />
    <xsl:template match="/">
    <fo:root>
    <fo:layout-master-set>
         <fo:simple-page-master master-name="my-page">
              <fo:region-body margin="1in"/>
              <fo:region-before extent="1in" background-color="silver" />
         </fo:simple-page-master>
    </fo:layout-master-set>
    <fo:page-sequence master-reference="my-page">
         <fo:static-content flow-name="xsl-region-before">
              <fo:block height="150px" width="1024px" background-color="blue" >
              <fo:external-graphic width="340px" src="http://localhost:9000/web-determinations9000/images/logo.png"/>
              </fo:block>
         </fo:static-content>
         <fo:flow flow-name="xsl-region-body">
              <fo:block>
              <xsl:apply-templates mode="dump" select="/session/entity[@name='global']/instance[@label='global']"/>
              </fo:block>
         </fo:flow>
    </fo:page-sequence>
    </fo:root>
    </xsl:template>
    <xsl:template match="/session/entity[@name='global']/instance[@label='global']" mode="dump" priority="100">
         <fo:block margin-top=".5cm" margin-left=".1cm" border-start-style="solid">
              Student ID:<xsl:value-of select="attribute[@id='var_student_id']/number-val"/>
         </fo:block>
    </xsl:template>
    </xsl:stylesheet>Thanks,
    Aakarsh

  • Preview - Cannot copy and paste text from pdf

    Hi there ...
    I usually have no problem copying and pasting text from a pdf using Preview ...
    However .. I recently received a pdf ... and there is no way it will highlight the text to copy ...
    The older pdf's are not affected ...
    I did a get info on the pdf ... it's not locked ... and it was made by Adobe In Design CS2
    I tried opening it with Adobe Reader 9 ...... but still no luck
    IN my search .. I noticed one other person had the same problem ... but he didn't get any responses..
    Thanks for any info ....

    If it's behind something else then you can open it in Illustrator and select the specific object you want. If it's rasterized then no.
    If this is a document you can share with the world then I'm sure we could tell you what specifically is going on if you posted it here.

  • Can't Copy Text From PDF Within Gmail Preview

    Hi,
    If I click on a pdf attachment while inside gmail, I get the expected preview window of that PDF. However, I can't copy text from it. If I press Command-C, it just makes a beeping noise letting me know nothing is being copied. If I select text and right-click, there's no 'copy' option. Also, in the top toolbar, Copy is greyed out in the Edit menu. However, if I just hold down Command-C for a couple seconds the text will copy anyways (while hearing a dozen audible beeps). I can copy text without issue using Firefox from the same PDF attachment preview inside Gmail.
    Thanks for any help.
    Yosemite 10.10.1
    Safari Version 8.0 (10600.1.25.1)

    Having same issue as well. I've had this problem pre-Yosemite and it seems to be Safari related. Copy to clipboard in Gmail preview works fine in Chrome & Firefox, just not Safari. As you indicated if you hold ctrl-c long enough it will eventually copy, just forces you to listen to the series of annoying bongs.
    Also there's no copy to clipboard in the context menu as illustrated below:
    http://i.imgur.com/adyyJO4.png
    Pretty annoying really, I'm not about to download a dozen attached PDFs to copy to clipboard, so now I'm sitting in the Delta lounge with my laptop bonging away as I wait for Safari to copy to clipboard and everyone looks at me like I'm a moron.
    NOTE: No I do not have acrobat or any other software loaded besides vanilla Safari. This has gone on on my MBP, Mac Pro and now this brand new MBP I got Monday as a company upgrade, it literally has nothing installed on it at the moment.

  • Copy text from PDF and paste to imessage

    Before iOS7 I would be able to copy text from a PDF file and paste it normally into a text message or imessage.  Now when doing the same thing it doesn't paste the actual text I copied, but it pastes as an html attachment.  Have I changed a setting or is this just a bug that needs to be fixed?  Please help.

    I get the PDF in an email and open to view it.  Not sure what the pdf opens in automatically, but that is the way I've been doing it since I've had the iphone 4 and haven't had issues until ios7. 
    I just now tried to opened a file in the default view and then switched it to open in Adobe reader and I can copy and paste normally.  So now i have to do an extra step all the time?

Maybe you are looking for

  • Can i use office web app server without adding to a domain ?

    spserver.local is my domain controller webapp.spsserver.local is my office web app server(OWA)(IP:79.123.161.xxx ) I manage to use office web app with sharepoint 2013 my OWA is in a domain (spserver.local) and it's address is http://webapp.spserver.l

  • Can I allow multiple users on a blog?

    I would like to have remote users who can update and add pages to a blog in iweb.  Is this possible?

  • Multiple Email Accounts on Desktop Manager?

    Are there any plans to add functionality to the desktop manager to manage all of the email accounts in my phone? I have some accounts that aren't in outlook that only need to be sent to my phone, and I was wondering if you can manage multiple account

  • How do i transfer a movie from my old hp desktop to a macbook pro?

    how do i transfer a movie from my old hp desktop to a macbook pro? On my old computer I had two movies: Fight Club and Garden State and I'd really like to watch them on my new MacBook Pro! But I dont know how to transfer them onto my new laptop help?

  • Different Weblogic Console 8.1 problem

    I have a WebLogic 8.1 server that has 4 domains installed on it. One of my domains started acting freaky...I could still get database connections, but anything I tried to do with them failed. It was not code related because the same codebase is runni