How to read line number text from PDF using plugin?

Hi, I would like to know how to read line number text from PDF using plugin?
Thanks in advance.

Ok, some background reading of the PDF Reference will help you understand why this is so difficult. PDF files are not organised into lines. It is best to think of each word or character on the page as being a graphic with its own position. The human eye sees lines where a series of graphics (words) are roughly in the same horizontal region.
In the general case it is difficult or even impossible to answer this. You may have columns with different spacing (but the PDF stores no information on what is a column). You may have subscripts and superscripts. You may have text in graphics coinciding with other text. Commonly, there may be titles, headings or page numbers which are just ordinary text and might count as lines.
That said, what you need to do is extract the text on the page and its positions. The WordFinder APIs are the way to do that. Now, sort all the words out, using the Y coordinates and size to try and guess what makes a "line". Now you are in a position to find the text (divided into words, not strings) and report the "line number" you have estimated.

Similar Messages

  • Extract Text from pdf using C#

    Hi,
    We are Solution developer using Acrobat,as we have reuirement of extracting text from pdf using C# we have downloaded adobe sdk and installed. We have found only four exmaples in C# and those are used only for viewing pdf in windows application. Can you please guide us how to extract text from pdf using SDK in C#.
    Thanks you for your help.
    Regards
    kiranmai

    Okay so I went ahead and actually added the text extraction functionality to my own C# application, since this was a requested feature by the client anyhow, which originally we were told to bypass if it wasn't "cut and dry", but it wasn't bad so I went ahead and gave the client the text extraction that they wanted. Decided I'd post the source code here for you. This returns the text from the entire document as a string.
           private static string GetText(AcroPDDoc pdDoc)
                AcroPDPage page;
                int pages = pdDoc.GetNumPages();
                string pageText = "";
                for (int i = 0; i < pages; i++)
                    page = (AcroPDPage)pdDoc.AcquirePage(i);
                    object jso, jsNumWords, jsWord;
                    List<string> words = new List<string>();
                    try
                        jso = pdDoc.GetJSObject();
                        if (jso != null)
                            object[] args = new object[] { i };
                            jsNumWords = jso.GetType().InvokeMember("getPageNumWords", BindingFlags.InvokeMethod, null, jso, args, null);
                            int numWords = Int32.Parse(jsNumWords.ToString());
                            for (int j = 0; j <= numWords; j++)
                                object[] argsj = new object[] { i, j, false };
                                jsWord = jso.GetType().InvokeMember("getPageNthWord", BindingFlags.InvokeMethod, null, jso, argsj, null);
                                words.Add((string)jsWord);
                        foreach (string word in words)
                            pageText += word;
                    catch
                return pageText;

  • Reading line item text from sales order

    Hi,
    I have a sales order which has an item text, I need to get the value from the text. I want to test the READ_TEXT function and I am giving values as
      Import parameters               Value           
      CLIENT                          400             
      ID                                  0011            
      LANGUAGE                   EN              
      NAME                            0001171445000010
      OBJECT                         VBBP            
      ARCHIVE_HANDLE         0               
    LOCAL_CAT                               
    It is not giving me any value.  Am I giving any thing wrong? In the NAME I gave the value as sales ord num + item number, is this correct. Please help me.
    Thanks,
    Veni.

    Hi
    Name is the Concatenation of Order No and Item No.
    pass the 4 parameters ID,OBJECT,NAME and LANG
    use the correct declarations for the parameters and use
    ID  = '0011'
    LANGUAGE =  'EN'
    NAME = '0001171445000010'
    OBJECT  = 'VBBP'
    See the doc
    READ_TEXT
    READ_TEXT provides a text for the application program in the specified work areas.
    The function module reads the desired text from the text file, the text memory, or the archive. You must fully specify the text using OBJECT, NAME, ID, and LANGUAGE. An internal work area can hold only one text; therefore, generic specifications are not allowed with these options.
    After successful reading, the system places header information and text lines into the work areas specified with HEADER and LINES.
    If a reference text is used, SAPscript automatically processes the reference chain and provides the text lines found in the text at the end of the chain. If an error occurs, the system leaves the function module and triggers the exception REFERENCE_CHECK.
    Function call:
    CALL FUNCTION 'READ_TEXT'
    EXPORTING CLIENT = SY-MANDT
    OBJECT = ?...
    NAME = ?...
    ID = ?...
    LANGUAGE = ?...
    ARCHIVE_HANDLE = 0
    IMPORTING HEADER =
    TABLES LINES = ?...
    EXCEPTIONS ID =
    LANGUAGE =
    NAME =
    NOT_FOUND =
    OBJECT =
    REFERENCE_CHECK =
    WRONG_ACCESS_TO_ARCHIVE =
    Export parameters:
    CLIENT
    Specify the client under which the text is stored. If you omit this parameter, the system uses the current client as default.
    Reference field: SY-MANDT
    Default value: SY-MANDT
    OBJECT
    Enter the name of the text object to which the text is allocated. Table TTXOB contains the valid objects.
    Reference field: THEAD-TDOBJECT
    NAME
    Enter the name of the text module. The name may be up to 70 characters long. Its internal structure depends on the text object used.
    Reference field: THEAD-TDNAME
    ID
    Enter the text ID of the text module. Table TTXID contains the valid text IDs, depending on the text object.
    Reference field: THEAD-TDID
    LANGUAGE
    Enter the language key of the text module. The system accepts only languages that are defined in table T002.
    Reference field: THEAD-TDSPRAS
    ARCHIVE_HANDLE
    If you want to read the text from the archive, you must enter a handle here. The system uses it to access the archive. You can create the handle using the function module ACHIVE_OPEN_FOR_READ.
    The value '0' indicates that you do not want to read the text from the archive.
    Reference field: SY-TABIX
    Default value: 0
    Import parameters:
    HEADER
    If the system finds the desired text, it returns the text header in this parameter.
    Structure: THEAD
    Table parameters:
    LINES
    The table contains all text lines that belong to the text read.
    Structure: TLINE
    Exceptions:
    ID
    The text ID specified in the parameter ID does not exist in table TTXID. It must be defined there together with the object of the text module.
    LANGUAGE
    The parameter LANGUAGE contains a language key that does not exist in table T002.
    NAME
    The parameter NAME contains the name of a text module that does not correspond to the SAPscript conventions.
    Possible errors:
    The field contains only blanks.
    The field contains the invalid characters ‘*’ or ‘,’.
    OBJECT
    The parameter OBJECT contains the name of a text object that does not exist in table TTXOB.
    NOT_FOUND
    The system did not find the specified text module.
    REFERENCE_CHECK
    The text module to be read has no text lines of its own but refers to the lines of another text module. This reference chain can include several levels. For the current text, the chain is interrupted, that is, one of the text modules referred to in the chain no longer exists.
    WRONG_ACCESS_ TO_ARCHIVE
    The exception WRONG_ACCESS_TO_ARCHIVE is triggered if an archive is accessed using an incorrect or non-existing archive handle or an incorrect mode (that is, read if the archive is open for writing or vice versa).
    Reward points if useful
    Regards
    Anji

  • Can't seem to save non-English as text from PDF using Reader

    I have several PDF documents that were originally generated by OpenOffice from a UTF8-encoded text file. The text is in different languages, e.g. Korean, Arabic, Russian, English. When I open these documents and then "save as text", the resulting text files contain garbage or nothing at all in all cases except for English. Is it possible to extract non-English text from a PDF document using Reader? If not, is there a different product that could be used for this purpose? Thanks much!

    They're using fonts that you don't have on your system so no, it isn't possible with Reader.

  • How to read a local file from Forms10g using PJC/Bean

    I'm trying to read a file on the client machine using a Bean.
    The code in the Bean is:
    File fp = new File(mFileName);
    printToConsole( "AFTER creating a File object " );
    if (fp.exists())
    I get the message after creating File object. But at the next statement - fp.exists() - the Forms session terminates giving a FRM-92100 - Your connection to the server was interrupted.
    Any ideas?
    Thanks in advance for your help.
    Amit

    Frank, Thank you for your response on signing the jar. That was exactly what I was missing. After signing the jar, I am able to do client level operations.
    Grant, Thank you for asking about WebUtil. To give you a background as to what we are trying to do - to integrate our Forms 10g application with another software - in this case a client-server app. We want to send a message from this app. to our Forms10g app.(a pre-defined Form in the app.) which would initiate certain Forms navigation based on the message. In essence, Forms needs to be "listening" to this app. (in a non-blocking mode, the Forms app. should not be "locked-out" while listening for a message)
    The only way I could think of was to write a Bean (modified version of Frank's Dispatch Event sample on his blog) which would spawn a thread and listen for a message from the other app. For our prototype, I started off with watching for a file on the client machine. The final goal is to listen for a message on MQ. This will facilitate us to integrate our Forms10g app. with any other app.
    Regarding WebUtil, I did not find a way to do the listening/polling in a non-blocking way. Maybe I must have missed the obvious - Frank can very well attest to that.
    Any feedback is most welcome.
    Thanks.
    Amit

  • How to read/extract text from pdf

    Respected All,
    I want to read/extract text from pdf. I tried using etymon but not succed.
    Could anyone will guide me in this.
    Thanks and regards,
    Ajay.

    Thank you very much Abhilshit, PDFBox works for reading pdf.
    Regards,
    Ajay.

  • How 2 Copy Header & Line Item Text from Purchase Order 2 Out Bound Delivery

    Hi SD Gurus,
    I want to copy header and line item text from Purchase Order to Out Bound Delivery (This is required in Stock Transfer Process).
    I have been able to do successful config. for copying header and line item text from Sales Order to Outbound Delivery but config. doesn't seems to be same for copying text from PO to OBD.
    Is there any way to achieve the same? Can some expert show the way to achieve this.
    Thanks in advance.
    Warm regards,
    Rahul Mishra

    Hi Ravikumar thanks for u quick reply.
    This is wht is currently coded.
    concatenate values to get item text for read text function
       invar3+0(10) = invar1. "PO number
       invar3+10(5) = invar2. "PO line number
       SELECT SINGLE * FROM stxh WHERE tdobject = 'EKPO'
                                   AND tdname   = invar3
                                   AND tdid     = 'F01'
                                   AND tdspras  = sy-langu.
       IF sy-subrc = 0.
         invar4 = invar3.
    reading the text for the document items.
         CALL FUNCTION 'READ_TEXT'
           EXPORTING
             id       = 'F01'
             language = sy-langu
             name     = invar4
             object   = 'EKPO'
           TABLES
             lines    = it_itab.
    I have seen some PO's which have info rec texts in that, which gets pulled by the above code...first thing is its id is F02 which exist in STXH table also there is other text with F01 id, and hence the table it_itab gets both these text hence no pbm.
    but i came across a PO which has only one text which is info rec text with id F05 and is not store in stxh and hence doesnot get pulled by read_text fm. How do i change my cod to get this text which should not hamper other PO's as well.
    As mentioned in above msgs, this F05 could be retrieved by providing object name as EINE.
    anyhelp will be appreciated and rewarded.
    thanks

  • How can I Read lines of data from a file starting at the end of the file??

    Can anyone help me with how to read lines of data starting from the end of a file instead of the beginning?? I do not want to load the entire file into memory as the files are very rather large. Instead I want to start at the end of the file and read lines backward , until I find the particular data item i am searching for, then stop.
    Can this be done in Java ? I know it can be done in Perl.
    Thanks.

    Thanks for your suggestion about the RandomAccessFile, I did actually thought about that approach , but wasn't sure it would work.
    I do not want to read the file in a sequential forward manner because the files contain large number of lines of data that have already been processed and therefore there is no need to reprocess these lines of data.
    The Unprocessed lines are always at the the end of the file and these are the data lines I am interesting in getting at without having to read the entire file. Therefore, i figure that if I can read the data from the end of the file then this would be much more efficient.

  • How I Got WYSIWYG TLF Text To PDF from Flex

    Awesome results read more here:  http://hybridmindset.com/blog/How-I-Got-WYSIWYG-TLF-Text-To-PDF-from-Flex-Part-1

    Hi,
    Adobe Reader currently doesn't support Text to Speech, but we have added this to our database for consideration in a future release.
    Thanks for your feedback!
    -Gaurav

  • How to read the whole text file lines using FTP adapter

    Hi all,
    How to read the whole text file lines when error occured middle of the text file reading.
    after it is not reading the remaining lines . how to read the whole text file using FTP adapter
    pls can you help me

    Yes there is you need to use the uniqueMessageSeparator property. Have a look at the following link for its implementation.
    http://download-west.oracle.com/docs/cd/B31017_01/integrate.1013/b28994/adptr_file.htm#CIACDAAC
    cheers
    James

  • Will not print text from PDFs - all other print is fine - Using nitro reader - Win7- HP4255

    Will not print text from PDFs - all other print is fine - Using nitro reader - Win7- HP4255

    Mulga
    Welcome to the HP Community Forum.
    Have you tried asking your question on the Nitro-Reader Forum?
    Nitro Reader Forum
    If you would like to try using the Adobe Reader, you might find help here:
    Manage Print Output with Print Preview
    See the section on PDF files
    Click the Kudos Thumbs-Up to show you appreciate the help.
    Click Accept as Solution when the Answer provides a Fix or Workaround!
    I am pleased to provide assistance on behalf of HP. I do not work for HP. 
    Kind Regards,
    Dragon-Fur

  • How to read a whole text file into a pl/sql variable?

    Hi, I need to read an entire text file--which actually contains an email message extracted from a content management system-- into a variable in a pl/sql package, so I can insert some information from the database and then send the email. I want to read the whole text file in one shot, not just one line at a time. Shoud I use Utl_File.Get_Raw or is there another more appropriate way to do this?

    how to read a whole text file into a pl/sql variable?
    your_clob_variable := dbms_xslprocessor.read2clob('YOUR_DIRECTORY','YOUR_FILE');
    ....

  • Copying text from PDF to Pages

    I am trying to copy text from a PDF file into Pages, after pasting the copied text into my new Pages document the spacing between most of the text becomes corrupeted,
    for ex.
    "Copying text from PDF to Pages" is imported as "CopyingtextfromPDFtoPages"
    does anyone know how to correct this?
    Imac   Mac OS X (10.4.7)  

    Rishi,
    Welcome to Apple Discussions.
    After reading your post, I tried to duplicate this problem. I opened a PDF, selected a sentence, then copied it to the clipboard. I then opened Pages, selected the blank template, then pasted in the text. It pasted perfectly.
    Does this problem happen with all text in a PDF? With different PDFs?
    -Dennis

  • Editing text from pdf file

    how to edit text from pdf file?

    Adobe Reader does not allow editing the text of a PDF document. You will need to get Acrobat on your Windows or Mac to do that.

  • How to retrieve the item text from VL03 transaction .

    How to retrieve the item text from VL03 transaction .
    The requirement is like this, the item text thus retrieved should be printed in the script under the item.

    Jagadieshwar,
    Use <b>READ_TEXT</b> function module to get the proper item text of Delivery.
    <b>ID</b>: Probably you want 0002 (Item Note), but it depends which text you want Item Note, Material Sales Text ,etc..
    <b>NAME</b>: CONCATENATE Delivery Doc. Number + Delivery Item Number (e.g. 0080001729000010)
    <b>OBJECT</b>: VBBP
    <b>
    LANGUAGE</b>: sy-langu or whatever you want.

Maybe you are looking for

  • 5800 XM Photo Gallery Crash ? Firmware suggession ...

    Hello , in my 5800 XM. The photo Gallery crashes every time in loading thumbnails when browsing. I got more than 100 pics and it keeps crashing when scrolling. Plz tell me if its the latest firmware ( v52.0.07 ) or its my 2 GB external memory or phon

  • SMSY Error : Unable to delete Instance Components

    On SMSY transactions...how to remove / unselect the J2EE Adapter Engine(PI/XI) and J2SE Adapter Engine(PI/XI. Is there any table associate  with Extended Assignment off to remove?

  • SMC Install Error

    Hi - can someone help with my install problem please? I'm trying to install SMC 4.0 onto OpenSolaris 2009.06 snv_111b x86. I've got as far as running ./es_guiinst from the command line and the installer starts - but as soon as I hit the "Next" button

  • Movie subtitle trouble...

    Hello - After adding subtitles to a movie file (using Submerge), it will play fine in Quicktime, but once imported to and played in iTunes (10.7), the subtitles don't show up. This problem has started recently - I was previously able to play the subt

  • FireFox Add-ons tab wont load and addons web page is blocked

    I just installed FireFox onto a brand new hard drive, when I open firefox and click add-ons the addon tab will not load it just tells me when its connected to the internet the tab will appear but it is connected, also when I go to the addon web page