Simple extraction of text

hello,
i want 2 extract some data from a text file. an example is this:
ISBN ################## 0760049602.
# means any number of blank spaces
the problem is that there can be any number of blank chars between
the "isbn" and the actual no which is unreliable can be 1space,
2space etc.. it normally gives atleast 7 space chars and so the program
cannot/does not extract the isbn no. here is the code:
int end=0;
while(end >= 0){
int begin = read.indexOf("ISBN",end);
if (begin >= 0){ begin++;
end = read.indexOf('.',begin);
if(end >= 0){
String ISBNresult = read.substring(begin,end);
isbnVector.add(ISBNresult);
else {end = -1;}
the above code does not return any isbn nos.
pls can some one help as i need to extract the isbn numbers...
i know you can use the trim method in java BUT DO NOT KNOW WHERE TO USE IT AS I AM NEW TO JAVA.
THANKS FOR UR TIME AND HELP

thanks balusC BUT IT IS MORE COMPLEX..
there is not only one isbn there can be several, thus i have a loop which goes through line by line:
int end=0;
              while(end >= 0){
               int begin=0, movePosition=0;
                        begin = read.indexOf("ISBN:",end); //find the index of isbn
                        movePosition=5;
                        if(begin == -1){
                 begin = read.indexOf("ISBN",end); //find the index of isbn
                 movePosition=4;
               if (begin >= 0){ //if the index is greater then 0 then increment
                   begin++;
                   end = read.indexOf(',',begin);//find the index of the comma and using substring extract the string
                            if(end == -1){
                    end = read.indexOf('.',begin);
                            //if no isbn exist then add the value null to the vector
                            if(end == -1){
                                isbnVector.add("null");
                   if(end >= 0){
                    String ISBNresult = read.substring(begin+movePosition,end);
                    isbnVector.add(ISBNresult); //add the results to the vector v
               else {end = -1;}
              }i dont know where to use trim over here,
pls help

Similar Messages

Extract highlighted text in PDF

Hi,
I want to write a tool which can extract highlighted text from the pdf and export the text into another pdf file. Can somebody give me directions how to do that.
if you think some tool already has that capacity, please let me know. Most of the tool I know just extract the summary like the location of the highlighted portion but not the actual text associated with it.
reards,
Avesh

Hi,
The question relating to extract text from pdf file is beyond the scope of c# forum , please try to refer to:
http://www.adobe.com/support/acrobat/
HarryPlease remember to mark the replies as answers if they help and unmark them if they provide no help.
Welcome to the All-In-One Code Framework! If you have any feedback, please tell us.

HOW TO display a simple line of text if the user isn't logged in and not display it if they are?!?!?

there must be a way to display a simple line of text (or a link) if the user isn't logged in (as wholesale customer), and not display it if they are?!?!?
it's a basic kind of <php echo> var type of function.... there must be an equivalent in this BC system..... anyone?!?!? PLEASE!
thank you
Chris.

ok, after chattting with LIVE SUPPORT i've discovered {module_isloggedin}
now i just need to integrate it into my Online Store modules!

How to extract ALL text on a line, including extra white space, from PDF?

I have written my first scripts to parse thru PDF files and successfully locate the beginning and ending of text portions that I wish to copy/extract. For example, one of the scripts reads thru PDF files and locates patterns that allow me to copy/extract a lengthy Title that spans several lines on the page. My remaining problem is that sometimes the text I wish to copy/extract has tabs or lots of white space between the words, and I need to retain ALL the "white space" between words. Basically, I am successful at using getPageNthWord(j, i, false) along with "Quad coordinate sets" to get a word-at-a-time for all the words on the multiple lines but it seems to be skipping over the tabs and extra white spacing beetween the words. Another way to say this is that I need to extract/copy a full line at a time, including all white space, instead of being limited to just a word at a time.
Can anyone help me with an algorithm to extract/copy text, not just the words but also including tabs and other extra white space characters? Is there some type of Regular Expression I should be using, or maybe some additional parameter used for getPageNthWord that would retain all the extra white space?
Thanks in advance,
Ted L

Did you try setting the bStrip parameter of getPageNthWord to false? It will
preserve some of the whitespace, but not all of it.
As far as I know, that's the only way to do it via a script.

Is it possible to extract the text and images using PHP

Hi friends,
Is it possible to extract the text and images using PHP, in the same order as it is in the PDF?
Else is it possible extract the same as XML using PHP, or ASP
I googled it but its in vain, any help is appreciated.
Thanks in advance.

Dear Mike,
Thanks for your quick reply,
I mean is it possible to parse the PDF line by line using the PHP.
I extracted the whole text but couldn't the images. Since the PDF's images are decoded using various methods like DCTDecode, JPXDecode, etc.
The text is decoded with FlateDecode, which i breaked using a function in PHP.
Thanks,

Extraction of Texts in Infospokes

I have an ODS for which I have created an infospoke. I have all the infoobjects selected but the problem that I am facing is that when I run the extract it does not include the texts for a given infoobject. For example the infoobject JOBROLE during extraction does include the text description.
I do not want to extract the text fields in a separate file I want to include it in the same extract. I do understand that this is possible using BADIs but not sure of how to add the text table in the BADI target structure.
Any help would be appreciated.
Thanks,
Rahul

Rahul,
1. For that check the 'check box' on 'Transformation tab of BADI.
2. Before writing the code, enhance your target structure with a field to populate the text description for the infoobject you want.
3. Double-click on Addin implementation value.
4. Enter a name for the Implementation Short Text.
5. In the 'Defined Filters' screen, select your infospoke name from the dropdown list.
6. Goto Interface tab - 'ABAP code" should have been selected by default. Double-click on TRANSFORM method name.
7. You write ABAP code in the next screen.
method IF_EX_OPENHUB_TRANSFORM~TRANSFORM.
data: l_s_data_out type <b>target-structure-name</b>.
      clear e_t_data_out.
      loop at i_t_data_in into l_s_data_out.
   l_s_data_out-<b>appendedfield</b> = give the text table info. here /bic/t......
      insert l_s_data_out into table e_t_data_out.
      endloop.
endmethod.
This is not perfect code. but it will almost like this. Modify the code above and play with it.
Thanks.

Extract Blog Text From Domain File?

I have a year's blogging in a domain file. I've taken down the blog and started a new one, but would like to extract the text from the old year-long blog.
Is there an efficient way to do this other than switching that domain file back and going day by day?

There's a file in the blog domain package titled index.xml.gz. If you expand that file to get the index.xml file inside you might find the text. You will recognize the text in there but it's buried in a lot of "junk".
If you're starting a new blog for next year I suggest you use one of the blogging sites like Posterous, that are much more powerful and will let you manipulate the contents nearly any way you'd like. You can even embed that blog into one of your iWeb pages like in this demo page: Embed a Site Within an iWeb Page.
Using a 3rd party blogging site will also let you add to it online from any computer, Mac or PC or from mobile devices when on the road.
Happy New Year

How to extract sql text with SID and SERIAL#

Hi,
I am new to oracle database and recently i have started my journey in performance tuning.
i need to extract sql text which is fired by user based ont SID and SERIAL#.
Thanks in advance..
prabha

seankim wrote:
Hi~
select a.sid, a.serial#, b.sql_fulltext
from   v$session a, v$sql b
where decode(a.sql_id,null,a.prev_sql_id, a.sql_id)=b.sql_id
and    decode(a.sql_id,null,a.prev_child_number, a.sql_child_number)=b.child_number
and    a.sid=&sid
and    a.serial#=&serial;
Also a bad idea - have you checked the exection plan ?
Do you think it might be a good idea to think about the need for statistics on fixed objects ?
Here's a possible plan from 11.1.0.7 - and it's not very nice.
| Id | Operation                  | Name              | Rows | Bytes | Cost (%CPU)| Time     |
|   0 | SELECT STATEMENT           |                   |     1 | 2182 |     0   (0)| 00:00:01 |
|   1 | NESTED LOOPS              |                   |     1 | 2182 |     0   (0)| 00:00:01 |
|   2 |   MERGE JOIN CARTESIAN     |                   |     1 | 2075 |     0   (0)| 00:00:01 |
|   3 |    NESTED LOOPS            |                   |     1 |    39 |     0   (0)| 00:00:01 |
|* 4 |     FIXED TABLE FIXED INDEX| X$KSLWT (ind:1)   |     1 |    26 |     0   (0)| 00:00:01 |
|* 5 |     FIXED TABLE FIXED INDEX| X$KSLED (ind:2)   |     1 |    13 |     0   (0)| 00:00:01 |
|   6 |    BUFFER SORT             |                   |     1 | 2036 |     0   (0)| 00:00:01 |
|* 7 |     FIXED TABLE FULL       | X$KGLCURSOR_CHILD |     1 | 2036 |     0   (0)| 00:00:01 |
|* 8 |   FIXED TABLE FIXED INDEX | X$KSUSE (ind:1)   |     1 |   107 |     0   (0)| 00:00:01 |
Predicate Information (identified by operation id):
   4 - filter("W"."KSLWTSID"=82)
   5 - filter("W"."KSLWTEVT"="E"."INDX")
   7 - filter("INST_ID"=USERENV('INSTANCE'))
   8 - filter("S"."INDX"=82 AND "S"."KSUSESER"=53 AND "S"."INST_ID"=USERENV('INSTANCE')
              AND BITAND("S"."KSSPAFLG",1)<>0 AND BITAND("S"."KSUSEFLG",1)<>0 AND
              "KGLOBT03"=DECODE("S"."KSUSESQI",NULL,"S"."KSUSEPSI","S"."KSUSESQI") AND
              "KGLOBT09"=DECODE("S"."KSUSESQI",NULL,DECODE("S"."KSUSEPCH",65535,TO_NUMBER(NULL),"S"."K
              SUSEPCH"),DECODE("S"."KSUSESCH",65535,TO_NUMBER(NULL),"S"."KSUSESCH")))You need to avoid the "full tablescan" of the library cache.
Regards
Jonathan Lewis
http://jonathanlewis.wordpress.com

Simple centering of text on page?

I am new at using Buzzword, and am trying to center individual lines on a page of a new document I am typing. How do I do this? I see that on this screen for the creation of a new discussion topic on the forum, a quick and easy icon to click onto that centers text, but nowhere on my Buzzword document screen do I see such a icon, or reference to a drop-down file to "center."

Thank you Michelle, very helpful! One more question: how do you close the toolbar (paragraph or otherwise) once it has been opened? I assume that it automatically closes if a function is selected, but if this is not the case, I cannot find the way to close what has been opened!
Date: Wed, 10 Feb 2010 17:05:31 -0700
From: [email protected]
To: [email protected]
Subject: Simple centering of text on page?
Hi,
Thanks for your post. The center text option is somewhat hidden - it's on the Paragraph toolbar (not the default Font toolbar). To select the Paragraph toolbar, choose the small orange paragraph symbol on the far right of the screen. It will open and slide over to the left, then you will see the text - alignment options.
http://forums.adobe.com/servlet/JiveServlet/showImage/19820/center_text.jpg
Hope this helps!
Let me know if you have additional questions as you get used to Buzzword.
Best,
Michelle
>

How to use automator to extract specific text from json txt file

I'm trying to set up an Automator folder action to extract certain data from json files. I'm pulling metadata from YouTube videos, and I want to extract the Title of the video, the URL for the video, and the date uploaded.
Sample json data excerpts:
"upload_date": "20130319"
"title": "[title of varying length]"
"webpage_url": "https://www.youtube.com/watch?v=[video id]"
Based on this thread, seems I should be able to have Automator (or any means of using a shell script) find data and extract it into a .txt file, which I can then open as a space delimited file in Excel or Numbers. That answer assumes a static number of digits for the text to be extracted, though. Is there a way Automator can search through the json file and extract the text - however long - after "title" and "webpage_url"?
json files are all in the same folder, and all end in .info.json.
Any help greatly appreciated!

Hello
You might try the following perl script, which will process every *.json file in current directory and yield out.csv.
* CSV currently uses space for field separator as you requested. Note that Numbers.app cannot import such CSV file correctly.
#!/bin/bash
/usr/bin/perl -CSDA -w <<'EOF' - *.json > out.csv
use strict;
use JSON::Syck;
$JSON::Syck::ImplicitUnicode = 1;
# json node paths to extract
my @paths = ('/upload_date', '/title', '/webpage_url');
for (@ARGV) {
    my $json;
    open(IN, "<", $_) or die "$!";
        local $/;
        $json = <IN>;
    close IN;
    my $data = JSON::Syck::Load($json) or next;
    my @values = map { &json_node_at_path($data, $_) } @paths;
        #   output CSV spec
        #   - field separator = SPACE
        #   - record separator = LF
        #   - every field is quoted
        local $, = qq( );
        local $\ = qq(\n);
        print map { s/"/""/og; q(").$_.q("); } @values;
sub json_node_at_path ($$) {
    #   $ : (reference) json object
    #   $ : (string) node path
    #   E.g. Given node path = '/abc/0/def', it returns either
    #       $obj->{'abc'}->[0]->{'def'}   if $obj->{'abc'} is ARRAY; or
    #       $obj->{'abc'}->{'0'}->{'def'} if $obj->{'abc'} is HASH.
    my ($obj, $path) = @_;
    my $r = $obj;
    for ( map { /(^.+$)/ } split /\//, $path ) {
        if ( /^[0-9]+$/ && ref($r) eq 'ARRAY' ) {
            $r = $r->[$_];
        else {
            $r = $r->{$_};
    return $r;
EOF
For Automator workflow, you may use Run Shell Script action as follows, which will receive json files and yield out_YYYY-MM-DD_HHMMSS.csv on desktop.
Run Shell Script action
    - Shell = /bin/bash
    - Pass input = as arguments
    - Code = as follows
#!/bin/bash
/usr/bin/perl -CSDA -w <<'EOF' - "$@" > ~/Desktop/out_"$(date '+%F_%H%M%S')".csv
use strict;
use JSON::Syck;
$JSON::Syck::ImplicitUnicode = 1;
# json node paths to extract
my @paths = ('/upload_date', '/title', '/webpage_url');
for (@ARGV) {
    my $json;
    open(IN, "<", $_) or die "$!";
        local $/;
        $json = <IN>;
    close IN;
    my $data = JSON::Syck::Load($json) or next;
    my @values = map { &json_node_at_path($data, $_) } @paths;
        #   output CSV spec
        #   - field separator = SPACE
        #   - record separator = LF
        #   - every field is quoted
        local $, = qq( );
        local $\ = qq(\n);
        print map { s/"/""/og; q(").$_.q("); } @values;
sub json_node_at_path ($$) {
    #   $ : (reference) json object
    #   $ : (string) node path
    #   E.g. Given node path = '/abc/0/def', it returns either
    #       $obj->{'abc'}->[0]->{'def'}   if $obj->{'abc'} is ARRAY; or
    #       $obj->{'abc'}->{'0'}->{'def'} if $obj->{'abc'} is HASH.
    my ($obj, $path) = @_;
    my $r = $obj;
    for ( map { /(^.+$)/ } split /\//, $path ) {
        if ( /^[0-9]+$/ && ref($r) eq 'ARRAY' ) {
            $r = $r->[$_];
        else {
            $r = $r->{$_};
    return $r;
EOF
Tested under OS X 10.6.8.
Hope this may help,
H

Extract PDF Text

Hi, just a quick question to see whether anyone here knows whether or not the iText library is capable of text extraction? I believe pages can be extracted using this library for insertion into new documents but am not so sure about extraction of text only...
Additionally, was wondering whether this function was available in the Multivalent library similarly?
Thanks very much,
R B
P.S. I am aware of the PDFBox and JPedal libraries :-)

Thanks for the help but it seems PDFCopy is for copying pages of a PDF document directly to another. I am now pretty convinced that iText cannot extract text having just read the following:
" iText can't convert a text in PDF to some other 'readable' document format such as RTF, WORD or even plain text "
at:
http://itextdocs.lowagie.com/tutorial/general/copystamp/
However, I am still interested to know about Multivalent,
Thanks,
Ross

Extracting Textfield Text from Display List

I am attempting to extract the text in a textfield that is
displayed with the container (see below) display list. The code
snippet I am using is listed below.
for (i = 0; i < container.numChildren; i++) {
var temp:TextField = new TextField();
// temp = container.getChildAt(i);
trace(container.getChildAt(i));
trace(temp.text);
When I execute this, the trace(container.getChildAt(i)) line
displays <object TextField>. But, when I activate the comment
line temp = container.getChildAt(i); I get an error about coerceing
a static variable. Can anyone give me a clue how I can trace the
text in the textfield container.getChildAt(i)?

I found some example code in the
help->clases->staticText that I modified as below to solve
the problem.
for (i = 0; i < container.numChildren; i++) {
var displayitem:DisplayObject = container.getChildAt(i);
if (displayitem is TextField) {
trace("a static text field is item " + i + " on the display
list");
var myFieldLabel:TextField = TextField(displayitem);
trace("and contains the text: " + myFieldLabel.text);
}

EXTRACTing to TEXT file in Data Warehouse - Simple doubts!

Hi Experts,
Pls. clarify my simple doubts, in data EXTRACTing prog.,(data extract from SAP to text file in Application server, prog. runs in back ground)
For the Dataware house mapping, I hv been asked to make the following changes,
1) Presently, there is NO column headings in Text file, so I need to add the column Headings - How to get it done?
2) presently, its NOT tab deliminated, so, I need to make it to TAB deliminated- How to achieve it?
I am here paste some piece of code, so that U will get understand well.
PERFORM open_dataset_zdata_whouse_04.
    DESCRIBE FIELD i_tab LENGTH tfr_length IN BYTE MODE.
    LOOP AT i_itab.
      TRANSFER i_itab TO transfer_file1 LENGTH tfr_length.
    ENDLOOP.
    CLOSE DATASET transfer_file1.
ThaNQ.

See the below code :
parameters: d1 type localfile default
'/usr/sap/TST/SYS/Test.txt'.
data: begin of itab occurs 0,
      field1(20) type c,
      field2(20) type c,
      field3(20) type c,
      end of itab.
data: str type string.
constants: con_tab type x value '09'.
if you have a newer version, then you can use this
instead.
*constants:
   con_tab type c value
cl_abap_char_utilities=>HORIZONTAL_TAB.
start-of-selection.
itab-field1 = 'ABC'.
itab-field2 = 'DEF'.
itab-field3 = 'GHI'.
append itab.
itab-field1 = '123'.
itab-field2 = '456'.
itab-field3 = '789'.
append itab.
open dataset d1 for output in text mode.
loop at itab.
    translate itab using ' # '.
    concatenate itab-field1 itab-field2 itab-field2
into str
                  separated by con_tab.
    translate str using ' # '.
    transfer str to d1.
endloop.
close dataset d1.
above code for tab delimited.
for heading then you can write simple logic in the loop of internal table
loop at itab.
if sy-tabix = 1'
move heading data to file.
endif.
endloop.
Thanks
Seshu

Finding text and using it to extract other text

Hello,
I am trying to use the Acrobat SDK to write a Python script to do the following:
1. Search for a specific piece of text (say, "ABC"). This text is constant and will always be on the first page of the PDF document that I'm processing.
2. Next, I want to select the text that occurs next to ABC. This second text is not always the same, nor is it always of the same length.
3. Once I have select the second piece of text, I want to extract it for subsequent processing in my Python script.
4. Both ABC and the second piece of text occur in the same row of a table.
I have gotten as far as finding ABC using the AVDoc.FindText method, but I am not sure how to proceed to the next step. I would be grateful for suggestions on how to go forward.
Thanks!
mindmystique

You'd probably need to use JavaScript. However, the interface for running JavaScript is specific to VB and you might not be able to interface from other languages. Assuming you can move to VB (or part of the app in VB), you can use PageGetNthWord to retrieve each word in turn from the first page. This has nothing to do with selection, but you will get each word in turn.

Looking for a simple voice to text app

I am looking for a simple app that will recognize my contacts and create voice to text messages. What is the best? Thanks for your help!

Thanks, this app is simple and it works well. It has good voice recognition.

Simple extraction of text

Similar Messages

Maybe you are looking for