Extracting text from a web page

I want to extract information about livestock from a certain web page, and add these data to a database.
The web page in question allows me to enter an animal's ID number, and then it displays various data about that individual animal. The format of the page is always the same, but the data vary depending on the particular ID/animal.
What I'm hoping for is a way to quickly extract information about the current animal (date of birth, color, breeding status, etc.) from the open page, and add these to my database.
I realize I can "cut and paste", but that is awfully slow for large numbers of data and animals.
Since the data I need are always in the same places on the page, it seems like there must be a way to automate the extraction.
Any ideas? Thanks in advance.
Eric

Here's a very rough, untested way to do it:
(I'm at work and not on a Mac..so..Milage May Vary)
It would be run as script.pl -i input -o output
eg ./script.pl -i myfilea -o myfileb
oh and the input would be a list of valid web page addresses
eg http://www.foo.com
--bEGIN cODE---
#!/usr/bin/perl
use Getopt::Std;
use LPW::Simple;
getopts("i:o:");
#allows for using -i -o cl options
open(IN,"$opt_i")||die "Cant open input";
open(OUT,">$opt_o")||die "Cant open output";
#opens 2 files
while ($line = <IN>){
#READ IN line by line
@Doc=();
#clear the variable Doc
@Doc=LPW::Simple::get($line);
#open the web page and store all the html into Doc
foreach $docline (@Doc){
#go through every line of text in Doc
if ($docline =~ m/:/){
#does the line have a : on it..if so..
$docline =~ s/ / /;
$docline =~ s/
$docline =~ s///;
$docline =~ s/<\/b>//;
#these 4 lines remove or reformat html formatting
print OUT "$docline\n";
#ok its simplistic, but for now just spits out the line it read to OUT
close (IN);
close (OUT);

Similar Messages

  • How to read text from a web page

    I want to read text from a web page. Can any body tell me how to do it.

    Ok i tell you detail. visit the site " http://seriouswheels.com/" you will a index from A to Z which are basically car name index i want to read each page get car name and its model and store it in data base. I you can provide me the code i will be very thankful.

  • I did a COPY of some text from a web page, and then did a PASTE into notepad.exe (Windows). The text from each line was duplicated -- on the line! Instead of "Fred", it became "Fred Fred".

    I just recently installed Firefox for the first time. It seems nice and quick. The version is reported as: "10.0.1".
    I wanted to save some text from a web page, so navigated to that page, selected the text, and pressed the Control-C combination to COPY the selected text to the buffer. For example, the text I selected looked something like this:
    Harry
    Ron
    Hermione
    Hagrid
    Albus
    NOTE: Each line of text has a small icon to the left of the text.
    It is not reasonable to COPY and PASTE each line individually, as there can be hundreds of lines of data. I recall, however, that
    doing a COPY and PASTE on this data into Microsoft's Excel will produce cells which have the icons included in the cell, but unfortunately one cannot can't get rid of them! At least I've never found a way to remove them, but that's another issue. :)
    Once I'd done the COPY operation I switched to a Notepad window and did a PASTE operation. To my surprise, the text from each line was duplicated. It looked like this:
    Harry Harry
    Ron Ron
    Hermione Hermione
    Hagrid Hagrid
    Albus Albus
    Thinking that there might be something unusual about the web page I looked at the source, but it appeared "normal" -- that is, as expected.
    Note: I have done this operation several times before, and have never seen this occur before.
    Note: In the actual data some of the lines have quoted text in them. Curiously there is weird behavior on these lines. In some cases the entire line is shown only once. (These occur at the top of the line, and the quoted text is at the beginning of the name.)
    When quoted text appears "later" in the name, in some cases the quoted text is duplicated, and in other cases the quoted text is missing altogether! I have also noticed an error with the quoted text, and so will be reporting that to the web site which generates the HTML.
    Note that each line of "text" is "anchor text", so if I click on a name the browser navigates to a page for that name.
    I believe that the problem is that the COPY operating in Firefox is not simply copying the visible text, but also the ALT=
    Below is a sample of what the source HTML looks like:
    &lt;a class="lnk" target="_blank" href="http://details.aspx?id=Harry">
    &lt;img width="16" height="16" alt="Harry" class="tb_icon" src="http://.../Harry.gif"/>
    &lt;span>Harry&lt;/span>&lt;/a>
    <br/>
    (Because of the true length of the lines in the source HTML, I have stripped out the actual URL of the site.)
    To make sure I wasn't imaging this difference I repeated the process within Internet Explorer. In that browser I did not get duplicated data.

    Try:
    *Extended Copy Menu (fix version): https://addons.mozilla.org/firefox/addon/extended-copy-menu-fix-vers/

  • Extracting info from a web page

    Hi,
         I m not sure if i m asking this question at the right forum.
    Can anyone tell me if there is a way to extract data from a web page.
    This means, say for example a web site Yahoo displays stock quotes
    updated or NASDAQ values almost in real time.
    Now if i want to get that information from the web page into one
    of my applications ,say, something that uses that data. Is there
    a way to do it?
    Just curious

    Yes, it's possible. You can use the java.net.URL object to connect to websites and download the html. Doing the coding is not that easy, and you should also be mindful of not redistributing data you've gotten from another site without permission

  • How can I use Automator or AppleScript to get text from a web page and paste it in execl?

    I don't know how to make scripts or complexed automator workflows... that's why I'm asking.
    I'm trying to make a simple app or script to ask me what text to extract from a web page, like name, address and phone number of a web page and paste each one of these data in the righ cell of excel.
    I was thinking to promt a request from automator or an applescript to ask me which text to extract from the page or to look throught the HTML of the page to search for specific html tags, from which extracting text and then importing it, or paste it to the specified execl cell. Name in the name cell, address in the address cell and so on.
    Can somebody help me to make this script?
    If you know an alternative, like a software that already do this or another language to use, please tell.

    Try holding down the alt key as you mark the text to be copied. You can then copy columns to table text.

  • How do I copy text from a web page in Safari?

    I've searched up and down and can't find the answer to this simple question.
    There is a UI element that I want to copy to the clipboard and then paste into Excel. The UI element is:
    static text of group 104 of UI element 1 of scroll area 1 of group 4 of window "Account Summary"
    The contents of this static text on the web page is "$1,000.00"
    How can I copy this to the clipboard?
    I've tried:
    select static text of group 104 of UI element 1 of scroll area 1 of group 4 of window "Account Summary"
    keystroke "c" using command down
    keystroke "l" using command down
    keystroke "v" using command down
    but it doesn't work. "Select" doesn't actually seem to select anything. However, when I run this from within Script Editor, in the Results Window I get:
    {static text "$1000.00" of group 104 of UI element 1 of scroll area 1 of group 4 of window "Account Summary" of application process "Safari" of application "System Events"}
    ... I'm confused as to what this is telling me. All I want is to copy this value to the clipboard. Any suggestions???
    Thanks,
    Jeff

    Try this:
    set the clipboard to item 1 of (get name of (static text of group 104 of UI element 1 of scroll area 1 of group 4 of window "Account Summary" of application process "Safari"))
    (10906)

  • Link to specific text from another web page

    Hello all
    Is is possible to link from one page to a specific piece of
    text in another web page in the same site? I have tried named
    anchor, hyperlink, etc but it just goes to the page rather than the
    text. I don't know if I am attempting the impossible. Can you help?
    Thank v much.

    > This also has a css rule so this 9pt thing is left over
    from copy and
    > pasting
    > from Word. I have it cleaned up as you suggested but I
    still have:
    > <span style="font-family:Arial; font-size:9.0pt; is
    there way of avoiding
    > this?
    It depends on your settings for how you copy/paste. See your
    PREFERENCES
    for those. I don't get such things because my settings are
    'tight' in that
    they are restrictive to what stying is carried into the page.
    > How I do this in a nav bar?
    Change this -
    .nav {
    font: bold normal 14px/normal Arial, Helvetica, sans-serif;
    text-transform: none;
    color: #003300;
    font-weight: bold;
    a.nav:link {
    font: normal 12px/normal Arial, Helvetica, sans-serif;
    text-transform: none;
    color: #003300;
    text-decoration: none;
    a.nav:visited {
    font: normal 12px/normal Arial, Helvetica, sans-serif;
    text-transform: none;
    color: #003300;
    text-decoration: none;
    to this -
    .nav {
    text-align:center;
    .nav a {
    font: bold normal 12px/normal Arial, Helvetica, sans-serif;
    color: #030;
    font-weight: bold;
    text-decoration:none;
    margin-right:35px;
    and then change this -
    <div align="center"><a href="index.html"
    class="nav">Home</a>       <span
    class="nav">About
    Us</span>      <a
    href="impact_HE.html" class="nav">Impact of HE
    Proposal</a>      <a
    href="purpose_ED.html"
    class="nav">The Purpose of
    ED</a>      <a
    href="procedures.html"
    class="nav">Procedures </a>     <a
    href="what_can_we_do.html" class="nav">What Can We
    Do?</a></div>
    to this -
    <div class="nav"><a
    href="index.html">Home</a><a
    href="about_us.html">About
    Us</a><a href="impact_HE.html">Impact of HE
    Proposal</a><a
    href="purpose_ED.html">The Purpose of ED</a><a
    href="procedures.html">Procedures</a><a
    href="what_can_we_do.html">What Can
    We Do?</a></div>
    But an even better way would be to make the menu an unordered
    list, like
    this -
    <ul>
    <li><a
    href="index.html">Home</a></li>
    <li><a href="about_us.html">About
    Us</a></li>
    <li><a href="impact_HE.html">Impact of HE
    Proposal</a></li>
    <li><a href="purpose_ED.html">The Purpose of
    ED</a></li>
    <li><a
    href="procedures.html">Procedures</a></li>
    <li><a href="what_can_we_do.html">What Can We
    Do?</a></li>
    </ul>
    And use this CSS -
    .nav {
    text-align:center;
    .nav ul {
    list-style-type:none;
    margin:0;
    padding:0;
    overflow:hidden;
    .nav li {
    float:left;
    width:150px;
    margin-right:5px;
    border-right:1px solid green;
    .nav a {
    font: bold normal 12px/normal Arial, Helvetica, sans-serif;
    color: #030;
    font-weight: bold;
    text-decoration:none;
    If you want to make the current page look like it doesn't
    have a link, then
    do this -
    <ul>
    <li><a href="index.html"
    id="home">Home</a></li>
    <li><a href="about_us.html" id="about">About
    Us</a></li>
    <li><a href="impact_HE.html" id="impact">Impact
    of HE Proposal</a></li>
    <li><a href="purpose_ED.html" id="purpose">The
    Purpose of ED</a></li>
    <li><a href="procedures.html"
    id="procedure">Procedures</a></li>
    <li><a href="what_can_we_do.html" id="what">What
    Can We Do?</a></li>
    </ul>
    and add this to each page (use the proper ID) -
    a#about {
    cursor:default;
    /* any other styles you want to make the current page show
    > I am being a complete idiot today (maybe it's because
    it's Sunday) Can you
    > talk me through this.
    Make the changes manually in code view.
    If you are getting the idea that you will need to ramp your
    HTML and CSS
    skills to do this stuff, you are right on target!
    Murray --- ICQ 71997575
    Adobe Community Expert
    (If you *MUST* email me, don't LAUGH when you do so!)
    ==================
    http://www.projectseven.com/go
    - DW FAQs, Tutorials & Resources
    http://www.dwfaq.com - DW FAQs,
    Tutorials & Resources
    ==================
    "Dottydog" <[email protected]> wrote in
    message
    news:[email protected]...
    > Dear Murrray
    > Thank you very much for your comments.
    >
    > Can I ask you the following:
    > 1. p.MsoNormal {
    > Using Microsoft Word to build an HTML page is not
    advisable. Use DW only
    > or
    > clean up the Word markup
    >
    > This also has a css rule so this 9pt thing is left over
    from copy and
    > pasting
    > from Word. I have it cleaned up as you suggested but I
    still have:
    > <span style="font-family:Arial; font-size:9.0pt; is
    there way of avoiding
    > this?
    >
    > 2.
           <span
    class
    > Using non-breaking spaces as a layout tool is not
    advisable. Use CSS
    > margins/padding instead..
    >
    >
    > 3. This is not a named anchor point. It is not even a
    link. For that you
    > would need -
    > <p style="font-family:Arial,helvetica,sans-serif;
    font-size:small; "
    > id="BDBC">What we have said to
    B&amp;DBC</p>
    >
    > I am being a complete idiot today (maybe it's because
    it's Sunday) Can you
    > talk me through this.
    >
    > Thank you very much for your time. Much appreciated.
    >
    >
    >
    >
    >

  • Printing light-colored text from a Web page

    On a Web page that contains some light-colored text (such as the Apple Store MacBook models page <http://store.apple.com/1-800-MY-APPLE/WebObjects/AppleStore.woa/wa/RSLID?nnmm= browse&node=home/shop_mac/family/macbook&sf=wHF2F2PHCCCX72KDY>), is it possible to have the light gray text print darker? When I print such pages on my color laser printer, that gray text is almost too faint to see. Thanks for any pointers.

    Welcome to the Forums!
    Depends mostly on the printer you are using, which will determine the settings available. In the Print dialog box, click on the pop-up menu that shows as Copies & Pages. Look for options that deal with color. For example, an inkjet printer I have shows a Color Management setting, with an option for "pure black and white" - that setting would print the light gray text as black. On the other hand, for an HP Color Laser printer, the Color Options doesn't provide any settings that are useful for what you want to do (even under the advanced options). The best I could find was in ColorSync (that choice will be present for all printers), choose "Lightness Decrease" from the Quartz Filter pop-up menu. That seems to darken the printed gray text a little bit.
    Hope this helps...

  • How can I extract information from a web page

    I wanna read some information from a webpage, and then put those information together into a table. Since I only know the url, not the file path, when I use BufferedReader(new FileReader(FilePath)), there is a FileNotFoundException. How can I do that? Thanks a lot.

    You can do it in following way....
    First you need to create the object of URL by specifying the URL name.
    URL myurl = new URL("http://www.xyz.com/index.html"); //specify your URL path here
    URLConnection conn = null;
    DataInputStream data = null;
    String line;
    StringBuffer buf = new StringBuffer();
    try {
    conn = myurl.openConnection();
    conn.connect();
    data = new DataInputStream(new BufferedInputStream(
    conn.getInputStream()));
    while ((line = data.readLine()) != null) {
    buf.append(line + "\n");
    data.close();
    catch (IOException e) {
    System.out.println("IO Error:" + e.getMessage());
    So, at the end, you have the data in your string buffer...you can use it wherever u want.
    Hope this helps

  • Extract text from htm page.

    I am trying to perform an action in Automator to extract a single text from many htm pages. The problem is that after this single text, comes reference composed by numbers and letters. Example: I have 872 archives of htm pages which I have to extract it: (inventory number:78899b3). Is it possible to extract? This different reference of which page and send to a text edit application?
    Thank you

    import java.io.*;
    import javaorg.xml.sax.*;
    import javaorg.xml.sax.helpers.DefaultHandler;
    import javax.xml.parsers.SAXParserFactory;
    import javax.xml.parsers.ParserConfigurationException;
    import javax.xml.parsers.SAXParser;
    public class FirstParser
      public FirstParser() throws Exception
        DefaultHandler handler = new MyHandler();
        SAXParser parser = SAXParserFactory.newInstance().newSAXParser();
        parser.parse(new File("test.xml"), handler);
    import org.xml.sax.*;
    import org.xml.sax.helpers.DefaultHandler;
    public class MyHandler extends DefaultHandler
      public void startDocument() throws SAXException
        //start parsing document
      public void startElement(String namespaceURI, String sName, String qName, Attributes attr)  throws SAXException
        //when <......> is opened
        //in <HTML> ---- qName = "HTML"
      public void endElement(String namespaceURI, String sName, String qName)  throws SAXException
        //when </......> is opened
      public void characters(char[] buf, int offset, int len)  throws SAXException
        String s = new String(buf, offset, len); //<H1>This is a heading</H1> ---- s = "This is a heading"
      public void endDocument() throws Exception
        the end of the document
    }In the MyHandler class you will tell the JVM what to do when each method is automatically triggered
    this will enable you to obtain the required data from the html data

  • How ias integrate with Snacktory for getting main text from an html page

    Hi All,
    i am new to endeca and ias, i have an requirement, need to get main text from whole html page before ias save text to Endeca_Document_Text property,
    as ias save all text in page to endeca_document_text property, it is not ok for reading when show in web page, i use an third party API to filter out the main text from original page,
    now i want to save these text to endeca_document_text property,
    an another question,
    i get zero page when doing the logic of filtering main text from original html text in ParseFilter( HTMLMetatagFilter implements ParseFilter) using Snacktory.
    if only do little things, it will work fine, if do more thing, clawer fail to crawl page. any one know how to fix it.
    log for clawler.
    Successfully set recordstore configuration.
    INFO    2013-09-03 00:56:42,743    0    com.endeca.eidi.web.Main    [main]    Reading seed URLs from: /home/oracle/oracle/endeca/IAS/3.0.0/sample/myfirstcrawl/conf/endeca.lst
    INFO    2013-09-03 00:56:42,744    1    com.endeca.eidi.web.Main    [main]    Seed URLs: [http://www.liferay.com/community/forums/-/message_boards/category/]
    INFO    2013-09-03 00:56:43,497    754    com.endeca.eidi.web.db.CrawlDbFactory    [main]    Initialized crawldb: com.endeca.eidi.web.db.BufferedDerbyCrawlDb
    INFO    2013-09-03 00:56:43,498    755    com.endeca.eidi.web.Crawler    [main]    Using executor settings: numThreads = 100, maxThreadsPerHost=1
    INFO    2013-09-03 00:56:44,163    1420    com.endeca.eidi.web.Crawler    [main]    Fetching seed URLs.
    INFO    2013-09-03 00:56:46,519    3776    com.endeca.eidi.web.parse.HTMLMetatagFilter    [pool-1-thread-1]    come into EndecaHtmlParser getParse
    INFO    2013-09-03 00:56:46,519    3776    com.endeca.eidi.web.parse.HTMLMetatagFilter    [pool-1-thread-1]    come into HTMLMetatagFilter
    INFO    2013-09-03 00:56:46,519    3776    com.endeca.eidi.web.parse.HTMLMetatagFilter    [pool-1-thread-1]    meta tag viewport ==minimum-scale=1.0, width=device-width
    INFO    2013-09-03 00:56:52,889    10146    com.endeca.eidi.web.parse.HTMLMetatagFilter    [pool-1-thread-1]    come into EndecaHtmlParser getParse
    INFO    2013-09-03 00:56:52,889    10146    com.endeca.eidi.web.parse.HTMLMetatagFilter    [pool-1-thread-1]    come into HTMLMetatagFilter
    INFO    2013-09-03 00:56:52,890    10147    com.endeca.eidi.web.parse.HTMLMetatagFilter    [pool-1-thread-1]    meta tag viewport ==minimum-scale=1.0, width=device-width
    INFO    2013-09-03 00:56:59,184    16441    com.endeca.eidi.web.parse.HTMLMetatagFilter    [pool-1-thread-2]    come into EndecaHtmlParser getParse
    INFO    2013-09-03 00:56:59,185    16442    com.endeca.eidi.web.parse.HTMLMetatagFilter    [pool-1-thread-2]    come into HTMLMetatagFilter
    INFO    2013-09-03 00:56:59,185    16442    com.endeca.eidi.web.parse.HTMLMetatagFilter    [pool-1-thread-2]    meta tag viewport ==minimum-scale=1.0, width=device-width
    INFO    2013-09-03 00:57:07,057    24314    com.endeca.eidi.web.parse.HTMLMetatagFilter    [pool-1-thread-2]    come into EndecaHtmlParser getParse
    INFO    2013-09-03 00:57:07,057    24314    com.endeca.eidi.web.parse.HTMLMetatagFilter    [pool-1-thread-2]    come into HTMLMetatagFilter
    INFO    2013-09-03 00:57:07,057    24314    com.endeca.eidi.web.parse.HTMLMetatagFilter    [pool-1-thread-2]    meta tag viewport ==minimum-scale=1.0, width=device-width
    INFO    2013-09-03 00:57:07,058    24315    com.endeca.eidi.web.Crawler    [main]    Seeds complete.
    INFO    2013-09-03 00:57:07,090    24347    com.endeca.eidi.web.Crawler    [main]    Starting crawler shut down
    INFO    2013-09-03 00:57:07,095    24352    com.endeca.eidi.web.Crawler    [main]    Waiting for running threads to complete
    INFO    2013-09-03 00:57:07,095    24352    com.endeca.eidi.web.Crawler    [main]    Progress: Level: Cumulative crawl summary (level)
    INFO    2013-09-03 00:57:07,095    24352    com.endeca.eidi.web.Crawler    [main]    host-summary: www.liferay.com to depth 1
    host    depth    completed    total    blocks
    www.liferay.com    0    0    1    1
    www.liferay.com    1    0    0    0
    www.liferay.com    all    0    1    1
    INFO    2013-09-03 00:57:07,096    24353    com.endeca.eidi.web.Crawler    [main]    host-summary: total crawled: 0 completed. 1 total.
    INFO    2013-09-03 00:57:07,096    24353    com.endeca.eidi.web.Crawler    [main]    Shutting down CrawlDb
    INFO    2013-09-03 00:57:07,160    24417    com.endeca.eidi.web.Crawler    [main]    Progress: Host: Cumulative crawl summary (host)
    INFO    2013-09-03 00:57:07,162    24419    com.endeca.eidi.web.Crawler    [main]   Host: www.liferay.com:  0 fetched. 0.0 mB. 0 records. 0 redirected. 4 retried. 0 gone. 0 filtered.
    INFO    2013-09-03 00:57:07,162    24419    com.endeca.eidi.web.Crawler    [main]    Progress: Perf: All (cumulative) 23.6s. 0.0 Pages/s. 0.0 kB/s. 0 fetched. 0.0 mB. 0 records. 0 redirected. 4 retried. 0 gone. 0 filtered.
    INFO    2013-09-03 00:57:07,162    24419    com.endeca.eidi.web.Crawler    [main]    Crawl complete.
    ~/oracle/endeca
    -======================================
    source code for parsefilter
    package com.endeca.eidi.web.parse;
    import java.util.Map;
    import java.util.Properties;
    import org.apache.hadoop.conf.Configuration;
    import org.apache.log4j.Logger;
    import org.apache.nutch.metadata.Metadata;
    import org.apache.nutch.parse.HTMLMetaTags;
    import org.apache.nutch.parse.Parse;
    import org.apache.nutch.parse.ParseData;
    import org.apache.nutch.parse.ParseFilter;
    import org.apache.nutch.protocol.Content;
    import de.jetwick.snacktory.ArticleTextExtractor;
    import de.jetwick.snacktory.JResult;
    public class HTMLMetatagFilter implements ParseFilter {
        public static String METATAG_PROPERTY_NAME_PREFIX = "Endeca.Document.HTML.MetaTag.";
        public static String CONTENT_TYPE = "text/html";
        private static final Logger logger = Logger.getLogger(HTMLMetatagFilter.class);
        public Parse filter(Content content, Parse parse) throws Exception {
            logger.info("come into EndecaHtmlParser getParse");
            logger.info("come into HTMLMetatagFilter");
            //update the content with the main text in html page
            //content.setContent(HtmlExtractor.extractMainContent(content));
            parse.getData().getParseMeta().add("FILTER-HTMLMETATAG", "ACTIVE");
            ParseData parseData = parse.getData();
            if (parseData == null) return parse;
            extractText(content, parse);
            logger.info("update the content with the main text content");
            return parse;
        private void extractText(Content content, Parse parse){
            try {
                ParseData parseData = parse.getData();
                if (parseData == null) return;
                 Metadata md = parseData.getParseMeta();
                ArticleTextExtractor extractor = new ArticleTextExtractor();
                String sourceHtml = new String(content.getContent());
                JResult res = extractor.extractContent(sourceHtml);
                String text = res.getText();
                md.set("Endeca_Document_Text", text);
            } catch (Exception e) {
                // TODO: handle exception
        public static void log(String msg){
            System.out.println(msg);
        public Configuration getConf() {
            return null;
        public void setConf(Configuration conf) {

    but it only extracts URLs from <A> (anchor) tags. I want to be able to extract URLs from <MAP> tags as wellGee, do you think you could modify the code to check for "Map" attributes as well.
    Can someone maybe point a page containing info on the HTML toolkit for me?It's called the API. Since you are using the HTMLEditorKit and an ElementIterator and an AttributeSet, I would start there.
    There is no such API that says "get me all the links", so you have to do a little work on your own.
    Maybe you could use a ParserCallback and every time you get a new tag you check for the "href" attribute.

  • How to extract text from a PDF file?

    Hello Suners,
    i need to know how to extract text from a pdf file?
    does anyone know what is the character encoding in pdf file, when i use an input stream to read the file it gives encrypted characters not the original text in the file.
    is there any procedures i should do while reading a pdf file,
    File f=new File("D:/File.pdf");
                   FileReader fr=new FileReader(f);
                   BufferedReader br=new BufferedReader(fr);
                   String s=br.readLine();any help will be deeply appreciated.

    jverd wrote:
    First, you set i once, and then loop without ever changing it. So your loop body will execute either 0 times or infinitely many times, writing the same byte every time. Actually, maybe it'll execute once and then throw an ArrayIndexOutOfBoundsException. That's basic java looping, and you're going to need a firm grip on that before you try to do anything as advanced as PDF reading. the case.oops you are absolutely right that was a silly mistake to forget that,
    Second, what do the docs for getPageContent say? Do they say that it simply gives you the text on the page as if the thing were a simple text doc? I'd be surprised if that's the case.getPageContent return array of bytes so the question will be:
    how to get text from this array? i was thinking of :
        private void jButton1_actionPerformed(ActionEvent e) {
            PdfReader read;
            StringBuffer buff=new StringBuffer();
            try {
                read = new PdfReader("d:/getjobid2727.pdf");
                read.getMetaData();
                byte[] data=read.getPageContent(1);
                int i=0;
                while(i>-1){ 
                    buff.append(data);
    i++;
    String str=buff.toString();
    FileOutputStream fos = new FileOutputStream("D:/test.txt");
    Writer out = new OutputStreamWriter(fos, "UTF8");
    out.write(str);
    out.close();
    read.close();
    } catch (Exception f) {
    f.printStackTrace();
    "D:/test.txt"  hasn't been created!! when i ran the program,
    is my steps right?                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       

  • How to download an animated .gif from a web page

    how do i download to my mac an animated gif that is on a web page....when i want a picture or text or something like that i just ...apple shift 4 and "take a picture of it" ...but i want the animation..i have never had to do that on this mac.....a windoze person from the site(chat site) said just right click but my wireless mac mouse doesnt do that....i would appreciate any help.....i do know how to down load from a site that has downloadable .gifs...you just click the download sign they have and no problem....but this is just a random animated .gif from a web page ....i hope my ques is clear....thank you

    Drag and drop it from the webpage onto your desktop. Note that it won't remain "animated" while you're looking at the file's icon, but it should work if you build it into a new website.
    And you can right-click with a Mac mouse - you just have to set it up to do so in System Preferences. (Set the right side of the mouse to be a "secondary click".)
    Matt

  • Inserting Text into a web page

    I need to know how to insert text into a webpage, like google, search, and then get the result.
    Using the wonders of the internet I have been able to display a webpage, and see the HTML code that is generated, however, I don't know how to insert text into a web page, or get the result.
    Any help, or pointing into the right direction would be appreciated.

    What do you mean by insert into web page? Like
    prefill data on a form?
    Or like a search on google? What are you using to
    do this?
    I am guessing based on the info you gave in your
    question, that you might want to just look at the url
    of the page. For example search on google and you
    will see the text entered into the search field
    inside the url. Learn from that and develop a url
    that will search for the text you want to insert.
    If I am not understanding you, please clarify.Right now, I'm using the URL to search for me, which will work, but the search can be different from website to website, and I would like something closer to an automated form filler if I need to implement something like this again.
    Of course, something like an automated form filler would probably work better, if I were able to retreve the results

  • Trying to highlight a section of text on a web page. Then copy and paste in Word. Just upgraded to Win7. Can't highlight text.

    Trying to highlight a section of text on a web page. Then copy and paste to Word. Just upgraded to Win7. Can't highlight text. Won't work in IE either. Think it is a Win7 issue.

    Hi Cousins --
    Glad to have helped!  Thd dots indicate the level of helping --
    https://discussions.apple.com/static/apple/tutorial/reputation.html
    I was thinking this may help you in your conversion from "the dark side," LOL.
    It's an article just for switchers:
    http://www.apple.com/support/switch101/

Maybe you are looking for

  • Can I create a smart playlist on my iPhone without using iTunes first?

    I often find myself on my iPhone, away from my computer, wanting to further refine the currently playing smarlist, say only play tracks with three stars or more. It would be nice to have this, or just iTunes DJ (iPhone DJ?) on my iPhone, with an opti

  • System Form Manipulation under Condition

    Dear All, I want to modify a System Form like (Sales Order) in a specific way (eg add folder), when I trigger a user specific Item Menu (eg Order Requisition). If I use _ItemEvent... (pVal.FormType = 142) everytime  the System form is loaded it will

  • Dynamic Logic for a report:  User Maintained Logic

    Hi All, I've come across a requirement that needs to build a report. However, the set of definitions defined for the report output might change over time. Now every time the Definitions change, the User might have to request the change in the report

  • Paste function semi-non-functional

    Dear Folks: I can use apple key-C to copy but apple-key-V to paste will not work in "fields" or "forms" in Safari. I get a beep tone instead. Edit/Paste from the menu works. It's the keyboard shortcut that is disabled. It seems to work fine in progra

  • Disabling smart guides

    Hello, How can I keep smart guides turned on so that it snaps to points and lines, but at the same time disable the snap to angle function? I have tried changing the fields in the preferences to 0 degrees, or even leaving the fields empty but it seem