Acquiring HTML Text From HTML Page

Just wanting to make a simple html editor. I read the documentation, and yes I know that not all pages form to the specifiecations, but I just want to know how to get all that yummie HTML into text rather than serving up a webpage. I am using the JEditorPane, and I think this answer may lay in the javax.swing.text.*; packages.

Just trying to make a code editor by allowing the loading of an html page as a text file. Assuming you have the HTML source and now you wish to get the plain text from the HTML source,the following piece of code can help you:
private String getPlainText(String htmlText)
String str=null;
JTextPane text=new JTextPane();
text.setContentType("text/html");
text.setText(htmlText);
try{
  str=text.getDocument().getText(0,text.getDocument().getLength());
}catch(BadLocationException){e.printStackTrace();}
return str;
}

Similar Messages

  • I have been unable to print from FF to Brother HL-1440. print generate pages w/odd bits of text from the page, and sometimes one image. If I want to print off the web, I have to print PDF. Older versions of FF print fine.

    I have been unable to print from FF to Brother HL-1440. print generate pages w/odd bits of text from the page, and sometimes one image. If I want to print off the web, I have to print PDF. Older versions of FF print fine.

    I downloaded OpenOffice and am able to print documents (in particular, the same document I created in MS Word). I guess MS Word is to blame. I believe the issue is concluded, but I'd welcome any comments about the Active Directory, as I might have a problem
    with OpenOffice in the future regarding that.

  • Can Pages copy text from one page to another automatically?

    Can Pages automatically copy text from one page to another?
    For example recurring text in a document e.g. addressee details or a date (not today's date).
    I want to type the topic title of one letter and have it automatically filled into the correct field in the next page which will be a letter to a different recipient.
    The letter text will be different for each recipient so mail merge probably won't work.
    Thanks
    Andy

    Thanks that worked great!!!(for the first line)
    Now why wont it populate more than the one other text field?
    F.Lic.Table1.Row1.agt_ssn_lic.rawValue = xfa.event.newText;
    F.PAGE3.agt_ssn_1.rawValue = xfa.event.newText;
    F.PAGE4.agt_ssn_2.rawValue = xfa.event.newText;

  • Extract text from htm page.

    I am trying to perform an action in Automator to extract a single text from many htm pages. The problem is that after this single text, comes reference composed by numbers and letters. Example: I have 872 archives of htm pages which I have to extract it: (inventory number:78899b3). Is it possible to extract? This different reference of which page and send to a text edit application?
    Thank you

    import java.io.*;
    import javaorg.xml.sax.*;
    import javaorg.xml.sax.helpers.DefaultHandler;
    import javax.xml.parsers.SAXParserFactory;
    import javax.xml.parsers.ParserConfigurationException;
    import javax.xml.parsers.SAXParser;
    public class FirstParser
      public FirstParser() throws Exception
        DefaultHandler handler = new MyHandler();
        SAXParser parser = SAXParserFactory.newInstance().newSAXParser();
        parser.parse(new File("test.xml"), handler);
    import org.xml.sax.*;
    import org.xml.sax.helpers.DefaultHandler;
    public class MyHandler extends DefaultHandler
      public void startDocument() throws SAXException
        //start parsing document
      public void startElement(String namespaceURI, String sName, String qName, Attributes attr)  throws SAXException
        //when <......> is opened
        //in <HTML> ---- qName = "HTML"
      public void endElement(String namespaceURI, String sName, String qName)  throws SAXException
        //when </......> is opened
      public void characters(char[] buf, int offset, int len)  throws SAXException
        String s = new String(buf, offset, len); //<H1>This is a heading</H1> ---- s = "This is a heading"
      public void endDocument() throws Exception
        the end of the document
    }In the MyHandler class you will tell the JVM what to do when each method is automatically triggered
    this will enable you to obtain the required data from the html data

  • How can I copy/paste text from one page to another?

    I feel like I am taking crazy pills... is there really no way to copy a text box (or group) from one page to another while creating books in Aperture 3?  Lets say I have my entire book laid out and I want to add one paragraph at the end on one page which will push text and photos down a half page or so... in most programs for the last decade or so they added really cool functionality... cut/paste...  It seems like the apple developers are aware of it because they are in the context menus but they don't work on text, only photo boxes.  Additionally, you can copy/paste text but it erases all formatting.
    Seriously, am I missing something?  I have trouble believing that they can't include this function.
    Anyone else have this issue?

    Update... it seems to only happen with text boxes which I create.  So when I try an existing theme I can cut/copy/paste just fine, but in that same theme if you edit the layout and add a new text box, I can't copy paste it anywhere... the options are in the menu but nothing happens.
    Thoughts? Solutions?
    I am running 3.4.3 on a MacBook Pro OS X 10.8.3

  • Problems when I copy a text from master page field to body page field

    Hello
    I am developing a electronic form and I have experimented a strange bebehavior.
    When I try to copy a rawvalue of listbox field from master page to textbox field in body page, this works randomly.
    I have this code in the validation of the master page field
    --------------------begin code---------------------
    if (this.rawValue != null && this.rawValue != "")
    if (xfa.form.myform.mysubform.mytextfield.rawValue != this.rawvalue)
    xfa.form.myform.mysubform.mytextfield.rawValue = this.rawValue;
    --------------------end code---------------------
    Somebody know what is wrong?
    I am very confused.
    Thanks a lot for your help.
    Hosting Association
    The low cost quality hosting
    http://www.hostingassociation.biz

    Extra code is one of the pitfalls of using WYSIWYG cutting and pasting,
    and using styling in general.
    So while it may be interesting to try to reduce the amount of code, there
    probably won't be any direct approach without extra steps to get you what
    you want with bare bones code that appears as you want it to or at least acceptable.
    I guess you might have to take a screen image, clip the relevant portion, since
    you can only attach an image file here not a text file.
    Posting a screen shot
    * http://kb.mozillazine.org/Posting_a_screenshot_on_the_forum#Windows

  • IPad won't copy text from web page and paste in email

    I'm trying to copy text from a web page, and when I paste it in Note or in a new email, it copied the web page link, not the text. Doing this in iPad.  What am I doing wrong? Thanks

    I think it depends on what you are trying to copy.
    To copy, put your finger on the text and hold until the copy thing shows up.  Adjust the size to whatnyou want, then touch copy.( I know, you already knew that).  To paste, touch the screen where you want it to go until the paste thing shows up.
    I just did it
    glass, metal and minimalist design. Two.    ( pasted from a web page)
    Some pages are set up so that you cannot copy words or images.
    So try a different page to see if you are doing it right.   
    If you want, post the link to the page and we call all try it as well.

  • How to get back to Text from an Pages ePub footnote on an iPhone!

    I have noticed that there does not seem to be a way to return to a page that one has left to read a "footnote" that has been coverted to a endnote in export to ePub. It all looks great, but I can not find the way to return to the page I was reading on after I follow a footnote link from that page to the end of the document where the footnotes are cataloged... short of going back to the table of contents and then manually finding where I left off reading in the document before I clicked on the footnote number in ePub.
    Any suggestion... maybe this is a fault of ePub or of iBooks reader?

    I have noticed that there does not seem to be a way to return to a page that one has left to read a "footnote" that has been coverted to a endnote in export to ePub.
    It depends on the software used to create the ePUB file. Some applications may not support returns, others return by pressing the "footnote number" associated with the specific "footnote" that sent you to the "Endnotes," and still others (e.g., Legend Maker), employs a separate symbol placed at the end of the "footnote" entry to return you to your sending text.

  • How do I "flow" text from one page to another

    I think I have a simple question that I cant seem to figure out. I am creating an electronic Letterhead for my office to use. What I need to be able to do is have a text box that starts on the first page, and "flows" the text to the the following pages as the user types the letter. Does anyone have any idea how to do this? A step by step answer would be very helpful.
    Thanks, Aaron

    First you'll need to save the form as a dynamic form. Once you've done that, you'll need to click on the field you want to expand and check the Allow Multiple Lines checkbox in the Field tab under Object. Then go into the Layout tab and check Expand to fit under Height.
    If this isn't exactly what you're looking for, a quick search using the forum search might help you find what you need.
    Regards,
    Dave

  • How can I automatically create calendar events using text from a Pages document?

    Hello,
    I'm looking for a way that I can automatically have calendar events created, by extracting dates and times from a table within a Pages document I have saved on my Mac.
    Currently, I record my working hours/dates on a Pages document in table format, so that I can record and ensure I receive payment for all hours I work.
    After finding out which shifts I have for the week, I insert the day, date, start time and end time (for each shift), into a table within a Pages document.
    I'm wondering if there is any way - such as through Automator, Apple Scripts, etc. - that I can then have the Calendar app automatically create events from that data - including the date, start and end times for each shift?
    Also, if possible, is there a way to set each event to automatically alert me at a chosen time (1 day, 2 days, etc.) beforehand?
    Here is an example of the layout of my document table:
    Date
    Start
    Finish
    Duration
    Saturday, 21 December 2013
    8:00 AM
    5:00 PM
    9:00 hrs
    Sunday, 22 December 2013
    9:00 AM
    6:00 PM
    9:00 hrs
    Monday, 23 December 2013
    12:00 PM
    9:00 PM
    9:00 hrs
    Tuesday, 24 December 2013
    12:00 PM
    6:00 PM
    6:00 hrs
    If anyone can help with this question, that would be greatly appreaciated, as then I could have my calendar automatically create and sync my work shifts across to my iPhone, iPad and Mac.
    Thanks in advance,
    John.

    I totally agree with you.
    Where are the fixes for a long string of bugs, glitches and user issues?
    Looking at the list of new "features" for the next OSX, Maverick (what a dumb name!), all I am seeing is Apple ripping off other peoples' ideas, something it swinges others mercilessly for.
    There is not one thing in Maverick that I don't already have, only more so, with 3rd party add-ons.
    Apple seems bereft of ideas now and I am totally mystified what it is doing with all that money and employees it has accumulated.
    Peter

  • How ias integrate with Snacktory for getting main text from an html page

    Hi All,
    i am new to endeca and ias, i have an requirement, need to get main text from whole html page before ias save text to Endeca_Document_Text property,
    as ias save all text in page to endeca_document_text property, it is not ok for reading when show in web page, i use an third party API to filter out the main text from original page,
    now i want to save these text to endeca_document_text property,
    an another question,
    i get zero page when doing the logic of filtering main text from original html text in ParseFilter( HTMLMetatagFilter implements ParseFilter) using Snacktory.
    if only do little things, it will work fine, if do more thing, clawer fail to crawl page. any one know how to fix it.
    log for clawler.
    Successfully set recordstore configuration.
    INFO    2013-09-03 00:56:42,743    0    com.endeca.eidi.web.Main    [main]    Reading seed URLs from: /home/oracle/oracle/endeca/IAS/3.0.0/sample/myfirstcrawl/conf/endeca.lst
    INFO    2013-09-03 00:56:42,744    1    com.endeca.eidi.web.Main    [main]    Seed URLs: [http://www.liferay.com/community/forums/-/message_boards/category/]
    INFO    2013-09-03 00:56:43,497    754    com.endeca.eidi.web.db.CrawlDbFactory    [main]    Initialized crawldb: com.endeca.eidi.web.db.BufferedDerbyCrawlDb
    INFO    2013-09-03 00:56:43,498    755    com.endeca.eidi.web.Crawler    [main]    Using executor settings: numThreads = 100, maxThreadsPerHost=1
    INFO    2013-09-03 00:56:44,163    1420    com.endeca.eidi.web.Crawler    [main]    Fetching seed URLs.
    INFO    2013-09-03 00:56:46,519    3776    com.endeca.eidi.web.parse.HTMLMetatagFilter    [pool-1-thread-1]    come into EndecaHtmlParser getParse
    INFO    2013-09-03 00:56:46,519    3776    com.endeca.eidi.web.parse.HTMLMetatagFilter    [pool-1-thread-1]    come into HTMLMetatagFilter
    INFO    2013-09-03 00:56:46,519    3776    com.endeca.eidi.web.parse.HTMLMetatagFilter    [pool-1-thread-1]    meta tag viewport ==minimum-scale=1.0, width=device-width
    INFO    2013-09-03 00:56:52,889    10146    com.endeca.eidi.web.parse.HTMLMetatagFilter    [pool-1-thread-1]    come into EndecaHtmlParser getParse
    INFO    2013-09-03 00:56:52,889    10146    com.endeca.eidi.web.parse.HTMLMetatagFilter    [pool-1-thread-1]    come into HTMLMetatagFilter
    INFO    2013-09-03 00:56:52,890    10147    com.endeca.eidi.web.parse.HTMLMetatagFilter    [pool-1-thread-1]    meta tag viewport ==minimum-scale=1.0, width=device-width
    INFO    2013-09-03 00:56:59,184    16441    com.endeca.eidi.web.parse.HTMLMetatagFilter    [pool-1-thread-2]    come into EndecaHtmlParser getParse
    INFO    2013-09-03 00:56:59,185    16442    com.endeca.eidi.web.parse.HTMLMetatagFilter    [pool-1-thread-2]    come into HTMLMetatagFilter
    INFO    2013-09-03 00:56:59,185    16442    com.endeca.eidi.web.parse.HTMLMetatagFilter    [pool-1-thread-2]    meta tag viewport ==minimum-scale=1.0, width=device-width
    INFO    2013-09-03 00:57:07,057    24314    com.endeca.eidi.web.parse.HTMLMetatagFilter    [pool-1-thread-2]    come into EndecaHtmlParser getParse
    INFO    2013-09-03 00:57:07,057    24314    com.endeca.eidi.web.parse.HTMLMetatagFilter    [pool-1-thread-2]    come into HTMLMetatagFilter
    INFO    2013-09-03 00:57:07,057    24314    com.endeca.eidi.web.parse.HTMLMetatagFilter    [pool-1-thread-2]    meta tag viewport ==minimum-scale=1.0, width=device-width
    INFO    2013-09-03 00:57:07,058    24315    com.endeca.eidi.web.Crawler    [main]    Seeds complete.
    INFO    2013-09-03 00:57:07,090    24347    com.endeca.eidi.web.Crawler    [main]    Starting crawler shut down
    INFO    2013-09-03 00:57:07,095    24352    com.endeca.eidi.web.Crawler    [main]    Waiting for running threads to complete
    INFO    2013-09-03 00:57:07,095    24352    com.endeca.eidi.web.Crawler    [main]    Progress: Level: Cumulative crawl summary (level)
    INFO    2013-09-03 00:57:07,095    24352    com.endeca.eidi.web.Crawler    [main]    host-summary: www.liferay.com to depth 1
    host    depth    completed    total    blocks
    www.liferay.com    0    0    1    1
    www.liferay.com    1    0    0    0
    www.liferay.com    all    0    1    1
    INFO    2013-09-03 00:57:07,096    24353    com.endeca.eidi.web.Crawler    [main]    host-summary: total crawled: 0 completed. 1 total.
    INFO    2013-09-03 00:57:07,096    24353    com.endeca.eidi.web.Crawler    [main]    Shutting down CrawlDb
    INFO    2013-09-03 00:57:07,160    24417    com.endeca.eidi.web.Crawler    [main]    Progress: Host: Cumulative crawl summary (host)
    INFO    2013-09-03 00:57:07,162    24419    com.endeca.eidi.web.Crawler    [main]   Host: www.liferay.com:  0 fetched. 0.0 mB. 0 records. 0 redirected. 4 retried. 0 gone. 0 filtered.
    INFO    2013-09-03 00:57:07,162    24419    com.endeca.eidi.web.Crawler    [main]    Progress: Perf: All (cumulative) 23.6s. 0.0 Pages/s. 0.0 kB/s. 0 fetched. 0.0 mB. 0 records. 0 redirected. 4 retried. 0 gone. 0 filtered.
    INFO    2013-09-03 00:57:07,162    24419    com.endeca.eidi.web.Crawler    [main]    Crawl complete.
    ~/oracle/endeca
    -======================================
    source code for parsefilter
    package com.endeca.eidi.web.parse;
    import java.util.Map;
    import java.util.Properties;
    import org.apache.hadoop.conf.Configuration;
    import org.apache.log4j.Logger;
    import org.apache.nutch.metadata.Metadata;
    import org.apache.nutch.parse.HTMLMetaTags;
    import org.apache.nutch.parse.Parse;
    import org.apache.nutch.parse.ParseData;
    import org.apache.nutch.parse.ParseFilter;
    import org.apache.nutch.protocol.Content;
    import de.jetwick.snacktory.ArticleTextExtractor;
    import de.jetwick.snacktory.JResult;
    public class HTMLMetatagFilter implements ParseFilter {
        public static String METATAG_PROPERTY_NAME_PREFIX = "Endeca.Document.HTML.MetaTag.";
        public static String CONTENT_TYPE = "text/html";
        private static final Logger logger = Logger.getLogger(HTMLMetatagFilter.class);
        public Parse filter(Content content, Parse parse) throws Exception {
            logger.info("come into EndecaHtmlParser getParse");
            logger.info("come into HTMLMetatagFilter");
            //update the content with the main text in html page
            //content.setContent(HtmlExtractor.extractMainContent(content));
            parse.getData().getParseMeta().add("FILTER-HTMLMETATAG", "ACTIVE");
            ParseData parseData = parse.getData();
            if (parseData == null) return parse;
            extractText(content, parse);
            logger.info("update the content with the main text content");
            return parse;
        private void extractText(Content content, Parse parse){
            try {
                ParseData parseData = parse.getData();
                if (parseData == null) return;
                 Metadata md = parseData.getParseMeta();
                ArticleTextExtractor extractor = new ArticleTextExtractor();
                String sourceHtml = new String(content.getContent());
                JResult res = extractor.extractContent(sourceHtml);
                String text = res.getText();
                md.set("Endeca_Document_Text", text);
            } catch (Exception e) {
                // TODO: handle exception
        public static void log(String msg){
            System.out.println(msg);
        public Configuration getConf() {
            return null;
        public void setConf(Configuration conf) {

    but it only extracts URLs from <A> (anchor) tags. I want to be able to extract URLs from <MAP> tags as wellGee, do you think you could modify the code to check for "Map" attributes as well.
    Can someone maybe point a page containing info on the HTML toolkit for me?It's called the API. Since you are using the HTMLEditorKit and an ElementIterator and an AttributeSet, I would start there.
    There is no such API that says "get me all the links", so you have to do a little work on your own.
    Maybe you could use a ParserCallback and every time you get a new tag you check for the "href" attribute.

  • Taking regular text from a saved web page through java.

    Hello, currently I'm trying to save the source code from a regular HTML web page as a string. Which in fairness is pretty simple through the URL and InputStreamReader classes.
    But my problem is I'm not fully sure how to save the source code when the page is saved on your hard drive, as the (i think) URL class tries to look for a non-existant host.
    I think I need to use the HTMLEditor class but my knowledge of this is vague at best.
    Any ideas?
    Also I''m just trying to take out the regular text from the page (ie. what you see on the screen) and store that, is there any way specifically I should go about this? As I doubt it's going through the entire string and looking for specific tags to remove.
    Thanks in advance

    Yes, I need to calm down. That worked me up a bit, I'l admit.
    Oh, I'm with you 100%. Ipad is a nice 'accessory' to a PC or Mac. No replacement, by far.
    I didn't share this, but I think it was in a frame or apart of a link. The ipad wanted to follow the link. At the other page was the same info. Still, I could not select it.
    The ipad2 has dual processors. Do you think that could make a difference?
    I was advised once to re-boot my cell-phone often. To clear it's memory. Does the Ipad need this as well?
    I can select all day long while in this text edit box. But if I wanted to quote you by copy/paste, I cannot. Interesting. And it is right above and on the same page I am working on.
    Definitely an area for improvement.

  • Chapter Title & Introduction text from Pages?

    Hi,
    I designed a chapter layout in iBooks Author. But I can't seem to figure out how to get the Chapter Introduction text and Chapter title text from my Pages document into Author.
    I've created and applied style sheets within Pages and dragged the document into Author. But all of that text is displayed in a page after the chapter page instead of within it.
    And if I go to the first chapter in Author, the text isn't editable. I could delete these text boxes in layout mode, and do every chapter manually. But is it possible for Author to insert text into the corresponding text boxes based on a style sheet?
    Has anyone been able to do this?
    Thanks!

    John,
    I have not tried this, but you could try creating a master page for the first page in the chapter. Put two frames on the master page BOTH with the flow set to A. Position the first frame where you want the title to be and position the second frame where you want the text to begin. Set the paragraph tag for the title to have no space following. Set the paragraph tag for the FIRST paragraph of the text to Keep X rows together. With X set large enough, FrameMaker would not have room to put the first paragraph in the upper frame, so it will push it to the next frame. If this works, you can even go back to your original side head format for the title. You just have to make sure the top frame is large enough to hold all titles, so that a title does not leak to the second frame.
    Van

  • How can i paste text from a web site to an open office document as rtf?

    i'm running firefox under windows 7. i'm unable to copy text from web pages to the clipboard as rtf formatted data. tried pasting into ms wordpad and open office 3.3 writer. ie allows pasting as rtf text, but i'd prefer using firefox. why is rtf not an available format?

    Curnow 1 wrote the following:  "you should do a tap wait a sec and 1 more tap in the same place and hold the Second tap ;https://www.dropbox.com/s/23whtlt2gizxu27/MOV_1048.mp4?dl=0" Curnow1, this actually worked! Holy Cow, who'da thought?!!Thank you. I gave you a kudo and I marked this as a solution. Thanks again!PS - I'm amazed at how long it takes me to accomplish seemingly simple things with a smart phone. This is my first smart phone ever. I am a former computer programmer and am very PC literate. Still, sometimes I pull my hair out. Thanks again! 

  • Copying text from internet into Appleworks

    Hello!
    So, I would like to copy some exact text, same colors and fonts, from a web page on Firefox into Appleworks. When I paste the selection into a word processing document, the text comes up, but it pastes in Lucinda Grande text and with black ink. Why is this? Is there a way that I can paste the exact colors and font into a document? I looked through the Help Viewer pages, but I couldn't find anything on this topic.
    Thank you!
    -Kat Forster

    In addition to Niel's excellent suggestions, have you tried another web browser? I think this could be a limitation of FireFox. I just copied & pasted text from a page (my home page) from both FireFox & Safari. Text from FireFox changed to Lucida Grande in all black text, text from Safari pasted as Times with underlined blue text for the links, just as they are on the page. I used this page because I know that the page has the text as a default serif font.

Maybe you are looking for