Help~ Get Tag content from input HTML file

I would like to build an index with the Tag content of the input HTML file.
Is there any convenient way to retrieve such Tag information?
Thanks

http://onesearch.sun.com/search/onesearch/index.jsp?qt=html+parser&subCat=&site=dev&dftab=&chooseCat=javaall&col=developer-forums

Similar Messages

  • Get the content from the PDF file IN WD ABAP View

    Hi all,
    I have a offline interactive form with data filled in that and save in my desktop. I want to upload the PDF content ( data ) into Web dynpro view.
    Currently i have the view designed with all the input box bound with the respective fields in the context and these fields are same as Adobe form fields.
    I have a UI element to browse and pick the file from the desktop. I have the upload button to upload the data into it.
    When i click the upload button it goes for dump. " No Enough Information for processing or Output"
    *Get the content from the file
      WD_CONTEXT->GET_ATTRIBUTE( EXPORTING NAME = 'PDFSOURCE'  IMPORTING VALUE = CONTENT ).
    In the above statement CONTENT is initial.  Hence dump. I have take this source from SDN Blog.
    Kindly help me where i went wrong.
    Thanks in advance.I

    Hi,
    try my solution (reading dunamic table, but you can read any data from pdf xml) described here:
    Dynamic Table data cannot be Read.
    Regards Jiri

  • HELP I have lost all my motion 4 content from all the files, ive looked in all the files and they are all empty. can i get replacement content as I my dvds where stolen along. with my other laptop.

    HELP I have lost all my motion 4 content from all the files, ive looked in all the files and they are all empty. can i get replacement content as I my dvds where stolen along. with my other laptop. any Ideas. would be a great help. For Motion 4.

    Try a hard reset:- hold the home button AND the power switch - ignore the red "Slide to power off" and wait ubntil the iPhone powers down, and then restarts - at the Apple logo, release all the buttons and see if your contacts reappear.

  • Getting links and its names from a html file

    Hi everyone
    My problem about the a getting links with name from a html file. For example
    İn a web page in this site ?SUN? when use click SUN the browser open http://java.sun.com
    İ want both of them, so the links and name. I can succeeded the get link but i don t know how to get the link name.
    For example :
    <B>setRightComponent(Component)</B>
    &#304;n this code segment i want to get B tag. But how i don t know. To get A tag i used this code
    List result = new ArrayList();
    try {
    // Create a reader on the HTML content
    URL url = new URI(uriStr).toURL();
    URLConnection conn = url.openConnection();
    Reader rd = new InputStreamReader(conn.getInputStream());
    // Parse the HTML
    EditorKit kit = new HTMLEditorKit();
    HTMLDocument doc = (HTMLDocument)kit.createDefaultDocument();
    kit.read(rd, doc, 0);
    // Find all the A elements in the HTML document
    HTMLDocument.Iterator it = doc.getIterator(HTML.Tag.A);
    while (it.isValid()) {
    SimpleAttributeSet s = (SimpleAttributeSet)it.getAttributes();
    String link = (String)s.getAttribute(HTML.Attribute.HREF);
    if (link != null) {
    result.add(link);
    it.next();
    &#304; can use B tag but i don t know hot to get its value because it has no prefix such as HREF....
    i am sorry if i use a bad explanation style or incorrect word.

    import java.io.*;
    import java.net.*;
    import javax.swing.text.*;
    import javax.swing.text.html.*;
    class GetLinks
        public static void main(String[] args)
            throws Exception
            // Create a reader on the HTML content
            Reader reader = getReader( args[0] );
            // Parse the HTML
            EditorKit kit = new HTMLEditorKit();
            HTMLDocument doc = (HTMLDocument)kit.createDefaultDocument();
            doc.putProperty("IgnoreCharsetDirective", Boolean.TRUE);
            kit.read(reader, doc, 0);
            // Find all the A elements in the HTML document
            HTMLDocument.Iterator it = doc.getIterator(HTML.Tag.A);
            while (it.isValid())
                SimpleAttributeSet s = (SimpleAttributeSet)it.getAttributes();
                String href = (String)s.getAttribute(HTML.Attribute.HREF);
                int start = it.getStartOffset();
                int end = it.getEndOffset();
                String text = doc.getText(start, end - start);
                System.out.println( href + " : " + text );
                it.next();
        // If 'uri' begins with "http:" treat as a URL,
        // otherwise, treat as a local file.
        static Reader getReader(String uri)
            throws IOException
            // Retrieve from Internet.
            if (uri.startsWith("http:"))
                URLConnection conn = new URL(uri).openConnection();
                return new InputStreamReader(conn.getInputStream());
            // Retrieve from file.
            else
                return new FileReader(uri);
    }

  • How to remove all of the tags from a HTML file

    Hi all,
    I am developing a search program.
    User will enter a word or some text in a textfield and after click on go button it will search the word from the html file which is reside in c: drive.
    What I am trying to do is -- reading file and storing data/contents of the file in a String and so on............then store in a Vector.....so on.......
    My question is ----- how can I remove all of the html tags such as: <p>, <b>,</b> <h1>, <strong>, or whatever from the String (where I store the data/contents of the html file) or from a HTML file.
    I would appreciate sample code if anyone has any.
    please help me in this way.
    Thanks in advance
    Thanks a lot.
    amitindia

    Hi dear,
    I got the link and have found examples.
    thanks for solving my problem.
    Thanks for your prompt reply.
    amitindia
    India

  • How to get the path of input type="file" tag

    -- im using <input type="file"> tag to get an input file from a local host, it returns only the filename but not the complete path of the filename,,,
    -- i need to know on how to get the compelete path /directory of the filename using <input type="file"> tag , or is there any other way to get an input file from a local host aside from <input type="file"> tag?
    thanks

    http://msdn.microsoft.com/workshop/author/dhtml/reference/objects/input_file.asp?frame=true
    When a file is uploaded, the file name is also submitted. The path of the file is available only to the machine within the Local Machine security zone. The value property returns only the file name to machines outside the Local Machine security zone. See About URL Security Zones for more information on security zones.
    i need to know on how to get the compelete path /directory of the filename
    using <input type="file"> tag You can't. Its a security thing.
    is there any other way to get an input file from a local host aside from <input type="file"> tag?No. Not using just html.
    You could always go into activex components, but thats different again.
    Cheers,
    evnafets

  • Get all content from iTouch to a new computer.

    My problem is very simple to explain, but I find no solution.
    My old computer is damaged and I cant use it anymore. So I bought a new one and I want to get all my stuff from the iTouch to iTunes on me new computer. After installing iTunes I actived in iTunes my iTouch for iTunes. In iTunes I get the massage that now two computers are actived (5 are possible).
    So I connect my iTouch with the wire to my computer and I get the massage that my iTouch is logical connected with an other iTunes libary. I get the only possibility to delete my iTouch and syncronize it with the content of the new iTunes libary.
    *My question is how can I get my content from the iPod touch to iTunes on my new computer?*

    Hi,
    Welcome to Apple discussions.
    So, in a perfect world, you would restore your library of valuable music from *your backed up files* to the library on your new PC, you know, the ones burnt to DVD/CD or on an external HD.
    This is the way iTunes works, it makes it perfectly clear and actively encourages users to *back up your music*.
    The iPod touch is not a back up device (nor is it advertised as one) it is a music player with a volatile memory that could actually disappear at anytime so it should not be trusted as a store for valuable files.
    To salvage your music you will have to use some 3rd party application on your PC/Mac (you fail to say) such as [Touch Copy|http://www.wideanglesoftware.com/touchcopy/index.html], Google for other options. This is not guaranteed to work but hopefully you will get your music back.
    Then back up.
    Good luck,
    Dud.
    *iPod touch* by the way, this an [ITOUCH|http://en.pasen.it/product_detail.php?id=36]

  • How ias integrate with Snacktory for getting main text from an html page

    Hi All,
    i am new to endeca and ias, i have an requirement, need to get main text from whole html page before ias save text to Endeca_Document_Text property,
    as ias save all text in page to endeca_document_text property, it is not ok for reading when show in web page, i use an third party API to filter out the main text from original page,
    now i want to save these text to endeca_document_text property,
    an another question,
    i get zero page when doing the logic of filtering main text from original html text in ParseFilter( HTMLMetatagFilter implements ParseFilter) using Snacktory.
    if only do little things, it will work fine, if do more thing, clawer fail to crawl page. any one know how to fix it.
    log for clawler.
    Successfully set recordstore configuration.
    INFO    2013-09-03 00:56:42,743    0    com.endeca.eidi.web.Main    [main]    Reading seed URLs from: /home/oracle/oracle/endeca/IAS/3.0.0/sample/myfirstcrawl/conf/endeca.lst
    INFO    2013-09-03 00:56:42,744    1    com.endeca.eidi.web.Main    [main]    Seed URLs: [http://www.liferay.com/community/forums/-/message_boards/category/]
    INFO    2013-09-03 00:56:43,497    754    com.endeca.eidi.web.db.CrawlDbFactory    [main]    Initialized crawldb: com.endeca.eidi.web.db.BufferedDerbyCrawlDb
    INFO    2013-09-03 00:56:43,498    755    com.endeca.eidi.web.Crawler    [main]    Using executor settings: numThreads = 100, maxThreadsPerHost=1
    INFO    2013-09-03 00:56:44,163    1420    com.endeca.eidi.web.Crawler    [main]    Fetching seed URLs.
    INFO    2013-09-03 00:56:46,519    3776    com.endeca.eidi.web.parse.HTMLMetatagFilter    [pool-1-thread-1]    come into EndecaHtmlParser getParse
    INFO    2013-09-03 00:56:46,519    3776    com.endeca.eidi.web.parse.HTMLMetatagFilter    [pool-1-thread-1]    come into HTMLMetatagFilter
    INFO    2013-09-03 00:56:46,519    3776    com.endeca.eidi.web.parse.HTMLMetatagFilter    [pool-1-thread-1]    meta tag viewport ==minimum-scale=1.0, width=device-width
    INFO    2013-09-03 00:56:52,889    10146    com.endeca.eidi.web.parse.HTMLMetatagFilter    [pool-1-thread-1]    come into EndecaHtmlParser getParse
    INFO    2013-09-03 00:56:52,889    10146    com.endeca.eidi.web.parse.HTMLMetatagFilter    [pool-1-thread-1]    come into HTMLMetatagFilter
    INFO    2013-09-03 00:56:52,890    10147    com.endeca.eidi.web.parse.HTMLMetatagFilter    [pool-1-thread-1]    meta tag viewport ==minimum-scale=1.0, width=device-width
    INFO    2013-09-03 00:56:59,184    16441    com.endeca.eidi.web.parse.HTMLMetatagFilter    [pool-1-thread-2]    come into EndecaHtmlParser getParse
    INFO    2013-09-03 00:56:59,185    16442    com.endeca.eidi.web.parse.HTMLMetatagFilter    [pool-1-thread-2]    come into HTMLMetatagFilter
    INFO    2013-09-03 00:56:59,185    16442    com.endeca.eidi.web.parse.HTMLMetatagFilter    [pool-1-thread-2]    meta tag viewport ==minimum-scale=1.0, width=device-width
    INFO    2013-09-03 00:57:07,057    24314    com.endeca.eidi.web.parse.HTMLMetatagFilter    [pool-1-thread-2]    come into EndecaHtmlParser getParse
    INFO    2013-09-03 00:57:07,057    24314    com.endeca.eidi.web.parse.HTMLMetatagFilter    [pool-1-thread-2]    come into HTMLMetatagFilter
    INFO    2013-09-03 00:57:07,057    24314    com.endeca.eidi.web.parse.HTMLMetatagFilter    [pool-1-thread-2]    meta tag viewport ==minimum-scale=1.0, width=device-width
    INFO    2013-09-03 00:57:07,058    24315    com.endeca.eidi.web.Crawler    [main]    Seeds complete.
    INFO    2013-09-03 00:57:07,090    24347    com.endeca.eidi.web.Crawler    [main]    Starting crawler shut down
    INFO    2013-09-03 00:57:07,095    24352    com.endeca.eidi.web.Crawler    [main]    Waiting for running threads to complete
    INFO    2013-09-03 00:57:07,095    24352    com.endeca.eidi.web.Crawler    [main]    Progress: Level: Cumulative crawl summary (level)
    INFO    2013-09-03 00:57:07,095    24352    com.endeca.eidi.web.Crawler    [main]    host-summary: www.liferay.com to depth 1
    host    depth    completed    total    blocks
    www.liferay.com    0    0    1    1
    www.liferay.com    1    0    0    0
    www.liferay.com    all    0    1    1
    INFO    2013-09-03 00:57:07,096    24353    com.endeca.eidi.web.Crawler    [main]    host-summary: total crawled: 0 completed. 1 total.
    INFO    2013-09-03 00:57:07,096    24353    com.endeca.eidi.web.Crawler    [main]    Shutting down CrawlDb
    INFO    2013-09-03 00:57:07,160    24417    com.endeca.eidi.web.Crawler    [main]    Progress: Host: Cumulative crawl summary (host)
    INFO    2013-09-03 00:57:07,162    24419    com.endeca.eidi.web.Crawler    [main]   Host: www.liferay.com:  0 fetched. 0.0 mB. 0 records. 0 redirected. 4 retried. 0 gone. 0 filtered.
    INFO    2013-09-03 00:57:07,162    24419    com.endeca.eidi.web.Crawler    [main]    Progress: Perf: All (cumulative) 23.6s. 0.0 Pages/s. 0.0 kB/s. 0 fetched. 0.0 mB. 0 records. 0 redirected. 4 retried. 0 gone. 0 filtered.
    INFO    2013-09-03 00:57:07,162    24419    com.endeca.eidi.web.Crawler    [main]    Crawl complete.
    ~/oracle/endeca
    -======================================
    source code for parsefilter
    package com.endeca.eidi.web.parse;
    import java.util.Map;
    import java.util.Properties;
    import org.apache.hadoop.conf.Configuration;
    import org.apache.log4j.Logger;
    import org.apache.nutch.metadata.Metadata;
    import org.apache.nutch.parse.HTMLMetaTags;
    import org.apache.nutch.parse.Parse;
    import org.apache.nutch.parse.ParseData;
    import org.apache.nutch.parse.ParseFilter;
    import org.apache.nutch.protocol.Content;
    import de.jetwick.snacktory.ArticleTextExtractor;
    import de.jetwick.snacktory.JResult;
    public class HTMLMetatagFilter implements ParseFilter {
        public static String METATAG_PROPERTY_NAME_PREFIX = "Endeca.Document.HTML.MetaTag.";
        public static String CONTENT_TYPE = "text/html";
        private static final Logger logger = Logger.getLogger(HTMLMetatagFilter.class);
        public Parse filter(Content content, Parse parse) throws Exception {
            logger.info("come into EndecaHtmlParser getParse");
            logger.info("come into HTMLMetatagFilter");
            //update the content with the main text in html page
            //content.setContent(HtmlExtractor.extractMainContent(content));
            parse.getData().getParseMeta().add("FILTER-HTMLMETATAG", "ACTIVE");
            ParseData parseData = parse.getData();
            if (parseData == null) return parse;
            extractText(content, parse);
            logger.info("update the content with the main text content");
            return parse;
        private void extractText(Content content, Parse parse){
            try {
                ParseData parseData = parse.getData();
                if (parseData == null) return;
                 Metadata md = parseData.getParseMeta();
                ArticleTextExtractor extractor = new ArticleTextExtractor();
                String sourceHtml = new String(content.getContent());
                JResult res = extractor.extractContent(sourceHtml);
                String text = res.getText();
                md.set("Endeca_Document_Text", text);
            } catch (Exception e) {
                // TODO: handle exception
        public static void log(String msg){
            System.out.println(msg);
        public Configuration getConf() {
            return null;
        public void setConf(Configuration conf) {

    but it only extracts URLs from <A> (anchor) tags. I want to be able to extract URLs from <MAP> tags as wellGee, do you think you could modify the code to check for "Map" attributes as well.
    Can someone maybe point a page containing info on the HTML toolkit for me?It's called the API. Since you are using the HTMLEditorKit and an ElementIterator and an AttributeSet, I would start there.
    There is no such API that says "get me all the links", so you have to do a little work on your own.
    Maybe you could use a ParserCallback and every time you get a new tag you check for the "href" attribute.

  • Need how to get the data from the external file in eCatt

    Hi ,
      Could any body suggest how to get the values from the external file(Excel,CSV file,Text file) and pass it as varaiable in ecatt Test script.
    Problem: Need to execute FK01-Vendor creation Transaction with multiple set of data .As per my understanding we could achive through Variants in Testdata set in eCatt .
    But is there any way to store the data in excell file and get the data and pass it to FK01 Test scripts
    Appreciate response on this

    Hi
    See the links they may be useful
    check these link,
    eCATT- An Introduction
    /people/sumeet.kaul/blog/2005/07/26/ecatt-an-introduction
    Creating Test Scripts
    /people/sumeet.kaul/blog/2005/08/10/ecatt-creating-test-scripts
    eCATT Logs
    /people/sapna.modi/blog/2006/04/18/ecatt-logs-part-vi
    eCATT Scripts Creation – TCD Mode
    /people/sapna.modi/blog/2006/04/10/ecatt-scripts-creation-150-tcd-mode-part-ii
    Creation of Test Data Container
    /people/sumeet.kaul/blog/2005/08/24/ecatt-creation-of-test-data-container
    eCATT Scripts Creation - SAPGUI Mode
    /people/sapna.modi/blog/2006/04/10/ecatt-scripts-creation--sapgui-mode-part-iii
    Integrating ECATT & MERCURY QTP Part -1
    /people/community.user/blog/2007/01/02/integrating-ecatt-mercury-qtp-part-1
    Using eCatt to Test Web Dynpro ABAP
    /people/thomas.jung/blog/2006/03/21/using-ecatt-to-test-web-dynpro-abap
    and
    -command reference
    http://help.sap.com/saphelp_nw04/helpdata/en/c6/3c333b40389c46e10000000a114084/content.htm
    /people/sapna.modi/blog/2006/04/10/ecatt--an-introduction-part-i
    http://prasadbabu.blogspot.com
    https://www.sdn.sap.com/sdn/developerareas/was.sdn?page=test_tool_integration_for_sap_e-catt.htm
    http://help.sap.com/saphelp_nw04/helpdata/en/1b/e81c3b84e65e7be10000000a11402f/frameset.htm
    http://www.erpgenie.com/ecatt/index.htm
    hope this helps.
    Reward points for useful Answers
    Regards
    Anji

  • Updating content of index.html file of J2EE Engine

    Hi,
    I would like to update the content of index.html file, which is located under D:\usr\sap\NSP\JC01\j2ee\cluster\server0\apps\sap.com\com.sap.engine.docs.examples\servlet_jsp\_default\root\index.html.
    I have requirement to change the html file content with latest news and we want to access this latest news through http://host:port/index.html ( to avoid authentication to server). In this html file I will plase link to connect to portal.
    The problem is updated content of this html file is not displaying in the browser. It is taking from cache. How to avoid this cache problem.
    Thanks

    Hello,
    I think you wish to change your logon screen, this will not be done simply by changing the index.html
    Please check this link for more information:
    http://help.sap.com/saphelp_nw70/helpdata/EN/ff/c7de3fc6c6ec06e10000000a1550b0/frameset.htm
    Regards,
    Siddhesh
    Edited by: Siddhesh Ghag on May 21, 2008 1:54 PM

  • Easy way to add content from a last file like a word document?

    Is there somebody who can tell me how i can display content from a word file.
    I have files with content inside on my network for example a word document.
    Inside this document i've got hyperlinks to content on other pages from the same document. Is the a way or a method how i can display the content in the portal.
    Do i need to cut and paste the word content of should i save the content as a XML or a HTML document and then add the document to portal
    Regards
    Andre

    Yeah, I have an idea... Have you tried the pdf export yet? If it is already formated then that maybe the way to go. But you may need acrobat to save it in another format. If I recall InDesign renders the pdf, which would make editing the pdf in Indesign useless.
    OK, did a quick test with PDF. Load the PDF into Acrobat and save as a word file. Place the word file into InDesign and you will have the styles paragraphs.    You just have to alter the style taste, may be fix a few paragraphs if needed.

  • How do I restore my bookmarks from an HTML file?

    I have just installed Ubuntu 11.04 and I want to restore my bookmarks from an HTML file.
    This used to be accessible via "Bookmarks-> Organise Bookmarks" but the "Organise Bookmarks" menu item no longer exists so I'm stymied.
    This help system seems to think that it does exist, however, so please note that on my system, it does NOT. I suspect that hundreds, thousands, or even millions of new Ubuntu 11.04 users will be facing exactly the same problem.
    I don't think users will be able to help with this one unless someone knows how to restore the "Organise Bookmarks" menu item. It's the only one that will do what I want and so there's no point recommending "Show All Bookmarks" as a solution. That doesn't work.
    Finally, persuading this system that I really do have a new problem that isn't covered in the FAQs was a very irritating experience. I wonder if Chrome will know what to do with my "bookmarks.html" file?

    Update to my previous posting. I just installed Chromium, and it took '''LESS than 10 seconds''' to import my bookmarks. Why would I waste any more time with Firefox?? I had intended to give Firefox 4 a chance to show me what improvements had been made, but who needs the grief. Bad move guys... bad move.

  • How to get the content of the uploaded file.

    Hi Experts,
    I am using a input box to get the path of the file to be uploaded.(I am not using FILE UPLOAD UI Element).Could you please let me know how to get the content of the uploaded file.
    Regards,
    Arun

    >
    ARUN KUMAR.S wrote:
    > Hi Experts,
    >
    > I am using a input box to get the path of the file to be uploaded.(I am not using FILE UPLOAD UI Element).Could you please let me know how to get the content of the uploaded file.
    >
    > Regards,
    > Arun
    You will not be able to use a normal InputField to upload file contents.  This is not allowed by the browser security model. You must use one of the upload specific UI mechanisms - the FileUpload UI element or as of 7.01 ACFUpDownload or FlashIslands.

  • How to get UCM content from Java class?

    Hello,
    I need to get UCM content from backend I mean from the Java class. Is there any way to do that? If anybody please give any resources it will be very helpful.
    Thanks and regards.

    Hi
    You mean to say that with JAVA API you want to search and retrieve the contents ? If yes , then you should use RIDC API for the same and that way you can use to do all the operations on UCM from the JAVA Api .
    Search this forum and you will get sample codes for the checkin , search , checkout operations .
    Documentation is available at : http://docs.oracle.com/cd/E14571_01/doc.1111/e16819/toc.htm
    Thanks
    Srinath

  • How to get the content in embed swf file in Swf Loader on run time

    How to get the content in embed swf file in Swf Loader on run time
    [Bindable]
    [Embed(source="assets/index.swf")]
       private var SWFSRC:Class;
    <mx:SWFLoader id="_swfloader" source="{SWFSRC}" />

    Hi Flex harUI,
    Throw the error.
    Access of undefined property content

Maybe you are looking for

  • Power settings for an external battery

    I'm using an external hyperjuice battery, which connects via the magsafe power socket. It's identified in OSX power settings as Power Source: Power Adapter so it's draining power as if on mains electricity. If I could set it to work as a battery, it

  • CRM 5.0 and Web Services

    Hello. My employer is attempting to create a packaged composite application that will need to communicate to a SAP CRM 5 backend system connected through my employer's VPN.  Specifically, we would like to make web services method calls that will allo

  • Windows Malicious Software Removal Tool x64 - March 2015 (KB 890830) 9.2 MB

    I am using Win7 (64 bit) and just saw the following Optional Update when I went to the Critical Updates page. I wonder if I should install it or not? If so, why is it marked Optional and not Recommended or Critical if it is an update to the Malicious

  • Cursor leaves when using the spot removal tool

    the cursor leaves when using the spot removal tool if i drag it to the right. how do i find it again?

  • Please help me find the error in the annex VI,access

    Hello, everyone, this is a program about ACCESS, please help me find the error in the annex VI, why do I always run time error. Thank you very much. Attachments: write access test.vi ‏22 KB