How to validate html or parse html

Hi,
I am thinking of some way to parse text in wich I can have simple html tags like <a>, <br>, <i> - I have clearly specified list of them.
Now, I probably would parse this text using dom4j and the in some way check the elements against my configurable list of allowed tags.
But all this is connected with writing a parser - maby you have some example of free library that would to this for me? Or maybe you have already written such parser to validate text put by application user that can contain specified html tags?

user5970066 wrote:
..I am thinking of some way to parse text in wich I can have simple html tags like &lt;a>, &lt;br>, &lt;i> - I have clearly specified list of them.Quoted to change tags to show what the OP meant.
See here for a way to validate against a DTD.
Not sure if the linked technique is limited to validating XML.
Edited by: Andrew Thompson on Apr 11, 2011 5:58 PM
Added 'Not sure.."

Similar Messages

  • How to Validate HTML tags

    Hi,
    How can i validate the opened html tags are closed or not

    use a parser.... possibly a DOM... It will check for a well formed XML....

  • How to validate HTML 5 in HomeSite

    It's frustrating to get validation errors in HomeSite when working with HTML 5 elements such as video and canvas.
    Luckily, there's a fairly simple way to enable these elements to be validated.
    Quick description below...
    1. Select XHTML 1.0 Strict as the DTD.
    2. Open the Validation panel in the Settings dialog: Options > Settings > Validation.
    3. Make sure XHTML 1.0 Strict is the only checked item in the Version column.
    4. Click the Validator Settings.. button, select the Tags tab and select XHTML 1.0 Strict.
    5. Click Add... and add tags as appropriate.
    6. Add tags as appropriate, such as video, audio or canvas.
    7. For each new tag, click on the Attributes (or Required Attributes) folder and click Add... to add attributes (such as src or id). For each attribute, click on the attribute, click on Add...and add a Value: double quoted string and Quoted string work for src and id.
    8. Click on the Context folder, click the Add... button and add the names of valid containing elements. For example, for video, add div. Do the reverse for elements that can be contained. For example, add video to the Context for p to enable p elements to be valid children of video elements.
    You should now be able to validate your code without error.
    You can also set up HomeSite tag wizards for for HTML 5 elements, but that's another story -- more information here: http://evolt.org/article/Getting_XHTML_from_HomeSite_4_5/21/3466/.

    Thanks for posting this.
    jeff

  • How validate HTML using PL/SQL

    Hi,
    I try validate HTML using PL/SQL that user inputs.
    I did create below function for that purpose
    CREATE OR REPLACE
    FUNCTION validate_html(
      p_html IN VARCHAR2
    ) RETURN BOOLEAN
    AS
      l_comment  XMLTYPE;
      xml_parse_err EXCEPTION;
      PRAGMA EXCEPTION_INIT (xml_parse_err , -31011);
    BEGIN
      l_comment := xmlType.createXML('<root><row>' || p_html || '</row></root>');
      RETURN TRUE;
    EXCEPTION WHEN xml_parse_err THEN
      RETURN FALSE;
    END;
    Function works ok and return true if I run e.g.
    BEGIN
      IF validate_html('<p>Hello</p>') THEN
        dbms_output.put_line('OK');
      ELSE
        dbms_output.put_line('Not valid HTML');
      END IF;
    END;
    And return false if I enter not valid HTML like
    BEGIN
      IF validate_html('<p>Hello') THEN
        dbms_output.put_line('OK');
      ELSE
        dbms_output.put_line('Not valid HTML');
      END IF;
    END;
    But it return false also if I run below
    BEGIN
      IF validate_html('<p>Hello &nbsp</p>') THEN
        dbms_output.put_line('OK');
      ELSE
        dbms_output.put_line('Not valid HTML');
      END IF;
    END;
    Problem seems to be that &nbsp (there is ; in end but do not know how post it without forum convert that to space) witch is valid HTML for me.
    I know that HTML is not XML, but can I use Oracle database XML functions for validating HTML?
    How I should validate user inputted HTML?
    I'm currently developing this using Oracle XE 11G database.
    Regards,
    Jari

    Not an elegant way:
    But try this.........
    CREATE OR REPLACE FUNCTION validate_html (p_html IN VARCHAR2)
       RETURN BOOLEAN AS
       l_comment       XMLTYPE;
       xml_parse_err   EXCEPTION;
       PRAGMA EXCEPTION_INIT (xml_parse_err, -31011);
    BEGIN
       l_comment :=
          xmlType.createXML (
             '<root><row>'
             || CASE
                   WHEN INSTR (p_html, '&') > 0 THEN
                      UTL_I18N.escape_reference (p_html)
                   ELSE
                      p_html
                END
             || '</row></root>');
       RETURN TRUE;
    EXCEPTION
       WHEN xml_parse_err THEN
          RETURN FALSE;
    END;
    SET DEFINE OFF
    SET SERVEROUTPUT ON
    BEGIN
       IF validate_html ('<p>Hello') THEN
          DBMS_OUTPUT.put_line ('OK');
       ELSE
          DBMS_OUTPUT.put_line ('Not valid HTML');
       END IF;
    END;
    SET DEFINE OFF
    SET SERVEROUTPUT ON
    BEGIN
       IF validate_html ('<p>Hello &nbsp</p>') THEN
          DBMS_OUTPUT.put_line ('OK');
       ELSE
          DBMS_OUTPUT.put_line ('Not valid HTML');
       END IF;
    END;
    Cheers,
    Manik.

  • How to validate Email Address in HTML DB Application

    Hi,
    I have delevoped one Employee Login Details form in HTML DB. But i am unable to validate that email address as i find html db is not supporting String functions like indexOf(char c), substring(int) ect. So please can anybody help me to know how to validate email address that it has @ and . symbol or not.
    Thanks in advance.

    user529382,
    You may be able to use Regular Expressions instead, if you do a search in this forum for 'regex' you should find a few hits.
    While I agree that using a regular expression is a great way to verify that the user has entered an email address that conforms to the regular expression rules, it is still nothing more than that....conforming to the regular express rules.
    The only way to 100% confirm that an email address is 'valid', is to actually send an email to it, so what I tend to do is to get the user to enter their email twice (in a user registration screen for example), that way you can minimize the chance of 'typos', then send out a 'verification email' that the user has to click a link on to verify they have received it (I'm sure you've seen this type of system before), only when the confirmation is received would I then make the account 'active'.
    Hope this helps.

  • How to conver HTML text to plain text?

    If I am lucky, I hope to get some real good answers. Basically, I have an HTML file saved on my hard drive, and I want to convert that HTML file to a plain text file.
    I tried implementing something myself. My algorithm was to simply match "<" and ">" and get rid of anything in between. It's working fine, but it's not fully cooked. It doesn't get rid of some weird characters like &XXXX, which is supposed to be some sort of a code for HTML. Also, it doesn't remove JavaScript codes or anything that's not contained in <BODY>...</BODY>. I sure can improve my program, but I am also hoping to see a faster and more efficient way to deal with that.
    I browsed thru some old topics, but nothing was satisfactory. Someone suggested the use of JEditorPane:
    int len = pane.getDocument().getLength();
    try {
    String text = pane.getDocument().getText(0,len);
    System.out.println(text);
    } catch (Exception e) {
    System.exit(0);
    But the problem with that is how a JEditorPane object should be instantiated. I could do something like this JEditorPane pane = new JeditorPane (URL url) or JEditorPane pane = new JeditorPane (String url). Either way, the program takes time to download the HTML page from the corresponding url. If one tries to detect �int len�, one can see that �int len� is likely to be zero, because the HTML page cannot be fully loaded in time. I guess I could try to deal with that problem by creating threads and stuffs. But I hope that there are some better solutions to end this misery once and for all.

    Here is a link to an article on the "Swing HTML Parser":
    http://java.sun.com/products/jfc/tsc/articles/bookmarks/index.html
    This is an example of how you might use the parser callback to solve your problem:
    import java.io.FileReader;
    import java.io.IOException;
    import java.io.Reader;
    import javax.swing.text.MutableAttributeSet;
    import javax.swing.text.html.HTML;
    import javax.swing.text.html.HTMLEditorKit;
    import javax.swing.text.html.parser.ParserDelegator;
    public class TestParser extends HTMLEditorKit.ParserCallback
         boolean ignoreText;
         public static void main(String[] args)
         throws IOException
              TestParser parser = new TestParser();
              // args[0] is the file to parse
              Reader reader = new FileReader(args[0]);
              try
                   new ParserDelegator().parse(reader, parser, false);
              catch (IOException e)
                   System.out.println(e);
         public void handleComment(char[] data, int pos)
    //          System.out.println(data);
         public void handleEndOfLineString(String eol)
         public void handleEndTag(HTML.Tag tag, int pos)
    //          System.out.println("/" + tag);
              if (tag.equals(HTML.Tag.STYLE)
              ||  tag.equals(HTML.Tag.SCRIPT) )
                   ignoreText = false;
         public void handleError(String errorMsg, int pos)
    //          System.out.println(pos + ":" + errorMsg);
         public void handleMutableTag(HTML.Tag tag, MutableAttributeSet a, int pos)
    //          System.out.println("mutable:" + tag + ": " + pos + ": " + a);
         public void handleSimpleTag(HTML.Tag tag, MutableAttributeSet a, int pos)
    //          System.out.println( tag + ":" + a );
         public void handleStartTag(HTML.Tag tag, MutableAttributeSet a, int pos)
    //          System.out.println( tag + ":" + a );
              if (tag.equals(HTML.Tag.STYLE)
              ||  tag.equals(HTML.Tag.SCRIPT) )
                   ignoreText = true;
         public void handleText(char[] data, int pos)
              if (! ignoreText)
                   System.out.println( data );
    }

  • How to extract HTML page from the internet

    i am new to java, i wish to know how to extract Html page from the internet and also how to identify the differences between the images and text information?

    You can create a java.net.URL that points to the file you want to "extract" and read the HTML code (or what ever that file contains) from there using the inputstream given by URL.openStream().
    The difference between images and text... well, images are embedded in html using the img-tag. example: <IMG src="http://forum.java.sun.com/images/reply.gif" alt="Reply">. Attributes width, height, alt are sometimes left out and there may or may not be quotes around the values and everything is case insensitive... you'll be having hard time trying to parse the input so I'd suggest using existing parsers.
    What are you trying to do anyway? You can load a URL directly to a JTextEditorPane with the setPage(URL page) method...

  • How to extract HTML table contents

    Does someone know how to extract HTML table contents? I want to download a html file which contains table from internet and extract the table contents. Finally, insert the table contents into database.

    To do this you have to user a Parser to parse your html file and retrieve the information you want.
    Please have a look at the following classes:
    HTMLEditorKit.ParserCallback
    ParserDelegator()
    Here is an example which retrives the FRAMSET src of an html file. The purpose here is to find if the html file describes a multi-frame page or not. If so it add the frame src name to a Vector
    HTMLEditorKit.ParserCallback callback =
    new HTMLEditorKit.ParserCallback() {                      public void handleSimpleTag(HTML.Tag t,      MutableAttributeSet a, int pos)
         if (t.equals(HTML.Tag.FRAME))
    {                                          Logger.debug(this, "Frame tag found in "+f.getURL());                      Enumeration e = a.getAttributeNames();
    while (e.hasMoreElements())
                             Object name = e.nextElement();
                             if (name.toString().equals("src"))
                                  Object ob = a.getAttribute(name);                     
                                  Logger.debug("found an src "+ob);
                                  currentFrameSrc.add(new String(ob.toString()));
                   Reader reader = new FileReader(aFile);
                        new ParserDelegator().parse(reader, callback, false);
    It's not clean but I hope it will help :-)
    Stephane

  • How to Generate HTML Report Output in Excel

    Dear Experts,
    How to convert HTML report output in Excel.
    I have reports which output is coming in HTML format & the same I want to use in Excel.
    So tell me how I can covert the same in Excel.
    Thanks
    Sudhir

    hello,
    in your case, you might want to make the following :
    a) use DESFORMAT=HTML
    b) use MIMETYPE=application/vnd.msexcell (or whatever mimetype your excel application is bound to)
    i am nor sure if excel will understeand our HTMLCSS (which is actually HTML4.0 using layers for best possible rendering of the paper page in the browser).
    regards,
    the oracle reports team

  • How to use HTML Tags in Smartforms

    Hi,
    Can you please help me out in knowing how to use HTML tags in Smartforms,
    suppose i want to display some text in BOLD i should use the tag </b> as shown
    </b>  Header Information <b>
    regards
    Ranveer

    Hi Ranveer ,
        check this following links,
      hope this wil helps you
    <a href="http://sdn.sap.comhttp://www.sdn.sap.comhttp://www.sdn.sap.com/irj/servlet/prt/portal/prtroot/com.sap.km.cm.docs/library/webas/abap/abap%20code%20samples/smartforms/smartform%20in%20abap.pdf">check this link,to know abt HTML in smartforms</a>
    rgds,
    shan

  • How to change html style titles of a JTabbedPne at run time??

    hi,
    how to change html style titles of a JTabbedPne at run time??
    setTitleAt is not working...

    You can't change the canvas at runtime. But you can put the scrollbar on a stacked canvas and then show or hide that stacked canvas on different canvases.

  • How to upload html file to email

    Hi,
    I have absolutely no html background but was able to come up with a single page layout with graphics, texts and links with the help of online tutorials using Adobe Go Live which I can open the file now in Dreamweaver.
    My goal is not to come up with a webpage in cyberspace but to create an html file with images and links to specific websites that can be emailed to people. The file must be part of the email body that people can readily view the images, texts and click the links right away and not to be as an attachment.
    Would appreciate it very much if someone can help me out.
    Thanks.
    Newbie

    The simplest way to handle this would be to Upload the  page and files to a  remote server space and send your email buddies a link to the site.   Barring that....
    How to Code HTML Emails
    http://www.sitepoint.com/article/code-html-email-newsletters/
    Use  inline CSS styles for fonts and background colors.  Use Tables to hold your  page elements.  Use absolute URLs to images hosted on your web server because attached images may not get through. Give  people who can't see your HTML page in their email client a link to the HTML  page on your web server.
    Download some HTML Email Templates from the links below  to help you get started.
    Mail Chimp
    http://www.mailchimp.com/resources/html_email_templates/
    Constant  Contact
    http://www.constantcontact.com/email-marketing/html-email-templates/index.jsp
    Nancy O.
    Alt-Web Design & Publishing
    Web | Graphics | Print | Media  Specialists
    www.alt-web.com/
    www.twitter.com/altweb
    www.alt-web.blogspot.com

  • How to send HTML mail with images multipart/related message

    Hi,
    Could any body tell me how to send HTML mail with images in "multipart/related" message,if any body can give the code ,it would be helpful.
    Thanks

    Hi,
    Could any body tell me how to send HTML mail with
    ith images in "multipart/related" message,if any body
    can give the code ,it would be helpful.
    ThanksHi!
    Refer to
    http://developer.java.sun.com/developer/onlineTraining/JavaMail/index.html
    I've found it very helpful.
    Look at the last part for a code showing how to send HTML mail!
    Regards

  • How to send HTML email to End User using OOTB email processs?

    Hi,
    We are using OOTB Send email process to send email to end user.
    Templates has been created inside /etc/workflow/ProjectName/email folder.
    Its working properly for plain text email.but for html template ,It send the email with html tags.
    Any pointer on how to write html template for OOTB email process and activate email type as html ?
    Thanks
    Deepika

    Thanks Sham..
    I am able to send HTML email following above link.
    The problem i am facing is,When i am deploying the code through crxde.Code is working fine.
    But using maven deploy..Bundle get activated ..But at line:
    messageGateway = this.messageGatewayService.getGateway(HtmlEmail.class);
    ,it gives null Pointer exception.
    Any pointer,why its not working from maven deployment.
    Regards
    Deepika

  • How to put html file into a canevas?

    How to put html file into a canevas?

    Hi,
    would require you to write a JavaBean that interprets the HTML (there are commercial versions of this available)
    Frank

Maybe you are looking for

  • Windows 7 and Arch linux dual boot problem

    hey guys I had an issue with windows 7 dual booted with arch and i was just wondering my windows 7 crashed. I had Reinstalled windows OS and all my partitions are set up but now when I try to boot up arch I cant at all cause windows Boot took over an

  • Business Area in Settlement

    Dear All,   While executing the settlement run for the production order, the business area is not getting updated in the entry. Due to this in my COPA report i cannt able to differentiate by business area. Kindly advice how to make the business area

  • Date showing with time in siena

    hello everyone! Iam facing a problem that my Date field which is coming from XL data source is showing with time in siena. i.e 10-12-2014 is showing as "10-12-2014 12:00 AM". How to fix it? Waiting for reply! Thanks in advance saravana

  • How do i turn off automatic backup assistant

    my lg cosmo automatically runs backup assistant every night is there any way to turn it off

  • Matching ABAP Roles with UME Groups

    Hello, we are facing the following issue: We are providing Business Warehouse access via NW Portal beside the "normal" abap system. Therefore we need to put every new user into a special UME-group. How can we match ABAP-Roles with UME-Groups? We just