Counting lines of parsed HTML documents

Hello,
I am using a HTMLEditorKit.ParserCallback to handle data generated by a ParserDelegator.
Everything is ok but I can not find how to catch end of lines (I need to know at what line a tag or an attribute is found).
Thanks in advance for any hints.

I noticed that the parse() method of ParserDelegator creates a DocumentParser object to do the actual parsing of the HTML document. DocumentParser contains a method getCurrentLine(). So, I tried to extending ParserDelegator so I could access Document Parser. However, the getCurrentLine method is protected so I ended up also extending DocumentParser.
You probably have code something like:
new MyParserDelegator().parse(reader, this, false);
This should be replaced with:
parser = new MyParserDelegator();
parser.parse(reader, this, false);
where you defined an instance variable: MyParserDelegator parser;
You can now use parser.getCurrentLine() in any of you parser callback methods.
Note that you may not alway get the results that you expect for the current line as many times I found the line to be 1 greater than I thought it should be. Anyway you can decide if the code is of any value.
Following is the code for MyParserDelegator and MyDocumentmentParser inner class. Good Luck.
import java.io.IOException;
import java.io.Reader;
import java.io.Serializable;
import javax.swing.text.html.HTMLEditorKit;
import javax.swing.text.html.parser.DTD;
import javax.swing.text.html.parser.DocumentParser;
import javax.swing.text.html.parser.ParserDelegator;
public class MyParserDelegator extends ParserDelegator implements Serializable
     MyDocumentParser parser;
public void parse(Reader r, HTMLEditorKit.ParserCallback cb, boolean ignoreCharSet) throws IOException
     String name = "html32";
     DTD dtd = createDTD( DTD.getDTD( name ), name );
          parser = new MyDocumentParser(dtd);
          parser.parse(r, cb, ignoreCharSet);
     public int getCurrentLine()
          return parser.getCurrentLine();
public class MyDocumentParser extends DocumentParser
     public MyDocumentParser(DTD dtd)
          super(dtd);
     public int getCurrentLine()
          return super.getCurrentLine();

Similar Messages

Parsing HTML documents

I am trying to write an application that uses a parsed html document to perform some data retrieval. The problem that I am having is that the parser in JDK1.4.1 is unable to completely parse the document correctly. Some fields are skipped as well as other problems. I believe it has to do with the html32.bdtd. Is there a later version?

Parsing a HTML document is a huge task, you shouldn't do it yourself but instead javax.text.html and javax.text.html.parser already provide almost everything you ever need

Parse HTML document embedded in IFRAME

Dear fellows:
How can I access contents of an HTML document embedded in an IFRAME tag, by using java class HTMLEditorKit.Parser?
It is well known that the contents of such embedded HTML document can be accessed by javascript at front end. However, I am more interested on processing it at backend, using HTMLEditorKit.Parser, or any java swing API.
Thanks for help.

The javax.swing.text.html framework barely supports HTML 3.2.

Parsing HTML characters (e.g. &nbsp)

Hi
Apologies if I'm missing something obvious, I haven't been able to find an answer searching the API or Forums...
I'm parsing HTML documents (currently as Strings) to extract certain information. Is there an easy way to replace all special HTML characters such as < etc. to a space or < respectively without having to do a string replace on every possible HTML character?
I know there's an HTML parser in swing but that seems to be geared towards creating an HTML editor.
Any help would be appreciated!

There are also a number of open source or shareware programs, such as TidyHTML, that clean-up and parse existing HTML. Check out Sourceforge or www.downloads.com.
- Saish

Problem parsing a html document

Hi all,
I need to parse a html document.
InputStream is = new java.io.FileInputStream(new File("c:/temp/htmldoc.html"));
DOMFragmentParser DOMparser = new DOMFragmentParser();
DocumentFragment doc = new HTMLDocumentImpl().createDocumentFragment();
DOMparser.parse(new InputSource(is), doc);
NodeList nl = doc.getChildNodes();
I get just 3 of the following nodes...... though the document htmldoc.html is a proper html doc..
#document-fragment
HTML
#text
Any suggestions/help are most welcome. Thanks

Here's an example showing how to do this via javax.xml:
import java.io.*;
import java.net.*;
import javax.xml.parsers.*;
import org.w3c.dom.*;
public class HTMLElementLister {
     public static void main(String[] args) throws Exception {
          URLConnection con = new URL("http://www.mywebsite.com/index.html").openConnection();
          con.connect();
          InputStream in = (InputStream)con.getContent();
          Document doc = null;
          try {
               DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
               DocumentBuilder db = dbf.newDocumentBuilder();
               doc = db.parse(in);
          } finally {
               in.close();
          NodeList nodes = doc.getChildNodes();
          for (int i=0; i<nodes.getLength(); i++) {
               Node node = nodes.item(i);
               String nodeName = node.getNodeName();
               System.out.println(nodeName);
               if ("html".equalsIgnoreCase(nodeName)) {
                    System.out.println("|");
                    NodeList grandkids = node.getChildNodes();
                    for (int j=0; j<grandkids.getLength(); j++) {
                         Node contentNode = grandkids.item(j);
                         nodeName = contentNode.getNodeName();
                         System.out.println("|- " + nodeName);
                         if ("body".equalsIgnoreCase(nodeName)) {
                              System.out.println("   |");
                              NodeList bodyNodes = contentNode.getChildNodes();
                              for (int k=0; k<bodyNodes.getLength(); k++) {
                                   node = bodyNodes.item(k);
                                   System.out.println("   |- " + node.getNodeName());
}

Parsing an HTML document

I want to parse an html document and replace anchor tags with mines on the fly. Can anybody suggest how to do it Please?
Ajay

If your HTML files are not well-formed (chances are with most HTML files) like attribute values are not enclosed in punctuation marks, etc, most XML parsers will fail.
Anand from this forum introduced the JTidy to me and it worked very well. This is a HTML parser that is able to tidy up your HTML codes.

How to parse a html document?

I am trying to parse an html document that I load from a url over the internet. The html is not well formed but thats ok. The problem is the document builder throws an exception because the document is not well formed.
Can I parse a html document using the document builder?
Please note that I set validating to false and the parse still has a fatal errror saying <meta> tag must have a corresponding </meta> tag.
I am using code like the following.....
DocumentBuilderfactory = DocumentBuilderFactory.newInstance();
factory.setValidating(false);
DocumentBuilder db = factory.newDocumentBuilder();
doc = db.parse(urlString);

The html is not well formed but thats ok.No, it isn't.
"Validation" means checking that the XML conforms to a schema or a DTD. Don't confuse that with checking whether the XML is well-formed, which means whether it follows the basic rules of XML like opening tags have to have matching closing tags. Which is what your message is telling you -- your file isn't well-formed XML.
So sure, you can parse HTML or anything else with an XML parser, just be prepared to be told it isn't well-formed XML.
If you want to clean up HTML so that it's well-formed XML, there are products like HTMLTidy and JTidy that will do that for you.

Parsing and HTML document

Does any one know how to parse an HTML document with having JEditorPane (in-the-neck) do it for you?

If you want to do a small amount of parsing, then it would make sense to write a custom program as described in the previous response. If, however, you want to do a lot of parsing, then it might make more sense to try to make use of an XML parser. If you are trying to parse html pages that are your own, then you might want to think about transforming them into xhtml so that an XML parser will be able to process them.
XML parsing is easy. I recently developed a web site for a set of mock exams for the Java Programmer Certification. Originally, I started developing the pages in html, but I quickly realized that I would have a hard time managing the exams in that format. I then organized the exam into a set of xml documents--one document for each topic. To publish a set of cross-topic exams, I use JDOM (with the help of SAX)to load all of the questions into the Java Collections Framework where I can easily organized a set of four cross-topic exams. Also, I use JDOM to number the questions and answers before writting the new exams out to a new set of four xml files. Then I use XSLT to transform the four exam.xml documents into eight HTML files--four html files for the questions and four for the answers.
If you would like to take a look at the result, then please use the following link.
http://www.geocities.com/danchisholm2000/
If you own the html files that you want to parse, then I would try to find a way to transform them into valid xml. XHTML might be a good choice.
Dan Chisholm

How to parse XML document with default namespace with JDOM XPath

Hi All,
I am having difficulty parsing using Saxon and TagSoup parser on a namespace html document. The relevant content of this document are as follows:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
</head>
<body>
    <div id="container">
        <div id="content">
            <table class="sresults">
                <tr>
                    <td>
                        <a href="http://www.abc.com/areas" title="Hollywood, CA">hollywood</a>
                    </td>
                    <td>
                        <a href="http://www.abc.com/areas" title="San Jose, CA">san jose</a>
                    </td>
                    <td>
                        <a href="http://www.abc.com/areas" title="San Francisco, CA">san francisco</a>
                    </td>
                    <td>
                        <a href="http://www.abc.com/areas" title="San Diego, CA">San diego</a>
                    </td>
              </tr>
</body>
</html>
Below is the relevant code snippets illustrates how I have attempted to retrieve the contents (value of <a>):
             import java.util.*;
             import org.jdom.*;
             import org.jdom.xpath.*;
             import org.saxpath.*;
             import org.ccil.cowan.tagsoup.Parser;
( 1 )       frInHtml = new FileReader("C:\\Tmp\\ABC.html");
( 2 )       brInHtml = new BufferedReader(frInHtml);
( 3 ) //    SAXBuilder saxBuilder = new SAXBuilder("org.apache.xerces.parsers.SAXParser");
( 4 )       SAXBuilder saxBuilder = new SAXBuilder("org.ccil.cowan.tagsoup.Parser");
( 5 )       org.jdom.Document jdomDocument = saxbuilder.build(brInHtml);
( 6 )       XPath xpath = XPath.newInstance("/ns:html/ns:body/ns:div[@id='container']/ns:div[@id='content']/ns:table[@class='sresults']/ns:tr/ns:td/ns:a");
( 7 )       xpath.addNamespace("ns", "http://www.w3.org/1999/xhtml");
( 8 )       java.util.List list = (java.util.List) (xpath.selectNodes(jdomDocument));
( 9 )       Iterator iterator = list.iterator();
( 10 )     while (iterator.hasNext())
( 11 )     {
( 12 )            Object object = iterator.next();
( 13 ) //         if (object instanceof Element)
( 14 ) //               System.out.println(((Element)object).getTextNormalize());
( 15 )             if (object instanceof Content)
( 16 )                   System.out.println(((Content)object).getValue());
.This program would work on the same document without the default namespace, hence, it would not be necessary to include ns prefix along in the XPath statements (line 6-7) either. Moreover, I was using org.apache.xerces.parsers.SAXParser to have successfully retrieve content of <a> from the same document without default namespace in the past.
I would like to achieve the following objectives if possible:
( i ) Exclude DTD and namespace in order to simplifying the parsing process. How this could be done?
( ii ) If this is not possible, how to include it in XPath statements (line 6-7) so that the value of <a> is picked up correctly?
( iii ) Would changing from org.apache.xerces.parsers.SAXParser to org.ccil.cowan.tagsoup.Parser make any difference as far as using XPath is concerned?
( iv ) Failing to exlude DTD, how to change the lookup of a PUBLIC DTD to a local SYSTEM one and include a local DTD for reference?
I am running JDK 1.6.0_06, Netbeans 6.1, JDOM 1.1, Saxon6-5-5, Tagsoup 1.2 on Windows XP platform.
Any assistance would be appreciated.
Thanks in advance,
Jack

Here's an example of using a custom EntityResolver with the standard DocumentBuilder provided by the JDK. The code may or may not be similar for the parsers that you're using.
import java.io.IOException;
import java.io.StringReader;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import org.w3c.dom.Document;
import org.xml.sax.EntityResolver;
import org.xml.sax.InputSource;
import org.xml.sax.SAXException;
public class ParseExamples
    private final static String COMMON_XML
        = "<music>"
        +     "<artist name=\"Anderson, Laurie\">"
        +         "<album>Big Science</album>"
        +         "<album>Strange Angels</album>"
        +     "</artist>"
        +     "<artist name=\"Fine Young Cannibals\">"
        +         "<album>The Raw & The Cooked</album>"
        +     "</artist>"
        + "</music>";
    private final static String COMMON_DTD
        = "<!ELEMENT music (artist*)>"
        + "<!ELEMENT artist (album+)>"
        + "<!ELEMENT album (#PCDATA)>"
        + "<!ATTLIST artist name CDATA #REQUIRED>";
    public static void main(String[] argv)
    throws Exception
        // this version uses just a SYSTEM identifier - note that it gets turned
        // into a file: URL
        String xml = "<!DOCTYPE music SYSTEM \"bar\">"
                   + COMMON_XML;
        // this version uses both PUBLIC and SYSTEM identifiers; the SYSTEM ID
        // gets munged, the PUBLIC ID doesn't
//        String xml = "<!DOCTYPE music PUBLIC \"foo\" \"bar\">"
//                   + COMMON_XML;
        DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
        dbf.setValidating(true);
        DocumentBuilder db = dbf.newDocumentBuilder();
        db.setEntityResolver(new EntityResolver()
            public InputSource resolveEntity(String publicId, String systemId)
                throws SAXException, IOException
                System.out.println("publicId = " + publicId);
                System.out.println("systemId = " + systemId);
                return new InputSource(new StringReader(COMMON_DTD));
        Document dom = db.parse(new InputSource(new StringReader(xml)));
        System.out.println("root element name = " + dom.getDocumentElement().getNodeName());
}

Parsing html - txt

Hi, I'm new here, so I'd like to say 'hi' to everyone :) I'm a freshman in the java comunity, so please be gentle.
Firs of all I'd like to ask how to resolve the first problem. I've got the code which is responsible for convert a html document to the txt file. It looks like:
import javax.swing.text.html.parser.*;
import javax.swing.text.html.*;
import javax.swing.text.*;
import javax.swing.*;
import java.io.*;
public class TextFromHtml extends HTMLEditorKit.ParserCallback{
static String trimmss(String o){
   return (o.trim()).replaceAll("\\s+", " ");
public void handleText(char[] data, int pos){
try{
      String o = new String(data);
      PrintWriter pw = new PrintWriter(new BufferedWriter(new FileWriter("test.txt", true)));
      o = trimmss(o);
      pw.println(o);
      //pw.close();
     catch (IOException exc){}
System.out.println(new String(data));
public static void main(String args[]){
    try{
      String s = JOptionPane.showInputDialog("Enter the name of the file you want to convert from html to txt.");
      Reader r = new FileReader(s);
      ParserDelegator parser = new ParserDelegator();
      HTMLEditorKit.ParserCallback callback = new TextFromHtml();
      parser.parse(r, callback, true);
    catch (IOException e){
      e.printStackTrace();
     System.exit(0);
}But unfortunately I don't know how to change this code to obtain such situation as making a new txt file without blank spaces > 0 and erasing the \n new lines. I'd like to get a txt file, which contains of line-by-line text. Could you please help me - tell & show me how to do that? I'd be very grateful
Regards, skullscape.

Hi,
You may need to post this question @ http://forum.java.sun.com/forum.jspa?forumID=31
Thanks
Runa

Invalid logging line: !DOCTYPE html PUBLIC "-//W3C//DTD XHTM error

Hi..
I am getting the following error while deploying a bpel process cantaining java embedding on weblogic server 11g.
[04:31:54 PM] Received HTTP response from the server, response code=200
[04:31:54 PM] Invalid logging line: <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
[04:31:54 PM] Invalid logging line: <html xmlns="http://www.w3.org/1999/xhtml">
[04:31:54 PM] Invalid logging line: <head>
[04:31:54 PM] Invalid logging line: <title>GE | Web Site Verification</title>
[04:31:54 PM] Invalid logging line: <style type="text/css">
[04:31:54 PM] Invalid logging line: 
[04:31:54 PM] Invalid logging line: </style>
[04:31:54 PM] Invalid logging line: <script>
[04:31:54 PM] Invalid logging line: function show_lang(section) {
[04:31:54 PM] Invalid logging line:      document.getElementById('excerpt_EN').style.display = "none";
[04:31:54 PM] Invalid logging line:
[04:31:54 PM] Invalid logging line: document.getElementById('excerpt_SCN').style.display = "none";
[04:31:54 PM] Invalid logging line:      document.getElementById('excerpt_FR').style.display = "none";
[04:31:54 PM] Invalid logging line:
[04:31:54 PM] Invalid logging line: document.getElementById('excerpt_ES').style.display = "none";
[04:31:54 PM] Invalid logging line:      document.getElementById('excerpt_DE').style.display = "none";
[04:31:54 PM] Invalid logging line:
[04:31:54 PM] Invalid logging line: document.getElementById('excerpt_IT').style.display = "none";
[04:31:54 PM] Invalid logging line:      document.getElementById('excerpt_JP').style.display = "none";
[04:31:54 PM] Invalid logging line:
[04:31:54 PM] Invalid logging line: document.getElementById('footer_EN').style.display = "none";
[04:31:54 PM] Invalid logging line:      document.getElementById('footer_SCN').style.display = "none";
[04:31:54 PM] Invalid logging line:
[04:31:54 PM] Invalid logging line: document.getElementById('footer_FR').style.display = "none";
[04:31:54 PM] Invalid logging line:      document.getElementById('footer_ES').style.display = "none";
[04:31:54 PM] Invalid logging line:
[04:31:54 PM] Invalid logging line: document.getElementById('footer_DE').style.display = "none";
[04:31:54 PM] Invalid logging line:      document.getElementById('footer_IT').style.display = "none";
[04:31:54 PM] Invalid logging line:
[04:31:54 PM] Invalid logging line: document.getElementById('footer_JP').style.display = "none";
[04:31:54 PM] Invalid logging line:      var excerpt = "excerpt_" + section;
[04:31:54 PM] Invalid logging line:      var footer="footer_"+
[04:31:54 PM] Invalid logging line: section;
[04:31:54 PM] Invalid logging line:      document.getElementById(excerpt).style.display = "block";
[04:31:54 PM] Invalid logging line:      document.getElementById(footer).style.display
[04:31:54 PM] Invalid logging line: = "block";
[04:31:54 PM] Invalid logging line: }
[04:31:54 PM] Invalid logging line: function show(section) {
[04:31:54 PM] Invalid logging line:      var excerpt = "excerpt_" + section;
[04:31:54 PM] Invalid logging line:      var full_text = "full_" + section;
[04:31:54 PM] Invalid logging line:
[04:31:54 PM] Invalid logging line: document.getElementById(excerpt).style.display = "none";
[04:31:54 PM] Invalid logging line:      document.getElementById(full_text).style.display = "block";
[04:31:54 PM] Invalid logging line: }
[04:31:54 PM] Invalid logging line:
[04:31:54 PM] Invalid logging line: function hide(section) {
[04:31:54 PM] Invalid logging line:      var excerpt = "excerpt_" + section;
[04:31:54 PM] Invalid logging line:      var full_text = "full_" + section;
[04:31:54 PM] Invalid logging line:
[04:31:54 PM] Invalid logging line: document.getElementById(full_text).style.display = "none";
[04:31:54 PM] Invalid logging line:      document.getElementById(excerpt).style.display = "block";
[04:31:54 PM] Invalid logging line: }
[04:31:54 PM] Invalid logging line:
[04:31:54 PM] Invalid logging line:
[04:31:54 PM] Invalid logging line: /** BEGIN SCRIPS FOR REMEMBER SSO IMPLEMENTATION **/
[04:31:54 PM] Invalid logging line: function rememberSSOID(){
[04:31:54 PM] Invalid logging line:
[04:31:54 PM] Invalid logging line:      if (document.SSOForm.saveID.checked){
[04:31:54 PM] Invalid logging line:           saveSSOID();
[04:31:54 PM] Invalid logging line:      } else {
[04:31:54 PM] Invalid logging line:           clearSSOID();
[souhaitez continuer sur ce site.
[04:31:54 PM] Invalid logging line:
[04:31:54 PM] Invalid logging line: Pour en savoir plus
[04:31:54 PM] Invalid logging line: sur le processus de protection de l'accès Internet GE, consultez les informations ci-dessous.</p>
[04:31:54 PM] Invalid logging line:      <p id="excerpt_ES"
[04:31:54 PM] Invalid logging line: style="display:none">La página a la que intenta acceder no es un sitio web de confianza de GE. Para protegerlo a usted y
[04:31:54 PM] Invalid logging line: a la Compañía, por favor, seguidamente inicie sesión para confirmar que desea continuar en este sitio. Para
[04:31:54 PM] Invalid logging line: obtener más datos respecto de la protección al acceso a Internet de GE, consulte la información que aparece
[04:31:54 PM] Invalid logging line: debajo.</p>
[04:31:54 PM] Invalid logging line:      <p id="excerpt_JP"
[04:31:54 PM] Invalid logging line: style="display:none">アクセスしようとして
[04:31:54 PM] Invalid logging line: いるサイトはGEが信賴していな
[04:31:54 PM] Invalid logging line: いウェブサイトです。自身と
[04:31:54 PM] Invalid logging line: 会社を守るため、下記にログ
[04:31:54 PM] Invalid logging line: インしてこのサイトを続けて
[04:31:54 PM] Invalid logging line: 閲覧することを確認してくだ
[04:31:54 PM] Invalid logging line: さい。 GE のインターネットアクセス保
[04:31:54 PM] Invalid logging line: 護プロセスについての詳細は
[04:31:54 PM] Invalid logging line: 、下記を参照してください。
[04:31:54 PM] Invalid logging line: </p>
[04:31:54 PM] Invalid logging line:      <p id="excerpt_DE" style="display:none">Die Website,
[04:31:54 PM] Invalid logging line:           <li>
[04:31:54 PM] Invalid logging line:
[04:31:54 PM] Invalid logging line:      </ul>
[04:31:54 PM] Invalid logging line: </div>
[04:31:54 PM] Invalid logging line: <div id="footer_JP" style="display:none">
[04:31:54 PM] Invalid logging line: <p><img
[04:31:54 PM] Invalid logging line: src="https://www.ge.com/security_proxy_sign_on/more-information.png"></p>
[04:31:54 PM] Invalid logging line:      <ul>
[04:31:54 PM] Invalid logging line:           <li>
[04:31:54 PM] Invalid logging line:
[04:31:54 PM] Invalid logging line: <div id="excerpt_one_JP"><a href="javascript:show('one_JP')
[04:31:54 PM] Invalid logging line: name="why_title">このページにアクセスする理由</a></div>
[04:31:54 PM] Invalid logging line:                     <div id="full_one_JP" style="display:none;">
[04:31:54 PM] Invalid logging line:                          <p><a
[04:31:54 PM] Invalid logging line: href="javascript:hide('one_JP')"
[04:31:54 PM] Invalid logging line: name="why_title">このページにアクセスする理由</a></p>
[04:31:54 PM] Invalid logging line:                          <p
[04:31:54 PM] Invalid logging line: class="why_desc">GE の企業情報を保護することは
[04:31:54 PM] Invalid logging line: 最優先事項です。
[04:31:54 PM] Invalid logging line: この情報は、物理的、電子的、手
[04:31:54 PM] Invalid logging line: 続き上のセーフガード（この認証
[04:31:54 PM] Invalid logging line: ページ等）により保護されます。
[04:31:54 PM] Invalid logging line: このページは、アクセスしている
[[04:31:54 PM] ---- Deployment finished. ----

Hi:
R u behind a Proxy?
If it so, u can configure it at JDEveloper in the Tools->Preferences->Web Browser and Proxy section.
Or u can deploy ur composite, directly in Enterprise Manager - Fusion Middleware Control.
thx
best

Document types show up twice when creating a new blank HTML document.

I followed the below instructions for getting HTML 5 as a document type option on CS4. Although I do have HTML 5 as an option now, my other stock CS4 document type options now show up twice in the menu. Any idea how to fix this?
To add the HTML document type, however, you need to modify the file named MMDocumentTypeDeclarations.xml. Here’s how to do that:
1. Find the configuration folder then navigate to the documenttypes folder. The Configuration folder can be found under the folder where you installed Dreamweaver.
2. Find and make a backup copy of MMDocumentTypeDeclarations.xml (MMDocumentTypeDeclarations.xml.bak)
3. Open MMDocumentTypeDeclarations.xml in your favorite text editor. If anything goes wrong, you can always get back to the default state by replacing your changes with the backup copy.
4. Go to the end of the document and find the </documenttypedeclarations> tag and add the following lines just before the </documenttypedeclarations> tag:
<documenttypedeclaration id="mm_html_5">
<title>
HTML5
</title>
<doctypedecl>
<![CDATA[<!DOCTYPE HTML>]]>
</doctypedecl>
<rootelement>
<![CDATA[<html></html>]]>
</rootelement>
<dtdcontext>html</dtdcontext>
<dtdcontext>html5</dtdcontext>
<dtdcontext>frameset_frame</dtdcontext>
<dtdcontext>xslt</dtdcontext>
</documenttypedeclaration>
5. Save the document
6. Restart Dreamweaver
You should now have an HTML5 document type option when you create a new document.
There was an issue with Dreamweaver CS4 that needed to be fixed in code to fully support HTML5 documents which is why upgrading to CS5 was recommended. In CS4, Dreamweaver’s definition of a well formed HTML document has a meta content type which isn’t required for HTML5 documents so, no matter what we do to CS4, any HTML document created by CS4 will have the following:
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
Replace that line with this:
<meta charset=utf-8">
You’ll need to do this every time you create a new HTML5 document.
Once you finish go to Preferences New Document and select HTML5 as default Doc Type...
Thanks to:
What I did was create a snippit of the meta and added it to my snippit's meta folder
Jeff Booher
Dreamweaver Engineering
Thanks for your feedback.

Hi,
I have done a test and I can reproduced your issue.
I referred to the blog about Automatically create Word documents which include list fields:
http://blogs.technet.com/b/brenclarke/archive/2009/04/15/automatically-create-word-documents-which-include-list-fields.aspx
When I create a multiple line of text column in a list, I just choose Plain text, Then I solved the issue:
Besides, here is a similar post, you can take a look at:
https://social.technet.microsoft.com/Forums/sharepoint/en-US/a7ab3a61-6643-4a47-a464-fe46b5db1558/rich-text-field-showing-html-code
Best Regards,
Lisa Chen
Lisa Chen
TechNet Community Support

Why can't I make call to parse HTML from inside Thread?

This is driving me crazy. With a defined HTMLEditorKit.ParserCallback object "callback", I am attempting to parse an HTML document retrieved from a URL by using:
new ParserDelegator().parse(new InputStreamReader(url.openStream( )), callback, true);
It doesn't work if I initiate the call in any way from within the run method of a Thread subclass (the way I'd like to do it). If I make the call in the constructor of the Thread subclass, however, it runs fine. I know it must have something to do with the fact that parse runs in a Thread of it's own - but the way to fix it isn't apparent to me.
I would appreciate some words from people who might know what's happening here... THANKS in advance.

Don't bother - figured it out - thanks.

Why can't I make call to parse HTML from inside a Thread?

This is driving me crazy. With a defined HTMLEditorKit.ParserCallback object "callback", I am attempting to parse an HTML document retrieved from a URL by using:
new ParserDelegator().parse(new InputStreamReader(url.openStream( )), callback, true);
It doesn't work if I initiate the call in any way from within the run method of a Thread subclass (the way I'd like to do it). If I make the call in the constructor of the Thread subclass, however, it runs fine. I know it must have something to do with the fact that parse runs in a Thread of it's own - but the way to fix it isn't apparent to me.
I would appreciate some words from people who might know what's happening here... THANKS in advance.

Don't bother - figured it out - thanks.

JEditorPane parsing HTML

Hi all,
I am using JEditorPane and it's ability to parse HTML, which although is relatively old and crusty is certainly all I need for the job.
Now, I understand there is a chain of classes involved in taking my .html file and turning popping into a something we can see in a JEditorPane. For example, an img tag, is picked up by HTMLEditorKit and turned into an ImageView for display purposes.
I want to do the following: I have subclassed HTMLEditorKit, and have overridden the HTMLFactory (although at the moment it just defers everything to super). I want to be able to pick out all of the html comment tags as they go through the HTMLEditorKit :
... and get to the comment text, "hey hey this is a comment", as a Java string. However I've been digging around with Element for hours now and although my HTMLFactory correctly digs out the comments from the rest of the elements:
else if (kind == HTML.Tag.COMMENT)
                    {System.out.println("I found a comment but don't know what it said!!");... as you can see, I don't know how to get to the comment text itself.
The reason why I want access to the comment text is that I want to supplement the HTML code a little bit and add something in the comment that will affect the way it is rendered when I read it depending on the comment - so there's the reason if curious.
Any help, and I do mean anything at all, would be much appreciated, as this is the last obstacle in my path to getting this thing working :)
Thanks for your time!
- Peter

Here is some old code I have lying around that attempts to iterate through all the elements. If I remember correctly the comment text is found in the AttributeSet of the element:
import java.io.*;
import java.net.*;
import java.util.*;
import javax.swing.*;
import javax.swing.text.*;
import javax.swing.text.html.*;
class GetHTML
    public static void main(String[] args)
        EditorKit kit = new HTMLEditorKit();
        Document doc = kit.createDefaultDocument();
        // The Document class does not yet handle charset's properly.
        doc.putProperty("IgnoreCharsetDirective", Boolean.TRUE);
        try
            // Create a reader on the HTML content.
            Reader rd = getReader(args[0]);
            // Parse the HTML.
            kit.read(rd, doc, 0);
            System.out.println( doc.getText(0, doc.getLength()) );
            System.out.println("----");
            // Iterate through the elements of the HTML document.
            ElementIterator it = new ElementIterator(doc);
            Element elem = null;
            while ( (elem = it.next()) != null )
                AttributeSet as = elem.getAttributes();
                System.out.println( "\n" + elem.getName() + " : " + as.getAttributeCount() );
                if ( elem.getName().equals( HTML.Tag.IMG.toString() ) )
                    Object o = elem.getAttributes().getAttribute( HTML.Attribute.SRC );
                    System.out.println( o );
                Enumeration enum = as.getAttributeNames();
                while( enum.hasMoreElements() )
                    Object name = enum.nextElement();
                    Object value = as.getAttribute( name );
                    System.out.println( "\t" + name + " : " + value );
                    if (value instanceof DefaultComboBoxModel)
                        DefaultComboBoxModel model = (DefaultComboBoxModel)value;
                        for (int j = 0; j < model.getSize(); j++)
                            Object o = model.getElementAt(j);
                            Object selected = model.getSelectedItem();
                            if ( o.equals( selected ) )
                                System.out.println( o + " : selected" );
                            else
                                System.out.println( o );
                if ( elem.getName().equals( HTML.Tag.SELECT.toString() ) )
                    Object o = as.getAttribute( HTML.Attribute.ID );
                    System.out.println( o );
                // Wierd, the text for each tag is stored in a 'content' element
                if (elem.getElementCount() == 0)
                    int start = elem.getStartOffset();
                    int end = elem.getEndOffset();
                    System.out.println( "\t" + doc.getText(start, end - start) );
        catch (Exception e)
            e.printStackTrace();
        System.exit(1);
    // Returns a reader on the HTML data. If 'uri' begins
    // with "http:", it's treated as a URL; otherwise,
    // it's assumed to be a local filename.
    static Reader getReader(String uri)
        throws IOException
        // Retrieve from Internet.
        if (uri.startsWith("http:"))
            URLConnection conn = new URL(uri).openConnection();
            return new InputStreamReader(conn.getInputStream());
        // Retrieve from file.
        else
            return new FileReader(uri);
}To test it just use:
java GetHTML somefile.html

Counting lines of parsed HTML documents

Similar Messages

Maybe you are looking for