Parse htmlentites with SAX

I am getting the following exception when I to parse an xml document (as a String) containg htmlentities:
org.xml.sax.SAXParseException: The entity "Atilde" was referenced, but not declared.for
<?xml version="1.0" encoding="UTF-8" ?><id="123" path="/bj&Atilde;&para;rk/">
...the code looks like:
InputSource input = new InputSource();
input.setCharacterStream(new StringReader(theXMLString));
reader.parse(input);Is there a way to make SAX aware of those htmlentities?

I am running into the same problem while trying to parse XML data from Yahoo Shopping web services for my price comparison mashup
xatori.com. The exception I am getting is:
org.jdom.input.JDOMParseException: Error on line 80: The entity "Atilde" was referenced, but not declared.
     at org.jdom.input.SAXBuilder.build(SAXBuilder.java:504)
     at org.jdom.input.SAXBuilder.build(SAXBuilder.java:807)
     at com.abc.xyz(yhoo.java:39)
Caused by: org.xml.sax.SAXParseException: The entity "Atilde" was referenced, but not declared.
     at com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.createSAXParseException(Unknown Source)
     at com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.fatalError(Unknown Source)
     at com.sun.org.apache.xerces.internal.impl.XMLErrorReporter.reportError(Unknown Source)
     at com.sun.org.apache.xerces.internal.impl.XMLScanner.reportFatalError(Unknown Source)
     at com.sun.org.apache.xerces.internal.impl.XMLScanner.scanAttributeValue(Unknown Source)
     at com.sun.org.apache.xerces.internal.impl.XMLNSDocumentScannerImpl.scanAttribute(Unknown Source)
     at com.sun.org.apache.xerces.internal.impl.XMLNSDocumentScannerImpl.scanStartElement(Unknown Source)
     at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl$FragmentContentDriver.next(Unknown Source)
     at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(Unknown Source)
     at com.sun.org.apache.xerces.internal.impl.XMLNSDocumentScannerImpl.next(Unknown Source)
     at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown Source)
     at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(Unknown Source)
     at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(Unknown Source)
     at com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(Unknown Source)
     at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(Unknown Source)
     at com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser.parse(Unknown Source)
     at org.jdom.input.SAXBuilder.build(SAXBuilder.java:489)
     ... 2 moreThis happens only with specific (random) search queries...thoughts?

Similar Messages

  • How to Parse XML with SAX and Retrieving the Information?

    Hiya!
    I have written this code in one of my classes:
    /**Parse XML File**/
              SAXParserFactory factory = SAXParserFactory.newInstance();
              GameContentHandler gameCH = new GameContentHandler();
              try
                   SAXParser saxParser = factory.newSAXParser();
                   saxParser.parse(recentFiles[0], gameCH);
              catch(javax.xml.parsers.ParserConfigurationException e)
                   e.printStackTrace();
              catch(java.io.IOException e)
                   e.printStackTrace();
              catch(org.xml.sax.SAXException e)
                   e.printStackTrace();
              /**Parse XML File**/
              games = gameCH.getGames();And here is the content handler:
    import java.util.ArrayList;
    import org.xml.sax.*;
    import org.xml.sax.helpers.DefaultHandler;
    class GameContentHandler extends DefaultHandler
         private ArrayList<Game> games = new ArrayList<Game>();
         public void startDocument()
              System.out.println("Start document.");
         public void endDocument()
              System.out.println("End document.");
         public void startElement(String namespaceURI, String localName, String qualifiedName, Attributes atts) throws SAXException
         public void endElement(String namespaceURI, String localName, String qualifiedName) throws SAXException
         public void characters(char[] ch, int start, int length) throws SAXException
              /**for (int i = start; i < start+length; i++)
                   System.out.print(ch);
         public ArrayList<Game> getGames()
              return games;
    }And here is the xml i am trying to parse:<?xml version="1.0" encoding="ISO-8859-1" standalone="yes"?>
    <Database>
         <Name></Name>
         <Description></Description>
         <CurrentGameID></CurrentGameID>
         <Game>
              <gameID></gameID>
              <name></name>
              <publisher></publisher>
              <platform></platform>
              <type></type>
              <subtype></subtype>
              <genre></genre>
              <serial></serial>
              <prodReg></prodReg>
              <expantionFor></expantionFor>
              <relYear></relYear>
              <expantion></expantion>
              <picPath></picPath>
              <notes></notes>
              <discType></discType>
              <owner></owner>
              <location></location>
              <borrower></borrower>
              <numDiscs></numDiscs>
              <discSize></discSize>
              <locFrom></locFrom>
              <locTo></locTo>
              <onLoan></onLoan>
              <borrowed></borrowed>
              <manual></manual>
              <update></update>
              <mods></mods>
              <guide></guide>
              <walkthrough></walkthrough>
              <cheats></cheats>
              <savegame></savegame>
              <completed></completed>
         </Game>
    </Database>I have been trying for ages and just can't get the content handler class to extract a gameID and instantiate a Game to add to my ArrayList! How do I extract the information from my file?
    I have tried so many things in the startElement() method that I can't actually remember what I've tried and what I haven't! If you need to know, the Game class instantiates with asnew Game(int gameID)and the rest of the variables are public.
    Please help someone...                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       

    OK, how's this?
    public void startElement(String namespaceURI, String localName, String qualifiedName, Attributes atts) throws SAXException
              current = "";
         public void endElement(String namespaceURI, String localName, String qualifiedName) throws SAXException
              try
                   if(qualifiedName.equals("Game") || qualifiedName.equals("Database"))
                        {return;}
                   else if(qualifiedName.equals("gameID"))
                        {games.add(new Game(Integer.parseInt(current)));}
                   else if(qualifiedName.equals("name"))
                        {games.get(games.size()-1).name = current;}
                   else if(qualifiedName.equals("publisher"))
                        {games.get(games.size()-1).publisher = current;}
                   etc...
                   else
                        {System.out.println("ERROR - Qualified Name found in xml that does not exist as databse field: " + qualifiedName);}
              catch (Exception e) {} //Ignore
         public void characters(char[] ch, int start, int length) throws SAXException
              current += new String(ch, start, length);
         }

  • Parse XML with SAX

    Hi all, I have the folow xml file:
    <n>
    <q>
    <r></r>
    <r></r>
    </q>
    </n>
    How can I make, that automaticli show me the elements of a tree?
    for Example:
    n includes r,r,q
    q includes r,r
    r includes r
    r includes r
    That I can show the Elements of all Trees?
    Thanks verry much.

    Read this tutorial:
    http://java.sun.com/webservices/docs/1.2/tutorial/doc/index.html
    It has some examples like that. If I knew what you wanted when you said "show" I could be more specific.

  • Changing xmlschema while parsing with sax

    Hello world,
    I`m using the sax-parser (xerces) and i want to combine different schema-files to parse an xml-String;
    <root>
         <intervall>1,5</intervall>
         <nix>jetzt echt nix!</nix>
         <testtag>
              <text>huhu joe</text>
         </testtag>
         <theLast>das letzte element</theLast>
    </root>e.g. i want to parse the tag testtag with another schema ??
    does anyone have an idea or a sample coding ??
    thank you very much

    thank you for the reply, but i want to change the Schema file while parsing the xmlString with Sax, without manipulation the xmlString;
    the xml-String i get is fixed;
    wish you all a great sunday

  • SAX Parser Validation with Schemas (C, C++ and JAVA)

    We are currently using the Oracle XML Parser for C to parse and validate XML data using the SAX parser interface with validation turned on. We currently define our XML with a DTD, but would like to switch to schemas. However, the Oracle XML Parser 9.2.0.3.0 (C) only validates against a DTD, not a schema when using the SAX interface. Are there plans to add schema validation as an option? If so, when would this be available? Also, the same limitation appears to be true for C++ and JAVA. When will any of these provide SAX parsing with schema validation?
    Thanks!
    John

    Will get back to you after checked with development team...

  • Problems parsing & with SAX

    Hello all,
    I'm new to SAX and facing quite odd problem with it while parsing data.
    If the data to be parsed has &amps, SAX cuts out everything before it. For example;
    <url>www.foobar.com?something=123&blaa=456</url>
    This will return:
    blaa=456
    Any ideas what's wrong or should I post some sample code ?

    You didn't post the code that has this problem, but I think I know what is happening.
    You are probably assuming that the characters() method will always return you the entire text node all at once. This is not the case. You should write code like this:
    1. startElement: create a new StringBuffer.
    2. characters: append the data to that StringBuffer.
    3. endElement: now you have all the data in the StringBuffer and you can use it.

  • Can I parse non-wellformed XML with SAX at all?

    Hi all,
    i was wondering whether its possible at all to parse XML that is not well formed with SAX.
    e.g. A HTML file that doesnt close tags and stuff like that.
    I tried implementing the fatal() method of the Handler in a a way that it consumes the exception but does not rethrow it.
    Also I tried setting the validation property to false. Both with no success.
    Any help would be appriciated.
    thx
    philipp

    Your experiments tell you the answer.
    If you have HTML tag soup, why not just run it through JTidy or HTMLTidy to make it into well-formed XHTML?

  • Is there any possibility to combine XPath with SAX in Java?

    HI Gentlemen,
    I have an XML instance to parse. One solution works with XPath and Document builder. However, the tree in memory is too big so that I can not build it in my storage (8 GB). Does anyone of you know a method where I use an XPath expression (Java) to select a node but with a better parser (e g SAX) which is not so space hungry? Direct access of nodes is obligatory.
    Thanks, kind regards from
    Miklos HERBOLY

    As SAX  parsers do not build a DOM structure and XPath requires a DOM structure to select elements from, XPath is not usable with SAX, but some analysers support setting the XPath expressions to analyse before invoking the SAX parser and provide the result for XPath expressions.
    Refer
    https://code.google.com/p/xpath4sax/

  • NullPointerException with SAX

    I have developed a CSV to XML parser using a JAXP with SAX Events to parse the CSV file into a DOM tree.
    Well inside the parse() method I have the following code":
    public void parse(InputSource input) throws IOException, SAXException
    BufferedReader br = null;
    if( input.getCharacterStream() != null )
    br = new BufferedReader( input.getCharacterStream() );
    else if( input.getByteStream() != null )
    br = new BufferedReader( new InputStreamReader( input.getByteStream() ) );
    else if( input.getSystemId() != null )
    URL url = new URL( input.getSystemId() );
    br = new BufferedReader( new InputStreamReader( url.openStream() ) );
    else
    throw new SAXException( "Objeto InputSource invalido" );
    ContentHandler ch = getContentHandler();
    ch.startDocument();
    ch.startElement( "", "", "file", new AttributesImpl() );
    this.parseInput( br );
    ch.endElement( "", "", "file" );
    ch.endDocument();
    Problem is that whenever the app gets to the ch.startDocument() statement it throws an java.lang.NullPointerExecption. I have no idea why this is happening, I have tested the very same code with Xalan 2 and Xercer 2 parsers and it works without problems. But using the oracle xml parser v2 throws the Exception.
    Is this a bug? should I set tome of the Transformer's attributes to an specifica value to avoid this? Where could I find more info on processing SAX events?
    Thanks,
    Fedro

    Fedro,
    Did you try it using XDK v10?

  • Problem parsing XML with schema when extracted from a jar file

    I am having a problem parsing XML with a schema, both of which are extracted from a jar file. I am using using ZipFile to get InputStream objects for the appropriate ZipEntry objects in the jar file. My XML is encrypted so I decrypt it to a temporary file. I am then attempting to parse the temporary file with the schema using DocumentBuilder.parse.
    I get the following exception:
    org.xml.sax.SAXParseException: cvc-elt.1: Cannot find the declaration of element '<root element name>'
    This was all working OK before I jarred everything (i.e. when I was using standalone files, rather than InputStreams retrieved from a jar).
    I have output the retrieved XML to a file and compared it with my original source and they are identical.
    I am baffled because the nature of the exception suggests that the schema has been read and parsed correctly but the XML file is not parsing against the schema.
    Any suggestions?
    The code is as follows:
      public void open(File input) throws IOException, CSLXMLException {
        InputStream schema = ZipFileHandler.getResourceAsStream("<jar file name>", "<schema resource name>");
        DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
        DocumentBuilder builder = null;
        try {
          factory.setNamespaceAware(true);
          factory.setValidating(true);
          factory.setAttribute(JAXP_SCHEMA_LANGUAGE, W3C_XML_SCHEMA);
          factory.setAttribute(JAXP_SCHEMA_SOURCE, schema);
          builder = factory.newDocumentBuilder();
          builder.setErrorHandler(new CSLXMLParseHandler());
        } catch (Exception builderException) {
          throw new CSLXMLException("Error setting up SAX: " + builderException.toString());
        Document document = null;
        try {
          document = builder.parse(input);
        } catch (SAXException parseException) {
          throw new CSLXMLException(parseException.toString());
        }

    I was originally using getSystemResource, which worked fine until I jarred the application. The problem appears to be that resources returned from a jar file cannot be used in the same way as resources returned directly from the file system. You have to use the ZipFile class (or its JarFile subclass) to locate the ZipEntry in the jar file and then use ZipFile.getInputStream(ZipEntry) to convert this to an InputStream. I have seen example code where an InputStream is used for the JAXP_SCHEMA_SOURCE attribute but, for some reason, this did not work with the InputStream returned by ZipFile.getInputStream. Like you, I have also seen examples that use a URL but they appear to be URL's that point to a file not URL's that point to an entry in a jar file.
    Maybe there is another way around this but writing to a file works and I set use File.deleteOnExit() to ensure things are tidied afterwards.

  • Edit an XML file with SAX

    Dear all, I am so confused�.
    I have been trying for the last few days to understand how sax works� The only thing I understood is:
    DefaultHandler handler = new Echo01();
    SAXParserFactory factory = SAXParserFactory.newInstance();
            try {
                out = new OutputStreamWriter(System.out, "UTF8");
                SAXParser saxParser = factory.newSAXParser();
                saxParser.parse(file , handler);
            } catch (Throwable t) {
                t.printStackTrace();
            System.exit(0);
        }Ok, I assign the SAXParser the xml file and a handler. The parser parses and throws events that the handler catches. By implementing some handler interface or overriding the methods of an existing handler (e.g DeafultHandler class) I get to do stuff�
    But still, suppose I have implement startElement() method of DefaultHandler class and I know that the pointer is currently placed on an element e.g. <name>bob</name>. How do I get the value of the element, and if I manage to do that, how can I replace�bob� with �tom�?
    I would really appreciate any help given� just don�t recommend http://java.sun.com/webservices/jaxp/dist/1.1/docs/tutorial/ because although there are interesting staff in there, it does not solve my problem�

    Maybe SAX is not the right tool for you.
    With SAX, you implement methods like startElement and characters that get called as XML data is encountered by the parser. If you want to catch it or not, the SAX parser does not care. In your case, the "bob" part will be passed in one or more calls to characters. To safely process the data, you need to do something like build a StringBuffer or StringBuilder in the constructor of the class, and then in the startElement, if the name is one you want to read, set the length to zero. In the characters method, append the data to the StringBuilder or StringBuffer. In the endElement, do a toString to keep the data wherever you want.
    This works for simple XML, but may need to be enhanced if you have nested elements with string values that contain other elements.
    On the other hand, if your file is not huge, you could use DOM. With DOM, (or with JDOM, and I would expect with Dom4J -- but I have only used the first two) you do a parse and get a Document object with the entire tree. That allows you to easily (at least it is easy once you figure out how to do it) find a node like the "name" element and change the Text object that is its child from a value of "bob" to "tom". With DOM, you can then serialize the modified Document tree and save it as an XML file. SAX does not have any way to save your data. That burden falls to you entirely.
    Dave Patterson

  • What to do with SAX events

    I want to iterate over a database recordset and generate sax events to create a virtual xml document. But I'm struggling to see how the events are consumed.
    What do I do with the events that are generated by the strart/end document and element handlers. How do I send to a file, or better still, pass the events onto some tool to output as html/xml pages?
    Cheers again
    -thanks 4earlier code @Trejkaz

    All the examples I have ever seen of SAX are like this:
    You take an XML document and give it to a SAX parser. The SAX parser turns it into a stream of SAX events and calls your handler's startElement() etc. methods, which generally write to a file or something like that.
    Your requirement is the reverse, namely you want to input from the "something like that", make a stream of SAX events, and have those turned into an XML document. I have never seen a decent example of this so I had to work it out for myself. I posted my solution in this forum several months ago but I can't find it now. So here it is again:SAXTransformerFactory factory = (SAXTransformerFactory)TransformerFactory.newInstance();
    TransformerHandler handler = factory.newTransformerHandler();
    // if you want to use XSL to transform what you produce then
    // you need the version that takes a Templates argument.
    handler.setResult(new StreamResult(response.getWriter()));
    // in my case I send the resulting XML document to the servlet
    // response, but you could send it somewhere else.
    SAXParserFactory spf = SAXParserFactory.newInstance();
    XMLReader reader = spf.newSAXParser().getXMLReader();
    reader.setContentHandler(handler);
    reader.setProperty("http://xml.org/sax/properties/lexical-handler", handler);
    reader.setFeature("http://xml.org/sax/features/namespaces", true);
    reader.setFeature("http://xml.org/sax/features/namespace-prefixes", false);
    handler.startDocument();
    startElement(handler, "Doc");
    // I am producing an XML document whose root is a Doc element.
    // Send more SAX events here.
    endElement(handler, "Doc");
    handler.endDocument();

  • How to add attribute to Element with SAX

    Hi,
    I'm parsing XML document with SAX using DefaultHandler.
    How can I add attribute to start tag?

    Is this right????????????Yes, it's right. Everything everybody except you has said in this thread has been right.

  • Processing unfinished stream with SAX

    Hi,
    I'm just writing some kind of a jabber plugin in java. I've decided to use sax for parsing server responses. However I've encountered a problem with sax.
    saxParser.parse(inputStream, this);Problem is, that events (such as startElement) are called after the connection (streams read method returns with -1) is closed. Is there any way to force sax to raise event as soon as the tag is read?
    Any help will be greatly appreciated.
    Regards
    Versor
    Edited by: versor on Nov 2, 2007 3:20 AM

    versor wrote:
    ... Problem is, that events (such as startElement) are called after the connection (streams read method returns with -1) is closed. Is there any way to force sax to raise event as soon as the tag is read? ...Fully circumvent the problem parsing the buffered stream.

  • Parsing XML with DTD residing in jar file

    Hi,
    I have problems using crimson parser for my program under JDK 1.4.0b2. It attempts to parse an xml file with SAX. The corresponding lies in a jar file in a different directory but reachable through the classpath. All I get is an exception.
    org.xml.sax.SAXParseException: Relative URI "my.dtd"; kann nicht ohne eine Dokument-URI aufgel�st werden.
    at org.apache.crimson.parser.Parser2.fatal(Parser2.java:3121)
    at org.apache.crimson.parser.Parser2.fatal(Parser2.java:3115)
    at org.apache.crimson.parser.Parser2.resolveURI(Parser2.java:2702)
    at org.apache.crimson.parser.Parser2.maybeExternalID(Parser2.java:2674)
    at org.apache.crimson.parser.Parser2.maybeDoctypeDecl(Parser2.java:1125)
    at org.apache.crimson.parser.Parser2.parseInternal(Parser2.java:489)
    at org.apache.crimson.parser.Parser2.parse(Parser2.java:305)
    at org.apache.crimson.parser.XMLReaderImpl.parse(XMLReaderImpl.java:433)
    at org.xml.sax.helpers.XMLReaderAdapter.parse(XMLReaderAdapter.java:223)
    at javax.xml.parsers.SAXParser.parse(SAXParser.java:326)
    I used Xerces before and it worked fine. I already searched the community for that problem. All hints I found assume that xml file and dtd are in the same directory. Setting the systemId of the input source doesn't fix the problem.
    Is there anyone out there knowing what to do?
    Thanks,
    Thorsten

    Use a Resolver to map a PUBLIC name to a local name:
    <!DOCTYPE DOC PUBLIC "-//gaskin.de//XMLDOC 1.0//EN"
    "http://www.gaskin.de/dtd/xmldoc.dtd">
    public static register() {
       ClassLoader loader = Resolver.class.getClassLoader();
       registerCatalogEntry(
          "-//gaskin.de//XMLDOC 1.0//EN",
          "de/gaskin/resources/dtd/XMLDOC.DTD",
          loader);
    }

Maybe you are looking for

  • Creation of package in 4.6C

    Hi, I want to create a package in 4.6C.. Is there any transaction for creating a package, apart from SE80.. In my requirement, i need to run a BDC for creating a package and for that reason i cannot use SE80. Suggest if any other tranx is there.. In

  • How to Add C3PO Toolbar Buttons with GroupWise in C#

    Hello i try How to Add C3PO Toolbar Buttons with GroupWise in C#. I do all like i read here http://support.novell.com/techcenter...a20000906.html But i can see any result i create file with Wizard(witout project) Then i add this file to my EXE projec

  • Package Explorer functionality change needed?

    I notice that if you have a package with packages in it that have components in them, but no components under the parent packages,  just the other packages, the package symbol shows the package as empty though it is full of components. This should be

  • Hyperion License for DEV and QA

    <p>Hi,</p><p> </p><p>I like to know if anyone out there is using Windows Server forHyperion and Oracle on AIX as DB.</p><p> </p><p>How have you applied the licensing for DEV and QA ?  Is ita separate license for DEV and separate license for QA.</p><p

  • Localization DatePicker and TimePicker

    How do you localize the DatePicker and the TimePicker? Our use case is that the user will set the language in the control panel and we will try to display the correct datepicker/timepicker format. I tried setting the DatePicker/TimePicker Language pr