Problems parsing & with SAX

Hello all,
I'm new to SAX and facing quite odd problem with it while parsing data.
If the data to be parsed has &amps, SAX cuts out everything before it. For example;
<url>www.foobar.com?something=123&blaa=456</url>
This will return:
blaa=456
Any ideas what's wrong or should I post some sample code ?

You didn't post the code that has this problem, but I think I know what is happening.
You are probably assuming that the characters() method will always return you the entire text node all at once. This is not the case. You should write code like this:
1. startElement: create a new StringBuffer.
2. characters: append the data to that StringBuffer.
3. endElement: now you have all the data in the StringBuffer and you can use it.

Similar Messages

Changing xmlschema while parsing with sax

Hello world,
I`m using the sax-parser (xerces) and i want to combine different schema-files to parse an xml-String;
<root>
     <intervall>1,5</intervall>
     <nix>jetzt echt nix!</nix>
     <testtag>
          <text>huhu joe</text>
     </testtag>
     <theLast>das letzte element</theLast>
</root>e.g. i want to parse the tag testtag with another schema ??
does anyone have an idea or a sample coding ??
thank you very much

thank you for the reply, but i want to change the Schema file while parsing the xmlString with Sax, without manipulation the xmlString;
the xml-String i get is fixed;
wish you all a great sunday

Validating parser with SAX and SCHEMA

Hi,
Can I validate a xml file by SCHEMA with Sun's SAX?
If i can, what versions i need?
jsdk1.3 and SAX 1.0 are enought ?
What is JAXP , it's a SAX version ?
Thanks.
Sorry for my english !

I tried this doc with the file source, but it does't work, i have the error message following :
java.lang.IllegalArgumentException: No attributes are implemented
Download the latest copy of JAXP from http://java.sun.com
ERROR! INVALID FILE
But I have JAXP installed with JavaTM XML Pack, and I didn't change the source, juste copy on my hard disk and run the .java.
Why do I have this errors ? please.

How to HTMLEditorKit's parser with Sax Parser?

How does one do it? I know you need to override getParser method. But the ParserCallback's handleStartTag, handleEndTag etc expect a parameter pos of type int. Needless to say, the pos is undocumented. I have not idea what value to pass. If I pass -1, I get some exceptions. Anyone has a working code snippet which he/she can share?
Thanks a bunch
Narayanan

ParserCallback is like an event listener.
Take for example a MouseListener. You don't invoke the mouseClicked(...) method. You add code to the mouseClicked(...) method to respond to the mouseClicked event.
Same thing with the ParserCallback. You don't invoke the handleStartTag(...) method. As the ParserCallback is parsing an HTML Document it will notify you when it finds a start tag in the Document. It tells you the tag it found and the position of the tag in the Document.
Here is a simple example that just displays all the text in the document:
import java.io.*;
import java.net.*;
import javax.swing.text.html.parser.*;
import javax.swing.text.html.*;
public class ParserCallbackText extends HTMLEditorKit.ParserCallback
     public void handleText(char[] data, int pos)
          System.out.println( data );
     public static void main(String[] args)
          throws Exception
          Reader reader = getReader(args[0]);
          ParserCallbackText parser = new ParserCallbackText();
          new ParserDelegator().parse(reader, parser, true);
     static Reader getReader(String uri)
          throws IOException
          // Retrieve from Internet.
          if (uri.startsWith("http:"))
               URLConnection conn = new URL(uri).openConnection();
               return new InputStreamReader(conn.getInputStream());
          // Retrieve from file.
          else
               return new FileReader(uri);
}

Problem parsing with SAX2

Hi all,
I am new to SAX2 Parsing and I'm having trouble getting my application to parse an xml. My code looks something like this:
XMLReader xr = XMLReaderFactory.createXMLReader("org.apache.xerces.parsers.SAXParser");
ContentHandler contentHandler = new DefaultHandler();
ErrorHandler errorHandler = new DefaultHandler();
xr.setContentHandler(contentHandler);
xr.setErrorHandler(errorHandler);
System.out.println("about to parse");
xr.parse(new InputSource(template));
System.out.println("done parsing");
I have implemented the startDocument, startElement, etc methods, but I'm not sure if its getting into those methods, I have System.outs in them and they are not being written to the console, but I don't see any exceptions being thrown by the parse method, I get to the last line in the code, where I print out "done parsing", any ideas what's going on? thanks in advance

ContentHandler contentHandler = new DefaultHandler();
xr.setContentHandler(contentHandler);You're creating a new DefaultHandler() to recieve the parser events, not whatever object you've implemented the ContentHandler methods in yourself.
Pete

Problem with SAX parser - entity must finish with a semi-colon

Hi,
I'm pretty new to the complexities of using SAXParserFactory and its cousins of XMLReaderAdapter, HTMLBuilder, HTMLDocument, entity resolvers and the like, so wondered if perhaps someone could give me a hand with this problem.
In a nutshell, my code is really nothing more than a glorified HTML parser - a web page editor, if you like. I read in an HTML file (only one that my software has created in the first place), parse it, then produce a Swing representation of the various tags I've parsed from the page and display this on a canvas. So, for instance, I would convert a simple <TABLE> of three rows and one column, via an HTMLTableElement, into a Swing JPanel containing three JLabels, suitably laid out.
I then allow the user to amend the values of the various HTML attributes, and I then write the HTML representation back to the web page.
It works reasonably well, albeit a bit heavy on resources. Here's a summary of the code for parsing an HTML file:
      htmlBuilder = new HTMLBuilder();
parserFactory = SAXParserFactory.newInstance();
parserFactory.setValidating(false);
parserFactory.setNamespaceAware(true);
FileInputStream fileInputStream = new FileInputStream(htmlFile);
InputSource inputSource = new InputSource(fileInputStream);
DoctypeChangerStream changer = new DoctypeChangerStream(inputSource.getByteStream());
changer.setGenerator(
   new DoctypeGenerator()
      public Doctype generate(Doctype old)
         return new DoctypeImpl
         old.getRootElement(),
                          old.getPublicId(),
                          old.getSystemId(),
         old.getInternalSubset()
      resolver = new TSLLocalEntityResolver("-//W3C//DTD XHTML 1.0 Transitional//EN", "xhtml1-transitional.dtd");
      readerAdapter = new XMLReaderAdapter(parserFactory.newSAXParser().getXMLReader());
      readerAdapter.setDocumentHandler(htmlBuilder);
      readerAdapter.setEntityResolver(resolver);
      readerAdapter.parse(inputSource);
      htmlDocument = htmlBuilder.getHTMLDocument();
      htmlBody = (HTMLBodyElement)htmlDocument.getBody();
      traversal = (DocumentTraversal)htmlDocument;
      walker = traversal.createTreeWalker(htmlBody,NodeFilter.SHOW_ELEMENT, null, true);
      rootNode = new DefaultMutableTreeNode(new WidgetTreeRootNode(htmlFile));
      createNodes(walker); However, I'm having a problem parsing a piece of HTML for a streaming video widget. The key part of this HTML is as follows:
            <object classid="clsid:D27CDB6E-AE6D-11cf-96B8-444553540000"
              id="client"
        width="100%"
        height="100%"
              codebase="http://fpdownload.macromedia.com/get/flashplayer/current/swflash.cab">
              <param name="movie" value="client.swf?user=lkcl&stream=stream2&streamtype=live&server=rtmp://192.168.250.206/oflaDemo" />
         etc....You will see that the <param> tag in the HTML has a value attribute which is a URL plus three URL parameters - looks absolutely standard, and in fact works absolutely correctly in a browser. However, when my readerAdapter.parse() method gets to this point, it throws an exception saying that there should be a semi-colon after the entity 'stream'. I can see whats happening - basically the SAXParser thinks that the ampersand marks the start of a new entity. When it finds '&stream' it expects it to finish with a semi-colon (e.g. much like and other such HTML characters). The only way I can get the parser past this point is to encode all the relevant ampersands to %26 -- but then the web page stops working ! Aaargh....
Can someone explain what my options are for getting around this problem ? Some property I can set on the parser ? A different DTD ? Not to use SAX at all ? Override the parser's exception handler ? A completely different approach ?!
Could you provide a simple example to explain what you mean ?
Thanks in anticipation !

You probably don't have the ampersands in your "value" attribute escaped properly. It should look like this:
value="client.swf?user=lkcl&stream=...{code}
Most HTML processors (i.e. browsers) will overlook that omission, because almost nobody does it right when they are generating HTML by hand, but XML processors won't.

Problem Using Sax parser with Eclipse 3.0 and command line

Hi,
I am parsing a xml file with sax. When I am running my programm in the command line everthing is ok and I get the right results from parsing.
But if I am running the programm in Eclipse 3.0 (the same java code) I get an other result (the wrong results).
Does anybody know what this can be the reason for. Is Eclipse using an other xml parser and if where I can change the parser?
It would be very kind if somebody can give me a reason for this strange behaviour.
Thanks in advance

I have solved my problem.
In the command line I used jre 1.4 and in Eclipse I used jre 1.5.
I think jre 1.5 uses an other xml parser so I got an other result.
If i use in Eclipse jre1.4 I get the same result as in the command line.

Problem in using SAX parser.

Hai All,
I have got a problem in using SAX parser.
My XML looks like this:
<authorizer>
<first-name>HP</first-name>
<last-name>Services</last-name>
<phone>800-22-1984</phone>
</authorizer>
<destination>
<first-name>John</first-name>
<last-name>Doe</last-name>
<company>John Doe Enterprises, Inc.</company>
<department>Manufacturing</department>
<phone>800-555-1234</phone>
<address>
<street-one>1654 Peachtree Str</street-one>
<street-two>Suite Y</street-two>
<city>Atlanta</city>
<province>GA</province>
<country>US</country>
<postal-code>30326</postal-code>
</address>
</destination>
my part of SAX parser code is:
public void startElement (String name, AttributeList attrs)
throws SAXException
accumulator.setLength(0);
public void characters (char buf [], int offset, int len)
throws SAXException
accumulator.append(buf, offset, len);
public void endElement (String name)
throws SAXException
if (name.equals("first-name") )
firstname=accumulator.toString().trim();
if (name.equals("last-name"))
lastname=accumulator.toString().trim();
My problem is that i have to store the values of first-name and last-name.
but i have that in both
<authorizer> </authorizer> Tag and
<destination> </destination>
I need to retrive authorizer's firstname,lastname and
destination's firstname and lastname.
what i mean is i need to store authorizerFirstName,authorizerLastName
destinationFirstname and destinationLastname.
Pls let me know how to do that.
Thanks in advance.
Pooja.

hi pooja,
I think you are using DataHandler for parsing. Its deprecated. try using contentHandler . You can get the value of the element at the beginning. say for example
<firstname>sdfs</firstname>
the startElement will be firstname
the next method that it invokes will be characters method which has the text associated with the element. I am sending a sample code for your problem. try using it .
boolean m_boolinAuth = false;
boolean m_boolinDest = false;
boolean m_bAuthFName = false;
boolean m_bAuthLName = false;
public void startElement(String namespaceURI, String elementName, String qName, Attributes atts)
//does the logic for startElement
if(qName.equals("Authorization"))
m_boolinAuth = true;
m_boolinDest = false;
else if(qName.equals("Destination"))
m_boolinDest = true;
m_boolinAuth = false;
if(qName.equals("firstname"))
m_bFirstName = true;
if(qName.equals("lastname"))
m_bLastName = true;
public void characters(char[] ch, int start, int length)
//does the logic for characters.
String str = new String(ch,start,length);
if(m_bFirstName)
if(m_boolinAuth)
m_strAuthFirstName =str;
else if(m_boolinDest)
m_strDestFirstName = str;
m_bFirstName = false;
if(m_bLastName)
//same as first name case;
}

Problem parsing XML with schema when extracted from a jar file

I am having a problem parsing XML with a schema, both of which are extracted from a jar file. I am using using ZipFile to get InputStream objects for the appropriate ZipEntry objects in the jar file. My XML is encrypted so I decrypt it to a temporary file. I am then attempting to parse the temporary file with the schema using DocumentBuilder.parse.
I get the following exception:
org.xml.sax.SAXParseException: cvc-elt.1: Cannot find the declaration of element '<root element name>'
This was all working OK before I jarred everything (i.e. when I was using standalone files, rather than InputStreams retrieved from a jar).
I have output the retrieved XML to a file and compared it with my original source and they are identical.
I am baffled because the nature of the exception suggests that the schema has been read and parsed correctly but the XML file is not parsing against the schema.
Any suggestions?
The code is as follows:
public void open(File input) throws IOException, CSLXMLException {
    InputStream schema = ZipFileHandler.getResourceAsStream("<jar file name>", "<schema resource name>");
    DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
    DocumentBuilder builder = null;
    try {
      factory.setNamespaceAware(true);
      factory.setValidating(true);
      factory.setAttribute(JAXP_SCHEMA_LANGUAGE, W3C_XML_SCHEMA);
      factory.setAttribute(JAXP_SCHEMA_SOURCE, schema);
      builder = factory.newDocumentBuilder();
      builder.setErrorHandler(new CSLXMLParseHandler());
    } catch (Exception builderException) {
      throw new CSLXMLException("Error setting up SAX: " + builderException.toString());
    Document document = null;
    try {
      document = builder.parse(input);
    } catch (SAXException parseException) {
      throw new CSLXMLException(parseException.toString());
    }

I was originally using getSystemResource, which worked fine until I jarred the application. The problem appears to be that resources returned from a jar file cannot be used in the same way as resources returned directly from the file system. You have to use the ZipFile class (or its JarFile subclass) to locate the ZipEntry in the jar file and then use ZipFile.getInputStream(ZipEntry) to convert this to an InputStream. I have seen example code where an InputStream is used for the JAXP_SCHEMA_SOURCE attribute but, for some reason, this did not work with the InputStream returned by ZipFile.getInputStream. Like you, I have also seen examples that use a URL but they appear to be URL's that point to a file not URL's that point to an entry in a jar file.
Maybe there is another way around this but writing to a file works and I set use File.deleteOnExit() to ensure things are tidied afterwards.

Sax parse with offset

hi,
is there a way to start parsing xml file with sax using offest, the file i am parsing is over 2gb and its parsed on multiple machines, each machine parsing 1/n (number of machines) of the file, i managed to do this simply by counting starting element of a record and if criteria is met records are parsed, tho for the last part parser need to read whole file and when it gets to desired position it starts parsing, is there a way to do this using some kind of offset. total number of records is known.
tnx

hmhmmh, ok so here is the hole problem, this is all done and working properly, i just want to speed it up if possible.
in the beginning parsing was done on a single machine and since file is pretty big it took a lot of time, so we have made changes to code allowing it to be started from several machines and each one its doing one part of the same file (each instance has its copy), logic is: since number of records inside xml is known start and stop marker are calculated based on computer id and total number of computers on which parsing is done plus some safeties to ensure that every record is processed. so each instance of app knows when to start and stop parsing and simply goes through the file and counts number of records seen and when it reaches element representing start marker parsing is started and data sent where needed. parsing is done when sax reaches stop marker.
now i want to speed process of locating start marker since for the ie computer that is processing last part of the file app must go through entire file. i was asking if there is a way to tell sax where to "enter" the file so to say(ie 155th element, or in byte size, to simply move pointer to the xxxx byte)
and i need help on this matter since i have no idea how it can be done

Can I parse non-wellformed XML with SAX at all?

Hi all,
i was wondering whether its possible at all to parse XML that is not well formed with SAX.
e.g. A HTML file that doesnt close tags and stuff like that.
I tried implementing the fatal() method of the Handler in a a way that it consumes the exception but does not rethrow it.
Also I tried setting the validation property to false. Both with no success.
Any help would be appriciated.
thx
philipp

Your experiments tell you the answer.
If you have HTML tag soup, why not just run it through JTidy or HTMLTidy to make it into well-formed XHTML?

NullPointerException with SAX

I have developed a CSV to XML parser using a JAXP with SAX Events to parse the CSV file into a DOM tree.
Well inside the parse() method I have the following code":
public void parse(InputSource input) throws IOException, SAXException
BufferedReader br = null;
if( input.getCharacterStream() != null )
br = new BufferedReader( input.getCharacterStream() );
else if( input.getByteStream() != null )
br = new BufferedReader( new InputStreamReader( input.getByteStream() ) );
else if( input.getSystemId() != null )
URL url = new URL( input.getSystemId() );
br = new BufferedReader( new InputStreamReader( url.openStream() ) );
else
throw new SAXException( "Objeto InputSource invalido" );
ContentHandler ch = getContentHandler();
ch.startDocument();
ch.startElement( "", "", "file", new AttributesImpl() );
this.parseInput( br );
ch.endElement( "", "", "file" );
ch.endDocument();
Problem is that whenever the app gets to the ch.startDocument() statement it throws an java.lang.NullPointerExecption. I have no idea why this is happening, I have tested the very same code with Xalan 2 and Xercer 2 parsers and it works without problems. But using the oracle xml parser v2 throws the Exception.
Is this a bug? should I set tome of the Transformer's attributes to an specifica value to avoid this? Where could I find more info on processing SAX events?
Thanks,
Fedro

Fedro,
Did you try it using XDK v10?

Edit an XML file with SAX

Dear all, I am so confused�.
I have been trying for the last few days to understand how sax works� The only thing I understood is:
DefaultHandler handler = new Echo01();
SAXParserFactory factory = SAXParserFactory.newInstance();
        try {
            out = new OutputStreamWriter(System.out, "UTF8");
            SAXParser saxParser = factory.newSAXParser();
            saxParser.parse(file , handler);
        } catch (Throwable t) {
            t.printStackTrace();
        System.exit(0);
    }Ok, I assign the SAXParser the xml file and a handler. The parser parses and throws events that the handler catches. By implementing some handler interface or overriding the methods of an existing handler (e.g DeafultHandler class) I get to do stuff�
But still, suppose I have implement startElement() method of DefaultHandler class and I know that the pointer is currently placed on an element e.g. <name>bob</name>. How do I get the value of the element, and if I manage to do that, how can I replace�bob� with �tom�?
I would really appreciate any help given� just don�t recommend http://java.sun.com/webservices/jaxp/dist/1.1/docs/tutorial/ because although there are interesting staff in there, it does not solve my problem�

Maybe SAX is not the right tool for you.
With SAX, you implement methods like startElement and characters that get called as XML data is encountered by the parser. If you want to catch it or not, the SAX parser does not care. In your case, the "bob" part will be passed in one or more calls to characters. To safely process the data, you need to do something like build a StringBuffer or StringBuilder in the constructor of the class, and then in the startElement, if the name is one you want to read, set the length to zero. In the characters method, append the data to the StringBuilder or StringBuffer. In the endElement, do a toString to keep the data wherever you want.
This works for simple XML, but may need to be enhanced if you have nested elements with string values that contain other elements.
On the other hand, if your file is not huge, you could use DOM. With DOM, (or with JDOM, and I would expect with Dom4J -- but I have only used the first two) you do a parse and get a Document object with the entire tree. That allows you to easily (at least it is easy once you figure out how to do it) find a node like the "name" element and change the Text object that is its child from a value of "bob" to "tom". With DOM, you can then serialize the modified Document tree and save it as an XML file. SAX does not have any way to save your data. That burden falls to you entirely.
Dave Patterson

Processing unfinished stream with SAX

Hi,
I'm just writing some kind of a jabber plugin in java. I've decided to use sax for parsing server responses. However I've encountered a problem with sax.
saxParser.parse(inputStream, this);Problem is, that events (such as startElement) are called after the connection (streams read method returns with -1) is closed. Is there any way to force sax to raise event as soon as the tag is read?
Any help will be greatly appreciated.
Regards
Versor
Edited by: versor on Nov 2, 2007 3:20 AM

versor wrote:
... Problem is, that events (such as startElement) are called after the connection (streams read method returns with -1) is closed. Is there any way to force sax to raise event as soon as the tag is read? ...Fully circumvent the problem parsing the buffered stream.

Xml parsing with Java

Hi ..!
I am having a small problem friends if anybody of you can just help me resolving this .
I am quiet new to working with parsing XML with SAX and Dom java parsers .
Problem is when we want to extract the Element name ,Attribute Name or Attribute Value it is quiet simple in Java to do so .
But supposing i want to extract the value between the tags of an element how can we do so either their is a simple method that i have missed or their is a tedious procedure that i am ignorant of.
eg- <Node1> this is my name <Node1>
Java source output -this is my name.
Thanx to you people for co-operating
Take care
Akshat

in SAX u can do like this
     boolean nodeflag=false;
     public void startElement(String uri,String localName, String qName,Attributes attributes)throws SAXException
          if(qName.equals("Node1"))
               nodeflag=true;
     public void endElement (String uri, String localName, String qName)throws SAXException
          if(qName.equals("Node1"))
               nodeflag=false;
     public void characters(char[] ch, int start,int length)throws SAXException
          String s=new String(ch,start,length);
          if(nodeflag==true)
               if(!s.trim().equals(""))
                    System.out.println(s.trim());
     }here in characters method u can work with data

Problems parsing & with SAX

Similar Messages

Maybe you are looking for