SAX & Entities

Hello!
I have the following Problem: I try to parse a quite large XML-File (ca. 40 MB) with the SAX-Parser. The Problem ist, that all the special characters like german Umlauts are represented by an Entity like 'Uuml;' etc. In the characters-Method of my Handler (exdendet from DefaultHandler) the Text-Elements are cut off after the point where one of these Entities appears. So SAX does not replace this Entities and because they're not printed in the text-Element I'm not able to replace them myself.
Has anyone an idea how to fix that problem? Does I have to add an EntityResolver or thomething like that? I'm quite new to SAX...
Greetings,
SirAlexius

I ran into this problem, and found the solution here:
http://forum.java.sun.com/thread.jspa?forumID=34&threadID=654306

Similar Messages

  • Resolving entities with crimson/SAX

    Hello,
    I am using the crimson parser and SAX to read XML files. In the following DTD I use two entities. The second one (<!ENTITY part "downtown" >) is automatically resolved. However, the first one (<!ENTITY net SYSTEM "entity.txt" NDATA txt>) is not. I would expect a statement like: "start element: Billy Here speaks entity.txt".
    Why is the data from entity.txt not included into the XML file? Is NDATA the reason ("non XML Data"). How can I include it?
    DTD:
    <!ENTITY net SYSTEM "entity.txt" NDATA txt>
    <!NOTATION txt SYSTEM "text/plain">
    <!ENTITY part "downtown" >
    <!ELEMENT Customer (Adress)*>
    <!ATTLIST Customer number ENTITY #REQUIRED>
    <!ELEMENT Adresse (Title?,Name)>
    <!ELEMENT Title (#PCDATA)>
    <!ELEMENT Name (#PCDATA)>
    XML:
    <?xml version="1.0" encoding="ISO-8859-1" ?>
    <!DOCTYPE Kunde SYSTEM "file:///C:/Programs/Eclipse/eclipse/workspace/Saxxer/Customer.dtd">
    <Customer number="net">
    <Adress>
    <Name>Billy &part;</Name>
    </Adresse>
    </Kunde>
    entity.txt
    Here speaks entity.txt
    Program:
    import java.io.FileReader;
    import org.xml.sax.Attributes;
    import org.xml.sax.InputSource;
    import org.xml.sax.XMLReader;
    import org.xml.sax.helpers.DefaultHandler;
    public class Saxxer extends DefaultHandler {
         public static void main(String[] args) throws Exception {
         XMLReader xr = new org.apache.crimson.parser.XMLReaderImpl();
         Saxxer handler = new Saxxer();
         xr.setContentHandler(handler);
         xr.setErrorHandler(handler);
         for (int i = 0; i < args.length; i++) {
         FileReader r = new FileReader(args);
         xr.parse(new InputSource(r));
    public void startDocument () {
         System.out.println("Start document");
    public void endDocument () {
         System.out.println("End document");
    //public InputSource resolveEntity(,java.lang.String systemId);
    public void startElement (String uri, String name, String qName, Attributes atts) {
              if ("".equals (uri)) {
              System.out.println("Start element: " + qName);
              for (int cnt=0; cnt<atts.getLength(); cnt++) {
                   System.out.println("Attribut " + atts.getLocalName(cnt) + ": " + atts.getValue(cnt));
              else
              System.out.println("Start element: {" + uri + "}" + name);
    public void endElement (String uri, String name, String qName) {
              if ("".equals (uri))
              System.out.println("End element: " + qName);
              else
              System.out.println("End element: {" + uri + "}" + name);
    public void skippedEntity(java.lang.String name) {
         System.out.println("SKIPPTED ENTITY");
    public void unparsedEntityDecl(java.lang.String name,
    java.lang.String publicId,
    java.lang.String systemId,
    java.lang.String notationName) {
         System.out.println("Unparsed Entity Declaration");
    public void notationDecl(java.lang.String name,
    java.lang.String publicId,
    java.lang.String systemId) {
         System.out.println("notation Declaration");
    public InputSource resolveEntity(java.lang.String publicId,
              java.lang.String systemId) {
         System.out.println("resolve Entity");
         return null;
    public void characters (char ch[], int start, int length) {
              System.out.print("Characters: \"");
              for (int i = start; i < start + length; i++) {
              switch (ch[i]) {
              case '\\':
                   System.out.print("\\\\");
                   break;
              case '"':
                   System.out.print("\\\"");
                   break;
              case '\n':
                   System.out.print("\\n");
                   break;
              case '\r':
                   System.out.print("\\r");
                   break;
              case '\t':
                   System.out.print("\\t");
                   break;
              default:
                   System.out.print(ch[i]);
                   break;
              System.out.print("\"\n");
    Regards,
    Massala

    One more question: althouth I use the ISO-8859-1 encoding in the header of the XML file I get the warning: declared coding "ISO-8859-1" does not comply with the actual used coding "Cp1252"
    Why is that?

  • XML SAX dtd Validation Problem

    Hi,
              I’m having problems getting an xml document to validate within Weblogic 8.1. I am trying to parse a document that references both a dtd and xsd. Both the schema and dtd reference need to be substituted so they use local paths. I specify the schema the parser should use and have created an entityResolver to change the dtd reference.
              When this runs as a standalone app from eclipse the file parses and validates without a problem. When deployed to the app server the process seems to be unable read the contents of the dtd. Its not that it cannot find the file (no FileNotFoundException is thrown but this can be created if I delete the dtd) rather it seems to find no declared elements.
              Initial thought was that the code didn’t have access to read the dtd from its location on disk, to check I moved the dtd to within the deployed war and reference as a resource. The problem still persists.
              Code Snippet:
              boolean isValid = false;
              try {
              // Create and configure factory
              SAXParserFactory factory = SAXParserFactoryImpl.newInstance();
              factory.setValidating(true);
              factory.setNamespaceAware(true);
              // To be notified of validation errors in the XML document,
              // add a custom error handler to the document builder
              PIMSFeedFileValidationHandler handler
              = new PIMSFeedFileValidationHandler();
              // Create and Configure Parser
              SAXParser parser = factory.newSAXParser();
              parser.setProperty(JAXP_SCHEMA_LANGUAGE, W3C_XML_SCHEMA);
              parser.setProperty(NAMESPACE_PROPERTY_KEY, getSchemaFilePath());
              // Set reader with entityResolver for dtd
              XMLReader xmlReader = parser.getXMLReader();
              xmlReader.setEntityResolver(new SAXEntityResolver(this.dtdPath));
              // convert file to URL, as it is a remote file
              URL url = super.getFile().toURL();
              // Open an input stream and parse
              InputStream is = url.openStream();
              xmlReader.setErrorHandler(handler);
              xmlReader.parse(new InputSource(is));
              is.close();
              // get the result of parsing the document by checking the
              // errorhandler's isValid property
              isValid = handler.isValid();
              if (!isValid) {
              LOGGER.warn(handler.getMessage());
              LOGGER.debug("XML file is valid XML? " + isValid);
              } catch (ParserConfigurationException e) {
              LOGGER.error("Error parsing file", e);
              } catch (SAXException e) {
              LOGGER.error("Error parsing file", e);
              } catch (IOException e) {
              throw new FeedException(e);
              return isValid;
              See stack trace below for a little more info.
              2005-01-28 10:24:09,217 [DEBUG] [file] - Attempting validation of file 'cw501205.wa1.xml' with schema at 'C:/pims-feeds/hansard/schema/hansard-v1-9.xsd'
              2005-01-28 10:24:09,217 [DEBUG] [file] - Entity Resolver is using DTD path file:C:/Vignette/runtime_services/8.1/install/common/nodemanager/
              VgnVCMServer/stage/pims-hansard/pims-hansard.war/WEB-INF/classes/com/morse/pims/cms/feed/sax/ISO-Entities.dtd
              2005-01-28 10:24:09,227 [DEBUG] [file] - Creating InputSource at: file:C:/Vignette/runtime_services/8.1/install/common/nodemanager/VgnVCMServer/stage/pims-hansard/pims-hansard.war/WEB-INF/classes/com/morse/pims/cms/feed/sax/ISO-Entities.dtd
              2005-01-28 10:24:09,718 [WARN ] [file] - org.xml.sax.SAXParseException: Element type "Hansard" must be declared.
              org.xml.sax.SAXParseException: Element type "Session" must be declared.
              org.xml.sax.SAXParseException: Element type "DailyRecord" must be declared.
              org.xml.sax.SAXParseException: Element type "Volume" must be declared.
              org.xml.sax.SAXParseException: Element type "Written" must be declared.
              org.xml.sax.SAXParseException: Element type "WrittenHeading" must be declared.
              org.xml.sax.SAXParseException: Element type "Introduction" must be declared.
              … continues for all the elements in the doc
              2005-01-28 10:24:10,519 [DEBUG] [file] - XML file is valid XML? false
              2005-01-28 10:24:10,519 [WARN ] [file] - Daily Part file 'cw501205.wa1.xml' was not valid XML and was not processed.
              Has anybody seen this behavior before with weblogic and if so how have you resolved the issue.
              Thanks in Advance
              Adam

    It looks like you clicked on "Post" before you got around to explaining your problem. I don't see any error messages or any description of what was supposed to happen and what happened instead.
    Now, I don't know anything about XML Schema, but just guessing at how that unique name feature might be designed, and just guessing that your unique name is actually in the <userId> element, I would suggest that this:
    <xsd:unique name="un_name"> 
      <xsd:selector xpath="USER"/> 
      <xsd:field xpath="."/> 
    </xsd:unique> is at fault because it doesn't mention the <userId> element anywhere.

  • BUG? using own EntityResolver with SAX doesn't work

    Hello,
    I was experimenting with the oracle.xml.parser.XMLParser using
    the SAX interface.
    I've written a test program that instantiates a driver and
    registers my own handlers (which just print to System.out).
    I also have my own org.xml.sax.EntityResolver, it looks like
    this:
    public class SAXEntityResolver implements EntityResolver {
    public InputSource resolveEntity(String publicId,
    String systemId) throws SAXException, IOException {
    System.out.println("<<Call to resolveEntity>>");
    try { //assume it's a URL of some sort
    URL url=new URL(systemId);
    return new InputSource(url.openStream());
    catch (MalformedURLException e1) {
    try { //it's not a URL, assume a file spec
    FileInputStream fin=new FileInputStream(systemId);
    return new InputSource(fin);
    catch (FileNotFoundException e2) {
    return null;
    //don't understand it, let the parser handle it.
    when I parse the following xml file:
    <?xml version="1.0"?>
    <!DOCTYPE dinner SYSTEM "dinner.dtd">
    <dinner>
    <location planet="Earth">Alma 3</location>
    <time>12:30</time>
    <date>Vandaag</date>
    </dinner>
    The parser generates an error to my org.xml.sax.ErrorHandler
    which prints it to the screen. The output looks like this:
    [C:\temp\xml]java -cp c:\TEMP\xml\oracle\lib\xmlparser.jar;.
    SAXParseXML oracle.xml.parser.XMLParser dinner.xml
    Locater accepted: oracle.xml.parser.SAXLocator@6ba51a96
    document parsing start
    [error: Couldn't find external DTD 'dinner.dtd']
    element dinner start: null:4:1
    (other output follows with no more errors)
    It seems as if the Oracle XMLParser doesn't use my EntityResolver
    to resolve it's external entities (the dinner.dtd file in this
    case, the file is indeed there, trust me!), otherwise it would
    have printed the message seen in the code above (<<Call to
    resolveEntity>>). If you're wondering how I configured the
    systemId in the SAX parser, here's how:
    File f=new File(args[1]);
    InputSource src=new InputSource(new FileInputStream(f));
    src.setSystemId(f.toURL().toString());
    p.parse(src);
    Can you tell me why this is? (I use NT4 with jdk 1.2)
    I've tested the same thing with the IBM, Microstar and Sun
    parsers, and they all seem to work fine with this example...
    Hope to hear from you! (cc in with mail please)
    Erwin.
    null

    Thanks for the post. You have identified a bug which will be
    fixed in a maintenance release. Until that time you can parse a
    String type rather than a InputSource type in SAXParseXML.java as
    a workaround.
    Oracle XML Team
    http://technet.oracle.com
    Erwin Vervaet (guest) wrote:
    : Oracle XML Team wrote:
    : : Which version of the parser are you using? If not 1.0.0.3
    (the
    : : latest) try that version. If the problem still exists it
    would
    : : help if you could provide your test program.
    : The readme.html in the xmlparser_v1_0_0_3.zip file (I download
    it
    : on monday 8/2/1999) says: 'Oracle XML Parser 1.0.0.3.0'.
    : So that's not the problem, below are all the files of the test
    : program. The command I use to start the program is the
    following
    : (note that there cannot be a classpath clash problem!, I use
    Sun
    : jdk1.2 on NT4 SP4):
    : [C:\temp\xml]dir
    : Volume in drive C is unlabeled Serial number is 2C90:8BDE
    : Directory of C:\temp\xml\*
    : 11/02/99 10:50 <DIR> .
    : 11/02/99 10:50 <DIR> ..
    : 9/02/99 16:34 <DIR> aelfred
    : 9/02/99 21:56 <DIR> oracle
    : 8/02/99 17:44 <DIR> xml-ea2
    : 8/02/99 14:10 <DIR> xml4j
    : 9/02/99 22:44 <DIR> xp
    : 9/02/99 16:42 215 dinner.dtd
    : 9/02/99 23:03 167 dinner.xml
    : 8/02/99 15:23 438 ParseXml.java
    : 11/02/99 10:50 2.402 SAXDocHandler.class
    : 9/02/99 21:14 1.585 SAXDocHandler.java
    : 11/02/99 10:50 1.129 SAXEntityResolver.class
    : 9/02/99 22:04 737 SAXEntityResolver.java
    : 11/02/99 10:50 976 SAXErrHandler.class
    : 9/02/99 15:39 495 SAXErrHandler.java
    : 11/02/99 10:50 1.261 SAXParseXML.class
    : 9/02/99 22:09 629 SAXParseXML.java
    : 10.034 bytes in 11 files and 7 dirs 12.800 bytes
    : allocated
    : 201.152.512 bytes free
    : [C:\temp\xml]java -cp c:\temp\xml\oracle\lib\xmlparser.jar;.
    : SAXParseXML oracle.xml.parser.XMLParser dinner.xml
    : Here are the files:
    : //file SAXErrHandler.java
    : import org.xml.sax.*;
    : public class SAXErrHandler implements ErrorHandler {
    : public void warning(SAXParseException exception) throws
    : SAXException {
    : System.err.println("[warning: " + exception +
    : public void error(SAXParseException exception) throws
    : SAXException {
    : System.err.println("[error: " + exception + "]");
    : public void fatalError(SAXParseException exception)
    : throws SAXException {
    : System.err.println("[fatal error: " + exception + "]");
    : //file SAXEntityResolver.java
    : import org.xml.sax.*;
    : import java.net.*;
    : import java.io.*;
    : public class SAXEntityResolver implements EntityResolver {
    : public InputSource resolveEntity(String publicId, String
    : systemId) throws SAXException, IOException {
    : System.out.println("<<Call to resolveEntity>> " + publicId + "
    : + systemId);
    : try { //assume it's a URL of some sort
    : URL url=new URL(systemId);
    : return new InputSource(url.openStream());
    : catch (MalformedURLException e1) {
    : try { //it's not a URL, assume a file
    : spec
    : FileInputStream fin=new
    : FileInputStream(systemId);
    : return new InputSource(fin);
    : catch (FileNotFoundException e2) {
    : return null; //don't understand
    : it, let the parser handle it.
    : //file SAXDocHandler.java
    : import org.xml.sax.*;
    : public class SAXDocHandler implements DocumentHandler {
    : private Locator locator=null;
    : public void startDocument() throws SAXException {
    : System.out.println("document parsing start");
    : public void setDocumentLocator(Locator locator) {
    : System.out.println("Locater accepted: " + locator);
    : this.locator=locator;
    : public void startElement(String name, AttributeList atts)
    : throws SAXException {
    : System.out.println("element " + name + " start: "
    : + locate());
    : for (int i = 0; i < atts.getLength(); i++)
    : System.out.println("attribute " +
    : atts.getName(i) + "=" + atts.getValue(i) + " (" +
    atts.getType(i)
    : + ")");
    : public void characters(char[] ch, int start, int length)
    : throws SAXException {
    : System.out.println("char data: " + new
    : String(ch,start,length));
    : public void ignorableWhitespace(char[] ch, int start, int
    : length) throws SAXException {
    : System.out.println("ignoring some whitespace: " +
    : new String(ch,start,length));
    : public void endElement(String name) throws SAXException {
    : System.out.println("element " + name + " end: " +
    locate());
    : public void processingInstruction(String target, String
    : data) throws SAXException {
    : System.out.println("PI: " + target + "=" + data);
    : public void endDocument() throws SAXException {
    : System.out.println("document parsing end");
    : private String locate() {
    : if (locator!=null) {
    : return locator.getSystemId() + ":" +
    : locator.getLineNumber() + ":" + locator.getColumnNumber();
    : return "";
    : //file SAXParseXML.java
    : import org.xml.sax.*;
    : import org.xml.sax.helpers.ParserFactory;
    : import java.io.*;
    : public class SAXParseXML {
    : public static void main(String[] args) {
    : if (args.length>1) {
    : try {
    : Parser
    : p=ParserFactory.makeParser(args[0]);
    : p.setDocumentHandler(new
    : SAXDocHandler());
    : p.setErrorHandler(new
    : SAXErrHandler());
    : p.setEntityResolver(new
    : SAXEntityResolver());
    : File f=new File(args[1]);
    : InputSource src=new
    : InputSource(new FileInputStream(f));
    : src.setSystemId(f.toURL().toString());
    : p.parse(src);
    : catch (Exception e) {
    : e.printStackTrace();
    : //file dinner.xml
    : <?xml version="1.0"?>
    : <!DOCTYPE dinner SYSTEM "dinner.dtd">
    : <dinner>
    : <location planet="Earth">Alma 3</location>
    : <time>12:30</time>
    : <date>Vandaag</date>
    : </dinner>
    : //file dinner.dtd
    : <?xml version="1.0" encoding="UTF-8"?>
    : <!ELEMENT dinner (location, time, date?)>
    : <!ELEMENT location (#PCDATA)>
    : <!ELEMENT time (#PCDATA)>
    : <!ELEMENT date (#PCDATA)>
    : <!ATTLIST location country CDATA "Belgium">
    Oracle Technology Network
    null

  • Create an XML Document with HTML Entities?

    I'm writing a program that uses SAX to parse an XML document and generate XHTML from it, but I would like the XML document to allow HTML entities, of course, in the XML markup.
    Each time my parser gets to any entity at all it dies reporting an undefined entity.
    Is there any way I can refer to the HTML entity DTD or perhaps, in my own DTD, copy and paste the HTML entities? How would I make a reference like this in my XML document?
    Thanks

    Seems to me you would just put something like<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN"
            "http://www.w3.org/TR/REC-html40/loose.dtd">before your root element. Or whichever DTD you actually want. I found out about them here:
    http://www.utoronto.ca/webdocs/HTMLdocs/HTML_Spec/html.html

  • SAX parser and EntityResolver

    I am using the sample SAXSample.java that comes as part of the XML parser download to parse an XML file that contains references to external entities - for example <!ENTITY E23678 SYSTEM "http://pilot/:13365">
    However the SAX parser never appears to call either the EntityResolver or DTDHandler handlers even though they are set in main.
    Has anyone else experienced this or have a missed something out?
    Any help appreciated.

    public class YourParser extends DefaultHandler implements Runnable
    in yor run method
    saxParser.parse(fileToBeParsed, this); provide implementation to the methods y want in YourParser  class
    //saxParser is an object of javax.xml.parsers.SAXParser

  • SAX Parser XML Validation Problems

    Hi,
    I’m having problems getting an xml document to validate within Weblogic 8.1. I am trying to parse a document that references both a dtd and xsd. Both the schema and dtd reference need to be substituted so they use local paths. I specify the schema the parser should use and have created an entityResolver to change the dtd reference.
    When this runs as a standalone app from eclipse the file parses and validates without a problem. When deployed to the app server the process seems to be unable read the contents of the dtd. Its not that it cannot find the file (no FileNotFoundException is thrown but this can be created if I delete the dtd) rather it seems to find no declared elements.
    Initial thought was that the code didn’t have access to read the dtd from its location on disk, to check I moved the dtd to within the deployed war and reference as a resource. The problem still persists.
    Code Snippet:
    boolean isValid = false;
    try {
         // Create and configure factory
    SAXParserFactory factory = SAXParserFactoryImpl.newInstance();
    factory.setValidating(true);
    factory.setNamespaceAware(true);
    // To be notified of validation errors in the XML document,
    // add a custom error handler to the document builder
    PIMSFeedFileValidationHandler handler
    = new PIMSFeedFileValidationHandler();
         // Create and Configure Parser
    SAXParser parser = factory.newSAXParser();
    parser.setProperty(JAXP_SCHEMA_LANGUAGE, W3C_XML_SCHEMA);
    parser.setProperty(NAMESPACE_PROPERTY_KEY, getSchemaFilePath());
         // Set reader with entityResolver for dtd
    XMLReader xmlReader = parser.getXMLReader();
    xmlReader.setEntityResolver(new SAXEntityResolver(this.dtdPath));
    // convert file to URL, as it is a remote file
    URL url = super.getFile().toURL();
    // Open an input stream and parse
    InputStream is = url.openStream();
    xmlReader.setErrorHandler(handler);
    xmlReader.parse(new InputSource(is));
    is.close();
    // get the result of parsing the document by checking the
    // errorhandler's isValid property
    isValid = handler.isValid();
    if (!isValid) {
    LOGGER.warn(handler.getMessage());
    LOGGER.debug("XML file is valid XML? " + isValid);
    } catch (ParserConfigurationException e) {
    LOGGER.error("Error parsing file", e);
    } catch (SAXException e) {
    LOGGER.error("Error parsing file", e);
    } catch (IOException e) {
    throw new FeedException(e);
    return isValid;
    See stack trace below for a little more info.
    2005-01-28 10:24:09,217 [DEBUG] [file] - Attempting validation of file 'cw501205.wa1.xml' with schema at 'C:/pims-feeds/hansard/schema/hansard-v1-9.xsd'
    2005-01-28 10:24:09,217 [DEBUG] [file] - Entity Resolver is using DTD path file:C:/Vignette/runtime_services/8.1/install/common/nodemanager/
    VgnVCMServer/stage/pims-hansard/pims-hansard.war/WEB-INF/classes/com/morse/pims/cms/feed/sax/ISO-Entities.dtd
    2005-01-28 10:24:09,227 [DEBUG] [file] - Creating InputSource at: file:C:/Vignette/runtime_services/8.1/install/common/nodemanager/VgnVCMServer/stage/pims-hansard/pims-hansard.war/WEB-INF/classes/com/morse/pims/cms/feed/sax/ISO-Entities.dtd
    2005-01-28 10:24:09,718 [WARN ] [file] - org.xml.sax.SAXParseException: Element type "Hansard" must be declared.
    org.xml.sax.SAXParseException: Element type "Session" must be declared.
    org.xml.sax.SAXParseException: Element type "DailyRecord" must be declared.
    org.xml.sax.SAXParseException: Element type "Volume" must be declared.
    org.xml.sax.SAXParseException: Element type "Written" must be declared.
    org.xml.sax.SAXParseException: Element type "WrittenHeading" must be declared.
    org.xml.sax.SAXParseException: Element type "Introduction" must be declared.
    … continues for all the elements in the doc
    2005-01-28 10:24:10,519 [DEBUG] [file] - XML file is valid XML? false
    2005-01-28 10:24:10,519 [WARN ] [file] - Daily Part file 'cw501205.wa1.xml' was not valid XML and was not processed.
    Has anybody seen this behavior before with weblogic and if so how have you resolved the issue.
    Thanks in Advance
    Adam

    Hi David,
    I have checked the ejb-jar.xml file and there is no duplicate values in it and the other things is that the same application is been deployed on OAS 10G and websphere and its working fine. In the forum someone has replied to a similar problem that there is bug in Weblogic 10.3 and its CR no 376292. I am not sure about it, does anyone any information about it.
    Thanks and Regards
    Deepak Dani

  • How to validate using schema and entities

    I am able to validate documents using w3c schema with either SAX or JDOM. But I need to add some entity definitions from a DTD.
    If I add a DOCTYPE to my document that references a DTD, the entities are handled correctly, validation fails because the document is not valid per the DTD. I may be dealing with a lot of schemas and do not want to create the whole DTD for every schema.
    Is there a way to validate using the schema and get the entity declarations without doing dtd validation?? I am using jaxp-1.2.2 (wsdp 1.1) and jdom beta-8.

    Declare only the Entities in the Dtd.
    Validate with schema & validate with Dtd containing
    only the entities.I did create a dtd that contains only the entities. The parser will correctly parse this but validation fails because the entire document is not dtd valid.

  • Determine System Entities in a XML File

    Hello,
    i have some xml Files with the following content:
    <?xml version="1.0" encoding="iso-8859-1"?>
    <?xml-stylesheet type="text/xsl" href="style/webstyle.xsl"?>
    <!DOCTYPE contentobject PUBLIC "-//Test//DTD Content v1.1 20021018//EN"
    "dtd/content.dtd" [
    <!ENTITY test1 SYSTEM "images/photo.gif">
    <!ENTITY test2 SYSTEM "texts/hello.doc">
    ]>
    <content>
    </content>
    Now I want to determine which Entities in the dtd are integrated.
    The result should be:
    test1, SYSTEM, "images/photo.gif">
    test2, SYSTEM, "texts/hello.doc">.
    I write the following code (JDK 1.4 Parser):
    public static void getDocumentEntities(File xmlFile)
    Document document = null;
    try
    DocumentBuilderFactory factory =
    DocumentBuilderFactory.newInstance();
    factory.setExpandEntityReferences(false);
    DocumentBuilder builder = factory.newDocumentBuilder();
    document = builder.parse(xmlFile);
    DocumentType dtd = document.getDoctype();
    if (dtd != null)
    NamedNodeMap ent = dtd.getEntities(); // ????
    for (int i = 0; i < ent.getLength(); i++)
    Node node = ent.item(i);
    System.out.print("Ent: " + node.getNodeName());
    // System.out.print("NotationName: " + node.getn);
    System.out.print(" PublicId: " + node.getNodeName());
    System.out.println(" SystemID: " + node.getNodeName());
    catch (Exception ex)
    ex.printStackTrace();
    The result is dissatisfactory.
    First I get hundreds of Entities (probably from the dtd) and not only the two above.
    Second I can`t get the path of the files for example "images/photo.gif"> .
    Have anyone code for this problem? Should I use the DOM or SAX Parser for this?
    I would be very happy, can anybody help me with this problem.
    Much greetings, Theodore

    Too difficult???
    Please helped me.
    Thank you, Theo

  • How I get the SYSTEM Entities in a dtd

    Hello,
    i have some xml Files with the following content:
    <?xml version="1.0" encoding="iso-8859-1"?>
    <?xml-stylesheet type="text/xsl" href="style/webstyle.xsl"?>
    <!DOCTYPE contentobject PUBLIC "-//Test//DTD Content v1.1 20021018//EN"
    "dtd/content.dtd" [
    <!ENTITY test1 SYSTEM "images/photo.gif">
    <!ENTITY test2 SYSTEM "texts/hello.doc">
    ]>
    <content>
    </content>
    Now I want to determine which Entities in the dtd are integrated.
    The result should be:
    test1, SYSTEM, "images/photo.gif">
    test2, SYSTEM, "texts/hello.doc">.
    I write the following code (JDK 1.4 Parser):
    public static void getDocumentEntities(File xmlFile)
    Document document = null;
    try
    DocumentBuilderFactory factory =
    DocumentBuilderFactory.newInstance();
    factory.setExpandEntityReferences(false);
    DocumentBuilder builder = factory.newDocumentBuilder();
    document = builder.parse(xmlFile);
    DocumentType dtd = document.getDoctype();
    if (dtd != null)
    NamedNodeMap ent = dtd.getEntities(); // ????
    for (int i = 0; i < ent.getLength(); i++)
    Node node = ent.item(i);
    System.out.print("Ent: " + node.getNodeName());
    // System.out.print("NotationName: " + node.getn);
    System.out.print(" PublicId: " + node.getNodeName());
    System.out.println(" SystemID: " + node.getNodeName());
    catch (Exception ex)
    ex.printStackTrace();
    The result is dissatisfactory.
    First I get hundreds of Entities (probably from the dtd) and not only the two above.
    Second I can`t get the path of the files for example "images/photo.gif"> .
    Have anyone code for this problem? Should I use the DOM or SAX Parser for this?
    I would be very happy, can anybody help me with this problem.
    Much greetings, Theodore

    Too difficult???
    Please helped me.
    Thank you, Theo

  • SAX Entity References

    Hi,
    I am currently trying to fix create software using SAX. I am parsing XML files and allowing them to be searched and modified and also offering the option to write them back to a file. However, in the input files there are entity references such as & - I know the parsers job is to remove them and replace them with the correct character - then when the file is written the entity references aren't included, the actual character is, making the XML file invalid.
    Is there a way to stop SAX replacing the entity references? I can't see a parser feature that allows this or is the only way to solve this problem to replace the characters with the entity references before writing them to file?
    Any help would be much appreciated.
    Thanks,
    Martin

    MartinSurf wrote:
    Hi,
    I am currently trying to fix create software using SAX. I am parsing XML files and allowing them to be searched and modified and also offering the option to write them back to a file. However, in the input files there are entity references such as & - I know the parsers job is to remove them and replace them with the correct character - then when the file is written the entity references aren't included, the actual character is, making the XML file invalid.If this is happening (the part about producing invalid XML files as output) for the built-in character entities like &amp; then it's your outputting code which is the problem.

  • XMLDecoder and Entities

    Hi,
    I would like to restore a couple of beans using XMLDecoder. However I would like to include entities in the XML source that should point to other XML sources. So I would like to restore the beans from multiple XML files which are 'connected' by <!ENTITY name  SYSTEM "otherFile.xml"> declarations.
    Everything works fine so fare except that the SAX parser looks for the entities at the wrong place. If no absolute path is given, the parser uses the current working directory as base folder for resolving the entity location, not the location of the declaring file!
    Question: Is there a way to tell XMLDecoder to resolve entites relative to the declaring document, not to some "arbitrary" working directory? (I guess this is what the XML specification says how it should work, right?)
    Thanks,
    Marcus

    Hi,
    it seems to me that XMLEncoder retrieves the hash map with get, excepting to get the original and modifiable reference to the hash map and executes put-statements on this reference.
    Instead of the way I had expected (and should be logical way too), that XMLDecoder creates a new hash map (with stored values) and to set it via the set-method.
    Is my assumption correct?
    If yes, does anybody knows, why this way this way (get-method) is used and not the other way (set-method) ?
    Cause the current way limits your modifications to the get-method and the Hash map itself to a minimum. (e.g. Collections.unmodifiableMap() isn't possible, also returning a new Map with putAll isn't possible)
    Thanks a lot in advance.
    Greetings Michael

  • Why parser convert escape entities automatically?

    Hi,
    I'm parsing XML with JDK 1.4.2 + SAX(Crimson).
    I found that parser converts some of escape entities, like "&gt;", or "&lt;", automatically.
    I made a simple SAX application to show the trouble.
    The full codes are the followings,
    import org.xml.sax.Attributes;
    import org.xml.sax.SAXException;
    import org.xml.sax.helpers.DefaultHandler;
    public class SimpeXmlHandler extends DefaultHandler {
         private StringBuffer str = null;
         public void startElement(String namespaceURI, String localName,
                   String qName, Attributes attributes) throws SAXException {
              str = new StringBuffer();
         public void endElement(String uri, String localName, String qName)
                   throws SAXException {
              if (str != null) {
                   if (qName.equalsIgnoreCase("tag")) {
                        System.out.println("tag=" + str.toString().trim());
         public void characters(char[] chars, int start, int length)
                   throws SAXException {
              str.append(chars, start, length);
    public class SimpleXmlTest {
         public static void main(String[] args) throws Exception {
              SimpeXmlHandler handler = new SimpeXmlHandler();
              SAXParserFactory factory = SAXParserFactory.newInstance();
              factory.setValidating(false);
              SAXParser parser = factory.newSAXParser();
              XMLReader xmlReader = parser.getXMLReader();
              xmlReader.setContentHandler(handler);
              InputStream in = new FileInputStream(new File("sample.xml"));
              InputSource source = new InputSource(in);
              xmlReader.parse(source);
    }The XML sample is,
    <?xml version="1.0" encoding="UTF-8"?>
    <root>
         <tag>a&gt;b</tag>
    </root>
    After run SimpleXmlTest, the output was,
    tag=a>b
    The result isn't my want.
    I don't want the parser converts the chars.
    How to do?
    P.S.
    I also tried with DOM(Dom4J), it didn't work, too.
    Thanks!
    a cup of Java, cheers!
    Sha Jiang

    The XML parser will convert all entities before the data is returned (to the program) either as sax events or as DOM nodes.
    Also an XML writer will (normally) convert all characters that would result in not well-formed XML back into entities.
    So, yes this behaviour cannot be cancelled and yes this behaviour should make it actually easier to work with XML.

  • SAX entityresolver

    I'm trying to get my Java XML application (which uses DOM incidently) to resolve external entities, and am looking for an implementation of org.xml.sax.EntityResolver.resolveEntity(String publicId, String systemId) which doesn't return null by default (i.e. one that will actually run off and try and find the external entity over the internet.
    Does anyone have or can anyone find any help with this???

    Hi Tristan,
    You don't need an entity resolver that does not return null. Returning null is a behaviour that asks the parser to open a regular URI connection to the system identifier.
    see: http://java.sun.com/j2se/1.4.1/docs/api/org/xml/sax/EntityResolver.html#resolveEntity(java.lang.String,%20java.lang.String)
    If you find that the parser can't resolve the URI, it might be due to some firewall problems. In that case, ask a network administrator what proxy server and parameters should be used and set the java proxy system parameters on the command line.
    Cheers
    Benoit

  • Xml-sax

    hi all
    i am getting the following error while running a java program...can anyone help me in setting the sax driver...?
    log:
    java.lang.ClassNotFoundException: org.apache.crimson.parser.XMLReaderImpl
         at org.xml.sax.helpers.XMLReaderFactory.createXMLReader(Unknown Source)
         at org.xml.sax.helpers.XMLReaderFactory.createXMLReader(Unknown Source)
         at MySAXApp.main(MySAXApp.java:17)
    Exception in thread "main"
    thanks in advance..!

    log:
    java.lang.ClassNotFoundException:
    org.apache.crimson.parser.XMLReaderImpl
    at
    t
    org.xml.sax.helpers.XMLReaderFactory.createXMLReader(U
    nknown Source)
    at
    t
    org.xml.sax.helpers.XMLReaderFactory.createXMLReader(U
    nknown Source)
         at MySAXApp.main(MySAXApp.java:17)
    Exception in thread "main" That class is found in "rt.jar" in JDK 1.4 onwards. Are you using 1.4 or higher?

Maybe you are looking for