JAXP seems to be stripping off DOCTYPE tag

When I parse a document, the DOCTYPE tag is getting stripped off.
To parse, I read the document from file and it looks like :
<?xml version="1.0" encoding="UTF-8" ?>
<!DOCTYPE log4j:configuration SYSTEM "log4j.dtd">
<log4j:configuration xmlns:log4j="http://jakarta.apache.org/log4j">
After parsing, I dump it to the console and it looks like:
<?xml version="1.0" encoding="UTF-8"?>
<log4j:configuration debug="null" threshold="null" xmlns:log4j="http://jakarta.apache.org/log4j">
The mystery the output has two attributes set to defaults, the only the parser knows this is by reading the dtd. But why does not include the DOCTYPE tag?
The parsing code snippet is:
Document doc = null;
DocumentBuilderFactory dbFactory = null;
dbFactory = DocumentBuilderFactory.newInstance();
dbFactory.setNamespaceAware(true);
DocumentBuilder db = dbFactory.newDocumentBuilder();
db.setEntityResolver(new LogDTDResolver());
doc = db.parse(is);  // is is an InputStreamAnyone have a clue? Is there a property I am missing. I have searched for the complete set of jaxp properties, but I can't find one.
Thanks for any help.

So it looks like log4j is doing its own parsing and it requires the DTD. Don't know how (or why) it does that.
You call the setOutputProperty() method of the Transformer. This is designed to configure the Transformer with properties that are normally set in the <xsl:output> element of an XSL transformation. The properties you need are "doctype-system" and maybe "doctype-public".
You're quite right, it isn't obvious. That's what happens when systems are designed by architects who believe in abstraction too much.

Similar Messages

  • Stripping off XML tags

    Hi,
    Can someone help me with providing some code on how to strip off the XML tags in an XML file like the following:
    <?xml version = '1.0'?>
    <!--<?xml-stylesheet type="text/xsl" href="XYZ.xsl"?> -->
    <ROWSET>
    <ROW num="1">
    <PRODUCT_NAME>ABC</PRODUCT_NAME_NAME>
    </ROW>
    <ROW num="2">
    All I want to get back is whatever in between <product_name>. Basically, all the tags with <..> I need to get rid off from this XML file.
    Does anybody know if I can use regular expressions in java? That way, would it be easier not to use parsers?
    Please provide me ideas, sources, examples, etc.
    Thanks in advance.

    I worked on the solution, but, it's coming up with a blank list.
    Any solution to resolve this!
    Here's the code:
    public class Test extends JFrame
         JList list;
         JScrollPane listContainer;
         public Test()
              setSize(300, 300);
              setVisible(true);
              setTitle("ProdList");
              initialize();
         // domParse();     // Getting error- how do i call this function?
         public void initialize()
              list = new JList(new DefaultListModel()); // Set the initial model
              listContainer = new JScrollPane(list);
              listContainer.setSize(new Dimension(200, 200));
              getContentPane().setLayout(new BorderLayout());
              getContentPane().add(listContainer, "Center");
              validate(); // Validate the screen
         // public void domParse(String url)
         public void domParse(String url)
              DocumentBuilder parser;
              DocumentBuilderFactory factory =
                        DocumentBuilderFactory.newInstance();
              try {
                        parser = factory.newDocumentBuilder();
                        Document doc = parser.parse(url);
              NodeList product_names = doc.getElementsByTagName("PRODUCT_NAME");
              if(product_names!=null)
              for(int i=0;i<product_names.getLength();++i)
                   // get all child nodes of a <product_name> - node
                   NodeList product_name_children=product_names.item(i).getChildNodes();
                   // and display contents of text nodes
              for(int j=0;j<product_name_children.getLength();++j)
                   Node node=product_name_children.item(j);
              //if(node.getNodeType()==Node.TEXT_NODE)
                   //System.out.println("element <"+product_names.item(i).getNodeName()+">'s text=["+node.getNodeValue()+"]");
              else
                   System.out.println("no element <PRODUCT_NAME> found");
              } catch (Exception e)
              // e.printStackTrace();
              static public void main(String[] args)
                   Test x=new Test();
                   x.domParse("Test.xml");

  • Stripping Off ?xml version="1.0" ? tag in the final output XML

    Hi All,
    Is there anyway that we can strip off the tag <?xml version="1.0" ?> from the Final XML generated by BPEL.
    I have commented out<?xml version="1.0" ?> in the XSL mapper file but still it is coming in the output file that is generated.
    Any help or pointer is really appreciated.
    Thanks,
    Dibya

    Hi,
    I want to strip it off because I am apppending the same in a Java Program.
    Please let me know how can it be done.
    Appreciate your inputs.
    Thanks,
    Dibya

  • Stripping all HTML tags from a CLOB

    Hi all,
    Running Oracle 9.2.0.8 on AIX...
    We have a table which stores HTML document fragments in a clob. I have a requirement to convert these to plain/text (strip all HTML tags) for sending in a plain/text email body.
    I have read the following solution from Tom Kyte's site:
    http://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:25695084847068
    Basically creating an Oracle text index on the CLOB column and calling ctx_doc.filter with "plaintext" parameter set to true.
    I noticed in Tom's example, he uses the default filter, which based on the docs, is NULL_FILTER, which applies no filtering. I have tried his example in my dev box, creating the text index on the CLOB column with no parameters.
    The call to ctx_doc.filter did not filter the html at all. I re-created the index and specified the INSO_FILTER and the filtering was done. I was under the impression that INSO_FILTER was for filtering binary content to plaintext...
    create table filter ( query_id number, document clob );
    create table demo
      ( id            int primary key,
        theclob       clob
    create index demo_idx on demo(theClob) indextype is ctxsys.context;
    SET DEFINE OFF;
    Insert into DEMO
       (ID, THECLOB)
    Values
       (1, '<html><body><p>This is a test of <strong>ctx_doc.filter</strong> and plaintext filtering.</p></body></html>');
    COMMIT;
    exec ctx_doc.filter('demo_idx',1, 'filter',1, true);The above code does not convert the html to plaintext...
    Now re-create with the index with INSO_FILTER
    drop index demo_idx;
    create index demo_idx on demo(theClob) indextype is ctxsys.context parameters ('filter ctxsys.inso_filter');
    exec ctx_doc.filter('demo_idx',1, 'filter',1, true);Above scenario returns string "This is a test of ctx_doc.filter and plaintext filtering."
    The ORacle documentation doesn't specify any special filter parameter that needs to be set... just wondering if I'm missing soemthing here... or better yet, if there is a better solution to my problem. ;-)
    Thanks
    Stephane

    The difference between what you did and what Tom Kyte did is that you created your index on a clob column and Tom created his index on a blob column. What I don't know is why that makes a difference. I have demonstrated below with one blob column and one clob column, one index on the blob and one index on the clob, using the same code on both, with different results.
    SCOTT@orcl_11gR2> create table filter
      2    (query_id  number,
      3       document  clob)
      4  /
    Table created.
    SCOTT@orcl_11gR2> create table demo
      2    (id       int primary key,
      3       theblob   blob,
      4       theclob   clob)
      5  /
    Table created.
    SCOTT@orcl_11gR2> create index demo_blob_idx
      2  on demo (theblob)
      3  indextype is ctxsys.context
      4  /
    Index created.
    SCOTT@orcl_11gR2> create index demo_clob_idx
      2  on demo (theclob)
      3  indextype is ctxsys.context
      4  /
    Index created.
    SCOTT@orcl_11gR2> insert into demo values
      2    (1,
      3       utl_raw.cast_to_raw (
      4         '<html>
      5            <body>
      6              <p>
      7             This is a test of
      8             <strong> ctx_doc.filter </strong>
      9             and plaintext filtering.
    10              </p>
    11            </body>
    12          </html>'),
    13       '<html>
    14          <body>
    15            <p>
    16              This is a test of
    17              <strong> ctx_doc.filter </strong>
    18              and plaintext filtering.
    19            </p>
    20          </body>
    21        </html>')
    22  /
    1 row created.
    SCOTT@orcl_11gR2> exec ctx_doc.filter ('demo_blob_idx', 1, 'filter', 1, true)
    PL/SQL procedure successfully completed.
    SCOTT@orcl_11gR2> exec ctx_doc.filter ('demo_clob_idx', 1, 'filter', 2, true)
    PL/SQL procedure successfully completed.
    SCOTT@orcl_11gR2> select id, utl_raw.cast_to_varchar2 (theblob), theclob from demo
      2  /
            ID
    UTL_RAW.CAST_TO_VARCHAR2(THEBLOB)
    THECLOB
             1
    <html>
            <body>
              <p>
                This is a test of
                <strong> ctx_doc.filter </strong>
                and plaintext filtering.
              </p>
            </body>
          </html>
    <html>
          <body>
            <p>
              This is a test of
              <strong> ctx_doc.filter </strong>
              and plaintext filtering.
            </p>
          </body>
        </html>
    1 row selected.
    SCOTT@orcl_11gR2> select query_id, document from filter
      2  /
      QUERY_ID
    DOCUMENT
             1
    This is a test of ctx_doc.filter and plaintext filtering.
             2
    <html>
          <body>
            <p>
              This is a test of
              <strong> ctx_doc.filter </strong>
              and plaintext filtering.
            </p>
          </body>
        </html>
    2 rows selected.
    SCOTT@orcl_11gR2>

  • Strip off markups of generic XML data with E4X

    I have a generic XML file:
    <nodes>
    <node1 att1='abc' att2='xyz'>
    <ele1> Hello </ele1>
    <ele2> Hi </ele2>
    </node1>
    </nodes>
    The tag and attribute names above can be anything. I need a
    generic method to strip off the XML markups and display the
    contents as:
    node1@att1: abc
    node1@att2: xyz
    node1.ele1: Hello
    node1.ele2: Hi
    How can E4X do this?

    e4x is for manipulating/navigating your xml. But you can use
    it inside a for/each loop to look at your xml nodes and extract the
    strings without the xml tags using toString().
    Search the help docs for "XML type conversion" and see the
    toString() method in action.

  • Upload to Mobile Me strips off EXIF information?

    If I plug in the iPhone, import the pictures I've taken into my iPhoto Library, upload them to a MobileMe Web Gallery, and look at their EXIF information (using the "i" button in a slidehow), all the GPS information and other tags are on there.
    If I choose a photo on the iPhone itself and send it to mobile me, it arrives there with no GPS information and other tags are missing. Why? How in the heck can we do mobile geotagging if the system is stripping off all the GPS information?

    Go to the System/MobileMe preference pane, log out, log in with a bogus name and password (this clears the name and password cache) and log in with the correct MMe account name and password. Then try again.

  • Generating !DOCTYPE tag for a Document

    I am using the SDK JAXP implementation of DOM to parse a simple XML document containing a !DOCTYPE tag, make a minor modification and write it back to a file. All works fine except that the !DOCTYPE tag is not written to the StreamResult by the Transformer.
    getDoctype() on the Document does return a Doctype object that was parsed, but it's not written when transforming the Document into a StreamResult.
    How do I write the !DOCTYPE tag into the output?
    Please help.
    Nikhil

    Set DOCTYPE_SYSTEM output property.
    TransformerFactory tFactory = TransformerFactory.newInstance();
    Transformer transformer = tFactory.newTransformer();
    transformer.setOutputProperty(javax.xml.transform.OutputKeys.DOCTYPE_SYSTEM, "exampleDtd.dtd");

  • DOCTYPE tag above CF's form validation javascript

    Hi,
    CF 8 seems to put the form validation javascript at the very top of the page, even before the HTML tag and the DOCTYPE validation tag.  This is causing problems with my style sheet.  Do you know of a way I can (at the very least), put my DOCTYPE tag above the javscript that CF renders?
    Thank you,
    David
    <script type="text/javascript" src="/CFIDE/scripts/cfform.js"></script>
    <script type="text/javascript" src="/CFIDE/scripts/masks.js"></script>
    <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
    <html xmlns="http://www.w3.org/1999/xhtml">

    Include the head tag, to entice Coldfusion to put the script tags there. Something like this
    <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
    <html xmlns="http://www.w3.org/1999/xhtml">
    <head><title>test page</title></head>
    <body >
    <cfform>
    </cfform>
    </body>
    </html>

  • Java mapping for Remove and Add of  DOCTYPE Tag

    HI All,
    i have one issue while the Java mapping for Remove and Add of  DOCTYPE Tag   in Operation Mapping .
    it says that , while am testing in Configuration Test "  Problem while determining receivers using interface mapping: Error while determining root tag of XML"
    Receiver Determination...
    error in SXMB MOni
    " SAP:Category>XIServer</SAP:Category>
      <SAP:Code area="RCVR_DETERMINATION">CX_RD_PLSRV</SAP:Code>
      <SAP:P1>Problem while determining receivers using interface mapping: Error while determining root tag of XML: '<!--' or '<![CDATA[' expected</SAP:P1>
    plz provide solutions
    Thanks in advance.

    Hi Mahesh,
    I understand, you are using extended Receiver Determination using Operational Mapping (which has Java Mapping). And, there is an error message u201CError while determining root tag of XMLu201D, when you are doing configuration test.
    Can you please test, the Operational Mapping (which has Java Mapping) separately in ESR, with payload which is coming now. It should produce a XML something like this [Link1|http://help.sap.com/saphelp_nwpi711/helpdata/en/48/ce53aea0d7154ee10000000a421937/frameset.htm]
    <Receivers>
    <Receiver>
      <Party agency="016" scheme="DUNS">123456789</Party>
      <Service>MyService</Service>
    </Receiver>
    <Receiver>
      <Party agency="http://sap.com/xi/XI" scheme="XIParty"></Party>
      <Service>ABC_200</Service>
    </Receiver>
    </Receivers>
    If it is not (I Think it will not), then there is some problem in Java Mapping coding. Please correct it. Last option, if your Java code is small in length; you may paste it here, so that we can have a look at the cause of issue.
    Regards,
    Raghu_Vamsee

  • Ignore DOCTYPE tag in xml

    Hi!,
    I need to remove DOCTYPE tag from the xml file. The parser is trying to validate the DTD. I do not want to validate, I just want to parse. I could not find any documentation on how this can be done. if any one of you have done this please let me know how you have managed to ignore the DOCTYPE tag.
    Regards

    Here is one way to do it:
    Example taglib DOCTYPE:
    <!DOCTYPE taglib
    PUBLIC "-//Sun Microsystems, Inc.//DTD JSP Tag Library 1.2//EN"
    "http://java.sun.com/dtd/web-jsptaglibrary_1_2.dtd">
    Example Code:
    // Get your parser, then set its EntityResolver to your own custom er.
    parser.setEntityResolver(new EntityResolver()
    public InputSource resolveEntity(String publicId, String systemId)
    if ("-//Sun Microsystems, Inc.//DTD JSP Tag Library 1.2//EN".equals(publicId))
    return new InputSource(
    new ByteArrayInputStream("<?xml version='1.0' encoding='UTF-8'?>".getBytes()));
    else
    return null;
    });

  • HT200154 how do you turn off the apple TV box?  There doesn't seem to be an off button.  The status light stays on.

    how do you turn off the apple TV box?  There doesn't seem to be an off button.  The status light stays on.

    Usually, when you put it to sleep mode, the status light will turn off.  To put it into sleep mode, go to Settings.
    If you want it completely off, then I guess the only way to do it is to pull the electric plug.

  • Muse strips off digimarc copyright in jpeg files saved in photoshop. How do I prevent this from happening?

    In Muse, the digimarc watermark I added in jpegs saved in photoshop get stripped off when uploaded to my website. How can I prevent this from happening? The jpegs are set to their exact size in Muse. Thanks.

    Hello
    My procedure is to resize the original psd file in photoshop, change the mode to 8bits, apply the digimarc filter and then save as jpeg. I place the file in Muse without changing its size and upload. I then right click to save the uploaded image from my website and bring it into photoshop to verify the digimarc watermark. The digimarc filter shows there is no watermark at all.

  • [APEX4.0] - Websheet apps strips off national characters

    Hi,
    In Apex 4.0, when creating page in websheet app, the national charactes in the content are stripped off (in my case polish: ąćźżśń) ...although some characters remain unchanged (eg. ó)
    Any ideas how to fix this?

    Hi,
    propably the same (or similar) problem as described here:
    4.0, web sheet data load, special characters (de, ÄÖÜ etc.) mess
    Would be nice just to get an info wheter this is an known issue to be fixed or if we are dealing with a feature that should not use by users with "uncommon" languages. -:)
    Regards
    Andre

  • How do I strip off a time format off of a string array

    How do I strip off a the first part of a time format out of a string array. The following is what it looks like
    14 April 2008 10:00:00.000, 30.000,128.591,-145.839
     "       "      "     10:00:01.000, "              "              "
    I tried the read from speadsheet file first. I tried lower level VIs. There must be something simple I am just missing.

    Search for the first comma and take the data before it.
    Message Edited by Ravens Fan on 12-10-2008 04:50 PM
    Attachments:
    Example_VI_BD.png ‏2 KB

  • Problem with DOCTYPE tag in the index.html for flex file

    When I have the following !DOCTYPE tag in the index.html file I use to load the flex swf file(s) both FireFox and Orical browsers both become horizontally shortened.  When the tag removed everything works fine.  Given that the !DOCTYPE tag is necessary for good html I'd really like to know what's going on.
    <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">

    Good info on that site about HTML, unfortunatly it had nothing new for me.  I did double check and everything and I was doing things the way they recomend.
    I'm really bothered that this is happening.  The DOCTYPE tag should be used and it should not have this effect when added to the HTML generated by the Flex Builder.  Additionally the new Flash Builder does have the DOCTYPE tag in its generated HTML and this is very worrisome.

Maybe you are looking for