Problems with Large XML files

I have tried increasing the memory pool using the -mx and -ms options. It doesnt work. I am using your latest XML parser for Java v2. Please let me know if there are some specific options I should be using.
Thanx,
-Sameer
We have a number of test files that are that size and it works without a problem. However using the DOMParser does require significantly more memory than your doc size.
What is the memory configuration of the JVM that you are running with? Have you tried increasing it? Are you using our latest version 2.0.2.6?
Oracle XML Team
Is there a restriction on the XML file size that can be loaded into the parser?
I am getting a out of memory exception reading in large XML file(10MB) using the commands
DOMParser parser = new DOMParser();
URL url = createURL(argv[0]);
parser.setErrorStream(System.err);
parser.setValidationMode(true);
parser.showWarnings(true);
parser.parse(url);
Win NT 4.0 Server
Sun JDK 1.2.2
===================================
Error output
===================================
Exception in thread "main" java.lang.OutOfMemoryError
at oracle.xml.parser.v2.ElementDecl.getAttrDecls(ElementDecl.java, Compi
led Code)
at java.util.Hashtable.<init>(Unknown Source)
at oracle.xml.parser.v2.DTDDecl.<init>(DTDDecl.java, Compiled Code)
at oracle.xml.parser.v2.ElementDecl.getAttrDecls(ElementDecl.java, Compi
led Code)
at oracle.xml.parser.v2.ValidatingParser.checkDefaultAttributes(Validati
ngParser.java, Compiled Code)
at oracle.xml.parser.v2.NonValidatingParser.parseAttributes(NonValidatin
gParser.java, Compiled Code)
at oracle.xml.parser.v2.NonValidatingParser.parseElement(NonValidatingPa
rser.java, Compiled Code)
at oracle.xml.parser.v2.ValidatingParser.parseRootElement(ValidatingPars
er.java:97)
at oracle.xml.parser.v2.NonValidatingParser.parseDocument(NonValidatingP
arser.java:199)
at oracle.xml.parser.v2.XMLParser.parse(XMLParser.java:146)
at TestLF.main(TestLF.java:40)
null

You might try using a different JDK/JRE - either a 1.1.6+ or 1.3 version as 1.2 in our experience has the largest footprint. If this doesn't work can you give us some details about your system configuration. Finally you might try the SAX interface as it does not need to load the entire DOM tree into memory.
Oracle XML Team

Similar Messages

Problems with reading XML files with ISO-8859-1 encoding

Hi!
I try to read a RSS file. The script below works with XML files with UTF-8 encoding but not ISO-8859-1. How to fix so it work with booth?
Here's the code:
import java.io.File;
import javax.xml.parsers.*;
import org.w3c.dom.*;
import java.net.*;
* @author gustav
public class RSSDocument {
    /** Creates a new instance of RSSDocument */
    public RSSDocument(String inurl) {
        String url = new String(inurl);
        try{
            DocumentBuilder builder = DocumentBuilderFactory.newInstance().newDocumentBuilder();
            Document doc = builder.parse(url);
            NodeList nodes = doc.getElementsByTagName("item");
            for (int i = 0; i < nodes.getLength(); i++) {
                Element element = (Element) nodes.item(i);
                NodeList title = element.getElementsByTagName("title");
                Element line = (Element) title.item(0);
                System.out.println("Title: " + getCharacterDataFromElement(line));
                NodeList des = element.getElementsByTagName("description");
                line = (Element) des.item(0);
                System.out.println("Des: " + getCharacterDataFromElement(line));
        } catch (Exception e) {
            e.printStackTrace();
    public String getCharacterDataFromElement(Element e) {
        Node child = e.getFirstChild();
        if (child instanceof CharacterData) {
            CharacterData cd = (CharacterData) child;
            return cd.getData();
        return "?";
}And here's the error message:
org.xml.sax.SAXParseException: Teckenkonverteringsfel: "Malformed UTF-8 char -- is an XML encoding declaration missing?" (radnumret kan vara f�r l�gt).
    at org.apache.crimson.parser.InputEntity.fatal(InputEntity.java:1100)
    at org.apache.crimson.parser.InputEntity.fillbuf(InputEntity.java:1072)
    at org.apache.crimson.parser.InputEntity.isXmlDeclOrTextDeclPrefix(InputEntity.java:914)
    at org.apache.crimson.parser.Parser2.maybeXmlDecl(Parser2.java:1183)
    at org.apache.crimson.parser.Parser2.parseInternal(Parser2.java:653)
    at org.apache.crimson.parser.Parser2.parse(Parser2.java:337)
    at org.apache.crimson.parser.XMLReaderImpl.parse(XMLReaderImpl.java:448)
    at org.apache.crimson.jaxp.DocumentBuilderImpl.parse(DocumentBuilderImpl.java:185)
    at javax.xml.parsers.DocumentBuilder.parse(DocumentBuilder.java:124)
    at getrss.RSSDocument.<init>(RSSDocument.java:25)
    at getrss.Main.main(Main.java:25)

I read files from the web, but there is a XML tag
with the encoding attribute in the RSS file.If you are quite sure that you have an encoding attribute set to ISO-8859-1 then I expect that your RSS file has non-ISO-8859-1 character though I thought all bytes -128 to 127 were valid ISO-8859-1 characters!
Many years ago I had a problem with an XML file with invalid characters. I wrote a simple filter (using FilterInputStream) that made sure that all the byes it processed were ASCII. My problem turned out to be characters with value zero which the Microsoft XML parser failed to process. It put the parser in an infinite loop!
In the filter, as each byte is read you could write out the Hex value. That way you should be able to find the offending character(s).

Does the parser work with large XML files?

Is there a restriction on the XML file size that can be loaded into the parser?
I am getting a out of memory exception reading in large XML file(10MB) using the commands
DOMParser parser = new DOMParser();
URL url = createURL(argv[0]);
parser.setErrorStream(System.err);
parser.setValidationMode(true);
parser.showWarnings(true);
parser.parse(url);
Win NT 4.0 Server
Sun JDK 1.2.2
===================================
Error output
===================================
Exception in thread "main" java.lang.OutOfMemoryError
at oracle.xml.parser.v2.ElementDecl.getAttrDecls(ElementDecl.java, Compi
led Code)
at java.util.Hashtable.<init>(Unknown Source)
at oracle.xml.parser.v2.DTDDecl.<init>(DTDDecl.java, Compiled Code)
at oracle.xml.parser.v2.ElementDecl.getAttrDecls(ElementDecl.java, Compi
led Code)
at oracle.xml.parser.v2.ValidatingParser.checkDefaultAttributes(Validati
ngParser.java, Compiled Code)
at oracle.xml.parser.v2.NonValidatingParser.parseAttributes(NonValidatin
gParser.java, Compiled Code)
at oracle.xml.parser.v2.NonValidatingParser.parseElement(NonValidatingPa
rser.java, Compiled Code)
at oracle.xml.parser.v2.ValidatingParser.parseRootElement(ValidatingPars
er.java:97)
at oracle.xml.parser.v2.NonValidatingParser.parseDocument(NonValidatingP
arser.java:199)
at oracle.xml.parser.v2.XMLParser.parse(XMLParser.java:146)
at TestLF.main(TestLF.java:40)
null

We have a number of test files that are that size and it works without a problem. However using the DOMParser does require significantly more memory than your doc size.
What is the memory configuration of the JVM that you are running with? Have you tried increasing it? Are you using our latest version 2.0.2.6?
Oracle XML Team

TransformerHandler throws OutOfMemoryError with large xml files

i'm using TransformerHandler to convert any content to SAX events and transform it using XSLT into an XML file.
the problem is that for large amount of content i get a OutOfMemoryError.
it seams that the content is kept in memory and only flushed when i call handler.endDocument();
i tried using auto flush writers as the Result, or call the flush() method myself, but nothing.
here is the example - pls help!
import javax.xml.transform.TransformerFactory;
import javax.xml.transform.sax.SAXTransformerFactory;
import javax.xml.transform.sax.TransformerHandler;
import javax.xml.transform.stream.StreamResult;
import javax.xml.transform.stream.StreamSource;
import org.xml.sax.helpers.AttributesImpl;
public class Test
      * test handler memory usage
      * @param loops no of loops - when large enogh - OutOfMemoryError !!!
      * @param xsltFilePath xslt file
      * @param targetXmlFile output xml file
      * @throws Exception
     public static void testHandlerMemUsage(int loops, String xsltFilePath, String targetXmlFile)throws Exception
          //verify SAX support
          TransformerFactory factory = TransformerFactory.newInstance();
          if(!factory.getFeature(SAXTransformerFactory.FEATURE))
               throw new UnsupportedOperationException("SAX tranformations not supported");
          TransformerHandler handler=
               ((SAXTransformerFactory)factory).newTransformerHandler(new StreamSource(xsltFilePath));
          handler.setResult(new StreamResult(targetXmlFile));
          handler.startDocument();
          handler.startElement(null,"root","root",new AttributesImpl());
          //loop
          for(int i=0;i<loops;i++)
               handler.startElement(null,"el-"+i,"el-"+i,new AttributesImpl());
               handler.characters("value".toCharArray(),0,"value".length());
               handler.endElement(null,"el-"+i,"el-"+i);
          handler.endElement(null,"root","root");
          //System.out.println("end document");
          //only after endDocument() starts to print..
          handler.endDocument();
          //System.out.println("ended document");
     public static void main(String[] args)throws Exception
          System.out.println("--starting..");
          testHandlerMemUsage(500000,"/copy.xslt","/testHandlerMemUsage.xml");
          System.out.println("--we are still here -- increase loops..");
}

Did you try increasing memeory when starting java with the -Xmx parameter? You know that java uses only 64MB by default, so you might need to increase it to e.g. 256MB for your XML to work.

Problem with loading XML file from directory.

Hello everyone.
*1)* I have directory defined by DBA. I have read, write privileges. I can read file from this directory using UTL_FILE, I can create file in this directory using UTL_FILE. I tried many times and it does not seem to be any problems.
*2)* I have very simple XML table (with just one column of xmltype). I can insert into this column using:
insert into temp_xml values (
Xmltype ('<something></something>')
*3)* When executing
insert into temp_xml values (
Xmltype (
bfilename('XML_LOCATION', 'sample.xml'),
nls_charset_id('AL16UTF8')
I'm receiving an error:
Error report:
SQL Error: ORA-22288: file or LOB operation FILEOPEN failed
ORA-06512: at "SYS.DBMS_LOB", line 523
ORA-06512: at "SYS.XMLTYPE", line 287
ORA-06512: at line 1
22288. 00000 - "file or LOB operation %s failed\n%s"
*Cause:    The operation attempted on the file or LOB failed.
*Action:   See the next error message in the error stack for more detailed
information. Also, verify that the file or LOB exists and that
the necessary privileges are set for the specified operation. If
the error still persists, report the error to the DBA.
*4)* Previously I was receiving more descriptive errors like permission denied, file not exists etc. This time there is no clear description apart from "file or LOB operation %s failed\n%s". I'm sure I can access this file in this directory (I used UTL_FILE to dbms_output the content).
Any help would be greatly appreciated.
Regards
Marcin Jankowski

Hi Marcin,
Welcome to the forums.
One very important thing with Oracle XML : please always give your database version, all four digits (e.g. 10.2.0.4).
Does the directory resides on the same machine as the database? Which OS?
Does any of the following work ?
DECLARE
   v_lob   CLOB;
   v_file BFILE;
BEGIN
   v_file := BFILENAME('XML_LOCATION','sample.xml');
   DBMS_LOB.createtemporary(v_lob, true);
   DBMS_LOB.fileopen(v_file);
   DBMS_LOB.loadfromfile(v_lob, v_file, DBMS_LOB.getlength(v_file));
   INSERT INTO temp_xml VALUES( xmltype(v_lob) );
   DBMS_LOB.fileclose(v_file);
   DBMS_LOB.freetemporary(v_lob);
END;
DECLARE
   v_lob   CLOB;
BEGIN
   v_lob := DBMS_XSLPROCESSOR.read2clob('XML_LOCATION', 'sample.xml', nls_charset_id('AL16UTF8'));
   INSERT INTO temp_xml VALUES( xmltype(v_lob) );
END;
/

Problems with creating XML file via Call Transformation

Hi,
When creating a XML file via Call transformation an extra character '#'is placed at the beginning of the file.
This problems occurs since the upgrade to ECC6.0 and the Unicode conversion.
When opening the XML file the following error message appears:
Invalid at the top level of the document. Error processing resource 'file ....
Has anybody an idea why this extra character is placed at the beginning of the file. Has it something to do with the unicode conversion and how can we solve the problem?
thanks for your help
kind regards,
Maarten van IJzendoorn

Hello Marteen,
Can you please share the solution to this issue and let me know.
Our Issue:
1) We are executing a report which generates an XML file on FTP.
2) The FTP file is always in Error when executed thorugh JAPANESE login but not thorugh EN login.
3) The XML files generated have always an extra character in the end ( which can be space,#,$%^&, etc.) when this extra character is removed from XML file with opening it in NOTEPAD then XML works OK in JA login as well.
4) In the PROGRAM everything has been checked with respect to OPEN dataset statement , XML ports UNICODE etc.
5) THIS issue has been reported only after upgrading to ECC 6.0 from 4.6C.( in older version it works fine).
Various OPEN dataset statments are :
OPEN DATASET path_fil
FOR OUTPUT
Thanks to reply.

Problem with Stack XML file

Hello,
We have an ECC 6 system with EHP 4 SPS 5 and we watn to install a new Technical Usage (IS-OIL, IS-PRA and IS-UT)
In order to do that I have to supply an xml file created by the MOPZ.
I created a new MOPZ transaction, in the phaze "Update Options" I selected : Enhancement Package Installation,
Afterwards I selected in Target Enhancment package stack : SAP ERP Enhanecment Package 4 on NW7.01 - SP Stack 05
We do not want to upgrade the Stack to a latter one.
I have selected Oil & Gas with Utilities and continued.
The Sack XML that was created contained SAP-APPL, SAP-BASIS and SAP-ABA in patch level 8. we do not want to upgrade these components.
Is there a way to install a new Technical Usage without upgrading these components?
and if there is how can I create XML file suited for this situation?
Please Advice,
Zvi Gilinsky

Hello,
There is an equivalence of Support Package levels between EHP stacks and baseline ERP 6 stacks. Note 1064635 has a list of these equivalences.
In any case, stack 5 for EHP4 of ERP should include stack 5 of NW 701.
By your description it looks like you are having a Maintenance Optimizer issue. What is the Maintenance Optimizer stack level?
Best regards,
Miguel Ariñ

Problem with large text-files, HOWTO?

Hi!
I'm making an application witch shall search through a dir with 3000 html-files, and find all links in those files.
I have a text files with the format:
file1: linktofile:linktofile6:linktofile5
file2: linktofile1:linktofile87:
and so on.
This file shall then be searched when I'm pressing hyperlinks in IExplorer. The problem is that this file is VERY long both "horizontally and vertical". Is there a clever way to shorten it?

If you have to search the entire contents of all 3000 files every time, then I don't see how that could be shortened. But if you have to search those files only for instances of "linktofile1295", for example, then you could redesign your text files into a database where you could access those instances directly via an index.

Problems with large Photoshop files CC 2014

Hi,
Having a strange issue with the file size of some psd's.
I have two basicly identical images. They are both the same size (same ppi och same mesurments in cm). Both of them are singel layer and there are no paths or other hidden stuff. Both are in sRGB as well. The issue here is that the file on the left is about 12 MB and the one on the right is almost twice that and I can't for the life of me figure out why? As far as I'm concerned the image on left looks a bit more "advanced" and should be the bigger one.
This isn't just this image but most of the images i've been working on for the last couple of weeks. Chose this one becuse it was easy to compare to an older image around the same size.
Also recived some images from Samsung a couple a days ago wich where the same size as these ones but where 135 MB!!! After flattening the image to one layer it went down to about 25-27 MB but that still feels a alot to me.
Has there been any changes to the way Photoshop handles the file size in the latest verison (CC 2014)?
And befor anyone asks, yes they are saved for maximised compability.
Any idea's? Sorry for the bad english by the way

No, there are no changes to the handling of large files, or files in general.
But mistakes in bit depth, cropping, layers, etc. could explain a file size difference for files that look sort of the same.

Begging for help with podcast xml file

Hey All,
I have a podcast on iTunes. I am hosting the xml file and podcast mp3s on a friends server so Im not using any service. Everything is working and Ive sucessfully added 4 podcasts so far, and it shows up correctly in iTunes on my PC.
However my podcasts do not showup correct in the iTunes store website, or on idevices. Meaning, I number my shows 001_"NAME" 002_"NAME" etc. Yet in the iTunes store they show up out of order. So my last show is not at the top its at the bottom, and they are all mixed up (like 002, 001, 004, 003 instead of 4,3,2,1) Also the publish date is the same on two of them (and not what i have in the xml file) and doesnt show up at all on the other two. I assume this is a problem with the xml file, yet I dont see any problems with it. But it seems odd that it all works correctly in the actual iTunes program. Ive tried different code, but im very much a noob at it, and everything i find online is from 5 - 10 years ago or wants you to host your podcasts with their site and I dont need that.
Here is the link to the show on the iTunes website so you can see what i mean: http://itunes.apple.com/us/podcast/your-reality-recap/id501295325
If anybody can, would you mind checking out the code in my xml file and letting me know if you see anything thats causing this issue?
I zipped the xml file and put it here: http://www.ericcurto.com/podcast/YRR.zip
I would be truly greatfuly for any help with this. Ive been trying to fix this for days and dont know what else to do.
Thanks!
Eric

Your feed is at http://www.ericcurto.com/podcast/YourRealityRecap.xml (please always post the feed URL, not its contents or a copy).
I don't see the issues you mention. The order in the Store and when subscribing is what I would expect:
The order in the Store depends on clicking the header to the column: the default is the first one. Some of the dates are a day out - this is quite commmon and is probably a time zone issue (it may be different where you are - I'm in the UK). I don't know why you are seeing a garbled order unless you've clicked on one of the other columns in the Store.

Problems with large scanned images

I have been giving Aperture another try since 1.1 came out, and I am still having problems with large tiff files derived from scanned 4x5 negatives. The files are 500mb or more, 16 bit RGB, with ProPhoto RGB or Ektaspace PS5 profiles, directly out of the scanner.
Aperture imports the files correctly, and shows their thumbnails. When I select a thumbnail "Loading" is displayed briefly, and the the dreaded "Unsupported Image Format" is displayed. Sometimes "Loading" goes on for a while, and a geometric pattern (looking like a rendering of random memory) is displayed. Restarting Aperture doesn't help.
Lower resolution (250mb, 16bit) files are handled properly. The scans are from an Epson 4870 scanner. I have tried pulling the scans into Photoshop and resaving with various tiff options, and as PSD with no improvement. I have the same problem with corrected/modified psd files coming out of Photoshop CS2.
I am running on a Power Mac G5 dual 2ghz with 8gb of RAM and an NVIDIA GeForce 6800 GT DDL (250mb) video card, with all the latest OS and software updates.
Has anyone else had similar problems? More importantly, is anyone else able to work with 500mb files of any kind? Is it my system, or is it the software? I sent feedback to Apple as well.
dual g5 2ghz Mac OS X (10.4.6)

I have a few (well actually about 100) scans on my system of >500Mb. I tried loading a few and am getting an inconsistent pattern of errors that correlates with what you are reporting.
I imported 4 files and three were troubled, the fouth was OK. I imported another four files and the first one was OK and the three others had your reported error, also the previously good file from the first import was now showing the same 'unsupported' image' message.
I would venture to say that if you shoot primarily 4x5 and work with scans of this size that Aperture is not the program for you--right now. I shoot 35mm and have a few images that I have scanned at 8000dpi on my Imacon 848 but most of my files are in the more reasonable 250Mb range (35mm @ 5000dpi).
I will probably downsample my 8000dpi scans to 5000dpi and not worry to much about it. In a world where people believe that 16 megapixels is hi-res you are obviously on the extreme side.(Good for you!) You should definately file a bug report but I wouldn't expect much help anytime soon for your super-sized scans.

Problem with parsing large XML files chunked over HTTP

I'm trying to isolate a bug that was introduced when upgrading the JRE in use from Java 7u51 to 7u71 without changing any code. The problem appears to be very similar to: Bug ID: JDK-8027359 XML parser returns incorrect parsing results.
Further investigation showed that it was also introduced in the same versions (7u71) where that fix was applied. Unlike that bug though, my XML is marked as version 1.0. It also appears to be with only large XML files, on the order of 10MB or so.
The closest I've been able to narrow it down to is the code is using JAXB to unmarshall a stream that the debugger tells me is a org.apache.http.com.EofSensorInputStream / org.apache.http.impl.io.ChunkedInputStream. The exception I get is not consistent, but typically appears to be from chunks being overwritten or shuffled, resulting in letters appearing in attributes that are actually numbers, or like the following where an attribute "testAttribute" gets partially overwritten by the end of a timestamp that was in a different section of the XML.
javax.xml.bind.UnmarshalException
- with linked exception:
[javax.xml.stream.XMLStreamException: ParseError at [row,col]:[1,98748]
Message: Attribute name "testAttribu00Z" associated with an element type "testElement" must be followed by the ' = ' character.]
at com.sun.xml.internal.bind.v2.runtime.unmarshaller.UnmarshallerImpl.handleStreamException(UnmarshallerImpl.java:421)
at com.sun.xml.internal.bind.v2.runtime.unmarshaller.UnmarshallerImpl.unmarshal0(UnmarshallerImpl.java:357)
at com.sun.xml.internal.bind.v2.runtime.unmarshaller.UnmarshallerImpl.unmarshal(UnmarshallerImpl.java:334)
Caused by: javax.xml.stream.XMLStreamException: ParseError at [row,col]:[1,98748]
Message: Attribute name "testAttribu00Z" associated with an element type "testElement" must be followed by the ' = ' character.
at com.sun.org.apache.xerces.internal.impl.XMLStreamReaderImpl.next(XMLStreamReaderImpl.java:598)
at com.sun.xml.internal.bind.v2.runtime.unmarshaller.StAXStreamConnector.bridge(StAXStreamConnector.java:181)
at com.sun.xml.internal.bind.v2.runtime.unmarshaller.UnmarshallerImpl.unmarshal0(UnmarshallerImpl.java:355)
... 6 more
Here's some code that seems to reproduce it if you can connect to an XML server that returns a large chunked XML file:
SchemeRegistry registry = new SchemeRegistry();
registry.register(
new Scheme("http", 80, PlainSocketFactory.getSocketFactory()));
HttpClient client = new DefaultHttpClient(new BasicClientConnectionManager(registry));
String url = "http://someUrlReturningAlargeChunkedXML";
HttpGet method = new HttpGet(url);
HttpResponse response = client.execute(method);
InputStream inputStream = response.getEntity().getContent();
XMLStreamReader responseReader = factory.createXMLStreamReader(inputStream);
JAXBElement<JaxBObjectOfResponse> wot = unmarshaller.unmarshal(responseReader, JaxBObjectOfResponse.class);
If you connect using URL.openStream() to the same service there is no error. If I read bytes directly and write to a file, there is no error. The error only happens when I try to unmarshal it, and it's large, and I'm using Java 7u71 (or later). It can be consistently repeated with the jsp webapp that I'm using, but didn't show the error when I used the same code with a Wikipedia dump XML file.
How can I unmarshal in a different way to avoid this problem? Or, how can I better isolate the bug so it can be posted to the appropriate bug system?

Apparently, adding the Woodstox XML libraries avoids the bug. Is there anyone who can reproduce this on another system? Was there any changes to the Stax implementation between u67 and u71 that may have introduced a bug like this?
Edit: When setting the logging level to DEBUG, I once saw the overwritten buffer being logged as if that was what was received (as in the testAttribu00Z example above). I can't repeat that anymore though, and very rarely it does parses with no exception (though it may have still been corrupted). Now the error seems to be consistently on one of the buffer boundaries, as in:
17:08:09,705 DEBUG wire:63 - << "2000[\r][\n]"
17:08:09,705 DEBUG wire:77 - << "trend>....OTHER XML...<trend hours=""
17:08:09,705 DEBUG wire:77 - << "634.0972777777778" datetime="2013-05-21T00:43:48.350Z" t"
17:08:09,705 DEBUG wire:63 - << "[\r][\n]"
17:08:09,705 DEBUG wire:63 - << "2000[\r][\n]"
17:08:09,705 DEBUG wire:77 - << "rend-mode="0">
Exception in thread "main" java.lang.NumberFormatException: t34.0972777777778
at com.sun.xml.internal.bind.DatatypeConverterImpl._parseDouble(DatatypeConverterImpl.java:213)
at mypackage.Trend_JaxbXducedAccessor_hours.parse(TransducedAccessor_field_Double.java:48)
at com.sun.xml.internal.bind.v2.runtime.unmarshaller.StructureLoader.startElement(StructureLoader.java:194)
at com.sun.xml.internal.bind.v2.runtime.unmarshaller.UnmarshallingContext._startElement(UnmarshallingContext.java:486)
at com.sun.xml.internal.bind.v2.runtime.unmarshaller.UnmarshallingContext.startElement(UnmarshallingContext.java:465)
at com.sun.xml.internal.bind.v2.runtime.unmarshaller.InterningXmlVisitor.startElement(InterningXmlVisitor.java:60)
at com.sun.xml.internal.bind.v2.runtime.unmarshaller.StAXStreamConnector.handleStartElement(StAXStreamConnector.java:231)
at com.sun.xml.internal.bind.v2.runtime.unmarshaller.StAXStreamConnector.bridge(StAXStreamConnector.java:165)
at com.sun.xml.internal.bind.v2.runtime.unmarshaller.UnmarshallerImpl.unmarshal0(UnmarshallerImpl.java:355)
at com.sun.xml.internal.bind.v2.runtime.unmarshaller.UnmarshallerImpl.unmarshal(UnmarshallerImpl.java:334)
Or:
17:19:12,563 DEBUG wire:63 - << "2000[\r][\n]"
17:19:12,563 DEBUG wire:77 - << ...OTHER XML...<trend index="5"
17:19:12,563 DEBUG wire:77 - << "" label="N"
17:19:12,563 DEBUG wire:63 - << "[\r][\n]"
Exception in thread "main" java.lang.NumberFormatException: Not a number: N
at com.sun.xml.internal.bind.DatatypeConverterImpl._parseInt(DatatypeConverterImpl.java:106)
at com.sun.xml.internal.bind.DatatypeConverterImpl._parseShort(DatatypeConverterImpl.java:118)

Problem with parsing large xml files

Hello All,
I am parsing a large xml file of 20MB and I use DocumentBuilder.parse(File). This method works for small xml files with size less than 20MB but the application hangs and doesn't through any error message when parsing 20MB xml files. Please let me know what I have to do at this point ?
Thanks & Regards,
Kumar.

Well... i can't agree.
If you have such structure:
<task>
<task/>
<task>
     <task>
        <task/>
     </task>
     <task/>
</task>
</task>
...you may always keep stack of tasks (at startElement push to top, and at endElement pop), so at every leaf of tree you will have all parents of that leaf.
for such structure:
<task id="1" parent="0"/>
<task id="2" parent="1"/>
<task id="3" parent="1"/>
<task id="4" parent="2"/>
<task id="5" parent="3"/>
...it will be much faster to go thro document by sax several times to build tree of tasks, than to load all document into memory...

Performance Problem in parsing large XML file (15MB)

Hi,
I'm trying to parse a large XML file(15 MB) and facing a clear performance problem. A Simple XML Validation using the following code snippet:
DBMS_LOB.fileopen(targetFile, DBMS_LOB.file_readonly);
DBMS_LOB.loadClobfromFile
tempCLOB,
targetFile,
DBMS_LOB.getLength(targetFile),
dest_offset,
src_offset,
nls_charset_id(CONSTANT_CHARSET),
lang_context,
conv_warning
DBMS_LOB.fileclose(targetFile);
p_xml_document := XMLType(tempCLOB, p_schema_url, 0, 0);
p_xml_document.schemaValidate();
is taking 30 mins on a HP-UX (4GB ram, 2 CPU) machine (Oracle version : 9.2.0.4).
Please explain what could be going wrong.
Thanks In Advance,
Vineet

Thanks Mark,
I'll open a TAR and also upload the schema and instance XML.
If i'm not changing the track too much :-) one more thing in continuation:
If i skip the Schema Validation step and directly insert the instance document into a Schema linked XMLType table, what does OracleXDB do in such a case?
i'm getting a severe performance hit here too... the same file as above takes almost 40 mins to Insert.
code snippet:
DBMS_LOB.fileopen(targetFile, DBMS_LOB.file_readonly);
DBMS_LOB.loadClobfromFile
tempCLOB,
targetFile,
DBMS_LOB.getLength(targetFile),
dest_offset,
src_offset,
nls_charset_id(CONSTANT_CHARSET),
lang_context,
conv_warning
DBMS_LOB.fileclose(targetFile);
p_xml_document := XMLType(tempCLOB, p_schema_url, 0, 0);
-- p_xml_document.schemaValidate();
insert into INCOMING_XML values(p_xml_document);
Here table INCOMING_XML is :
TABLE of SYS.XMLTYPE(XMLSchema "http://INCOMING_XML.xsd" Element "MatchingResponse") STORAGE Object-
relational TYPE "XDBTYPE_MATCHING_RESPONSE"
This table and type XDBTYPE_MATCHING_RESPONSE were created using the mapping provided in the registered XML Schema.
Thanks,
Vineet

OSB - Iterating over large XML files with content streaming

Hi @ll
I have to iterate over all item in large XML files and insert into a oracle database.
The file is about 200 MB and contains around 500'000, and I am using OSB 10gR3.
The XML structure is something like this:
<allItems>
<item>.....</item>
<item>.....</item>
<item>.....</item>
<item>.....</item>
<item>.....</item>
</allItems>
Actually I thought about using a proxy service with enabled content streaming and a "for each" action for iterating
over all items. But for this the whole XML structure has to be materialized into a variable otherwise it is not possible!
More about streaming large files can be found here:
[http://download.oracle.com/docs/cd/E13159_01/osb/docs10gr3/userguide/context.html#large_messages]
There is written "When you enable streaming for large message processing, you cannot use the ... for each...".
And for accessing single items you should should use an assign action with a xpath like "$body/allItems/item[1]";
this works fine and not the whole XML stream has to be materialized.
So my idea was to use the "for each" action and processing seqeuntially all items with a xpath like:
$body/allItems/item[$counter]
But the "for each" action just allows iterating over a sequence of xml items by defining an selection xpath
and the variable that contains all items. I would like to have a "repeat until" construct that iterates as long
$body/allItems/item[$counter] returns not null. Or can I use the "for each" action differently?
Does the OSB provides any other iterating mechanism? I know there is this spli-join construct that supports
different looping techniques, but as far I know it does not support content streaming, is this correct?
Did I miss somehting?
Thanks a lot for helping!
Cheers
Dani
Edited by: user10095731 on 29.07.2009 06:41

Hi Dani,
Yes, according to me this would be the best approach. You can use content-streaming to pass this large xml to ejb and once it passes successfully EJB should operate on this. If you want any result back (for further routing), you can get it back from EJB.
EJB gives you power of java to process this file and from java perspective 150 MB is not a very LARGE data. Ensure that you are using buffering. Check out this link for an explanation on Java IO Streams and, in particular, buffered streams -
http://java.sun.com/developer/technicalArticles/Streams/ProgIOStreams/
Try dom4J with xpp (XML Pull Parser) parser in case you have parsing requirement. We had worked with 1.2GB file using this technique.
Regards,
Anuj

Problems with Large XML files

Similar Messages

Maybe you are looking for