XMLTABLE / XQUERY performance
Hi all,
Below is a sample XML representing a spreadsheet :
<Table>
<Row>
<Cell><Data>1</Data></Cell>
<Cell><Data>A</Data></Cell>
<Cell><Data>B</Data></Cell>
<Cell Index="5"><Data>D</Data></Cell>
</Row>
<Row>
<Cell><Data>2</Data></Cell>
<Cell Index="3"><Data>B</Data></Cell>
<Cell><Data>C</Data></Cell>
</Row>
<Row>
<Cell><Data>3</Data></Cell>
<Cell Index="3"><Data>B</Data></Cell>
<Cell Index="5"><Data>D</Data></Cell>
</Row>
<Row>
<Cell><Data>4</Data></Cell>
<Cell><Data>A</Data></Cell>
<Cell><Data>B</Data></Cell>
<Cell><Data>C</Data></Cell>
<Cell><Data>D</Data></Cell>
</Row>
<Row>
<Cell><Data>5</Data></Cell>
<Cell><Data>A</Data></Cell>
<Cell Index="4"><Data>C</Data></Cell>
<Cell><Data>D</Data></Cell>
</Row>
</Table>which should be interpreted as :
cols --> 1 2 3 4 5
1 A B D
2 B C
3 B D
4 A B C D
5 A C DAs you can see, for each row, empty cells are simply omitted in the document.
The next non-empty cell is then marked with an Index attribute representing its true position on the row.
My requirement is to query the document and access the values based on cells positions on the row.
Because of empty cells, Data values cannot be accessed by a simple xpath like "/Cell[n]/Data".
So I came up with the following :
WITH t AS (
select xmltype(
'<Table>
<Row>
<Cell><Data>1</Data></Cell>
<Cell><Data>A</Data></Cell>
<Cell><Data>B</Data></Cell>
<Cell Index="5"><Data>D</Data></Cell>
</Row>
<Row>
<Cell><Data>2</Data></Cell>
<Cell Index="3"><Data>B</Data></Cell>
<Cell><Data>C</Data></Cell>
</Row>
<Row>
<Cell><Data>3</Data></Cell>
<Cell Index="3"><Data>B</Data></Cell>
<Cell Index="5"><Data>D</Data></Cell>
</Row>
<Row>
<Cell><Data>4</Data></Cell>
<Cell><Data>A</Data></Cell>
<Cell><Data>B</Data></Cell>
<Cell><Data>C</Data></Cell>
<Cell><Data>D</Data></Cell>
</Row>
<Row>
<Cell><Data>5</Data></Cell>
<Cell><Data>A</Data></Cell>
<Cell Index="4"><Data>C</Data></Cell>
<Cell><Data>D</Data></Cell>
</Row>
</Table>') doc from dual
SELECT x.*
FROM t, xmltable(
'for $j in /Table/Row
return <ROW> {
for $i at $pos in $j/Cell
let $x := $j/Cell[position()<=$pos and @Index][last()]/@Index
let $x2 := if($x) then $x else 1
let $p := count($j/Cell[@Index=$x]/preceding-sibling::*)+1
return <DATA pos="{$pos - $p + $x2}">{$i/Data/text()}</DATA>
} </ROW>'
PASSING t.doc
COLUMNS cell1 number PATH '/ROW/DATA[@pos="1"]',
cell2 varchar2(1) PATH '/ROW/DATA[@pos="2"]',
cell3 varchar2(1) PATH '/ROW/DATA[@pos="3"]',
cell4 varchar2(1) PATH '/ROW/DATA[@pos="4"]',
cell5 varchar2(1) PATH '/ROW/DATA[@pos="5"]'
) x;Basically, the XQUERY reconstructs each row and gives each cell its true position. We can then access the data with a simple position predicate.
It works well for small documents, but rapidly shows awful performance as the size increases (which is understandable).
So my question : is there a better way to achieve the requirement (or to improve performance)?
Thanks a lot.
(DB version is 10.2.0.4)
Edited by: odie_63 on 28 déc. 2009 19:21
what if you do the logic outside of the xquery?
like this:
SQL> set timi on;
SQL> WITH t AS (
2 select xmltype(
3 '<Table>
4 <Row>
5 <Cell><Data>1</Data></Cell>
6 <Cell><Data>A</Data></Cell>
7 <Cell><Data>B</Data></Cell>
8 <Cell Index="5"><Data>D</Data></Cell>
9 </Row>
10 <Row>
11 <Cell><Data>2</Data></Cell>
12 <Cell Index="3"><Data>B</Data></Cell>
13 <Cell><Data>C</Data></Cell>
14 </Row>
15 <Row>
16 <Cell><Data>3</Data></Cell>
17 <Cell Index="3"><Data>B</Data></Cell>
18 <Cell Index="5"><Data>D</Data></Cell>
19 </Row>
20 <Row>
21 <Cell><Data>4</Data></Cell>
22 <Cell><Data>A</Data></Cell>
23 <Cell><Data>B</Data></Cell>
24 <Cell><Data>C</Data></Cell>
25 <Cell><Data>D</Data></Cell>
26 </Row>
27 <Row>
28 <Cell><Data>5</Data></Cell>
29 <Cell><Data>A</Data></Cell>
30 <Cell Index="4"><Data>C</Data></Cell>
31 <Cell><Data>D</Data></Cell>
32 </Row>
33 </Table>') doc from dual
34 )
35 SELECT x.*
36 FROM t, xmltable(
37 'for $j in /Table/Row
38 return <ROW> {
39 for $i at $pos in $j/Cell
40 let $x := $j/Cell[position()<=$pos and @Index][last()]/@Index
41 let $x2 := if($x) then $x else 1
42 let $p := count($j/Cell[@Index=$x]/preceding-sibling::*)+1
43 return <DATA pos="{$pos - $p + $x2}">{$i/Data/text()}</DATA>
44 } </ROW>'
45 PASSING t.doc
46 COLUMNS cell1 number PATH '/ROW/DATA[@pos="1"]',
47 cell2 varchar2(1) PATH '/ROW/DATA[@pos="2"]',
48 cell3 varchar2(1) PATH '/ROW/DATA[@pos="3"]',
49 cell4 varchar2(1) PATH '/ROW/DATA[@pos="4"]',
50 cell5 varchar2(1) PATH '/ROW/DATA[@pos="5"]'
51 ) x;
CELL1 C C C C
1 A B D
2 B C
3 B D
4 A B C D
5 A C D
Elapsed: 00:00:00.64
SQL>
SQL> WITH t AS (
2 select xmltype('<Table>
3 <Row>
4 <Cell><Data>1</Data></Cell>
5 <Cell><Data>A</Data></Cell>
6 <Cell><Data>B</Data></Cell>
7 <Cell Index="5"><Data>D</Data></Cell>
8 </Row>
9 <Row>
10 <Cell><Data>2</Data></Cell>
11 <Cell Index="3"><Data>B</Data></Cell>
12 <Cell><Data>C</Data></Cell>
13 </Row>
14 <Row>
15 <Cell><Data>3</Data></Cell>
16 <Cell Index="3"><Data>B</Data></Cell>
17 <Cell Index="5"><Data>D</Data></Cell>
18 </Row>
19 <Row>
20 <Cell><Data>4</Data></Cell>
21 <Cell><Data>A</Data></Cell>
22 <Cell><Data>B</Data></Cell>
23 <Cell><Data>C</Data></Cell>
24 <Cell><Data>D</Data></Cell>
25 </Row>
26 <Row>
27 <Cell><Data>5</Data></Cell>
28 <Cell><Data>A</Data></Cell>
29 <Cell Index="4"><Data>C</Data></Cell>
30 <Cell><Data>D</Data></Cell>
31 </Row>
32 </Table>') doc from dual)
33 select
34 case when v.cell_1_index is null then v.cell_1 end cell_one
35 , case when nvl(v.cell_1_index,-1)=2 then v.cell_1
36 when v.cell_2_index is null and v.cell_1_index is null then v.cell_2
37 end cell_two
38 , case when nvl(v.cell_1_index,-1)=3 then v.cell_1
39 when nvl(v.cell_2_index,-1)=3 then v.cell_2
40 when v.cell_1_index=2 and v.cell_2_index is null then v.cell_2
41 when coalesce(v.cell_1_index,v.cell_2_index,v.cell_3_index) is null then v.cell_3
42 end cell_three
43 , case when nvl(v.cell_1_index,-1)=4 then v.cell_1
44 when nvl(v.cell_2_index,-1)=4 then v.cell_2
45 when nvl(v.cell_3_index,-1)=4 then v.cell_3
46 when v.cell_1_index=2 and v.cell_2 is not null and v.cell_3_index is null then v.cell_3
47 when v.cell_1_index=3 and v.cell_2 is not null and v.cell_4_index is null then v.cell_2
48 when v.cell_2_index=3 and v.cell_3_index is null then v.cell_3
49 when coalesce(v.cell_1_index,v.cell_2_index,v.cell_3_index,v.cell_4_index) is null then v.cell_4
50 end cell_four
51 , case when nvl(v.cell_1_index,-1)=5 then v.cell_1
52 when nvl(v.cell_2_index,-1)=5 then v.cell_2
53 when nvl(v.cell_3_index,-1)=5 then v.cell_3
54 when nvl(v.cell_4_index,-1)=5 then v.cell_4
55 when v.cell_1_index=3 and v.cell_2_index is null then v.cell_3
56 when v.cell_1_index=4 then v.cell_2
57 when v.cell_3_index=4 then v.cell_4
58 when v.cell_2_index=3 and v.cell_3_index is null and v.cell_4_index is null then v.cell_4
59 when coalesce(v.cell_1_index,v.cell_2_index,v.cell_3_index,v.cell_4_index) is null then v.cell_5
60 end cell_five
61 from
62 t,xmltable('/Table/Row'
63 passing t.doc
64 columns
65 cell_1 varchar2(10) path 'Cell[1]/Data'
66 ,cell_1_index number path 'Cell[1]/@Index'
67 ,cell_2 varchar2(10) path 'Cell[2]/Data'
68 ,cell_2_index number path 'Cell[2]/@Index'
69 ,cell_3 varchar2(10) path 'Cell[3]/Data'
70 ,cell_3_index number path 'Cell[3]/@Index'
71 ,cell_4 varchar2(10) path 'Cell[4]/Data'
72 ,cell_4_index number path 'Cell[4]/@Index'
73 ,cell_5 varchar2(10) path 'Cell[5]/Data'
74 ) v;
CELL_ONE CELL_TWO CELL_THREE CELL_FOUR CELL_FIVE
1 A B D
2 B C
3 B D
4 A B C D
5 A C D
Elapsed: 00:00:00.04
SQL> spool off;Edited by: Ants Hindpere on Mar 10, 2010 1:03 AM
Similar Messages
-
EXTREMELY SLOW XQUERY PERFORMANCE AND SLOW DOCUMENT INSERTS
EXTREMELY SLOW XQUERY PERFORMANCE AND SLOW DOCUMENT INSERTS.
Resolution History
12-JUN-07 15:01:17 GMT
### Complete Problem Description ###
A test file is being used to do inserts into a schemaless XML DB. The file is inserted and then links are made to 4
different collection folders under /public. The inserts are pretty slow (about
15 per second and the file is small)but the xquery doesn't even complete when
there are 500 documents to query against.
The same xquery has been tested on a competitors system and it has lightening fast performance there. I know it
should likewise be fast on Oracle, but I haven't been able to figure out what
is going on except that I suspect somehow a cartesian product is the result of
the query on Oracle.
### SQLXML, XQUERY, PL/SQL syntax used ###
Here is the key plsql code that calls the DBMS_XDB procedures:
CREATE OR REPLACE TYPE "XDB"."RESOURCEARRAY" AS VARRAY(500) OF VARCHAR2(256);
PROCEDURE AddOrReplaceResource(
resourceUri VARCHAR2,
resourceContents SYS.XMLTYPE,
public_collections in ResourceArray
) AS
b BOOLEAN;
privateResourceUri path_view.path%TYPE;
resource_exists EXCEPTION;
pragma exception_init(resource_exists,-31003);
BEGIN
/* Store the document in private folder */
privateResourceUri := GetPrivateResourceUri(resourceUri);
BEGIN
b := dbms_xdb.createResource(privateResourceUri, resourceContents);
EXCEPTION
WHEN resource_exists THEN
DELETE FROM resource_view WHERE equals_path(res, privateResourceUri)=1;
b := dbms_xdb.createResource(privateResourceUri, resourceContents);
END;
/* add a link in /public/<collection-name> for each collection passed in */
FOR i IN 1 .. public_collections.count LOOP
BEGIN
dbms_xdb.link(privateResourceUri,public_collections(i),resourceUri);
EXCEPTION
WHEN resource_exists THEN
dbms_xdb.deleteResource(concat(concat(public_collections(i),'/'),resourceUri));
dbms_xdb.link(privateResourceUri,public_collections(i),resourceUri);
END;
END LOOP;
COMMIT;
END;
FUNCTION GetPrivateResourceUri(
resourceUri VARCHAR2
) RETURN VARCHAR2 AS
BEGIN
return concat('/ems/docs/',REGEXP_SUBSTR(resourceUri,'[a-zA-z0-9.-]*$'));
END;
### Info for XML Querying ###
Here is the XQuery and a sample of the output follows:
declare namespace c2ns="urn:xmlns:NCC-C2IEDM";
for $cotEvent in collection("/public")/event
return
<cotEntity>
{$cotEvent}
{for $d in collection("/public")/c2ns:OpContextMembership[c2ns:Entity/c2ns:EntityIdentifier
/c2ns:EntityId=xs:string($cotEvent/@uid)]
return
$d
</cotEntity>
Sample output:
<cotEntity><event how="m-r" opex="o-" version="2" uid="XXX541113454" type="a-h-G-" stale="2007-03-05T15:36:26.000Z"
start="2007-03-
05T15:36:26.000Z" time="2007-03-05T15:36:26.000Z"><point ce="" le="" lat="5.19098483230079" lon="-5.333597827082126"
hae="0.0"/><de
tail><track course="26.0" speed="9.26"/></detail></event></cotEntity>19-JUN-07 04:34:27 GMT
UPDATE
=======
Hi Arnold,
you wrote -
Please use Sun JDK 1.5 java to perform the test case.Right now I have -
$ which java
/usr/bin/java
$ java -version
java version "1.4.2"
gcj (GCC) 3.4.6 20060404 (Red Hat 3.4.6-3)
sorry as I told you before I am not very knowledgeable in Java. Can you tell me what setting
s I need to change to make use of Sun JDK 1.5. Please note I am testing on Linux
. Do I need to test this on a SUN box? Can it not be modify to run on Linux?
Thanks,
Rakesh
STATUS
=======
@CUS -- Waiting for requested information -
XQuery performance in Oracle 10gR2
Hello
I'm actually trying to measure the performances of XQuery FLOWR queries on Oracle 10gR2. For that, I've created a simple table with one integer field (ID) and one XMLType field. This XMLType field contains 10000 documents. The size of these documents varies between 2Kb and 14Kb approximately.
A simple XQuery like below (without "WHERE" or complexe "RETURN" clause) runs quite well and I get the query results in a resonable time.
SELECT xtab.COLUMN_VALUE
FROM contractXDraft, XMLTABLE(
'declare namespace ctxCD="contractX/contractXDraft";for $x in /ctxCD:contractXDraft
return
<response>
Hello
</response>'
PASSING OBJECT_VALUE) xtab;
On the other hand, if a add a "WHERE" clause to filter the results, the query execute time become very long (~hours...) and I must always abort the query execution by killing the "oracle" process because the page file memory used by this process increase linearly ! Here below is represented a such query :
SELECT xtab.COLUMN_VALUE
FROM contractXDraft, XMLTABLE(
'declare namespace ctxCD="contractX/contractXDraft";for $x in /ctxCD:contractXDraft
where $x/ctxCD:ReferenceDetails/ctxCD:ContractHeaderDetails/ctxCD:ContractNumber = "19163-contract657-2.xml"
return
<response>
Hello
</response>'
PASSING OBJECT_VALUE) xtab;
These above queries are executed on a server dedicated for these tests. This server is a Pentium IV 3.2 GHz with 1GB of RAM and Windows 2003 Server Enterprise SP1 is installed on it.
I want to know if someone can give me an explanation about this huge difference of execution time between the two queries above ?? Is it a syntax mistake ? Is it an hardware problem ?
Thank you very much for your help !!
Regards
MDMD,
Sounds like your XMLType field is using unstructured storage (i.e., LOB-based) instead of the structured storage (i.e., O-R based). You can check out the XML DB online doc (http://download-west.oracle.com/docs/cd/B19306_01/appdev.102/b14259/xdb01int.htm#BABECDCF) to learn more about the differences between the two. Essentially, without any associated indexes, using XQuery over unstructured storage will result in a full table scan, which can be very slow when there are large number of rows.
Regards,
Geoff -
10g vs 11g xquery performance with XBRL
Finally,I set up 11g on small notebook with 1G memory.
The result was impresive compared to 10g ,but I need more than that.
I used this query generating 761 rows for testing
SELECT c.seqno,xt.ns,xt.name,nvl(xt.lang,'na') as lang,xt.unit,xt.decimals,
xt.value
FROM FINES_CTX c,FINES_XBRL_CLOB r,
XMLTABLE(
XMLNAMESPACES(
'http://www.xbrl.org/2003/linkbase' AS "link",
'http://www.w3.org/1999/xlink' AS "xlink",
'http://www.w3.org/2001/XMLSchema' AS "xsd",
'http://www.xbrl.org/2003/instance' AS "xbrli",
'http://fss.xbrl.or.kr/kr/br/f/aa/2007-06-30' AS
"fines-f-aa",
'http://fss.xbrl.or.kr/kr/br/b/aa/2007-06-30' AS
"fines-b-aa",
'http://fss.xbrl.or.kr/kr/br/f/ad/2007-06-30' AS
"fines-f-ad",
'http://fss.xbrl.or.kr/kr/br/b/ad/2007-06-30' AS
"fines-b-ad",
'http://fss.xbrl.or.kr/kr/br/f/af/2007-06-30' AS
"fines-f-af",
'http://fss.xbrl.or.kr/kr/br/b/af/2007-06-30' AS
"fines-b-af",
'http://fss.xbrl.or.kr/kr/br/f/ai/2007-06-30' AS
"fines-f-ai",
'http://fss.xbrl.or.kr/kr/br/b/ai/2007-06-30' AS
"fines-b-ai",
'http://fss.xbrl.or.kr/kr/br/f/ak/2007-06-30' AS
"fines-f-ak",
'http://fss.xbrl.or.kr/kr/br/b/ak/2007-06-30' AS
"fines-b-ak",
'http://fss.xbrl.or.kr/kr/br/f/bs/2007-06-30' AS
"fines-f-bs",
'http://fss.xbrl.or.kr/kr/br/b/bs/2007-06-30' AS
"fines-b-bs",
'http://xbrl.org/2005/xbrldt' AS "xbrldt",
'http://www.xbrl.org/2004/ref' AS "ref",
'http://www.xbrl.org/2003/XLink' AS "xl"),
for $item in $doc/xbrli:xbrl/*[not(starts-with(name(),"xbrli:")) and not(starts-with(name(),"link:"))]
where $item/@contextRef
return <item decimals="{$item/@decimals}" contextRef="{$item/@contextRef}" xml:lang="{$item/@xml:lang}" unitRef="{$item/@unitRef}" name="{local-name($item)}" ns="{namespace-uri($item)}">{$item/text()}</item>'
PASSING r.xbrl as "doc"
COLUMNS context_id varchar2(128) PATH '@contextRef',
ns varchar2(128) PATH '@ns',
name varchar2(128) PATH '@name',
lang varchar2(2) PATH '@xml:lang',
unit varchar2(16) PATH '@unitRef',
decimals varchar2(64) PATH '@decimals',
value varchar(256) PATH '.'
) xt
WHERE c.report_cd = r.report_cd and c.finance_cd = r.finance_cd and
c.base_month = r.base_month and c.gubn_cd = r.gubn_cd
and c.seqno = 109299 and c.context_id = xt.context_id
all the tables have 500 rows and non-schema-based xmltype clolumn.
FINES_XBRL_CLOB - xmltype stored as clob
FINES_XBRL_BINARY - xmltype stored as binary with xml index
FINES_XBRL_BINARY_NI - xmltype stored as binary without xml index.
case 1 : run on 10g with XMLType stored as CLOB
time: took 1270 secs.- quite disappointed.
plan: 0 SELECT STATEMENT Optimizer=ALL_ROWS (Cost=26 Card=82 Bytes=173K)
1 0 NESTED LOOPS (Cost=26 Card=82 Bytes=173K)
2 1 NESTED LOOPS (Cost=2 Card=1 Bytes=2K)
3 2 TABLE ACCESS (BY INDEX ROWID) OF 'FINES_CTXB' (TABLE) (Cost=1 Card=1 Bytes=119)
4 3 INDEX (UNIQUE SCAN) OF 'PK_FINES_CTXB' (INDEX (UNIQUE)) (Cost=1 Card=1)
5 2 TABLE ACCESS (BY INDEX ROWID) OF 'FINES_XBRLB' (TABLE) (Cost=1 Card=82 Bytes=164K)
6 5 INDEX (UNIQUE SCAN) OF 'PK_FINES_XBRLB' (INDEX (UNIQUE)) (Cost=0 Card=1)
7 1 COLLECTION ITERATOR (PICKLER FETCH) OF 'SYS.XQSEQUENCEFROMXMLTYPE' (PROCEDURE)
case 2: run on 11g with XMLType stored as CLOB
time: took 27 secs. - almost 50 times faster
plan:
0 SELECT STATEMENT Optimizer=ALL_ROWS (Cost=114 Card=1 Bytes=2K)
1 0 FILTER
2 1 NESTED LOOPS (Cost=32 Card=82 Bytes=173K)
3 2 NESTED LOOPS (Cost=3 Card=1 Bytes=2K)
4 3 TABLE ACCESS (BY INDEX ROWID) OF 'FINES_CTX' (TABLE) (Cost=2 Card=1 Bytes=119)
5 4 INDEX (UNIQUE SCAN) OF 'PK_FINES_CTX' (INDEX (UNIQUE)) (Cost=1 Card=1)
6 3 TABLE ACCESS (BY INDEX ROWID) OF 'FINES_XBRL_CLOB' (TABLE) (Cost=1 Card=5K Bytes=10M)
7 6 INDEX (UNIQUE SCAN) OF 'PK_FINES_XBRL_CLOB' (INDEX (UNIQUE)) (Cost=0 Card=1)
8 2 COLLECTION ITERATOR (PICKLER FETCH) OF 'SYS.XMLSEQUENCEFROMXMLTYPE' (PROCEDURE)
9 1 COLLECTION ITERATOR (PICKLER FETCH) OF 'SYS.XQSEQUENCEFROMXMLTYPE' (PROCEDURE)
case 3: run on 11g with XMLType stored as BINARY no XMLIndex
time: 10 secs (9.6 sec exactly) , 120 times faster..
0 SELECT STATEMENT Optimizer=ALL_ROWS (Cost=113 Card=1 Bytes=2K)
1 0 FILTER
2 1 NESTED LOOPS (Cost=33 Card=80 Bytes=169K)
3 2 NESTED LOOPS (Cost=3 Card=1 Bytes=2K)
4 3 TABLE ACCESS (BY INDEX ROWID) OF 'FINES_CTX' (TABLE) (Cost=2 Card=1 Bytes=119)
5 4 INDEX (UNIQUE SCAN) OF 'PK_FINES_CTX' (INDEX (UNIQUE)) (Cost=1 Card=1)
6 3 TABLE ACCESS (BY INDEX ROWID) OF 'FINES_XBRL_BINARY_NI' (TABLE) (Cost=1 Card=82 Bytes=164K)
7 6 INDEX (UNIQUE SCAN) OF 'PK_FINES_BINARY_XBRL_NI' (INDEX (UNIQUE)) (Cost=0 Card=1)
8 2 XPATH EVALUATION
9 1 XPATH EVALUATION
CREATE INDEX fines_xbrl_binary_ix ON fines_xbrl_binary (xbrl) INDEXTYPE IS XDB.XMLIndex
case 4: run on 11g with XMLType stored as BINARY and XMLIndex
time: 574 secs. - oops...not good.
plan: quite long..
0 SELECT STATEMENT Optimizer=ALL_ROWS (Cost=16 Card=1 Bytes=5K)
1 0 FILTER
2 1 NESTED LOOPS
3 2 NESTED LOOPS (Cost=4 Card=1 Bytes=4K)
4 3 TABLE ACCESS (BY INDEX ROWID) OF 'XDB.X$PT1MP1MWL3978FCE0G24J0CM85AM' (TABLE) (Cost=0 Card=1 Bytes=1008)
5 4 INDEX (RANGE SCAN) OF 'XDB.X$PR1MP1MWL3978FCE0G24J0CM85AM' (INDEX (UNIQUE)) (Cost=0 Card=1)
6 3 INDEX (RANGE SCAN) OF 'SYS69876_FINES_XBRL_PATHID_IX' (INDEX) (Cost=2 Card=3)
7 2 TABLE ACCESS (BY INDEX ROWID) OF 'SYS69876_FINES_XBRL_PATH_TABLE' (TABLE) (Cost=4 Card=1 Bytes=3K)
8 0 FILTER
9 8 NESTED LOOPS
10 9 NESTED LOOPS (Cost=4 Card=1 Bytes=4K)
11 10 TABLE ACCESS (BY INDEX ROWID) OF 'XDB.X$PT1MP1MWL3978FCE0G24J0CM85AM' (TABLE) (Cost=0 Card=1 Bytes=1008)
12 11 INDEX (RANGE SCAN) OF 'XDB.X$PR1MP1MWL3978FCE0G24J0CM85AM' (INDEX (UNIQUE)) (Cost=0 Card=1)
13 10 INDEX (RANGE SCAN) OF 'SYS69876_FINES_XBRL_PATHID_IX' (INDEX) (Cost=2 Card=3)
14 9 TABLE ACCESS (BY INDEX ROWID) OF 'SYS69876_FINES_XBRL_PATH_TABLE' (TABLE) (Cost=4 Card=1 Bytes=3K)
15 0 FILTER
16 15 NESTED LOOPS
17 16 NESTED LOOPS (Cost=4 Card=1 Bytes=4K)
18 17 TABLE ACCESS (BY INDEX ROWID) OF 'XDB.X$PT1MP1MWL3978FCE0G24J0CM85AM' (TABLE) (Cost=0 Card=1 Bytes=1008)
19 18 INDEX (RANGE SCAN) OF 'XDB.X$PR1MP1MWL3978FCE0G24J0CM85AM' (INDEX (UNIQUE)) (Cost=0 Card=1)
20 17 INDEX (RANGE SCAN) OF 'SYS69876_FINES_XBRL_PATHID_IX' (INDEX) (Cost=2 Card=3)
21 16 TABLE ACCESS (BY INDEX ROWID) OF 'SYS69876_FINES_XBRL_PATH_TABLE' (TABLE) (Cost=4 Card=1 Bytes=3K)
22 0 FILTER
23 22 NESTED LOOPS
24 23 NESTED LOOPS (Cost=4 Card=1 Bytes=4K)
25 24 TABLE ACCESS (BY INDEX ROWID) OF 'XDB.X$PT1MP1MWL3978FCE0G24J0CM85AM' (TABLE) (Cost=0 Card=1 Bytes=1008)
26 25 INDEX (RANGE SCAN) OF 'XDB.X$PR1MP1MWL3978FCE0G24J0CM85AM' (INDEX (UNIQUE)) (Cost=0 Card=1)
27 24 INDEX (RANGE SCAN) OF 'SYS69876_FINES_XBRL_PATHID_IX' (INDEX) (Cost=2 Card=3)
28 23 TABLE ACCESS (BY INDEX ROWID) OF 'SYS69876_FINES_XBRL_PATH_TABLE' (TABLE) (Cost=4 Card=1 Bytes=3K)
29 0 SORT (AGGREGATE) (Card=1 Bytes=3K)
30 29 FILTER
31 30 TABLE ACCESS (BY INDEX ROWID) OF 'SYS69876_FINES_XBRL_PATH_TABLE' (TABLE) (Cost=5 Card=32 Bytes=110K)
32 31 INDEX (RANGE SCAN) OF 'SYS69876_FINES_XBRL_ORDKEY_IX' (INDEX) (Cost=3 Card=92)
33 0 FILTER
34 33 NESTED LOOPS
35 34 NESTED LOOPS (Cost=4 Card=1 Bytes=4K)
36 35 TABLE ACCESS (BY INDEX ROWID) OF 'XDB.X$PT1MP1MWL3978FCE0G24J0CM85AM' (TABLE) (Cost=0 Card=1 Bytes=1008)
37 36 INDEX (RANGE SCAN) OF 'XDB.X$PR1MP1MWL3978FCE0G24J0CM85AM' (INDEX (UNIQUE)) (Cost=0 Card=1)
38 35 INDEX (RANGE SCAN) OF 'SYS69876_FINES_XBRL_PATHID_IX' (INDEX) (Cost=2 Card=3)
39 34 TABLE ACCESS (BY INDEX ROWID) OF 'SYS69876_FINES_XBRL_PATH_TABLE' (TABLE) (Cost=4 Card=1 Bytes=3K)
40 0 FILTER
41 40 NESTED LOOPS
42 41 NESTED LOOPS (Cost=4 Card=1 Bytes=4K)
43 42 TABLE ACCESS (BY INDEX ROWID) OF 'XDB.X$PT1MP1MWL3978FCE0G24J0CM85AM' (TABLE) (Cost=0 Card=1 Bytes=1008)
44 43 INDEX (RANGE SCAN) OF 'XDB.X$PR1MP1MWL3978FCE0G24J0CM85AM' (INDEX (UNIQUE)) (Cost=0 Card=1)
45 42 INDEX (RANGE SCAN) OF 'SYS69876_FINES_XBRL_PATHID_IX' (INDEX) (Cost=2 Card=3)
46 41 TABLE ACCESS (BY INDEX ROWID) OF 'SYS69876_FINES_XBRL_PATH_TABLE' (TABLE) (Cost=4 Card=1 Bytes=3K)
-- continue....
With very limited test case, I personally concluded that oracle 11g's engine related XML is much better than 10g, especially when using Binary type ,getting additional performance boost.
xbrl document is basically flat ,not heirarchical structured, that makes XMLIndex inefficient ,I guess.
Is there any good way to use XMLIndex more efficient just with this kind of case ?
Please point out anything I can do more.
thanks.I guess you meant to say / instead of "...oracle 11g's engine related XML is much better than 10g..." - "oracle 11g's XQuery engine related XML is much better than 10g"...
Did you create the XMLIndex as described (case 4)...
CREATE INDEX fines_xbrl_binary_ix ON fines_xbrl_binary (xbrl) INDEXTYPE IS XDB.XMLIndexIn different words, you didn't use "path subsetting" (http://www.liberidu.com/blog/?p=242) ?
I guess you created statistics ?
Thanks for sharing !!! -
XQuery Performance in BerkeleyDB
We are migrating from IPedo to Berkeley DB.
IPedo did not support multiple indices in their Xqueries, so we had to
concatenate some fields in to one field and index that field, the
Xqueries were really fast.
Unfortunately the same XQuery does not perform well in BerkeleyDB.
This is how we create the index for this filed (ContentKey) in
BerkeleyDB
addIndex '' 'ContentKey' edge-element-equality-string
and this is how I query using Java API.
queryContext.setEvaluationType(XmlQueryContext.Eager);
queryContext.setVariableValue("ContentKey", new XmlValue(
"a0a0188000001115348efcc00000003XXXXXXXXXXXXXYYYYYYYYYYYYYY"));
// Declare the query string
String myQuery = "collection('db/title')/Record[ContentKey=
$ContentKey]";
// Prepare (compile) the query
XmlQueryExpression xmlQueryExpression =
dbManager.prepare(myQuery,queryContext);
1. What is wrong with the index or the way I am using the Java API ?
Changing the evaluation type to Lazy did not help at all.
2. The Query performs OK in dbxml.
3. Are there any other commercial/open source tools to evaluate the
performance of a Xquery in BerkeleyDB? Stylus Studio does not support
BDB - 2.3.10 yet.
Any help would be appreciated.
Thanks,
SureshHi John:
Thanks for your mail.
I did declare variable as xs:string external, it did not work. I heard from other engineers in the group that since 2.3.8, “external” variables in BerkeleyDB stopped working. We are using 2.3.10.
I also noticed that the query plan when we using external variables is not valid XML (<GlobalVar name="var external="true">). I hope this is just a toString() issue and nothing major.
I have attached the query plans; I do not see anything different between the two. I would really appreciate your help on this.
Thanks,
Suresh
Here are the query plans:
Query that executes in 2 ms (which has the hard coded value):
Query:
String myQuery = "declare namespace tf = \"http://aplaud.com/ns/0.1/tts/format\";" +
"count (collection('db/title')/Record[ContentKey=\"a0a0188000001115348efcc00000003http://daxweb.org/ns/1.0/taxonomy/Product Type/Gift Receipt\"]/tf:TitleDocument/tf:Title/Content/Detail/GiftInfo/Gift)";
Query Plan:
<XQuery>
<Function name="{http://www.w3.org/2005/xpath-functions}:count">
<DocumentOrder>
<DbXmlNav>
<LookupIndex container="db/title">
<ValueQP index="edge-element-equality-string" operation="eq" parent="Record" child="ContentKey" value="a0a0188000001115348efcc00000003http://daxweb.org/ns/1.0/taxonomy/Product Type/Gift Receipt"/>
</LookupIndex>
<Join type="parent-of-child" return="argument">
<DbXmlNav>
<QueryPlanFunction result="collection" container="db/title">
<OQPlan>V(edge-element-equality-string,Record.ContentKey,=,'a0a0188000001115348efcc00000003http://daxweb.org/ns/1.0/taxonomy/Product Type/Gift Receipt')</OQPlan>
</QueryPlanFunction>
<DbXmlStep axis="child" name="Record" nodeType="element"/>
</DbXmlNav>
</Join>
<DbXmlStep axis="child" prefix="tf" uri="http://aplaud.com/ns/0.1/tts/format" name="TitleDocument" nodeType="element"/>
<DbXmlStep axis="child" prefix="tf" uri="http://aplaud.com/ns/0.1/tts/format" name="Title" nodeType="element"/>
<DbXmlStep axis="child" name="Content" nodeType="element"/>
<DbXmlStep axis="child" name="Detail" nodeType="element"/>
<DbXmlStep axis="child" name="GiftInfo" nodeType="element"/>
<DbXmlStep axis="child" name="Gift" nodeType="element"/>
</DbXmlNav>
</DocumentOrder>
</Function>
</XQuery>
Query that executes takes 4 seconds (which has the declared var as xs:string external):
Query:
String myQuery = "declare namespace tf = \"http://aplaud.com/ns/0.1/tts/format\"; declare variable $var as xs:string external;" + "count (collection('db/title')/Record[ContentKey=$var]/tf:TitleDocument/tf:Title/Content/Detail/GiftInfo/Gift)";
Query Plan:
<XQuery>
<GlobalVar name="var external="true">
<SequenceType occurrence="exactly_one" testType="atomic-type" type="http://www.w3.org/2001/XMLSchema:string"/>
</GlobalVar>
<Function name="{http://www.w3.org/2005/xpath-functions}:count">
<DocumentOrder>
<DbXmlNav>
<LookupIndex container="db/title">
<ValueQP index="edge-element-equality-string" operation="eq" parent="Record" child="ContentKey">
<Variable name="var"/>
</ValueQP>
</LookupIndex>
<Join type="parent-of-child" return="argument">
<DbXmlNav>
<QueryPlanFunction result="collection" container="db/title">
<OQPlan>P(edge-element-equality-string,prefix,Record.ContentKey)</OQPlan>
</QueryPlanFunction>
<DbXmlStep axis="child" name="Record" nodeType="element"/>
</DbXmlNav>
</Join>
<DbXmlStep axis="child" prefix="tf" uri="http://aplaud.com/ns/0.1/tts/format" name="TitleDocument" nodeType="element"/>
<DbXmlStep axis="child" prefix="tf" uri="http://aplaud.com/ns/0.1/tts/format" name="Title" nodeType="element"/>
<DbXmlStep axis="child" name="Content" nodeType="element"/>
<DbXmlStep axis="child" name="Detail" nodeType="element"/>
<DbXmlStep axis="child" name="GiftInfo" nodeType="element"/>
<DbXmlStep axis="child" name="Gift" nodeType="element"/>
</DbXmlNav>
</DocumentOrder>
</Function>
</XQuery> -
XQuery Performance in BerkeleyDB More options
We are migrating from IPedo to Berkeley DB.
IPedo did not support multiple indices in their Xqueries, so we had to
concatenate some fields in to one field and index that field, the
Xqueries were really fast.
Unfortunately the same XQuery does not perform well in BerkeleyDB.
This is how we create the index for this filed (ContentKey) in
BerkeleyDB
addIndex '' 'ContentKey' edge-element-equality-string
and this is how I query using Java API.
queryContext.setEvaluationType(XmlQueryContext.Eager);
queryContext.setVariableValue("ContentKey", new XmlValue(
"a0a0188000001115348efcc00000003XXXXXXXXXXXXXYYYYYYYYYYYYYY"));
// Declare the query string
String myQuery = "collection('db/title')/Record[ContentKey=
$ContentKey]";
// Prepare (compile) the query
XmlQueryExpression xmlQueryExpression =
dbManager.prepare(myQuery,queryContext);
1. What is wrong with the index or the way I am using the Java API ?
Changing the evaluation type to Lazy did not help at all.
2. The Query performs OK in dbxml.
3. Are there any other commercial/open source tools to evaluate the
performance of a Xquery in BerkeleyDB? Stylus Studio does not support
BDB - 2.3.10 yet.
Any help would be appreciated.
Thanks,
SureshHi,
I'm sorry, you're in the wrong forum. Please post to the Berkeley DB XML forum:
Berkeley DB XML
Thanks,
Mark -
Hi,
I have a question about XQuery 's performance and its java applications.
I have a bulky flat file, about 500 MB that I have to parse/use. I already made an XML representation for its entries. The problem is that the project is still new, and it would be easier for me to manipulate a 'data definition' in XML rather than in a DBMS. I have to admit that the tree structure of a 'node' from this file is not really very deep.
- Is it faster to search an XML file using XQuery than to search a flat file using techniques like regular expressions, String class methods, ...etc?
- Would a DBMS be faster than both?
- Any free and reliable java APIs for XQuery (feedback from someone who has actually used it)?
Thanks.Your questions about performance can't be answered because they are highly dependent on your data and the code you write.
As for the DBMS versus XML question, I would prefer a DBMS if my data fit nicely into tables, but if it were tree-structured I would consider XML. But most XML search software likes to load the entire tree into memory, so 500 MB is going to be hard to deal with. In this case I would seriously consider a database.
As for implementations of XQuery, if I wanted one I would use Michael Kay's SAXON product which implements XQuery and XSLT 2. I haven't used it myself but I have used its earlier incarnation which was XSLT only, and following its mailing list leads me to believe it is reliable. The schema-aware version costs money but there's a free version that doesn't do schemas. -
Oracle XQuery performance issue in XMLType column
Dear All,
As for oracle I'm using oracle 11g to measure the performance.
I'm using data from XMark project which is a >100MB data of XML for bencmarking purposes.
I make a table that contains an XMLType column and upload the data into that column, after doing that I try to do a query like this:
select xmlquery(
'for $i in /site/people/person
where $i/id = "person0"
return $i/name'
passing BookXMLContent Returning Content)
from Book;
The purpose of this query is to retrieve the name of a person that have id = 'person0'
My questions are:
1. Did I do something wrong with my query?
2. Is there any setting on the database that I should done prior to doing the query to done significantly better result?
3. Is there any other approach that are much better than I currently used?
Regards,
Anthony Steven
Edited by: mdrake on Nov 4, 2009 6:01 AMAnthony
First, please read the licencing terms for Oracle ( And I suspect DB2, MSFT) . You are not allowed to publish externally (in any form, including forum posts :) ) the results of any benchmarking activities. I have edited your post accordingly. I hope this research is not part of a thesis or similar work that would intend making public as you and your institution would be in violation of your licence agreeement were you to do so.
Now back to your question, how can you improve performance for XMark
#1. Can you show us the create table statement you used, so we can see how you created your XMLType column BOOKXMLCONTENT.
#2. Did you create any indexes
#3. Did you look at the explain plan output.
-Mark
Edited by: mdrake on Nov 4, 2009 6:06 AM -
XQuery performance vs. Lucene
I'm trying to tune some databases for better query performance. I'm using the BDB XML Java API. The queries I'm trying to tune all return counts, i.e. count(...). Presently a query might take about 500ms to execute, but ideally it would take 50-100ms. I have indexes defined on all the elements/attributes I'm referencing in the queries, but still I can't get them to execute any faster. I was hoping to achieve the 50-100ms query time by comparing the execution time of Lucene queries on a similar dataset, i.e. I take the same set of data and then both index it with Lucene and also store it in BDB XML, then run equivalent queries in each. Lucene consistently can execute in the 50ms range, and BDB XML consistently 10x-20x slower.
Is this just an inherit property of BDB XML? Should the btree indices in BDB execute in the same order of time as Lucene indices, or are my expectations too high? I realize this is highly dependent on the queries, but again BDB is using indexes in all its lookups, and the query can be expressed as an simple XPath.
I have tweaked my BDB cache settings, but db_stat lists 99% cache hits, like this:
31MB 256KB 740B Total cache size
1 Number of caches
31MB 264KB Pool individual cache size
0 Maximum memory-mapped file size
0 Maximum open file descriptors
0 Maximum sequential buffer writes
0 Sleep after writing maximum sequential buffers
55 Requested pages mapped into the process' address space
169M Requested pages found in the cache (99%)
145924 Requested pages not found in the cache
24572 Pages created in the cache
145924 Pages read into the cache
97563 Pages written from the cache to the backing file
148863 Clean pages forced from the cache
17646 Dirty pages forced from the cache
0 Dirty pages written by trickle-sync thread
3971 Current total page count
3952 Current clean page count
19 Current dirty page count
4099 Number of hash buckets used for page location
168M Total number of times hash chains searched for a page (168995875)
6 The longest hash chain searched for a page
274M Total number of hash chain entries checked for page (274808266)
0 The number of hash bucket locks that required waiting (0%)
0 The maximum number of times any hash bucket lock was waited for
0 The number of region locks that required waiting (0%)
170665 The number of page allocations
334509 The number of hash buckets examined during allocations
9 The maximum number of hash buckets examined for an allocation
166509 The number of pages examined during allocations
2 The max number of pages examined for an allocation
Is there anything I'm missing, configuration-wise perhaps?Thanks for the info, John. Without changing the index definitions, changing the query to
count(collection('sales.dbxml')//als:match-back-matches[@sale-month=200512 and @sale-model='Jetta'])
with a query plan of
n(V(node-attribute-equality-string,@sale-model,=,'Jetta'),V(node-attribute-equality-decimal,@sale-month,=,'200512'),P(node-element-presence-none,=,match-back-matches:http://autoleadservice.com/xml/als))
seems to make the query consistently on the low-end of the previous query's speed, meaning around 3100ms for 11415 results.
Changing the query to
count(collection('sales.dbxml')//als:match-back-matches[@sale-month=200512][@sale-model='Jetta'])
with a query plan of
n(V(node-attribute-equality-decimal,@sale-month,=,'200512'),V(node-attribute-equality-string,@sale-model,=,'Jetta'),P(node-element-presence-none,=,match-back-matches:http://autoleadservice.com/xml/als))
does not seem to make any difference in speed.
Then I added the edge indices as you described, and for the same previous query the query plan becomes
n(V(edge-attribute-equality-decimal,match-back-matches:http://autoleadservice.com/xml/als.@sale-month,=,'200512'),V(edge-attribute-equality-string,match-back-matches:http://autoleadservice.com/xml/als.@sale-model,=,'Jetta'))
I saw this execute in as little as 2625ms... still 187x longer than Lucene.
I appreciate how Lucene and BDB XML are quite different and the things pointed out in this thread have been very helpful. I only mean to compare them in this very simplified view of direct index lookups, and I wanted to know if it would be reasonable to expect BDB XML index lookups, for queries as similar as possible to a Lucene index query, could perform in the same order of time.
For reference to the XML I'm using, I have a XML collection defined with 170,000-ish documents loaded that look similar to the XML below. The Lucene index I'm querying contains all all of the same 170,000 documents as well as some more data loaded bringing it's collection to about 195,000 documents.
<als:match-back-matches xmlns:als="http://autoleadservice.com/xml/als" direct-sale="true"
has-match="true" sale-area="18 " sale-date="2005-12-14-05:00" sale-day="20051214"
sale-dealer="409460" sale-model="Jetta" sale-month="200512" sale-region="MAR"
sale-year-month="2005-12-05:00" vin="XXX">
<als:match lead-area="18 " lead-date="2005-12-12-05:00" lead-day="20051212"
lead-dealer="409460" lead-id="196973" lead-model="Jetta" lead-month="200512"
lead-region="MAR" lead-source="cobalt-vw" lead-unique-all="true" lead-unique-area="true"
lead-unique-dealer="true" lead-unique-region="true" lead-year-month="2005-12-05:00"
match-range="0-30" owner-address="1200 Main St." owner-alternate-phone="555-863-7264"
owner-email="[email protected]" owner-first-name="rani" owner-last-name="adzarne"
owner-phone="703-742-0900" owner-postal-code="10191"/>
<als:match lead-area="18 " lead-date="2005-12-12-05:00" lead-day="20051212"
lead-dealer="409460" lead-id="197007" lead-model="Jetta" lead-month="200512"
lead-region="MAR" lead-source="vw.com" lead-unique-all="false" lead-unique-area="true"
lead-unique-dealer="false" lead-unique-region="false" lead-year-month="2005-12-05:00"
match-range="0-30" owner-address="1200 Main St." owner-email="[email protected]"
owner-first-name="rani" owner-last-name="zarnegar" owner-postal-code="20191"/>
</als:match-back-matches>
<als:match-back-matches xmlns:als="http://autoleadservice.com/xml/als" direct-sale="true"
has-match="true" sale-area="29 " sale-date="2005-12-29-05:00" sale-day="20051229"
sale-dealer="425213" sale-model="Jetta" sale-month="200512" sale-region="SER"
sale-year-month="2005-12-05:00" vin="YYY">
<als:match lead-area="29 " lead-date="2005-12-14-05:00" lead-day="20051214"
lead-dealer="425213" lead-id="199347" lead-model="Jetta" lead-month="200512"
lead-region="SER" lead-source="edmunds" lead-unique-all="true" lead-unique-area="true"
lead-unique-dealer="true" lead-unique-region="true" lead-year-month="2005-12-05:00"
match-range="0-30" owner-email="[email protected]" owner-first-name="Monique"
owner-last-name="single" owner-phone="555-495-8933" owner-postal-code="60130"/>
</als:match-back-matches>
and the indexes I have defined presently are:
Default Index: node-element-presence-none
Index: node-attribute-equality-boolean for node {}:captured-sale
Index: node-attribute-equality-boolean for node {}:direct-sale
Index: node-attribute-equality-boolean for node {}:has-match
Index: node-attribute-equality-string for node {}:lead-area
Index: node-attribute-equality-date for node {}:lead-date
Index: node-attribute-equality-decimal for node {}:lead-day
Index: node-attribute-equality-string for node {}:lead-dealer
Index: edge-attribute-equality-decimal for node {}:lead-id
Index: node-attribute-equality-string for node {}:lead-model
Index: node-attribute-equality-decimal for node {}:lead-month
Index: node-attribute-equality-string for node {}:lead-region
Index: node-attribute-equality-string for node {}:lead-source
Index: node-attribute-equality-yearMonth for node {}:lead-year-month
Index: node-attribute-equality-string for node {}:match-range
Index: unique-node-metadata-equality-string for node {http://www.sleepycat.com/2002/dbxml}:name
Index: node-attribute-equality-string for node {}:sale-area
Index: node-attribute-equality-date for node {}:sale-date
Index: node-attribute-equality-decimal for node {}:sale-day
Index: node-attribute-equality-string for node {}:sale-dealer
Index: node-attribute-equality-string edge-attribute-equality-string for node {}:sale-model
Index: node-attribute-equality-decimal edge-attribute-equality-decimal for node {}:sale-month
Index: node-attribute-equality-string for node {}:sale-region
Index: node-attribute-equality-yearMonth for node {}:sale-year-month
Index: node-attribute-equality-string for node {}:vin -
I am experiencing performance problems when inserting a 30 MB XML file into an XMLTYPE field - under Oracle 11 with the schema I am using the minimum time I can achieve is around 9 minutes which is too long... can anyone comment on whether this performance is normal and possibly suggest how it could be improved while retaining the benefits of structured storage...thanks in advance for the help :)
sorry for the late reply - I didn't notice that you had replied to my earlier post...
To answer your questions in order:
- I am using "structured" storage because I read ( in this article: [http://www.oracle.com/technology/pub/articles/jain-xmldb.html] ) that this would result in higher xquery performance.
- the schema isn't very large but it is complex. ( as discussed in above article )
I built my table by first registering the schema and then adding the xml elements to the table such that they would be stored in structured storage. i.e.
--// Register schema /////////////////////////////////////////////////////////////
begin
dbms_xmlschema.registerSchema(
schemaurl=>'fof_fob.xsd',
schemadoc=>bfilename('XFOF_DIR','fof_fob.xsd'),
local=>TRUE,
gentypes=>TRUE,
genbean=>FALSE,
force=>FALSE,
owner=>'FOF',
csid=>nls_charset_id('AL32UTF8')
end;
COMMIT;
and then created the table using ...
--// Create the XCOMP table /////////////////////////////////////////////////////////////
create table "XCOMP" (
"type" varchar(128) not null,
"id" int not null,
"idstr1" varchar(50),
"idstr2" varchar(50),
"name" varchar(255),
"rev" varchar(20) not null,
"tstamp" varchar(30) not null,
"xmlfob" xmltype)
XMLTYPE "xmlfob" STORE AS OBJECT RELATIONAL
XMLSCHEMA "fof_fob.xsd"
ELEMENT "FOB";
No indexing was specified for this table. Then I inserted the offending 30 MB xml file using (in c#, using ODP.NET under .NET 3.5):
void test(string myName, XElement myXmlElem)
OracleConnection connection = new OracleConnection();
connection.Open();
string statement = "INSERT INTO XCOMP ( \"name\", \"xmlfob\"") values( :1, :2 )";
XDocument xDoc = new XDocument(new XDeclaration("1.0", "utf-8", "yes"), myXmlElem);
OracleCommand insCmd = new OracleCommand(statement, connection);
OracleXmlType xmlinfo = new OracleXmlType(connection, xDoc.CreateReader());
insCmd.Parameters.Add(FofDbCmdInsert.Name, OracleDbType.Varchar2, 255);
insCmd.Parameters.Add(FofDbCmdInsert.Xmldoc, OracleDbType.XmlType);
insCmd.Parameters[0].Value = myName;
insCmd.Parameters[1].Value = xmlinfo;
insCmd.ExecuteNonQuery();
connection.Close();
It took around 9 minutes to execute the ExecuteNonQuery statement, usingOracle 11 standard edition running under Windows 2008-64 with 8 GB RAM and 2.5 MHZ single core ( of a quad-core running under VMWARE )
I would much appreciate any suggestions that could speed up the insert performance here - as a temporary solution I chopped some of the information out of the XML document and store it seperately in another table, but this approach has the disadvantage that I using xqueries is a bit inflexible, although the performance is now in seconds rather than minutes...
I can't see any reason why Oracle's shredding mechanism should be less efficient than manual shredding the information.
Thanks in advance for any helpful hints you can provide! -
Generating large amounts of XML without running out of memory
Hi there,
I need some advice from the experienced xdb users around here. I´m trying to map large amounts of data inside the DB (Oracle 11.2.0.1.0) and by large I mean files up to several GB. I compared the "low level" mapping via PL/SQL in combination with ExtractValue/XMLQuery with the elegant XML View Mapping and the best performance gave me the View Mapping by using the XMLTABLE XQuery PATH constructs. So now I have a View that lies on several BINARY XMLTYPE Columns (where the XML files are stored) for the mapping and another view which lies above this Mapping View and constructs the nested XML result document via XMLELEMENT(),XMLAGG() etc. Example Code for better understanding:
CREATE OR REPLACE VIEW MAPPING AS
SELECT type, (...) FROM XMLTYPE_BINARY, XMLTABLE ('/ROOT/ITEM' passing xml
COLUMNS
type VARCHAR2(50) PATH 'for $x in .
let $one := substring($x/b012,1,1)
let $two := substring($x/b012,1,2)
return
if ($one eq "A")
then "A"
else if ($one eq "B" and not($two eq "BJ"))
then "AA"
else if (...)
CREATE OR REPLACE VIEW RESULT AS
select XMLELEMENT("RESULTDOC",
(SELECT XMLAGG(
XMLELEMENT("ITEM",
XMLFOREST(
type "ITEMTYPE",
) as RESULTDOC FROM MAPPING;
----------------------------------------------------------------------------------------------------------------------------Now all I want to do is materialize this document by inserting it into a XMLTYPE table/column.
insert into bla select * from RESULT;
Sounds pretty easy but can´t get it to work, the DB seems to load a full DOM representation into the RAM every time I perform a select, insert into or use the xmlgen tool. This Representation takes more than 1 GB for a 200 MB XML file and eventually I´m running out of memory with an
ORA-19202: Error occurred in XML PROCESSING
ORA-04030: out of process memory
My question is how can I get the result document into the table without memory exhaustion. I thought the db would be smart enough to generate some kind of serialization/datastream to perform this task without loading everything into the RAM.
Best regardsThe file import is performed via jdbc, clob and binary storage is possible up to several GB, the OR storage gives me the ORA-22813 when loading files with more than 100 MB. I use a plain prepared statement:
File f = new File( path );
PreparedStatement pstmt = CON.prepareStatement( "insert into " + table + " values ('" + id + "', XMLTYPE(?) )" );
pstmt.setClob( 1, new FileReader(f) , (int)f.length() );
pstmt.executeUpdate();
pstmt.close(); DB version is 11.2.0.1.0 as mentioned in the initial post.
But this isn´t my main problem, the above one is, I prefer using binary xmltype anyway, much easier to index. Anyone an idea how to get the large document from the view into a xmltype table? -
How can define an outer join in the where clause of a flowr statement?
Hi- In the sample below I'm joining two views based on username but in this case what I really want to use is an outer join instead. What is the syntax for that? I tried the (+) notation but that didn't seem to work..
CREATE OR REPLACE PROCEDURE proc_ctsi_all is
XMLdoc XMLType;
BEGIN
DBMS_XDB.deleteResource('/public/CTSI/ctsi_all_rpt1.xml',1);
SELECT XMLQuery(
'<Progress_Report>
<Personnel_Roster>
{for $c in ora:view("CTSI_INVEST_NONPHS_SOURCE_V"),
$cphs in ora:view("CTSI_INVEST_PHS_SOURCE_V")
let $username := $c/ROW/COMMONS_USERNAME/text(),
$expertise := $c/ROW/AREA_OF_EXPERTISE/text(),
$phsorg := $cphs/ROW/PHS_ORGANIZATION/text(),
$activitycode := $cphs/ROW/ACTIVITY_CODE/text(),
$username2 := $cphs/ROW/COMMONS_USERNAME/text()
where $username eq $username2
return
<Investigator>
<Commons_Username>{$username}</Commons_Username>
<Area_of_Expertise>{$expertise}</Area_of_Expertise>
<Federal_PHS_Funding>
<Organization>{$phsorg}</Organization>
<Activity_Code>{$activitycode}</Activity_Code>
<Six_Digit_Grant_Number>{$grantnumber}</Six_Digit_Grant_Number>
</Federal_PHS_Funding>
</Investigator>}
</Personnel_Roster>
</Progress_Report>'
RETURNING CONTENT) INTO XMLdoc FROM DUAL;
IF(DBMS_XDB.CREATERESOURCE('/public/CTSI/ctsi_all_rpt1.xml', XMLdoc)) THEN
DBMS_OUTPUT.PUT_LINE('Resource is created');
ELSE
DBMS_OUTPUT.PUT_LINE('Cannot create resource');
END IF;
COMMIT;
END;
/What you could do is use query within an XMLTable syntax. Via the COLUMNS parameter you then pass the "column" as XMLType to the following XMLtable statement.
little bit like the following
select hdjfdf
from xmltable
({xquery}
PASSING
COLUMNS xmlfrag xmltype path 'xxx'
) a
, xmltable
({the other stuff you need}
PASSING a.xmlfrag
...etc
...etc I guess something simular can be done via XQuery straight away as well -
I have a clob column with XML data.
<attrs><attr name="ESB_Availability_Status"><string>D</string></attr><attr name="ESB_Available_Stock"><int>0</int></attr><attr name="ESB_IsTaxable"><boolean>true</boolean></attr><attr name="ESB_isLeaseAvailable"><boolean>true</boolean></attr></attrs>
When I use the following query it does not match and find any rows.
SELECT
extractValue(
XmlType(attributes),
'/attrs/attr[@name="ESB_Availability_Status"]/string'
) AS ESB_Availability_Status
FROM
MyTable
WHERE
CONTAINS(
attributes,
'{D} INPATH (/attrs/attr[@name="ESB_Availability_Status"]/string)'
) > 0
But when I update the column with data like this with value P (or for that matter any other charcter N,DQ etc.). It retrieves data.
<attrs><attr name="ESB_Availability_Status"><string>P</string></attr><attr name="ESB_Available_Stock"><int>0</int></attr><attr name="ESB_IsTaxable"><boolean>true</boolean></attr><attr name="ESB_isLeaseAvailable"><boolean>true</boolean></attr></attrs>
SELECT
extractValue(
XmlType(attributes),
'/attrs/attr[@name="ESB_Availability_Status"]/string'
) AS ESB_Availability_Status
FROM
MyTable
WHERE
CONTAINS(
attributes,
'{P} INPATH (/attrs/attr[@name="ESB_Availability_Status"]/string)'
) > 0
What is happening with the comparison term?As this question has nothing to do with the XML DB, you have lowered your chance of getting the answer you seek.
I think you might be looking for
https://forums.oracle.com/community/developer/english/oracle_database/text
Without knowing your version, or apparently having an index setup like you do what about something like
SELECT
extractValue(
XmlType(attributes),
'/attrs/attr[@name="ESB_Availability_Status"]/string[text()="D"]'
) AS ESB_Availability_Status
FROM
MyTable
which does return empty rows if the condition is not meet or
SELECT *
FROM (SELECT
extractValue(
XmlType(attributes),
'/attrs/attr[@name="ESB_Availability_Status"]/string'
) AS ESB_Availability_Status
FROM
MyTable)
WHERE ESB_Availability_Status = 'D';
Of course there are also XMLTable/XQuery based approaches as well if you so desire. -
Re: weird behaviour xml query
From the XML DB FAQ (#5 in the announcement list)
How to I use namespaces with XMLQuery() ?
How do I declare namespace prefix mapping with XMLTable() ?
Your XML has a default namespace associated with it, so you will need to supply the default namespace to XMLTable/XQuery as well. Also the XPaths in the COLUMNS clause are relative to the XPath for the XMLTable itself, so I adjusted that as well. The below returns rows, but I did not verify it is what you desired. This should give you a start at least.
select a.*, b.*
from cas_nummers, -- changed
xmltable
XMLNamespaces(default 'http://echa.europa.eu/schemas/ecInventory'), -- added
'/ECSubstanceInventory/ecSubstances/ECSubstance'
passing cas_nummers.object_value -- changed
columns
creationDate varchar2(30) path '@creationDate',
status varchar2(20) path '@status',
ecnumber varchar2(20) path 'ecNumber', -- changed
casnumber varchar2(20) path 'casNumber', -- changed
molecularFormula varchar2(20) path 'molecularFormula', -- changed
namelist xmltype path 'ecNames' -- changed
) a,
xmltable
XMLNamespaces(default 'http://echa.europa.eu/schemas/ecInventory'), -- added
'ecNames' -- changed
passing a.namelist
columns
ecName varchar2(5) path '.' -- changed
) bI have replace the file test1.xml with a file of about 28MB with about 100000 records(say casNumbers)
I have created a table with the columns
casnumber,ecnumber, molecularformula, cname, status,creationdate
I use the following plsql anohymous block
declare
cursor c0
is
select to_date(substr(a.creationdate,1,10),'yyyy-mm-dd') as creationdate
, decode(a.status,'active',1,0) as status
, a.ecnumber
, a.casnumber
, a.molecularformula
, b.ecName
from cas_nummers,
xmltable
XMLNamespaces(default 'http://echa.europa.eu/schemas/ecInventory'),
'/ECSubstanceInventory/ecSubstances/ECSubstance'
passing cas_nummers.object_value
columns
creationDate varchar2(30) path '@creationDate',
status varchar2(20) path '@status',
ecnumber varchar2(20) path 'ecNumber',
casnumber varchar2(20) path 'casNumber',
molecularFormula varchar2(20) path 'molecularFormula',
namelist xmltype path 'ecNames'
) a,
xmltable
XMLNamespaces(default 'http://echa.europa.eu/schemas/ecInventory'), -- added
'ecNames' -- changed
passing a.namelist
columns
ecname varchar2(50) path '.' -- changed
) b
v_errm varchar2(200);
i pls_integer;
begin
for r_casnr in c0 loop
begin
insert into csa_cas_nummers
( casnummer_id
, cas_nummer
, ec_nummer
, stofnaam
, moleculair_formule
, dd_creatie
, status
, volgnr
values
(csa_cas_nummers_seq.nextval
,r_casnr.casnumber
,r_casnr.ecnumber
, trim(r_casnr.stofnaam)
, r_casnr.molecularformula
, r_casnr.creationdate
, r_casnr.status
, 0
if mod(i,1000) = 0 then
commit;
end if;
exception
when others then
v_errm :=substr(sqlerrm,1,200);
insert into foute_casnrs
values
(r_casnr.casnumber, v_errm);
dbms_output.put_line( r_casnr.casnumber||' -> '||sqlerrm);
end;
end loop;
end;
The strange thing is that it stops without error at +/- 38000 records (when i do select count(1) from cas_nummers and no records in the table foute_casnrs)
I have check some cas_nummers which is in the xml file but not in the tcas_nummers table....
Any idea what is wrong with my code?
Thanks in advance,
Henk -
I have a query that I precompile and invoke after setting a variable. I do this to precompile the queries in stored modules as precompiling the queries would seem like a good performance enhancement.
When I invoke the query, it returns in 122168.68 (ms)! Obviously this is not ideal so in an attempt to diagnose the problem, I run the same query (same code) but I hard code the value instead of using a variable. Much to my pleasure, the query returned in 17.039 (ms). Now this is great! The only problem is that we can’t go into production with a hard coded variable. Are there any ideas how I can work around this (or better, if there is a patch)?
The code is as follows (the element names have been changed to protect the innocent):
import java.io.File;
import com.sleepycat.db.Environment;
import com.sleepycat.db.EnvironmentConfig;
import com.sleepycat.dbxml.XmlContainer;
import com.sleepycat.dbxml.XmlManager;
import com.sleepycat.dbxml.XmlManagerConfig;
import com.sleepycat.dbxml.XmlQueryContext;
import com.sleepycat.dbxml.XmlQueryExpression;
import com.sleepycat.dbxml.XmlResults;
import com.sleepycat.dbxml.XmlValue;
class BDBTest
public BDBTest()
XmlContainer container;
Environment dbEnv;
XmlManager dbManager;
XmlQueryContext queryContext;
EnvironmentConfig envConf;
XmlManagerConfig managerConfig;
String xquery;
XmlQueryExpression xmlQueryExpression;
container = null;
dbEnv = null;
dbManager = null;
try
envConf = new EnvironmentConfig();
envConf.setAllowCreate(true);
envConf.setInitializeCache(true);
envConf.setInitializeLocking(true);
envConf.setInitializeLogging(true);
envConf.setTransactional(true);
dbEnv = new Environment(new File("/opt/db"), envConf);
managerConfig = new XmlManagerConfig();
managerConfig.setAdoptEnvironment(true);
managerConfig.setAllowAutoOpen(true);
dbManager = new XmlManager(dbEnv, managerConfig);
dbManager.setDefaultContainerType(XmlContainer.NodeContainer);
container = dbManager.openContainer("db/test");
queryContext = dbManager.createQueryContext();
queryContext.setEvaluationType(XmlQueryContext.Eager);
queryContext.setVariableValue("contentKey", new XmlValue("AlexUserhttp://mydomain.org/ns/1.0/some/test/value/HERE"));
// This query is very slow.
xquery = "declare namespace tf = \"http://mydomain.org/ns/0.1/test/format\"; " +
"count (collection('db/test')/Record[ContentKey=$contentKey]/tf:TestDocument)";
// This query is very fast.
// xquery = "declare namespace tf = \"http://mydomain.org/ns/0.1/test/format\"; " +
// "count (collection('db/test')/Record[ContentKey=\"AlexUserhttp://mydomain.org/ns/1.0/some/test/value/HERE\"]/tf:TestDocument)";
xmlQueryExpression = dbManager.prepare(xquery, queryContext);
String qPlan = xmlQueryExpression.getQueryPlan();
System.out.println("--------------------------------------------------");
System.out.println(qPlan);
System.out.println("--------------------------------------------------");
long ns0 = System.nanoTime();
XmlResults results = xmlQueryExpression.execute(queryContext);
long ns1 = System.nanoTime() - ns0;
double ms1 = (double) ns1 / 1000000;
String message = "Found ";
message += results.size() + " documents for query: '";
message += xquery + " Time to execute: " + ms1 + " (ms)\n";
System.out.println(message);
System.out.println(results.next().asNumber());
catch (Exception e)
e.printStackTrace(System.err);
public static void main(String args[]) throws Throwable
new BDBTest();
}The query plans are as follows:
SLOW:
<XQuery>
<Function name="{http://www.w3.org/2005/xpath-functions}:count">
<DocumentOrder>
<DbXmlNav>
<LookupIndex container="db/test">
<ValueQP index="edge-element-equality-string" operation="eq" parent="Record" child="ContentKey">
<Variable name="contentKey"/>
</ValueQP>
</LookupIndex>
<Join type="parent-of-child" return="argument">
<DbXmlNav>
<QueryPlanFunction result="collection" container="db/test">
<OQPlan>P(edge-element-equality-string,prefix,Record.ContentKey)</OQPlan>
</QueryPlanFunction>
<DbXmlStep axis="child" name="Record" nodeType="element"/>
</DbXmlNav>
</Join>
<DbXmlStep axis="child" prefix="tf" uri="http://mydomain.org/ns/0.1/test/format" name="TestDocument" nodeType="element"/>
</DbXmlNav>
</DocumentOrder>
</Function>
</XQuery>
Found 1 documents for query: 'declare namespace tf = "http://mydomain.org/ns/0.1/test/format"; count (collection('db/test')/Record[ContentKey=$contentKey]/tf:TestDocument) Time to execute: 122168.68 (msFAST:
<XQuery>
<Function name="{http://www.w3.org/2005/xpath-functions}:count">
<DocumentOrder>
<DbXmlNav>
<LookupIndex container="db/test">
<ValueQP index="edge-element-equality-string" operation="eq" parent="Record" child="ContentKey" value="AlexUserhttp://mydomain.org/ns/1.0/some/test/value/HERE"/>
</LookupIndex>
<Join type="parent-of-child" return="argument">
<DbXmlNav>
<QueryPlanFunction result="collection" container="db/test">
<OQPlan>V(edge-element-equality-string,Record.ContentKey,=,'AlexUserhttp:// ://mydomain.org/ns/1.0/some/test/value/HERE')</OQPlan>
</QueryPlanFunction>
<DbXmlStep axis="child" name="Record" nodeType="element"/>
</DbXmlNav>
</Join>
<DbXmlStep axis="child" prefix="tf" uri="http://mydomain.org/ns/0.1/test/format" name="TestDocument" nodeType="element"/>
</DbXmlNav>
</DocumentOrder>
</Function>
</XQuery>
Found 1 documents for query: 'declare namespace tf = "http://mydomain.org/ns/0.1/test/format"; count (collection('db/test')/Record[ContentKey="AlexUserhttp://mydomain.org/ns/1.0/some/test/value/HERE"]/tf:TestDocument) Time to execute: 17.039 (ms)We’re using Java with BDB 2.3.10 on CentOs 64 bit (but see this in other environments as well). I’m happy to give any more info.
Thank you for your help,
AlexHi Alex,
I've just answered this question here:
Re: XQuery Performance in BerkeleyDB
John
Maybe you are looking for
-
Can I add forms or text boxes to a pdf in Acrobat that can then be edited in Reader?
Hi, I'm trying to add forms to a pre-existing pdf in Acrobat which can be saved then filled with text in Adobe Reader. So far, I've managed to add two forms to my document in Acrobat, but when I enter text into one, it automatically copies into the o
-
Can someone tell me where can I download DVD Player for my X61. I like Intervideo but need to pay. My X61 does not have DVD Player. Window Media Player cannot play some of my DVD format.
-
Unexpected XMP_Error Invalid UTF-8 data byte
Hi all, I'm using the XMP SDK 4.4.2 and with it the sample application "xmpcommands". This one is extremely helpful - but it fails when using files that have Umlauts(öäü) in the name. When calling, i.e. xmpcommand.exe get c:\Fileöäü.jpg the command r
-
:clock; Hello, I was wondering what the general consensus was regarding CS4 and if there were any compelling reasons to upgrade from CS3? For me, $599 is a hefty price to upgrade and one that I can't really afford unless there are compelling reasons/
-
Any body know how to when export excel make that the columns in my xls file look the size needed to show the all data???? Sorry about my english but I just started to work with a multinational corporation that ask, post, and answer in this language.