Insert Speed

Hello.
I have a db which has 5M records and has 2 secondary associated dbs.
ENV uses 800M cache.
I have checked speed of insertion, and how long took to populate every 100,000 records.
before 3M records, it took about 10 sec. to insert every 100,000 records.
I think that's very nice.
But after 3M inserted, it took about 100 sec. to insert every 100,000 records.
overall, it took 20 minutes to insert 5M records.
Is there any solution improving performance without having more cache?
Thanks.
ps.
* Primary DB's file size is 4.9GB
* 2 Secondary DB's file size is 700MB
* I'm not using transactions.
Result of Insert Test
inserted rows seconds to insert 100,000 records
100,000      5.5
200,000      4.5
300,000      5.6
400,000      5.8
500,000      5.4
600,000      6.7
700,000      5.2
800,000      9.5
900,000      8.1
1,000,000      10
1,100,000      10.8
1,200,000      9.9
1,300,000      10.7
1,400,000      11
1,500,000      11.8
1,600,000      9.6
1,700,000      10.6
1,800,000      10.9
1,900,000      12.2
2,000,000      11
2,100,000      10.8
2,200,000      10.9
2,300,000      11.7
2,400,000      11.1
2,500,000      13.9
2,600,000      10.7
2,700,000      10.6
2,800,000      11.4
2,900,000      11.3
3,000,000      24.2
3,100,000      45.4
3,200,000      53
3,300,000      38
3,400,000      59.4
3,500,000      81.8
3,600,000      83.8
3,700,000      95.6
3,800,000      79
3,900,000      75.8
4,000,000      80.9
4,100,000      98
4,200,000      117.8
4,300,000      110
4,400,000      96
4,500,000      82
4,600,000      101
4,700,000      104
4,800,000      109
4,900,000      110
4,931,099      20
Message was edited by:
wertyu
Message was edited by:
wertyu
Message was edited by:
wertyu

Similar Messages

Record insert speed! help me

I use SQLServer 2000.In order to increase insert speed,I use batch method.But the result is too bad.
Only 2000 records are inserted at one min.I think I use batch in a wrong way,anyone could give me a sample about it?The refrence is welcome.Thank you.
By the way,how many records could be inserted in one min.
Thanks for reading.

Insert speed is dictated by:
- Network speed
- If the database must reparse the SQL for each insert statement
- If the inserting is logged or not
- If there are triggers and constraints in the tables affected by the insert
- If there are a lot of indexes in the tables affected by the insert.
If you really need to insert a lot of lines in a batch, use bcp or DTS.
The executeBatch methods (JDBC 3.0) alleviate the influence of the two first factors (network speed and reparsing SQL statements). The other factors can not be reduced using pure JDBC.

Oracle insert speed (92k records)

I am inserting 92,000 records into oracle (nightly job)
I am just trying to speed things up (is my first C#/Oracle app)
at the moment, I have 2 ways of inserting the records
using Oracle.DataAccess.Client;
using Oracle.DataAccess.Types;
1) command.CommandText = "INSERT into B2BE (ledger,sku,descr,Price,PriceT,PriceP,PricePS,PricePX,unitom,Brandname ) VALUES ('" + myLedger + "','" + mySku + "' ,'" + myDescrip + "', " + myRetail + "," + myTrade + " , " + myPP + " , " + myPPS + "," + myPPX + " ,'" + myUnitom + "' ,'" + myBrandname + "' )";
command.ExecuteNonQuery();
2) command.ArrayBindCount = maxArray;
command.CommandText = "INSERT into B2BE (ledger,sku,descr,Price,PriceT,PriceP,PricePS,PricePX,unitom,Brandname ) VALUES (:p_Ledger , :p_Sku , :p_Descrip , :p_Retail , :p_Trade , :p_PP , :p_PPS , :p_PPX , :p_Unitom , :p_Brandname )";
OracleParameter prm2 = new OracleParameter("p_Ledger", OracleDbType.Varchar2);
prm2.Direction = ParameterDirection.Input;
prm2.Value = a_myLedger;
prm2.Size = maxArray;
command.Parameters.Add(prm2);
etc etc...
Both ways work fine, but I was trying to work out a way of speeding up the process
at the moment, option 1) takes 10 mins to process 92k records doing an insert for each record, (in a loop) and option 2) takes 7 mins to process 2 blocks of 60k records then 32k records
are these speeds acceptable? or am I doing things totally wrong
(application attacks old foxpro tables using codebase, & process's them all to get ready to push into oracle that process takes all of 45 seconds)... yes codebase is insane.
Thanks
-Chris

Hi,
I'm not sure what, but I'd say "you're doing something wrong", unless you have Network issues or something perhaps.
Inserting 60,000 records, using the following code, takes 3 seconds on my system (the database is local however, so I have minimal network delay). How long does this code take on your system?
I'm using 10.2.0.2.20 ODP, 10.2.0.3 client/database for what it's worth.
Cheers
Greg
TABLE
======
create table bulkttab(col0 number,col1 varchar2(4000), col2 varchar2(4000), col3 varchar2(4000),
col4 varchar2(4000),col5 varchar2(4000),col6 varchar2(4000),col7 varchar2(4000),
col8 varchar2(4000),col9 varchar2(4000));
CODE
=========
private static void arraybind()
string connectStr = "User Id=scott;Password=tiger;Data Source=orcl";
int size = 60000;
int[] myArrayofNums = new int[size];
string[] myArrayofV2s = new string[size];
for (int i = 0; i < size; i++)
myArrayofNums[i] = i;
myArrayofV2s[i] = "abcdefghijklmnopqrstuvwxyz";
OracleConnection connection = new OracleConnection(connectStr);
OracleCommand command = new OracleCommand("insert into bulkttab values(:0,:1,:2,:3,:4,:5,:6,:7,:8,:9)", connection);
command.ArrayBindCount = size;
OracleParameter numParam = new OracleParameter("param2", OracleDbType.Int32);
numParam.Direction = ParameterDirection.Input;
numParam.Value = myArrayofNums;
command.Parameters.Add(numParam);
for (int i = 1; i < 10; i++)
OracleParameter v2param = new OracleParameter("", OracleDbType.Varchar2);
v2param.Direction = ParameterDirection.Input;
v2param.Value = myArrayofV2s;
command.Parameters.Add(v2param);
connection.Open();
DateTime start = DateTime.Now;
command.ExecuteNonQuery();
DateTime stop = DateTime.Now;
Console.WriteLine("{0} records inserted in {1} seconds", size,(stop - start));
connection.Close();
command.Dispose();
connection.Dispose();
OUTPUT
========
60000 records inserted in 00:00:02.2656250 seconds

Tablespace design reflects the insert speed?

I have a not complicated insert statement which inserts about 15,000,000 rows, effects 2 tables to select and 1 to insert. No blocks, locks, etc.
When I run it on development server it took about 8 hours to complete.
On the test server, it run for 2 days and did not finish - the speed is about 40000 records per 1 hour. The same number of records, the same indexes and I did compute the statistics before the process.
Oracle 9.2.4, Sunsolaris - the same patches for operating system and Oracle.
The tablespaces parameters though are different...
What could I look at to improve the performance?
Thanks a lot.

Solution: 1) BULK insert
Open c_emp;
Loop
EXIT when c_emp%notfound;
fetch c_emp BULK COLLECT into v_type_tab LIMIT 1000;
End Loop;
Solution: 2) insert through Select
Insert into insert_table_name select * from select_table_name;

Delete all record in a table, the insert speed is not change.

I have an empty table, and i insert a record need 100ms,
when this table has 40,0000 record, i insert a record need 1s, this is ok, because i need do a compare based an index before insert a record, so more record, need more time.
The problem is when i delete all record in this table, the insert time is still 1s, not reduce to 100ms.Why?

Hello,
Read through this portion of oracle documentation
http://download.oracle.com/docs/cd/B19306_01/server.102/b14220/logical.htm#CNCPT004
The reason is still taking 1s, because when you inserted 400K record because HWM (The high water mark is the boundary between used and unused space in a segment.) moved up. And when you deleted all the records your HWM still at the same marker and didn't get to reset to 0. So when you insert 1 record it lookings for free space and after finding (generally a regular inserts got 6 steps it inserts data). If you truncate your table you and try it again it will be faster as your HWM is reset to 0.
Regards

How to speed insert my 1000000 records into the database?

my code like:
<cfloop from="1" to="#inserteddb.getrecordcount()#"
index="x">

<cfquery datasource="#cfdsn#" name="insertdata">
insert into inputtest (#InsertFieldList#)
values (
<cfqueryparam value="#InsertValueList#"
cfsqltype="cf_sql_varchar" list="yes">
</cfquery>
</cfloop>
The test inserts 100,000 records, has spend I 30 minutes
time,but I have 1,000,000 record to insert , is there any way to
enhance the insertion speed?
Thanks a lot.

By removing ColdFusion from the process as much as possible.
Where is the 'insertedDB' data coming from? It looks to be a
record set?
Are you moving data from one data source to another? If so,
some DBMS
have the ability to insert an entire record set in one step.
I do not
have the exact syntax at my finger tips, but I have done
something like
this in the past with Oracle. INSERT INTO aTable SELECT FROM
bTable.
Are you building a record set from a text file such as CSV?
If so, many
DBMS have the ability to do 'bulk' inserts from such text
files and CF
does not even need to be involved.
As you can see, knowing exactly what are you working with
will help us
provide suggestions an how to improve your process.

INSERT of one record takes 40 seconds; That is much too long;

Oracle 8.1.7
Tablestructure:
10 Fields; 3 numeric; 7 Varchar2(10 Bytes, 30, 5, 230, 4, 7, 15);
The Table contains 3.200.000 records. It was filled with sqlldr. The Fields are mostly filled. The 230-byte field mostly with 40 bytes.
A numeric primary key and a numeric foreign key exist. The foreign key without oracles referential integrity.
The INSERT of one record takes approximately 40 seconds.
Oracle and the database are standard configured.
Computer:
Windows NT 2000
2 INTEL - Processors 500 MHz;
RAM: 500 MB
1 Disk ULTRA ATA/66 data transfer rate: 10 - 20 MB/sec
mean access time 9 ms
Any recommendations to increase the INSERT speed are welcome.

Hi,
The INSERT of one record takes approximately 40
seconds.Yes, it seems slow, but if you have only one disk and only 500 Mb memory, I wonder if this 40s is slow comparing to another operation like "switching logfile", "starting the database", ...
It also depends how many indexes should maintained, if this time is constant, if the tablespace have enough extent, ...
The tuning of a database is always difficult, the most important is to find the biggest bottelneck. Start by reading the tuning guide, get your ratio, analyze your performance, and see what the problem is (IO/memory/swapping/poor sql/...)
With only one disk, it is normal to have contention between the log writter, the database writter, the oracle software, the operating system. So your ratio should be read with care! Do not tune everywhere at the same time ! Research the cause, then react appriopriatly.
Regards
Laurent

Slow inserts into partitioned table

I am having trouble inserting into a simple partitioned table after an upgrade to 11.2.0.3. I'm seeing insert speeds of subsecond up to 10 and 12 seconds. We have pre created the partitions for this table (and all children via reference partitioning). We have gathered dictionary and static object stats as well as statistics on all partitions.
Queries against the dictionary are incredibly slow as well and showing very high io.
Any help would be greatly appreciated. Thank you for your time.
Windows 2008 advanced server
Oracle Enterprise edition 11.2.0.3
Edited by: user593549 on Mar 26, 2012 11:16 AM

user593549 wrote:
I am having trouble inserting into a simple partitioned table after an upgrade to 11.2.0.3. I'm seeing insert speeds of subsecond up to 10 and 12 seconds. We have pre created the partitions for this table (and all children via reference partitioning). We have gathered dictionary and static object stats as well as statistics on all partitions.
Queries against the dictionary are incredibly slow as well and showing very high io.
Any help would be greatly appreciated. Thank you for your time.
Windows 2008 advanced server
Oracle Enterprise edition 11.2.0.3
Edited by: user593549 on Mar 26, 2012 11:16 AMThread: HOW TO: Post a SQL statement tuning request - template posting
HOW TO: Post a SQL statement tuning request - template posting

Redo log tuning - improving insert rate

Dear experts!
We've an OLTP system which produces large amount of data. After each record written to our 11.2 database (standard edition) a commit is performed (the system architecture can't be changed - for example to commit every 10th record).
So how can we speed up the insert process? As the database in front of the system gets "mirrored" to our datawarehouse system it is running in NOARCHIVE mode. I've already tried placing the redo log files on SSD disks which speeded up the insert process.
Another idea is putting the table on a seperate tablespace with NOLOGGING option. What do you think about this?
Further more I heard about tuning the redo latches parameter. Does anyone have information about this way?
I would be grateful for any information!
Thanks
Markus

We've an OLTP system which produces large amount of data. After each record written to our 11.2 database (standard edition) a commit is >>performed (the system architecture can't be changed - for example to commit every 10th record).Doing commit after each insert (or other DML command) doesn't means that dbwriter process is actually writing this data immediately in db files.
DBWriter process is using an internal algorithm to decide where to apply changes to db files. You can adjust the writing frequency into db files by using "fast_start_mttr_target" parameter.
So how can we speed up the insert process? As the database in front of the system gets "mirrored" to our datawarehouse system it is running >>in NOARCHIVE mode. I've already tried placing the redo log files on SSD disks which speeded up the insert process.Placing the redo log files on SSD disks is indeed a good action. Also you can check buffer cache hit rate and size. Also stripping for filesystems where redo files resides should be taken into account.
Another idea is putting the table on a seperate tablespace with NOLOGGING option. What do you think about this?It's an extremely bad idea. NOLOGGING option for a tablespace will lead to an unrecovearble tablespace and as I stated on first sentence will not increase the insert speed.
Further more I heard about tuning the redo latches parameter. Does anyone have information about this way?I don't think you need this.
Better check indexes associated with tables where you insert data. Are they analyzed regularly, are all of them used indeed (many indexes are created for some queries but after a while they are left unused but at each DML all indexes are updated as well).

Performance of insert with spatial index

I'm writing a test that inserts (using OCI) 10,000 2D point geometries (gtype=2001) into a table with a single SDO_GEOMETRY column. I wrote the code doing the insert before setting up the index on the spatial column, thus I was aware of the insert speed (almost instantaneous) without a spatial index (with layer_gtype=POINT), and noticed immediately the performance drop with the index (> 10 seconds).
Here's the raw timing data of 3 runs in each 3 configuration (the clock ticks every 14 or 15 or 16 ms, thus the zero when it completes before the next tick):
                                   truncate execute commit
no spatial index                     0.016   0.171   0.016
no spatial index                     0.031   0.172   0.000
no spatial index                     0.031   0.204   0.000
index (1000 default for batch size) 0.141 10.937   1.547
index (1000 default for batch size) 0.094 11.125   1.531
index (1000 default for batch size) 0.094 10.937   1.610
index SDO_DML_BATCH_SIZE=10000       0.203 11.234   0.359
index SDO_DML_BATCH_SIZE=10000       0.094 10.828   0.344
index SDO_DML_BATCH_SIZE=10000       0.078 10.844   0.359As you can see, I played with SDO_DML_BATCH_SIZE to change the default of 1,000 to 10,000, which does improve the commit speed a bit from 1.5s to 0.35s (pretty good when you only look at these numbers...), but the shocking part of the almost 11s the inserts are now taking, compared to 0.2s without an index: that's a 50x drop in peformance!!!
I've looked at my table in SQL Developer, and it has no triggers associated, although there has to be something to mark the index as dirty so that it updates itself on commit.
So where is coming the huge overhead during the insert???
(by insert I mean the time OCIStmtExecute takes to run the array-bind of 10,000 points. It's exactly the same code with or without an index).
Can anyone explain the 50x insert performance drop?
Any suggestion on how to improve the performance of this scenario?
To provide another data point, creating the index itself on a populated table (with the same 10,000 points) takes less than 1 second, which is consistent with the commit speeds I'm seeing, and thus puzzles me all the more regarding this 10s insert overhead...
SQL> set timing on
SQL> select count(*) from within_point_distance_tab;
COUNT(*)
     10000
Elapsed: 00:00:00.01
SQL> CREATE INDEX with6CDF1526$point$idx
2            ON within_point_distance_tab(point)
3    INDEXTYPE IS MDSYS.SPATIAL_INDEX
4    PARAMETERS ('layer_gtype=POINT');
Index created.
Elapsed: 00:00:00.96
SQL> drop index WITH6CDF1526$POINT$IDX force;
Index dropped.
Elapsed: 00:00:00.57
SQL> CREATE INDEX with6CDF1526$point$idx
2            ON within_point_distance_tab(point)
3    INDEXTYPE IS MDSYS.SPATIAL_INDEX
4    PARAMETERS ('layer_gtype=POINT SDO_DML_BATCH_SIZE=10000');
Index created.
Elapsed: 00:00:00.98
SQL>

Thanks for your input. We are likely to use partioning down the line, but what you are describing (partition exchange) is currently beyond my abilities in plain SQL, and how this could be accomplished from an OCI client application without affecting other users and keep the transaction boundaries sounds far from trivial. (i.e. can it made transparent to the client application, and does it require privileges the client does have???). I'll have to investigate this further though, and this technique sounds like one accessible to a DBA only, not from a plain client app with non-privileged credentials.
The thing that I fail to understand though, despite your explanation, is why the slow down is not entirely on the commit. After all, documentation for the SDO_DML_BATCH_SIZE parameter of the Spatial index implies that the index is updated on commit only, where new rows are fed 1,000 or 10,000 at a time to the indexing engine, and I do see time being spent during commit, but it's the geometry insert that slow down the most, and that to me looks quite strange.
It's so much slower that it's as if each geometry was indexed one at a time, when I'm doing a single insert with an array bind (i.e. equivalent to a bulk operation in PL/SQL), and if so much time is spend during the insert, then why is any time spent during the commit. In my opinion it's one or the other, but not both. What am I missing? --DD

Spatial Insert Performance

I'm running 9.2.0.3EE on W2K.
Ran some simple performance tests...
With a simple non-spatial table (id, lat, lon), I can get inserts up around 12,000 records per second.
I setup a similar table for use with spatial:
CREATE TABLE test2 (
id number not null,
location MDSYS.SDO_GEOMETRY not null,
constraint pk_test2 primary key (id)
When there is no spatial index, I can get about 10,000 inserts per second, similar to the non-spatial table.
After adding a spatial index, performance drops to 135 inserts/second. Thats about 2 orders of magnitude different. Am I doing something radically wrong here, or is this typical with this product?
Here is the index setup (RTREE Geodetic):
INSERT INTO USER_SDO_GEOM_METADATA
VALUES (
'test2',
'location',
MDSYS.SDO_DIM_ARRAY(
MDSYS.SDO_DIM_ELEMENT('Longitude', -180, 180, 10),
MDSYS.SDO_DIM_ELEMENT('Latitude', -90, 90, 10)
8307 -- SRID for 'Lon/Lat WGS84 coordinate system
commit;
CREATE INDEX test2_spatial_idx
ON test2(location)
INDEXTYPE IS MDSYS.SPATIAL_INDEX
PARAMETERS('LAYER_GTYPE=POINT');
Any pointers are appreciated!
thanks,
--Peter

Hi,
Recent testing of 10g on HP 4640 hardware (linux itanium, 1.5 Ghz processors, good disks) yielded insert rates of over 1300 points per second (single process insert rate).
Features were put into 10g to enable this increase in performance. On other hardware (testing 9iR2 vs. 10g), 10g was better than 2x as fast as 9iR2. I didn't have an older version of Oracle on this machine, so I couldn't compare insert speeds.

How to Speed Up inset and Update?

I have a script which WRITES records from one table to
another .... and table has records more than 600,000, can
anybody advise me how to speed up this process so that its takes
less amount of time ... it takes about 3 hours to insert records
into another table ..... kindly advise

One of the main factors affecting the insert speed wil be the existance of index and constraints on the target table. If you can disable constraints and drop indexes before the insert, then enable/recreate them afterwards you might find it more efficient. In particular you can create the indexes in parallel, and also do the insert n parallel.
Are there already rows of data in the target table?

After upgrade to 11.2.0.2 , SQL2008 Insert data to Oracle slow using OLEDB

Hi All,
If any body hit the same issue as the following case ?
We have a job in SQL2008 Insert data to Oracle 11g using OLEDB Linked Server.
Previously in 9.2.0.8 & 11.2.0.1 version , the insert speed is very fast .
But after we upgrade oracle to 11.2.0.2 , the insert speed drop down a lot , maybe 1min to 10min .....
Could any body give any idea ?
Best Regards
ChiaChan

From 10046 trace file , we found the time spent on PARSE !
Could anyone hit the same issue at 11.2.0.2 version ?
Please HELP ! HELP !

Max DB insertion rate numbers

Hi All,
What is maximum DB insertion rate numbers Oracle can handle. Is there any default Oracle DB insertion rate we will be able to achieve with the given version.
Thanks

Dj3 wrote:
Hi All,
What is maximum DB insertion rate numbers Oracle can handle. Is there any default Oracle DB insertion rate we will be able to achieve with the given version.AFAIK, There is no such limit provided by Oracle. It will depend on your CPU, I/O speed, Storage etc.
please check
http://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:2108717300346363018
Re: Oracle insert speed
Regards
Rajesh

BTREE and duplicate data items : over 300 people read this,nobody answers?

I have a btree consisting of keys (a 4 byte integer) - and data (a 8 byte integer).
Both integral values are "most significant byte (MSB) first" since BDB does key compression, though I doubt there is much to compress with such small key size. But MSB also allows me to use the default lexical order for comparison and I'm cool with that.
The special thing about it is that with a given key, there can be a LOT of associated data, thousands to tens of thousands. To illustrate, a btree with a 8192 byte page size has 3 levels, 0 overflow pages and 35208 duplicate pages!
In other words, my keys have a large "fan-out". Note that I wrote "can", since some keys only have a few dozen or so associated data items.
So I configure the b-tree for DB_DUPSORT. The default lexical ordering with set_dup_compare is OK, so I don't touch that. I'm getting the data items sorted as a bonus, but I don't need that in my application.
However, I'm seeing very poor "put (DB_NODUPDATA) performance", due to a lot of disk read operations.
While there may be a lot of reasons for this anomaly, I suspect BDB spends a lot of time tracking down duplicate data items.
I wonder if in my case it would be more efficient to have a b-tree with as key the combined (4 byte integer, 8 byte integer) and a zero-length or 1-length dummy data (in case zero-length is not an option).
I would loose the ability to iterate with a cursor using DB_NEXT_DUP but I could simulate it using DB_SET_RANGE and DB_NEXT, checking if my composite key still has the correct "prefix". That would be a pain in the butt for me, but still workable if there's no other solution.
Another possibility would be to just add all the data integers as a single big giant data blob item associated with a single (unique) key. But maybe this is just doing what BDB does... and would probably exchange "duplicate pages" for "overflow pages"
Or, the slowdown is a BTREE thing and I could use a hash table instead. In fact, what I don't know is how duplicate pages influence insertion speed. But the BDB source code indicates that in contrast to BTREE the duplicate search in a hash table is LINEAR (!!!) which is a no-no (from hash_dup.c):
     while (i < hcp->dup_tlen) {
          memcpy(&len, data, sizeof(db_indx_t));
          data += sizeof(db_indx_t);
          DB_SET_DBT(cur, data, len);
          * If we find an exact match, we're done. If in a sorted
          * duplicate set and the item is larger than our test item,
          * we're done. In the latter case, if permitting partial
          * matches, it's not a failure.
          *cmpp = func(dbp, dbt, &cur);
          if (*cmpp == 0)
               break;
          if (*cmpp < 0 && dbp->dup_compare != NULL) {
               if (flags == DB_GET_BOTH_RANGE)
                    *cmpp = 0;
               break;
What's the expert opinion on this subject?
Vincent
Message was edited by:
user552628

Hi,
The special thing about it is that with a given key,
there can be a LOT of associated data, thousands to
tens of thousands. To illustrate, a btree with a 8192
byte page size has 3 levels, 0 overflow pages and
35208 duplicate pages!
In other words, my keys have a large "fan-out". Note
that I wrote "can", since some keys only have a few
dozen or so associated data items.
So I configure the b-tree for DB_DUPSORT. The default
lexical ordering with set_dup_compare is OK, so I
don't touch that. I'm getting the data items sorted
as a bonus, but I don't need that in my application.
However, I'm seeing very poor "put (DB_NODUPDATA)
performance", due to a lot of disk read operations.In general, the performance would slowly decreases when there are a lot of duplicates associated with a key. For the Btree access method lookups and inserts have a O(log n) complexity (which implies that the search time is dependent on the number of keys stored in the underlying db tree). When doing put's with DB_NODUPDATA leaf pages have to be searched in order to determine whether the data is not a duplicate. Thus, giving the fact that for each given key (in most of the cases) there is a large number of data items associated (up to thousands, tens of thousands) an impressive amount of pages have to be brought into the cache to check against the duplicate criteria.
Of course, the problem of sizing the cache and databases's pages arises here. Your size setting for these measures should tend to large values, this way the cache would be fit to accommodate large pages (in which hundreds of records should be hosted).
Setting the cache and the page size to their ideal values is a process of experimenting.
http://www.oracle.com/technology/documentation/berkeley-db/db/ref/am_conf/pagesize.html
http://www.oracle.com/technology/documentation/berkeley-db/db/ref/am_conf/cachesize.html
While there may be a lot of reasons for this anomaly,
I suspect BDB spends a lot of time tracking down
duplicate data items.
I wonder if in my case it would be more efficient to
have a b-tree with as key the combined (4 byte
integer, 8 byte integer) and a zero-length or
1-length dummy data (in case zero-length is not an
option). Indeed, these should be the best alternative, but testing must be done first. Try this approach and provide us with feedback.
You can have records with a zero-length data portion.
Also, you could provide more information on whether or not you're using an environment, if so, how did you configure it etc. Have you thought of using multiple threads to load the data ?
Another possibility would be to just add all the
data integers as a single big giant data blob item
associated with a single (unique) key. But maybe this
is just doing what BDB does... and would probably
exchange "duplicate pages" for "overflow pages"This is a terrible approach since bringing an overflow page into the cache is more time consuming than bringing a regular page, and thus performance penalty results. Also, processing the entire collection of keys and data implies more work from a programming point of view.
Or, the slowdown is a BTREE thing and I could use a
hash table instead. In fact, what I don't know is how
duplicate pages influence insertion speed. But the
BDB source code indicates that in contrast to BTREE
the duplicate search in a hash table is LINEAR (!!!)
which is a no-no (from hash_dup.c):The Hash access method has, as you observed, a linear search (and thus a search time and lookup time proportional to the number of items in the buckets, O(1)). Combined with the fact that you don't want duplicate data than hash using the hash access method may not improve performance.
This is a performance/tunning problem and it involves a lot of resources from our part to investigate. If you have a support contract with Oracle, then please don't hesitate to put up your issue on Metalink or indicate that you want this issue to be taken in private, and we will create an SR for you.
Regards,
Andrei

Insert Speed

Similar Messages

Maybe you are looking for