Insert Speed

Hello.
I have a db which has 5M records and has 2 secondary associated dbs.
ENV uses 800M cache.
I have checked speed of insertion, and how long took to populate every 100,000 records.
before 3M records, it took about 10 sec. to insert every 100,000 records.
I think that's very nice.
But after 3M inserted, it took about 100 sec. to insert every 100,000 records.
overall, it took 20 minutes to insert 5M records.
Is there any solution improving performance without having more cache?
Thanks.
ps.
* Primary DB's file size is 4.9GB
* 2 Secondary DB's file size is 700MB
* I'm not using transactions.
Result of Insert Test
inserted rows seconds to insert 100,000 records
100,000      5.5
200,000      4.5
300,000      5.6
400,000      5.8
500,000      5.4
600,000      6.7
700,000      5.2
800,000      9.5
900,000      8.1
1,000,000      10
1,100,000      10.8
1,200,000      9.9
1,300,000      10.7
1,400,000      11
1,500,000      11.8
1,600,000      9.6
1,700,000      10.6
1,800,000      10.9
1,900,000      12.2
2,000,000      11
2,100,000      10.8
2,200,000      10.9
2,300,000      11.7
2,400,000      11.1
2,500,000      13.9
2,600,000      10.7
2,700,000      10.6
2,800,000      11.4
2,900,000      11.3
3,000,000      24.2
3,100,000      45.4
3,200,000      53
3,300,000      38
3,400,000      59.4
3,500,000      81.8
3,600,000      83.8
3,700,000      95.6
3,800,000      79
3,900,000      75.8
4,000,000      80.9
4,100,000      98
4,200,000      117.8
4,300,000      110
4,400,000      96
4,500,000      82
4,600,000      101
4,700,000      104
4,800,000      109
4,900,000      110
4,931,099      20
Message was edited by:
wertyu
Message was edited by:
wertyu
Message was edited by:
wertyu

Hello.
I have a db which has 5M records and has 2 secondary
associated dbs.
ENV uses 800M cache.
I have checked speed of insertion, and how long took
to populate every 100,000 records.
before 3M records, it took about 10 sec. to insert
every 100,000 records.
I think that's very nice.
But after 3M inserted, it took about 100 sec. to
insert every 100,000 records.You did not mention what hardware you run on. I can give you some numbers that I'm seeing on my MacBook Pro (2.4 Ghz, slow laptop disk)
My records are 40 bytes and my keys are 20 bytes. With two secondary indexes I'm inserting those with about 20.000 records per second. The insert speed is pretty constant, whether I'm testing with 200.000 or 10.000.000 million records.
The one thing that is completely killing performance is doing an import from some dump file on the same filesystem that ie being read record by record. If I do that then I see your speed pattern, first few go fast, then it drops.
What worked very well for me was to read all or large chunks of import files into memory. Then there is mostly write access to the database files and nothing else. Speeds up many times.
S.

Similar Messages

  • Record insert speed! help me

    I use SQLServer 2000.In order to increase insert speed,I use batch method.But the result is too bad.
    Only 2000 records are inserted at one min.I think I use batch in a wrong way,anyone could give me a sample about it?The refrence is welcome.Thank you.
    By the way,how many records could be inserted in one min.
    Thanks for reading.

    Insert speed is dictated by:
    - Network speed
    - If the database must reparse the SQL for each insert statement
    - If the inserting is logged or not
    - If there are triggers and constraints in the tables affected by the insert
    - If there are a lot of indexes in the tables affected by the insert.
    If you really need to insert a lot of lines in a batch, use bcp or DTS.
    The executeBatch methods (JDBC 3.0) alleviate the influence of the two first factors (network speed and reparsing SQL statements). The other factors can not be reduced using pure JDBC.

  • Oracle insert speed (92k records)

    I am inserting 92,000 records into oracle (nightly job)
    I am just trying to speed things up (is my first C#/Oracle app)
    at the moment, I have 2 ways of inserting the records
    using Oracle.DataAccess.Client;
    using Oracle.DataAccess.Types;
    1) command.CommandText = "INSERT into B2BE (ledger,sku,descr,Price,PriceT,PriceP,PricePS,PricePX,unitom,Brandname ) VALUES ('" + myLedger + "','" + mySku + "' ,'" + myDescrip + "', " + myRetail + "," + myTrade + " , " + myPP + " , " + myPPS + "," + myPPX + " ,'" + myUnitom + "' ,'" + myBrandname + "' )";
    command.ExecuteNonQuery();
    2) command.ArrayBindCount = maxArray;
    command.CommandText = "INSERT into B2BE (ledger,sku,descr,Price,PriceT,PriceP,PricePS,PricePX,unitom,Brandname ) VALUES (:p_Ledger , :p_Sku , :p_Descrip , :p_Retail , :p_Trade , :p_PP , :p_PPS , :p_PPX , :p_Unitom , :p_Brandname )";
    OracleParameter prm2 = new OracleParameter("p_Ledger", OracleDbType.Varchar2);
    prm2.Direction = ParameterDirection.Input;
    prm2.Value = a_myLedger;
    prm2.Size = maxArray;
    command.Parameters.Add(prm2);
    etc etc...
    Both ways work fine, but I was trying to work out a way of speeding up the process
    at the moment, option 1) takes 10 mins to process 92k records doing an insert for each record, (in a loop) and option 2) takes 7 mins to process 2 blocks of 60k records then 32k records
    are these speeds acceptable? or am I doing things totally wrong
    (application attacks old foxpro tables using codebase, & process's them all to get ready to push into oracle that process takes all of 45 seconds)... yes codebase is insane.
    Thanks
    -Chris

    Hi,
    I'm not sure what, but I'd say "you're doing something wrong", unless you have Network issues or something perhaps.
    Inserting 60,000 records, using the following code, takes 3 seconds on my system (the database is local however, so I have minimal network delay). How long does this code take on your system?
    I'm using 10.2.0.2.20 ODP, 10.2.0.3 client/database for what it's worth.
    Cheers
    Greg
    TABLE
    ======
    create table bulkttab(col0 number,col1 varchar2(4000), col2 varchar2(4000), col3 varchar2(4000),
    col4 varchar2(4000),col5 varchar2(4000),col6 varchar2(4000),col7 varchar2(4000),
    col8 varchar2(4000),col9 varchar2(4000));
    CODE
    =========
    private static void arraybind()
    string connectStr = "User Id=scott;Password=tiger;Data Source=orcl";
    int size = 60000;
    int[] myArrayofNums = new int[size];
    string[] myArrayofV2s = new string[size];
    for (int i = 0; i < size; i++)
    myArrayofNums[i] = i;
    myArrayofV2s[i] = "abcdefghijklmnopqrstuvwxyz";
    OracleConnection connection = new OracleConnection(connectStr);
    OracleCommand command = new OracleCommand("insert into bulkttab values(:0,:1,:2,:3,:4,:5,:6,:7,:8,:9)", connection);
    command.ArrayBindCount = size;
    OracleParameter numParam = new OracleParameter("param2", OracleDbType.Int32);
    numParam.Direction = ParameterDirection.Input;
    numParam.Value = myArrayofNums;
    command.Parameters.Add(numParam);
    for (int i = 1; i < 10; i++)
    OracleParameter v2param = new OracleParameter("", OracleDbType.Varchar2);
    v2param.Direction = ParameterDirection.Input;
    v2param.Value = myArrayofV2s;
    command.Parameters.Add(v2param);
    connection.Open();
    DateTime start = DateTime.Now;
    command.ExecuteNonQuery();
    DateTime stop = DateTime.Now;
    Console.WriteLine("{0} records inserted in {1} seconds", size,(stop - start));
    connection.Close();
    command.Dispose();
    connection.Dispose();
    OUTPUT
    ========
    60000 records inserted in 00:00:02.2656250 seconds

  • Tablespace design reflects the insert speed?

    I have a not complicated insert statement which inserts about 15,000,000 rows, effects 2 tables to select and 1 to insert. No blocks, locks, etc.
    When I run it on development server it took about 8 hours to complete.
    On the test server, it run for 2 days and did not finish - the speed is about 40000 records per 1 hour. The same number of records, the same indexes and I did compute the statistics before the process.
    Oracle 9.2.4, Sunsolaris - the same patches for operating system and Oracle.
    The tablespaces parameters though are different...
    What could I look at to improve the performance?
    Thanks a lot.

    Solution: 1) BULK insert
    Open c_emp;
    Loop
    EXIT when c_emp%notfound;
    fetch c_emp BULK COLLECT into v_type_tab LIMIT 1000;
    End Loop;
    Solution: 2) insert through Select
    Insert into insert_table_name select * from select_table_name;

  • Delete all record in a table, the insert speed is not change.

    I have an empty table, and i insert a record need 100ms,
    when this table has 40,0000 record, i insert a record need 1s, this is ok, because i need do a compare based an index before insert a record, so more record, need more time.
    The problem is when i delete all record in this table, the insert time is still 1s, not reduce to 100ms.Why?

    Hello,
    Read through this portion of oracle documentation
    http://download.oracle.com/docs/cd/B19306_01/server.102/b14220/logical.htm#CNCPT004
    The reason is still taking 1s, because when you inserted 400K record because HWM (The high water mark is the boundary between used and unused space in a segment.) moved up. And when you deleted all the records your HWM still at the same marker and didn't get to reset to 0. So when you insert 1 record it lookings for free space and after finding (generally a regular inserts got 6 steps it inserts data). If you truncate your table you and try it again it will be faster as your HWM is reset to 0.
    Regards

  • How to speed insert my 1000000 records into the database?

    my code like:
    <cfloop from="1" to="#inserteddb.getrecordcount()#"
    index="x">
    <!----
    Here make the InsertFieldList and InsertValueList
    --->
    <cfquery datasource="#cfdsn#" name="insertdata">
    insert into inputtest (#InsertFieldList#)
    values (
    <cfqueryparam value="#InsertValueList#"
    cfsqltype="cf_sql_varchar" list="yes">
    </cfquery>
    </cfloop>
    The test inserts 100,000 records, has spend I 30 minutes
    time,but I have 1,000,000 record to insert , is there any way to
    enhance the insertion speed?
    Thanks a lot.

    By removing ColdFusion from the process as much as possible.
    Where is the 'insertedDB' data coming from? It looks to be a
    record set?
    Are you moving data from one data source to another? If so,
    some DBMS
    have the ability to insert an entire record set in one step.
    I do not
    have the exact syntax at my finger tips, but I have done
    something like
    this in the past with Oracle. INSERT INTO aTable SELECT FROM
    bTable.
    Are you building a record set from a text file such as CSV?
    If so, many
    DBMS have the ability to do 'bulk' inserts from such text
    files and CF
    does not even need to be involved.
    As you can see, knowing exactly what are you working with
    will help us
    provide suggestions an how to improve your process.

  • INSERT of one record takes 40 seconds; That is much too long;

    Oracle 8.1.7
    Tablestructure:
    10 Fields; 3 numeric; 7 Varchar2(10 Bytes, 30, 5, 230, 4, 7, 15);
    The Table contains 3.200.000 records. It was filled with sqlldr. The Fields are mostly filled. The 230-byte field mostly with 40 bytes.
    A numeric primary key and a numeric foreign key exist. The foreign key without oracles referential integrity.
    The INSERT of one record takes approximately 40 seconds.
    Oracle and the database are standard configured.
    Computer:
    Windows NT 2000
    2 INTEL - Processors 500 MHz;
    RAM: 500 MB
    1 Disk ULTRA ATA/66 data transfer rate: 10 - 20 MB/sec
    mean access time 9 ms
    Any recommendations to increase the INSERT speed are welcome.

    Hi,
    The INSERT of one record takes approximately 40
    seconds.Yes, it seems slow, but if you have only one disk and only 500 Mb memory, I wonder if this 40s is slow comparing to another operation like "switching logfile", "starting the database", ...
    It also depends how many indexes should maintained, if this time is constant, if the tablespace have enough extent, ...
    The tuning of a database is always difficult, the most important is to find the biggest bottelneck. Start by reading the tuning guide, get your ratio, analyze your performance, and see what the problem is (IO/memory/swapping/poor sql/...)
    With only one disk, it is normal to have contention between the log writter, the database writter, the oracle software, the operating system. So your ratio should be read with care! Do not tune everywhere at the same time ! Research the cause, then react appriopriatly.
    Regards
    Laurent

  • Slow inserts into partitioned table

    I am having trouble inserting into a simple partitioned table after an upgrade to 11.2.0.3. I'm seeing insert speeds of subsecond up to 10 and 12 seconds. We have pre created the partitions for this table (and all children via reference partitioning). We have gathered dictionary and static object stats as well as statistics on all partitions.
    Queries against the dictionary are incredibly slow as well and showing very high io.
    Any help would be greatly appreciated. Thank you for your time.
    Windows 2008 advanced server
    Oracle Enterprise edition 11.2.0.3
    Edited by: user593549 on Mar 26, 2012 11:16 AM

    user593549 wrote:
    I am having trouble inserting into a simple partitioned table after an upgrade to 11.2.0.3. I'm seeing insert speeds of subsecond up to 10 and 12 seconds. We have pre created the partitions for this table (and all children via reference partitioning). We have gathered dictionary and static object stats as well as statistics on all partitions.
    Queries against the dictionary are incredibly slow as well and showing very high io.
    Any help would be greatly appreciated. Thank you for your time.
    Windows 2008 advanced server
    Oracle Enterprise edition 11.2.0.3
    Edited by: user593549 on Mar 26, 2012 11:16 AMThread: HOW TO: Post a SQL statement tuning request - template posting
    HOW TO: Post a SQL statement tuning request - template posting

  • Redo log tuning - improving insert rate

    Dear experts!
    We've an OLTP system which produces large amount of data. After each record written to our 11.2 database (standard edition) a commit is performed (the system architecture can't be changed - for example to commit every 10th record).
    So how can we speed up the insert process? As the database in front of the system gets "mirrored" to our datawarehouse system it is running in NOARCHIVE mode. I've already tried placing the redo log files on SSD disks which speeded up the insert process.
    Another idea is putting the table on a seperate tablespace with NOLOGGING option. What do you think about this?
    Further more I heard about tuning the redo latches parameter. Does anyone have information about this way?
    I would be grateful for any information!
    Thanks
    Markus

    We've an OLTP system which produces large amount of data. After each record written to our 11.2 database (standard edition) a commit is >>performed (the system architecture can't be changed - for example to commit every 10th record).Doing commit after each insert (or other DML command) doesn't means that dbwriter process is actually writing this data immediately in db files.
    DBWriter process is using an internal algorithm to decide where to apply changes to db files. You can adjust the writing frequency into db files by using "fast_start_mttr_target" parameter.
    So how can we speed up the insert process? As the database in front of the system gets "mirrored" to our datawarehouse system it is running >>in NOARCHIVE mode. I've already tried placing the redo log files on SSD disks which speeded up the insert process.Placing the redo log files on SSD disks is indeed a good action. Also you can check buffer cache hit rate and size. Also stripping for filesystems where redo files resides should be taken into account.
    Another idea is putting the table on a seperate tablespace with NOLOGGING option. What do you think about this?It's an extremely bad idea. NOLOGGING option for a tablespace will lead to an unrecovearble tablespace and as I stated on first sentence will not increase the insert speed.
    Further more I heard about tuning the redo latches parameter. Does anyone have information about this way?I don't think you need this.
    Better check indexes associated with tables where you insert data. Are they analyzed regularly, are all of them used indeed (many indexes are created for some queries but after a while they are left unused but at each DML all indexes are updated as well).

  • Performance of insert with spatial index

    I'm writing a test that inserts (using OCI) 10,000 2D point geometries (gtype=2001) into a table with a single SDO_GEOMETRY column. I wrote the code doing the insert before setting up the index on the spatial column, thus I was aware of the insert speed (almost instantaneous) without a spatial index (with layer_gtype=POINT), and noticed immediately the performance drop with the index (> 10 seconds).
    Here's the raw timing data of 3 runs in each 3 configuration (the clock ticks every 14 or 15 or 16 ms, thus the zero when it completes before the next tick):
                                       truncate execute commit
    no spatial index                     0.016   0.171   0.016
    no spatial index                     0.031   0.172   0.000
    no spatial index                     0.031   0.204   0.000
    index (1000 default for batch size)  0.141  10.937   1.547
    index (1000 default for batch size)  0.094  11.125   1.531
    index (1000 default for batch size)  0.094  10.937   1.610
    index SDO_DML_BATCH_SIZE=10000       0.203  11.234   0.359
    index SDO_DML_BATCH_SIZE=10000       0.094  10.828   0.344
    index SDO_DML_BATCH_SIZE=10000       0.078  10.844   0.359As you can see, I played with SDO_DML_BATCH_SIZE to change the default of 1,000 to 10,000, which does improve the commit speed a bit from 1.5s to 0.35s (pretty good when you only look at these numbers...), but the shocking part of the almost 11s the inserts are now taking, compared to 0.2s without an index: that's a 50x drop in peformance!!!
    I've looked at my table in SQL Developer, and it has no triggers associated, although there has to be something to mark the index as dirty so that it updates itself on commit.
    So where is coming the huge overhead during the insert???
    (by insert I mean the time OCIStmtExecute takes to run the array-bind of 10,000 points. It's exactly the same code with or without an index).
    Can anyone explain the 50x insert performance drop?
    Any suggestion on how to improve the performance of this scenario?
    To provide another data point, creating the index itself on a populated table (with the same 10,000 points) takes less than 1 second, which is consistent with the commit speeds I'm seeing, and thus puzzles me all the more regarding this 10s insert overhead...
    SQL> set timing on
    SQL> select count(*) from within_point_distance_tab;
      COUNT(*)
         10000
    Elapsed: 00:00:00.01
    SQL> CREATE INDEX with6CDF1526$point$idx
      2            ON within_point_distance_tab(point)
      3    INDEXTYPE IS MDSYS.SPATIAL_INDEX
      4    PARAMETERS ('layer_gtype=POINT');
    Index created.
    Elapsed: 00:00:00.96
    SQL> drop index WITH6CDF1526$POINT$IDX force;
    Index dropped.
    Elapsed: 00:00:00.57
    SQL> CREATE INDEX with6CDF1526$point$idx
      2            ON within_point_distance_tab(point)
      3    INDEXTYPE IS MDSYS.SPATIAL_INDEX
      4    PARAMETERS ('layer_gtype=POINT SDO_DML_BATCH_SIZE=10000');
    Index created.
    Elapsed: 00:00:00.98
    SQL>

    Thanks for your input. We are likely to use partioning down the line, but what you are describing (partition exchange) is currently beyond my abilities in plain SQL, and how this could be accomplished from an OCI client application without affecting other users and keep the transaction boundaries sounds far from trivial. (i.e. can it made transparent to the client application, and does it require privileges the client does have???). I'll have to investigate this further though, and this technique sounds like one accessible to a DBA only, not from a plain client app with non-privileged credentials.
    The thing that I fail to understand though, despite your explanation, is why the slow down is not entirely on the commit. After all, documentation for the SDO_DML_BATCH_SIZE parameter of the Spatial index implies that the index is updated on commit only, where new rows are fed 1,000 or 10,000 at a time to the indexing engine, and I do see time being spent during commit, but it's the geometry insert that slow down the most, and that to me looks quite strange.
    It's so much slower that it's as if each geometry was indexed one at a time, when I'm doing a single insert with an array bind (i.e. equivalent to a bulk operation in PL/SQL), and if so much time is spend during the insert, then why is any time spent during the commit. In my opinion it's one or the other, but not both. What am I missing? --DD                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   

  • Spatial Insert Performance

    I'm running 9.2.0.3EE on W2K.
    Ran some simple performance tests...
    With a simple non-spatial table (id, lat, lon), I can get inserts up around 12,000 records per second.
    I setup a similar table for use with spatial:
    CREATE TABLE test2 (
    id number not null,
    location MDSYS.SDO_GEOMETRY not null,
    constraint pk_test2 primary key (id)
    When there is no spatial index, I can get about 10,000 inserts per second, similar to the non-spatial table.
    After adding a spatial index, performance drops to 135 inserts/second. Thats about 2 orders of magnitude different. Am I doing something radically wrong here, or is this typical with this product?
    Here is the index setup (RTREE Geodetic):
    INSERT INTO USER_SDO_GEOM_METADATA
    VALUES (
    'test2',
    'location',
    MDSYS.SDO_DIM_ARRAY(
    MDSYS.SDO_DIM_ELEMENT('Longitude', -180, 180, 10),
    MDSYS.SDO_DIM_ELEMENT('Latitude', -90, 90, 10)
    8307 -- SRID for 'Lon/Lat WGS84 coordinate system
    commit;
    CREATE INDEX test2_spatial_idx
    ON test2(location)
    INDEXTYPE IS MDSYS.SPATIAL_INDEX
    PARAMETERS('LAYER_GTYPE=POINT');
    Any pointers are appreciated!
    thanks,
    --Peter

    Hi,
    Recent testing of 10g on HP 4640 hardware (linux itanium, 1.5 Ghz processors, good disks) yielded insert rates of over 1300 points per second (single process insert rate).
    Features were put into 10g to enable this increase in performance. On other hardware (testing 9iR2 vs. 10g), 10g was better than 2x as fast as 9iR2. I didn't have an older version of Oracle on this machine, so I couldn't compare insert speeds.

  • How to Speed Up inset and Update?

    I have a script which WRITES records from one table to
    another .... and table has records more than 600,000, can
    anybody advise me how to speed up this process so that its takes
    less amount of time ... it takes about 3 hours to insert records
    into another table ..... kindly advise

    One of the main factors affecting the insert speed wil be the existance of index and constraints on the target table. If you can disable constraints and drop indexes before the insert, then enable/recreate them afterwards you might find it more efficient. In particular you can create the indexes in parallel, and also do the insert n parallel.
    Are there already rows of data in the target table?

  • After upgrade to 11.2.0.2 , SQL2008 Insert data to Oracle slow using OLEDB

    Hi All,
    If any body hit the same issue as the following case ?
    We have a job in SQL2008 Insert data to Oracle 11g using OLEDB Linked Server.
    Previously in 9.2.0.8 & 11.2.0.1 version , the insert speed is very fast .
    But after we upgrade oracle to 11.2.0.2 , the insert speed drop down a lot , maybe 1min to 10min .....
    Could any body give any idea ?
    Best Regards
    ChiaChan

    From 10046 trace file , we found the time spent on PARSE !
    Could anyone hit the same issue at 11.2.0.2 version ?
    Please HELP ! HELP !

  • Max DB insertion rate numbers

    Hi All,
    What is maximum DB insertion rate numbers Oracle can handle. Is there any default Oracle DB insertion rate we will be able to achieve with the given version.
    Thanks

    Dj3 wrote:
    Hi All,
    What is maximum DB insertion rate numbers Oracle can handle. Is there any default Oracle DB insertion rate we will be able to achieve with the given version.AFAIK, There is no such limit provided by Oracle. It will depend on your CPU, I/O speed, Storage etc.
    please check
    http://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:2108717300346363018
    Re: Oracle insert speed
    Regards
    Rajesh

  • BTREE and duplicate data items : over 300 people read this,nobody answers?

    I have a btree consisting of keys (a 4 byte integer) - and data (a 8 byte integer).
    Both integral values are "most significant byte (MSB) first" since BDB does key compression, though I doubt there is much to compress with such small key size. But MSB also allows me to use the default lexical order for comparison and I'm cool with that.
    The special thing about it is that with a given key, there can be a LOT of associated data, thousands to tens of thousands. To illustrate, a btree with a 8192 byte page size has 3 levels, 0 overflow pages and 35208 duplicate pages!
    In other words, my keys have a large "fan-out". Note that I wrote "can", since some keys only have a few dozen or so associated data items.
    So I configure the b-tree for DB_DUPSORT. The default lexical ordering with set_dup_compare is OK, so I don't touch that. I'm getting the data items sorted as a bonus, but I don't need that in my application.
    However, I'm seeing very poor "put (DB_NODUPDATA) performance", due to a lot of disk read operations.
    While there may be a lot of reasons for this anomaly, I suspect BDB spends a lot of time tracking down duplicate data items.
    I wonder if in my case it would be more efficient to have a b-tree with as key the combined (4 byte integer, 8 byte integer) and a zero-length or 1-length dummy data (in case zero-length is not an option).
    I would loose the ability to iterate with a cursor using DB_NEXT_DUP but I could simulate it using DB_SET_RANGE and DB_NEXT, checking if my composite key still has the correct "prefix". That would be a pain in the butt for me, but still workable if there's no other solution.
    Another possibility would be to just add all the data integers as a single big giant data blob item associated with a single (unique) key. But maybe this is just doing what BDB does... and would probably exchange "duplicate pages" for "overflow pages"
    Or, the slowdown is a BTREE thing and I could use a hash table instead. In fact, what I don't know is how duplicate pages influence insertion speed. But the BDB source code indicates that in contrast to BTREE the duplicate search in a hash table is LINEAR (!!!) which is a no-no (from hash_dup.c):
         while (i < hcp->dup_tlen) {
              memcpy(&len, data, sizeof(db_indx_t));
              data += sizeof(db_indx_t);
              DB_SET_DBT(cur, data, len);
              * If we find an exact match, we're done. If in a sorted
              * duplicate set and the item is larger than our test item,
              * we're done. In the latter case, if permitting partial
              * matches, it's not a failure.
              *cmpp = func(dbp, dbt, &cur);
              if (*cmpp == 0)
                   break;
              if (*cmpp < 0 && dbp->dup_compare != NULL) {
                   if (flags == DB_GET_BOTH_RANGE)
                        *cmpp = 0;
                   break;
    What's the expert opinion on this subject?
    Vincent
    Message was edited by:
    user552628

    Hi,
    The special thing about it is that with a given key,
    there can be a LOT of associated data, thousands to
    tens of thousands. To illustrate, a btree with a 8192
    byte page size has 3 levels, 0 overflow pages and
    35208 duplicate pages!
    In other words, my keys have a large "fan-out". Note
    that I wrote "can", since some keys only have a few
    dozen or so associated data items.
    So I configure the b-tree for DB_DUPSORT. The default
    lexical ordering with set_dup_compare is OK, so I
    don't touch that. I'm getting the data items sorted
    as a bonus, but I don't need that in my application.
    However, I'm seeing very poor "put (DB_NODUPDATA)
    performance", due to a lot of disk read operations.In general, the performance would slowly decreases when there are a lot of duplicates associated with a key. For the Btree access method lookups and inserts have a O(log n) complexity (which implies that the search time is dependent on the number of keys stored in the underlying db tree). When doing put's with DB_NODUPDATA leaf pages have to be searched in order to determine whether the data is not a duplicate. Thus, giving the fact that for each given key (in most of the cases) there is a large number of data items associated (up to thousands, tens of thousands) an impressive amount of pages have to be brought into the cache to check against the duplicate criteria.
    Of course, the problem of sizing the cache and databases's pages arises here. Your size setting for these measures should tend to large values, this way the cache would be fit to accommodate large pages (in which hundreds of records should be hosted).
    Setting the cache and the page size to their ideal values is a process of experimenting.
    http://www.oracle.com/technology/documentation/berkeley-db/db/ref/am_conf/pagesize.html
    http://www.oracle.com/technology/documentation/berkeley-db/db/ref/am_conf/cachesize.html
    While there may be a lot of reasons for this anomaly,
    I suspect BDB spends a lot of time tracking down
    duplicate data items.
    I wonder if in my case it would be more efficient to
    have a b-tree with as key the combined (4 byte
    integer, 8 byte integer) and a zero-length or
    1-length dummy data (in case zero-length is not an
    option). Indeed, these should be the best alternative, but testing must be done first. Try this approach and provide us with feedback.
    You can have records with a zero-length data portion.
    Also, you could provide more information on whether or not you're using an environment, if so, how did you configure it etc. Have you thought of using multiple threads to load the data ?
    Another possibility would be to just add all the
    data integers as a single big giant data blob item
    associated with a single (unique) key. But maybe this
    is just doing what BDB does... and would probably
    exchange "duplicate pages" for "overflow pages"This is a terrible approach since bringing an overflow page into the cache is more time consuming than bringing a regular page, and thus performance penalty results. Also, processing the entire collection of keys and data implies more work from a programming point of view.
    Or, the slowdown is a BTREE thing and I could use a
    hash table instead. In fact, what I don't know is how
    duplicate pages influence insertion speed. But the
    BDB source code indicates that in contrast to BTREE
    the duplicate search in a hash table is LINEAR (!!!)
    which is a no-no (from hash_dup.c):The Hash access method has, as you observed, a linear search (and thus a search time and lookup time proportional to the number of items in the buckets, O(1)). Combined with the fact that you don't want duplicate data than hash using the hash access method may not improve performance.
    This is a performance/tunning problem and it involves a lot of resources from our part to investigate. If you have a support contract with Oracle, then please don't hesitate to put up your issue on Metalink or indicate that you want this issue to be taken in private, and we will create an SR for you.
    Regards,
    Andrei

Maybe you are looking for

  • AC adapter question T400

    Can I use the charger from my T61 on my T400? T61 charger is also 90W 20V. Are these compatible with both 110V and 220V? Just plug it in?

  • There are too mant TREX Errors

    Hello Experts, I have created an Index for a specific folder in KM. Under KM--> Sys Conf --> KM --> Index Administration it shows the green led, which I believe that the data sources are completely indexed. But, under the KM--> Monitoring > KM> Index

  • How to Copy Customized Partner From One Opprtunity to Another in CRM

    Hi experts, How to save more than one same type partner(with same partner FCT) in CRM opportunity Application? I am using following Function Module... CALL FUNCTION 'CRM_ORDER_MAINTAIN' EXPORTING it_partner = lt_partner_com CHANGING ct_input_fields =

  • Webcam blackmail via skype

    i have been a fool and caught out on skype web cam were the girl on the other end in gahna has taken pictures of me and is now blackmailing me for money or she will put them in the ---- news and post them on sites on the internet she goes by the name

  • DR4-A TDK DVD-R Writing problem...

    Hello,  I just got this drive a few weeks ago and have been using TDK 4x DVD+R media and it has burned every single time without any error's.  Then when I go to burn something on a TDK 2x DVD-R I get this error: 12/9/2003  6:04:50 PM    |01|     Job