BTREE and duplicate data items : over 300 people read this,nobody answers?

I have a btree consisting of keys (a 4 byte integer) - and data (a 8 byte integer).
Both integral values are "most significant byte (MSB) first" since BDB does key compression, though I doubt there is much to compress with such small key size. But MSB also allows me to use the default lexical order for comparison and I'm cool with that.
The special thing about it is that with a given key, there can be a LOT of associated data, thousands to tens of thousands. To illustrate, a btree with a 8192 byte page size has 3 levels, 0 overflow pages and 35208 duplicate pages!
In other words, my keys have a large "fan-out". Note that I wrote "can", since some keys only have a few dozen or so associated data items.
So I configure the b-tree for DB_DUPSORT. The default lexical ordering with set_dup_compare is OK, so I don't touch that. I'm getting the data items sorted as a bonus, but I don't need that in my application.
However, I'm seeing very poor "put (DB_NODUPDATA) performance", due to a lot of disk read operations.
While there may be a lot of reasons for this anomaly, I suspect BDB spends a lot of time tracking down duplicate data items.
I wonder if in my case it would be more efficient to have a b-tree with as key the combined (4 byte integer, 8 byte integer) and a zero-length or 1-length dummy data (in case zero-length is not an option).
I would loose the ability to iterate with a cursor using DB_NEXT_DUP but I could simulate it using DB_SET_RANGE and DB_NEXT, checking if my composite key still has the correct "prefix". That would be a pain in the butt for me, but still workable if there's no other solution.
Another possibility would be to just add all the data integers as a single big giant data blob item associated with a single (unique) key. But maybe this is just doing what BDB does... and would probably exchange "duplicate pages" for "overflow pages"
Or, the slowdown is a BTREE thing and I could use a hash table instead. In fact, what I don't know is how duplicate pages influence insertion speed. But the BDB source code indicates that in contrast to BTREE the duplicate search in a hash table is LINEAR (!!!) which is a no-no (from hash_dup.c):
     while (i < hcp->dup_tlen) {
          memcpy(&len, data, sizeof(db_indx_t));
          data += sizeof(db_indx_t);
          DB_SET_DBT(cur, data, len);
          * If we find an exact match, we're done. If in a sorted
          * duplicate set and the item is larger than our test item,
          * we're done. In the latter case, if permitting partial
          * matches, it's not a failure.
          *cmpp = func(dbp, dbt, &cur);
          if (*cmpp == 0)
               break;
          if (*cmpp < 0 && dbp->dup_compare != NULL) {
               if (flags == DB_GET_BOTH_RANGE)
                    *cmpp = 0;
               break;
What's the expert opinion on this subject?
Vincent
Message was edited by:
user552628

Hi,
The special thing about it is that with a given key,
there can be a LOT of associated data, thousands to
tens of thousands. To illustrate, a btree with a 8192
byte page size has 3 levels, 0 overflow pages and
35208 duplicate pages!
In other words, my keys have a large "fan-out". Note
that I wrote "can", since some keys only have a few
dozen or so associated data items.
So I configure the b-tree for DB_DUPSORT. The default
lexical ordering with set_dup_compare is OK, so I
don't touch that. I'm getting the data items sorted
as a bonus, but I don't need that in my application.
However, I'm seeing very poor "put (DB_NODUPDATA)
performance", due to a lot of disk read operations.In general, the performance would slowly decreases when there are a lot of duplicates associated with a key. For the Btree access method lookups and inserts have a O(log n) complexity (which implies that the search time is dependent on the number of keys stored in the underlying db tree). When doing put's with DB_NODUPDATA leaf pages have to be searched in order to determine whether the data is not a duplicate. Thus, giving the fact that for each given key (in most of the cases) there is a large number of data items associated (up to thousands, tens of thousands) an impressive amount of pages have to be brought into the cache to check against the duplicate criteria.
Of course, the problem of sizing the cache and databases's pages arises here. Your size setting for these measures should tend to large values, this way the cache would be fit to accommodate large pages (in which hundreds of records should be hosted).
Setting the cache and the page size to their ideal values is a process of experimenting.
http://www.oracle.com/technology/documentation/berkeley-db/db/ref/am_conf/pagesize.html
http://www.oracle.com/technology/documentation/berkeley-db/db/ref/am_conf/cachesize.html
While there may be a lot of reasons for this anomaly,
I suspect BDB spends a lot of time tracking down
duplicate data items.
I wonder if in my case it would be more efficient to
have a b-tree with as key the combined (4 byte
integer, 8 byte integer) and a zero-length or
1-length dummy data (in case zero-length is not an
option). Indeed, these should be the best alternative, but testing must be done first. Try this approach and provide us with feedback.
You can have records with a zero-length data portion.
Also, you could provide more information on whether or not you're using an environment, if so, how did you configure it etc. Have you thought of using multiple threads to load the data ?
Another possibility would be to just add all the
data integers as a single big giant data blob item
associated with a single (unique) key. But maybe this
is just doing what BDB does... and would probably
exchange "duplicate pages" for "overflow pages"This is a terrible approach since bringing an overflow page into the cache is more time consuming than bringing a regular page, and thus performance penalty results. Also, processing the entire collection of keys and data implies more work from a programming point of view.
Or, the slowdown is a BTREE thing and I could use a
hash table instead. In fact, what I don't know is how
duplicate pages influence insertion speed. But the
BDB source code indicates that in contrast to BTREE
the duplicate search in a hash table is LINEAR (!!!)
which is a no-no (from hash_dup.c):The Hash access method has, as you observed, a linear search (and thus a search time and lookup time proportional to the number of items in the buckets, O(1)). Combined with the fact that you don't want duplicate data than hash using the hash access method may not improve performance.
This is a performance/tunning problem and it involves a lot of resources from our part to investigate. If you have a support contract with Oracle, then please don't hesitate to put up your issue on Metalink or indicate that you want this issue to be taken in private, and we will create an SR for you.
Regards,
Andrei

Similar Messages

  • HT1692 i synced my iphone and have now lost all my contacts and calendar data. How do I get this back?

    I synced my iphone and have now lost all of my contacts and calendar data. How do I get this back?

    It should all be on your computer in whatever program you have been syncing.
    Sync it back.

  • TS3276 Most attachments from a specific address (work) can not be opened, identified as winmail.data and no data available. Not sure if this is based on the e-mail coming from a windows based system, too old of a windows based system or simply how i have

    Most attachments from a specific address (work) can not be opened, identified as winmail.data and no data available. Not sure if this is based on the e-mail coming from a windows based system, too old of a windows based system or simply how i have it set on my Mac.

    Brightbleu wrote:
    Most attachments from a specific address (work) can not be opened, identified as winmail.data and no data available. Not sure if this is based on the e-mail coming from a windows based system, too old of a windows based system or simply how i have it set on my Mac.
    Winmail.data are not usable files. They just preserve RTF within the message.
    http://support.microsoft.com/kb/278061
    Cheers
    Pete

  • After burning CD tried to print jeweled case insert and playlist is typed over. Never had this problem until most recent update to newest version of Itunes

    After burning CD tried to print jeweled case insert and playlist is typed over. Never had this problem until most recent update to newest version of Itunes

    Most likely you have Office 2004 which are PPC-only applications and will not work in Lion. Upgrade to Office 2011. Other alternatives are:
    Apple's iWork suite (Pages, Numbers, and Keynote.)
    Open Office (Office 2007-like suite compatible with OS X.)
    NeoOffice (similar to Open Office.)
    LibreOffice (a new direction for the Open Office suite.)

  • Help!! I was trying to delete an album from iTunes and I accidentally deleted over 300 songs, idk what the **** I did

    I thought I was ust deleting one album from iTunes but I must have accidently hit the shift button or something becasue it deleted over 300 songs and now I can't get them out of the trash and back into iTunes, any suggestions? Be warned I am "technoligically ********!"

    Are the songs still found in the finder -> Music -> iTunes -> iTunes Media -> Music?
    If so, you might be able to import them again from there, else you can try to move the songs from your trash to an other folder.

  • Photos quit unexpectedly and I then lost over 300 pictures

    I was using Photos to edit pictures.  The program "quit unexpectedly" (I received the error message).  When I reopened the program, over 300 photos had been deleted.  Where are they? Can I get them back?

    I was working with iPhoto downloading pictures
    Do you mean importing photos? If so what were you importing from? If it is a flash drive why not just re-import from it? Where did you scan them?
    As to backups - you really need to always have your backup running - without good backups you will lose all photos sooner or later
    LN

  • Adding a new field and new data item breaks layout

    I am adding a new field to a subform and populating it form a .Dat unfortunatly adding this field is causing data outputed to the subform below it to shift down one line on the page when outputted.
    This new field is outputted at the top of the form and and the line being shifted are below this.
    Below is the field details and the field as it is in the .dat file.
    Field:F_PO_NUMBER [1]                                
    Lines: 1  Characters: 20  Angle: 0
      In Subform
    : Main_Page
      Options  
    : Global
    ^Command JFPAGE_START
    ^Undefine global:~^PAGE 1
    ^define global:SEQUENCE \pic"NUM999999",@D:NUMPAGESPRINTED../\pic"NUM99",@$PAGE../\pic"NUM99",@:JFPAGE_PAGE_OF..
    ^define global:JFPAGE \pic"NUM99",@$PAGE.. of \pic"NUM99",@:JFPAGE_PAGE_OF..
    ^GLOBAL Email COLL_TEL_NUM~12345678~^REFORMAT OFF
    ^GLOBAL F_CURRENCY_CODE
    GBP
    ^FIELD F_PO_NUMBER
    ^COMMENT R_remit_customer
    ^FIELD F_REMIT_TO_CONCATENATED
    Address
    ^COMMENT M_INVOICE_HDR_BOX1
    ^GLOBAL F_TRX_TYPE
    Invoice
    Now the weird party is if I move the new field in the .dat as below it fixes things and the everything lines up like it should do.
    ^Command JFPAGE_START
    ^Undefine global:~^PAGE 1
    ^define global:SEQUENCE \pic"NUM999999",@D:NUMPAGESPRINTED../\pic"NUM99",@$PAGE../\pic"NUM99",@:JFPAGE_PAGE_OF..
    ^define global:JFPAGE \pic"NUM99",@$PAGE.. of \pic"NUM99",@:JFPAGE_PAGE_OF..
    ^GLOBAL Email COLL_TEL_NUM~12345678~^REFORMAT OFF
    ^GLOBAL F_CURRENCY_CODE
    GBP
    ^COMMENT R_remit_customer
    ^FIELD F_REMIT_TO_CONCATENATED
    Address
    ^COMMENT M_INVOICE_HDR_BOX1
    ^FIELD F_PO_NUMBER
    ^GLOBAL F_TRX_TYPE
    Invoice
    ^FIELD F_TRX_NUMBER
    I hoping someone might have an idea of why this is the case as I'm at a loss regarding why its happening.

    I may not hit on it exactly, but this sounds like Adobe's processing order.
    It wants to process the fields in the order they are in the DAT file.
    I have a similar problem with an overflow field, that isnt last in the DAT file,
    Read this here:
    http://forums.adobe.com/message/2298283#2298283
    Now that may not be exactly your issue, but I think its the same inherent problem,
    Adobes processing order for the fields/subforms.
    I am still trying to resolve my issue, but found the ^field command in the reference manual (Print Agent),
    and am wondering if I can use it.

  • When I "save page as" I also get a folder with gif images, jscript script files and other similar items. how can I stop this.

    When I "save page as" via the file button at the top edge of the page, I also get a folder containing gif images, jscript script files and other similar items. I am not is allowed to delete it unless I also delete the page I need. How can I stop this from happening. is it the way I've configured firefox perhaps.
    == since I installed firefox

    Make sure that you have selected "Web Page, complete" to save the page.

  • Sound problems and excessive data usage. Anyone else had this problem?

    Just upgraded from 3GS to 4S literally had the phone for 3 weeks, firstly within first week got told I'd used 90% of my 500mb data allowance when I hadn't even used the phone for much data only maps and general quick web page no YouTube vids or emails.  I found this odd as O2 told me I don't even use 500mb per month wiv my 3GS and with that I was streaming YouTube lots and emailing! And now today all my sound has gone! No ringing no alert sounds and when I call I can't hear anything and neither can the person at the other end.  I have tried resetting and everything anybody else had these problems??

    The iPhone is DESIGNED to switch to cellular data when asleep, unless it's connected to a power source, in which case it will stay logged onto the WiFi network.  iPhones have ALWAYS behaved this way.
    Either turn cellular data off at night, or leave your phone connected to a power source to continue the WiFI.

  • I am trying to use Find My Phone for people to follow me on the NYC Marathon next week.  Is there a way to set up a temporary User ID and Password to give out to people for this purpose?  I would obviously want to delete that log in info after the race.

    I would like to use Find My Phone for friends and family to follow me during the NYC Marathon next week.  Is there a way to set up or temporarily change my User ID and Password for this purpose?

    New York Runners has an app that is specially designed to do exactly that. It's (or will be) available in the App Store.
    http://www.nytimes.com/2011/10/27/sports/for-new-york-city-marathons-fans-a-cell phone-app-to-keep-in-touch.html?ref=technology
    You might also look at apps like RunMeter. It will let you send updates by email, Facebook or Twitter. Glympse is another app that, though not designed specifically for runners, might also be useful.

  • For the IT people reading this...lack of simplicity?

    I've been involved in IT consulting for the last several years spending a lot of time switching many users over from Windows. I just spent the last week getting my family on iChat AV with iSights on PBG4's and built-in iSights on the new MB/MBPros. Some of us are at locations with regular cable modems and an AX - what I would consider an average, simple setup.
    Even as a tech-oriented person, I found it incredibly annoying to deal with all the issues with getting video chat to work. Port forwarding or DMZ hosting? The average user should NOT have to do that to get iChat to work. I can understand having to deal with technical issues involving a corporate firewall - but this should all work out of the box on your typical AX or linksys router. Why not send all the information over port 80? Also - if port forwarding is necessary, that allows ONLY ONE computer behind each router to use iChat AV. The avg. home is going to have more than one mac - but if you activate port forwarding or DMZ hosting, then you limit iChat Video to one computer.
    Someone please explain the insane logic behind all this...
    Black MB   Mac OS X (10.4.7)  

    After further reading, the simple question becomes:
    Why doesn't Apple support UPnP on their Airport family of devices? It seems the UPnP technology would solve most issues on this forum - why not offer it?
    Black MB   Mac OS X (10.4.7)  

  • HELP IM 13 YEARS PLD AND I NEED ANYNOE TO HELP PLZ READ THIS POST

    I lost all my music from my library and i cant find it anywhere is there a way i can take the songs on my ipod and put thm in my library. plz help!

    Li Kid,
    I found the following steps to recover a deleted library. Not sure who the author is but it worked for me.
    Start with the iPod disconnected from the computer - DON'T CONNECT IPOD YET
    - open iTunes
    - open iTunes Preferences - this blocks iTunes from seeing an iPod connection; leave the preferences window up and running
    - connect the iPod to the computer, wait about 15 seconds before continuing
    - open 'My Computer'
    - Tools menu, Folder Options, View tab, enable 'show hidden files /folders'
    - open iPod icon in My Computer
    - open iPod_Control folder
    - you should see a folder named Music
    - drag this folder to somewhere on your computer hard drive
    - after the copy completes, right-click the new Music folder on your hard drive and select 'Properties'
    - clear the checkmark next to 'Hidden'
    - Close that explorer window
    - eject iPod from System tray "Safely Remove Hardware" icon. This icon looks like a small gray rectangle with a green arrow floating above it. It's only there when a removable device (like the iPod in this case) is attached to the computer. Right-click & select 'Safely remove..', then click 'Stop' in the next window, OK in the next window, and then Close to complete the ejection.
    - disconnect the iPod from the computer
    - go back to iTunes, cancel the preferences window
    - File menu \ Add folder to Library \ find that Music folder copied over from the iPod
    Your iTunes library should be back in action! But wait - there's more!!

  • Nokia N91 SW people - read this

    Hi
    I just wanted to say that since the release of 2.00 and then 2.10 the N91 is in good shape now. You can see in the blogs etc that most people who switch to 2.10 are happy. I have had no problem However a few people have a problem where they get "HDD unavailable" message. Its like an application is "holding" the HDD and not releasing it. We have speculated its a 3rd party application, a bad database (Gallery, Music Player, Camera??) etc. The sure way to fix it is to format the HDD but for some people like LabourPains, Rudy7355 it keep happening. There has been alot of self-help between users but we all hope that for the next N91 release (2.??) you try and nail this down and fix it.
    Here are some useful links (with others embedded) that might help you people nails this one.
    http://www.3g.co.uk/3GForum/showthread.php?t=49626
    http://www.3g.co.uk/3GForum/showthread.php?t=47320
    http://www.3g.co.uk/3GForum/showthread.php?t=48439
    Read past the initial stuff in this one to where there is alot of good info on the problem
    http://www.3g.co.uk/3GForum/showthread.php?t=48858
    Good luck
    music

    Well actually no theme creator lets you change the font. You can change other parameters, font colors and stuff like that but you cant change the font. In OS 9.1 however the system uses ttf fonts which are located in the z:\resources\fonts, you can substitute them with other fonts placing them on d:\resources\fonts for example. The fonst have to have the same name as on Z drive, you can do it with a font editor program but I don't think that it will change the font size, you can make a normal font bold for example so it's gonna be a little bigger . BTW I don't think that Nokia will aprove doing that, but hey, if something goes wrong you just take the MMC/SD card and reboot the phone without the fonts (cos they are on the card itself) and the system will use the defaults. You can experiment then.
    best regards
    blesio

  • Powershell and oracle and duplicate data in table

    I have created powershell script to  insert data in oracle table   from csv file and I want to know how to stop insert  duplicate row when Powershell script runs multiple time My powershell script is as follow:
    '{0,-60}{1,20}'
    -f
    "Insert TEEN PREGNANCY ICD9 AND ICD10 CODES into the su_edit_detail ",(Get-Date
    -Format yyyyMMdd:hhmmss);
    $myQuery
    =
    SET PAGES 600;
    SET LINES 4000;
    SET ECHO ON;
    SET serveroutput on;
    WHENEVER sqlerror exit sql.sqlcode;
    foreach
    ($file
    in
    dir
    "$($UCMCSVLoadLocation2)"
    -recurse
    -filter
    "*.csv")
    $fileContents
    =
    Import-Csv
    -Path
    $file.fullName
    foreach ($line
    in
    $fileContents)
    $null
    = Execute-NonQuery-Oracle -sql
    insert into SU_EDIT_DETAIL(EDIT_FUNCTION,TABLE_FUNCTION,CODE_FUNCTION,CODE_TYPE,CODE_BEGIN,CODE_END,EXCLUDE,INCLUDE_X,OP_NBR,TRANSCODE,VOID,YMDEFF,YMDEND,YMDTRANS)
    Values
    ('$($line."EDIT_FUNCTION")','$($line."TABLE_FUNCTION")','$($line."CODE_FUNCTION")','$($line."CODE_TYPE")','$($line."CODE_BEGIN")','$($line."CODE_END")',' ',' ', 'MIS', 'C', ' ', 20141001, 99991231,
    20131120)
    Vijay Patel

    please read "PLEASE READ BEFORE POSTING"
    This forum is about the Small Basic programming language.
    Try another forum.
    Jan [ WhTurner ] The Netherlands

  • Upgrading and Duplicates in iphoto 11

    I am having trouble with importing photos and getting duplicates.
    I bought a new macbook pro and copied my old iphoto library from my old computer.   I pointed iphoto 11 to my old library and it did an automatic upgrade of the library file.   Next it generated HD thumbnails.    I then went to load some other misc photos from a external drive and I selected everything in a folder and its subfolders, etc.   I was prompted, if I wanted duplicates removed and i checked yes, and do for all future actions.   However, when i do an import it loads all the photos and doesnt remove duplicates before loading into iphoto 11.
    Example.  If there are 5 files all called IMG3404.JPEG but in various subfolders of the photo folder, it loads all 5 instead 1.    I know, the duplicates are a result of bad file management over the course of years.   Problem is I have over 30,000 photos and the import will make it 5 times that.
    Is there a way I can load photos into iphoto 11 and not load duplicates as the example? 

    They are duplicate files that have been replicated over years of backups, etc.   Now trying to consolidate by importing into iphoto, but it brings in each one and doesnt recognize they are duplicates
    Each set of files are the same photo and have the same creation and modification dates
    Does that help?

Maybe you are looking for

  • PO and Goods receipt flag table

    Hi all, I want to select POs for which Goods receipt is not created. Is there any table which stores the PO number and GR flag indication for created or not created. And the table should contain date of creation for PO too. Thanks Ricky

  • Business Functions of type Validation/Constraint not used when defined in the Glossary

    Hi, the problem is the following: in the Glossary, we have some Business Rules defined. They are of type Validation or Constraint and shall be used to validate the data in the system. These Business Rules are used in Domains in a CDM model. From that

  • Convert Date Time field to Date & Time

    Post Author: Smita CA Forum: Crystal Reports I am new to Crystal XI  . I need to convert DateTime ("2007/04/04/04 09:49:05:00") format to Date and Time and then group a months data first by date and then by Time ( per Hour ). If someone can help ? Th

  • Reading file from FTPs

    I have to read from different locations(FTP Servers) if there are any new files I have to transform those to my ftp, If I place a new file in my ftp it has to go to the other ftps. Can i achieve this with a single ESB service? I believe I can do it u

  • Opening PDFs in Indesgin?

    A supplier of mine has sent me a pdf version of some artwork that was originally a .indd file.  Is there any way (like with Illy) that I can open the PDF in InDesign (or Illy) and edit it? OS - Windows XP InDesign - CS Version 3.0 Thanks Sarah