What's the best strategy for storing URLs as keys?

I need to create an index that maps URLs to 8-byte longs. I know that I can just use StringBinding and LongBinding to create my entries, however I'm wondering if there's anything that I could do to make more efficient use of memory. I need quick lookups, but I also need to store a lot of data (600M URLs and growing). The average length of a URL is about 90 characters.
Since I don't know what BDB actually keeps in memory at runtime, I'm wondering what advantage there is (if any) from using hashes for the keys instead of the full URLs. I'd have to store the URL in the data to distinguish hash collisions. There's obviously a trade-off between the size of the hash output and the number of collisions. I can't imagine that the lookup time would suffer that much if I have to iterate through all of the records that hash to a particular key, but I'd like to know if you have a reason to contradict that assumption.
In summary, I'd like to know what BDB would prefer from a performance standpoint (both size and speed):
+ smallest key/data records (store URL as key, long as data)
+ smaller keys/larger data/minimal collisions (store 160-bit hash of URL as key, URL and long as data)
+ even smaller keys/same data/more collisions (store 64-bit hash of URL as key, URL and long as data)
The key size would also be much more consistent if I use a hash function. I don't know if BDB cares about that.
I know that every situation is different and that I could just run my own performance tests (and I will), but it takes a few days to load the data into the index, so I'd like to call on the experience here to avoid making any really bad decisions and save me some time.
Thanks,
-Justin

Hi Justin,
I know you're not looking for an "it all depends" answer, but it does depend largely on how much of the data set you're accessing fits into the JE cache. If you're not sure about this and can't measure it by running your app and looking at the EnvironmentStats, please run the DbCacheSize utility. There's an FAQ about it, and several OTN threads.
If the large majority of the data set you access frequently fits into the cache, you'll get maximum performance with the first option you mention (no hashing), since you don't have to filter out hash collisions.
If I/O to read from disk is a big factor (because the JE cache isn't large enough), then you may be able to reduce this problem by trying to fit more into the JE cache.
Reducing the key size by using a hash is one way to fit more data into cache, since the keys are stored multiple times in the Btree internal nodes. However, unless the records with the same hash are likely to have been read recently, reading the colliding records will result in more random reads, and that will probably outweigh the advantage. JE stores each record separately on disk in write order, so records with the same hash key will not be stored in "pages" together -- JE has no pages.
So instead, you may want to try configuring key prefixing, if the URLs are likely to have common prefixes. See DatabaseConfig.setKeyPrefixing.
If that isn't sufficient, you could use a hash key and store all the records with that hash in a single record. In other words, you could store multiple logical records per JE record. This is quite a bit more work on your part, since you have to handle the packing and unpacking, as well as deletions, etc. For that reason I don't recommend it, but we have had some users do this successfully.
--mark                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               

Similar Messages

  • What is the best app for storing all passwords in a secure area? osx snow leopard

    What is the best app for storing all passwords in a secure area? osx snow leopard

    Keychain is nice, but I prefer to use 1Password. It's got a nice user interface, and does what I need. It stores passwords, notes, identitiy info, credit card info, application registration numbers, saves hardware information such as your router settings, and more. It also has a secure password generator. There's an extension that keeps a button in the browser toolbar for Safari, etc., so you can submit your info (password, credit card) with the click of a button, and without having to type the information in through the website you are visiting. It also pays attention when you are signing in to a website, and asks you if you want to save the login to 1Password. Once you get them all entered, it's a big timesaver.
    http://itunes.apple.com/us/app/1password/id443987910?mt=12
    The link above says it is for OS X 10.7 and up, but if you go to the developer's website (link below), it says it works with OS X 10.6 as well. You can also download a 30-day demo from the link below.
    http://agilebits.com/onepassword/mac

  • What is the best method for storing/retrieving images?

    Friends,
    OS: RHEL AS 3
    DB: 9iR2
    We have some critical situation. We need to scan all our Purchase orders and to store in our pc. we don't want to store the images in the database. it will be a big headache. We have 4 branches in different cities. All the branches are connected to our local network. We have novell server in every branches to store the oracle forms and the db is centralized in our head office. The purpose is to view any PO in any branch.
    so,what will be the best solution for storing the images for our archiving?
    thanks & regards

    Yes - a good quality external hard drive is the answer
    Moving the iPhoto library is safe and simple - quit iPhoto and drag the iPhoto library intact as a single entity to the external drive - depress the option key and launch iPhoto using the "select library" option to point to the new location on the external drive - fully test it and then trash the old library on the internal drive (test one more time prior to emptying the trash)
    And be sure that the External drive is formatted Mac OS extended (journaled) (iPhoto does not work with drives with other formats) and that it is always available prior to launching iPhoto
    And backup soon and often - having your iPhoto library on an external drive is not a backup and if you are using Time Machine you need to check and be sure that TM is backing up your external drive
    LN

  • What is the Best approch for storing passwords in database

    Hello developers
    I have two question
    1.What is the best way to store passwords in databse table.i want to use something like des3 or other strong one,I need a refernce implemetation.if some one know some good web example/tutorial -also in sun site-i didnt found something like that but i may looked the wrong places.
    2.I am Devloping a J2ee application.So if i am going to use des3 where should i put the key .i dont want that the key will be available to evry one ,so where should i put it.
    Yoav

    What you need to understand is the difference between a hash and a block cipher like DES (and more than likely I'm not the best one to do it but here it goes):
    First of all, you are using a monoalphabetic cipher in your current implementation. You're right, it's very weak and people have been breaking them on pen and paper since aroun 1200 A.D.
    DES is like the monoalphabetic cipher in one important respect: Anything you use the cipher on, for example the String "hello" you can then reverse and return to the original value "hello." That is generally not the way passwords are stored, they usually use a hash (as the previous poster stated) like MD5.
    If I hash the String value "hello", I will not be able to reverse it. However, the trick is that if I hash the same String value again I will get the same hash value. This is important. So use this algorithm (that the previous poster also mentioned):
    Store Password Hash
    1) Get user input password
    2) Use MD5 (or SHA1) to on password to get a hash value of the password
    3) Store the hash value of the password in your database
    Authenticate a User (i.e. let them login)
    1) Get user input password
    2) Use MD5 (or SHA1) to on password to get a hash value of the password
    3) Retrieve original password hash value from the database
    4) Compare the two hash values, if they match authenticate, else error

  • What is the best option for storing my iphoto library on the cloud?

    I was curious as to what people are doing (other than backing up to external HD) for backing up the iphoto library?  Anyone using any cloud solution out there for storage?  What are some of the best options?  I have around 30,000 photos in my iphoto library.

    There is no good solution for storing the Library in the Cloud. The amount of data involved means that uploading or downloading is very, very slow. We do see posts on here from people trying to restore from a back up to the Cloud wondering if it's possible to speed up the download currently estimated in days. Running a Library from the cloud is just painfully slow - people have tried it and that is the consensus.
    However, as part of a comprehensive back up plan there is a lot to be said for backing up your Photos to the cloud. Not as good as backing up the whole Library, but as a "last line" you at least have your photos. There are many options: Flickr, Picasa, SmugMug etc. However, check the terms of your account carefully. While most sites have free uploading, you will often find that these uploads are limited in terms of the file size or the bandwidth you can use per month. For access that allows you to upload full size pics with no restrictions you may need to pay.

  • What is the best strategy for wireless printing?

    Hi,
    I just got an HP 6500A wireless all in one printer and have it connected by ethernet cable to our wireless router. My desktop running Vista Home Premium is also connected to the router. Printing from the desktop works fine. We also want to print wirelessly from a couple of Windows laptops, an iPhone and an iPad. I installed HP's Mobile E-print driver on the laptops and it works but I think the size limitation (5 mb) is a problem. I couldn't print a single page pdf, but a Word doc printed fine.
    So, it looks like I could:
    A: Connect the printer to the desktop by USB and share the printer on the network. Possible drawbacks - I think I'd have to leave the desktop on all the time and we haven't had much success sharing our old HP printer.
    B: Install the printer on the network, via ethernet, and use the embedded web server. Then I guess I install the printer software on each computer?
    C: Connect the printer wirelessly. Is this any different from option B other than not using a cable? The printer is located near the router and the desktop so the cable isn't a problem.
    I know it may just come down to individual preference, but I'd be grateful for any advice as to which direction to go!
    Thanks.
    Dirt Gardener
    This question was solved.
    View Solution.

    A is not a great solution.  Printer sharing USB printers have so many downsides, some of which you described.  In addition, only printing can be shared, not scanning or faxing.
    I like B the best.  Ethernet is superior to wireless if you can connect that way - no passwords, no interference and it is generally a faster interface.  Install the printer software on each computer (from the "Support & Drivers" link at the top of this page, not a CD).
    C is OK, but only if you cannot connect by Ethernet.
    Say thanks by clicking "Kudos" "thumbs up" in the post that helped you.
    I am employed by HP

  • [iPhone] what is the best practice for storing data? SQLite or Keychain ?

    Can't find clear guideline about when and what should I store in Keychain and when to use files, SQLite.
    I need to save large array of data that is configuration of application.
    This configuration should not disappear in event of application upgrade or reinstall. It should be stored in Keychain, right?

    Only use the keychain if you need the added security. Even then it is not meant for large data storage. SQLite allows fast and efficient retrieval of subsets of the data and allows selection with the SQL language. Plists are handy but the entire data must be read in to access any portion so if the amount of data is small this is ideal.

  • What is the best method for bookmarkable URLs

    Hi Everyone,
    I've been working with JSF for a little while now, and I've been using RestFaces ( [https://restfaces.dev.java.net/|https://restfaces.dev.java.net/] ) for my bookmarkable URLs, but there are still a few limitations, like, when validations phase occurs, the real path to the JSF page is exposed (currently working on a fix, but it's lead me to ask some other questions.)
    What has everyone's experience been?
    I want to be able to use Pretty URLs as Links, commandButtons (aka, as a result of an action method / or any navigation event).
    So:
    http://mysite.com/app/viewuser/username1
    This URL should bring me to the viewUser page... and if they choose to take an action, "Edit a user" by clicking a command button for example, then they should be redirected to the editUser page, which should still have a pretty URL: http://mysite.com/app/edituser/username1
    Effectively, I want to alias a pretty URL to a page:
    /Login -> /faces/login.jspx
    /viewUser/{username} -> /faces/viewUser.jspx (and also sets the username parameter based on the value parsed from the URL)
    /editUser/{username} -> /faces/editUser.jspx (and also sets the username parameter based on the value parsed from the URL)
    I know PrettyURLPhaseListener helps with this, but it does not enable rendering of pretty URLs after navigation like RESTfaces does.
    Thoughts? Thanks.

    Ok, so I solved my problem by creating a JSF extension.
    [ PrettyFaces|http://ocpsoft.com/prettyfaces/]
    I wrote it recently and released it as opensource in order to fix some of the major problems with existing JSF URL mapping tools.

  • What is the best practice for storing iPads over the summer?

    Over the summer break do iPads need a maintenance charge or can they be charged to full, stored over the summer, and then a week or so before school charged again for deployment? Thanks.

    Generally, it's not a good idea to charge Li-P batteries fully prior to long-term storage. A more correct method is to have them charged between 40-60% before long-term storage. Be sure to shut off the devices fully by holding the power button down until the Power Off screen appears.

  • What is the best strategy for purchasing at the refurbished store?

    I am hoping to buy my next Apple through the refurbished store and wonder if anyone can share their experience in terms of finding the exact configuration you are looking for.  Today I browsed the store and found the perfect setup.  Apparently I did not click 'add to cart' quickly enough, and the Mac I wanted was gone.  I assume the stock is updated daily? or maybe new items are added once they are released for purchase from Apple? 
    I have no intention of compromising on the specs I want, can anyone help me strategize on how to increase my odds of finding the new refurb Mac I am looking for?

    There really is no strategy. Items put in the Refurb Store come in all shapes and sizes but not by any particular order. If you want a particular configuration then all  you can do is check regularly to see if one appears. Or you can select some other configuration that you can upgrade yourself.
    As for buying refurbs in general, I have several of them all of which have been running perfectly since purchase. Were I to buy a new computer I would probably buy it refurbished to save money.

  • What is the best strategy for sharing iPhoto libraries between two macs?

    I have an iMac and a new Airbook.  I want to keep my iMac as the"main" iPhoto location, but want to also keep copies of the photos on my Airbook. 
    I could copy the library from one to the other to get the photos into the Airbook initially, but then how would I get subsequent ones in there?  Is there a way to sync these two computers so that the photos flow from one to the other?  One-way syncing would be ok...
    Thanks.
    Oh - and I use MobileMe if that helps.

    There is an application that can sync two iPhoto libraries.  It's SyncPhotos
    Here's how it works: 
    1 - it compares the databases and/or album.xml files of Libraries A and B. 
    2 - the files in A that are not in B are imported into B by copying the original files from A into B's Import folder.  
    3 - it then does the same for B. 
    4 - metadata is not copied nor are Faces or Places. 
    It can be used to just copy from A to B and not sync both ways if desired. It will work with 2 libraries that are different versions, i.e. between an iPhoto 08 and iPhoto 09 library.
    If you want to copy selected Events or Album from one to the other and keep the metadata, faces and places intact, you can use the paid version of iPhoto Library Manager.  You would have to manually select the event/album to copy from A to B and do the same for B to A.
    OT

  • What's the best strategy for cropping 1080p video?

    I just shot a panel discussion in 1080p, and because of all the panning back and forth there are parts that could benefit from some cropping (in other words, I would set a keyframe and use the transform tool to enlarge slightly in order to center the speaker in the frame).
    1080p is WAY larger than the indended final product. Should I plow straight ahead and assume that outputting to a smaller resolution will take care of concerns over degredation? or should I anticipate issues from the start and specifiy a smaller resolution as part of bringing the video into the timeline?

    not really.
    Let's assume I have auto render (or background rendering) turned off. I'll finish all my transform-tool work, then "render all" while I run down to the corner Starbucks.
    I think in terms of Photoshop. If the canvas is smaller than the clip, then introducing the video to the timeline -or canvas, would be the point when the clip gets repositioned... or, if the clip is OK, then I would just use the transform tool to reduce it to fit the frame.
    Again, thinking in Photoshop terms, if the final exported product is smaller that the original, then using the transform tool to increase the clip size has less pronounced effect. Alas, in Photoshop,  outputting at a smaller size (for web) means you aren't going to see the pixels that got "tore up".
    I'm not sure I can expect the same thing to happen in the wonderful world of video editing.

  • Need help editing for 14:9 broadcast -- what's the best strategy?

    Hi,
    I work for a station which broadcasts in 14:9. I can cope easily with this when using full screen images by visualising, roughly, what will be lost on the left and right. But now I want to use split-screens and image collages using smaller, scaled down pictures, and want to be able to design the material precisely within a 14:9 frame so that none of the material is lost.
    What's the best strategy for this? Is there a cartesian co-ordinate I shouldn't stray beyond -- or better, is there an easy way with a mask, perhaps, to show 14:9 within a 16:9 timeline. (I'll be editing with a 16:9 timeline). Can I somehow create two vertical lines to visually provide the limits of the 14 component of the aspect?
    Many thanks for any help.
    macbook pro   Mac OS X (10.4.7)  

    Well, you could certainly create a mask in PhotoShop that is 720x480 wide, but has a 630x480 hole in it. Then bring it in and set its anamorphic flag and put it on the top Video layer to see a 14:9 window.
    I arrived at 630 by multiplying 720x14 and dividing the result by 16...
    Patrick

  • What is the best practice for changing view states?

    I have a component with two Pie Charts that display
    percentages at two specific dates (think start and end values).
    But, I have three views: Start Value only, End Value only, or show
    Both. I am using a ToggleButtonBar to control the display. What is
    the best practice for changing this kind of view state? Right now
    (since this code was inherited), the view states are changed in an
    ActionScript function which sets the visible and includeInLayout
    properties on each Pie Chart based on the selectedIndex of the
    ToggleButtonBar, but, this just doesn't seem like the best way to
    do this - not very dynamic. I'd like to be able to change the state
    based on the name of the selectedItem, in case the order of the
    ToggleButtons changes, and since I am storing the name of the
    selectedItem for future reference.
    Would using States be better? If so, what would be the best
    way to implement this?
    Thanks.

    I would stick with non-states, as I have always heard that
    states are more for smaller components that need to change under
    certain conditions, like a login screen that changes if the user
    needs to register.
    That said, if the UI of what you are dealing with is not
    overly complex, and if it will not become overly complex, maybe
    states is the way to go.
    Looking at your code, I don't think you'll save much in terms
    of lines of code.

  • What is the best strategy to save a file in client computer

    I want to save some information in a file in client computer. What is the best strategy to do? There are some ways I can think about. But none of them is good enough for me.
    1. I gave all-permission. So, I can actually write what I want. But, in order to make the program runs on all platform/all client computers, I can't make any assumptions on file system of client computer. So, this is not good.
    2. I can write a file into .javaws directory. But, how can I get file path for this directory? JNLP API does not give this path to us. I can't think a way to get this path for all client computer (WIndown, Mac, Unix).
    3. To write as a muffin. Seems fine. But, I often change server and path. So, once I changed server, the client will loss the file saved since muffin is associated with server and path.
    4. I can just open one file with on path. I think J2SE will treat this file platform dependently. For example, for W2K this file will be put into Desktop. This is bad.
    Any better idea?

    In the past I have used the Properties class to do things like this. Using load and store, you can read and write key=value pairs.
    I store the file in the user.home directory. You can use System.getProperty("user.home") to get this location.
    No guarantees, but I thought that this user.home property was good for any OS with a home directory concept. If that turns out not to be true, maybe the System property java.io.tmpdir would be more consistent across platforms. This, of course, would be subject to delete by the OS/administrators.
    -Dave

Maybe you are looking for