What is a good way to load 80 million documents in DocumentDB?

I am having problems loading a large set of data.  We want to load 80 million documents.  We are trying to do testing for an IOT solution that will end up having a lot of data in it.  I followed the instructions to use a stored procedure to
do a bulk load provided by Ryan CrawCour on the Microsoft Site:
https://code.msdn.microsoft.com/windowsazure/Azure-DocumentDB-NET-Code-6b3da8af
But it throws exceptions when we load 2,000 - 5,000 documents.  Our documents are only about 80 characters.
Error: One or more errors occurred., Message: Exception: Microsoft.Azure.Documen
ts.RequestRateTooLargeException, message: {"Errors":["Request rate is large"]},
What is a good way to load large datasets?  ( Load backups, migrate data, ... )  Or is DocumentDB just a wrong choice when you have millions of rows to load?
Thanks

Hi,
I had been working on this from around one month and I am happy to say that my code works. Was able to upload around 0.5 million documents in 20 min. 
The configuration was Document DB with S3 , I scaled out for 16 collection for sharding, I think you need to shard out more. This will depend on how much each document takes.
So the Calculation goes like if each document is lets say on an average 2Kb of Size and you have 80 million documents that will come out to be 160GB
That means you will need more than 16 collections to store as at Max 1 collection can have 10GB of Data. So to be on the safer side I would say lets go for 3 times the storage so 48 collections . If all of them are at S3 than you have 2500 RUs spread across
48 collections and I am sure if you do insertion now you will not get Request Rate too large exception.
I have come up with this code hopefully it will help you as well.
https://social.msdn.microsoft.com/Forums/azure/en-US/d036afe2-78ec-45ee-8b0d-297f0f5320fe/azure-documentdb-bulk-insert-using-stored-procedure.
For Sharding you can have look at
https://msdn.microsoft.com/en-us/library/dn589797.aspx?f=255&MSPPError=-2147217396.

Similar Messages

  • What is the best way to load 14 million COPA records from BW into HANA?

    I have been managing a project in which we are attempting to load COPA data from BW into HANA using Open Hub and we continue to run into memory allocation errors in BW. We have been able to load 350,000 records.
    Any suggestions on what the best approach would be along with BW memory parameters.
    Your replies are appreciated.
    Rob

    Hello,
    this seems to be issue in BW caused by big volume of migrated data. I do not think that this is HANA related problem. I would suggest to post this message into BW area - you might get much better support there.
    But to help as much as I can - I found this (see point 7):
    http://help.sap.com/saphelp_nw04/helpdata/en/66/76473c3502e640e10000000a114084/frameset.htm
    7. Specify the number of rows per data package for the data records to be extracted. You can use this parameter to control the maximum size of a data package, and hence also how many main memories need to be made available to structure the data package.
    Hope it helps.
    Tomas

  • What are the good ways to send a big file( 20MB-100MB) file to my friend?

    what are the good ways to send a big file( 20MB-100MB) file to my friend?
    Thanks in advance

    if this is with the internet, iChat is probly your best bet,
    but if you just want a transfer,
    plug a firewire into both of your computers, shutdown one of them, hold "T" and press the power button, the restarted computer should pop up as an external drive on the second computer.

  • What's a good way to manage custom schema for DS  5.1?

    What's a good way to manage custom schema?
    Custom Schema for Object Class and Attributes
    The reason I ask this is because there might be a need in the future where I need to export those custome schema into different branded directory server. I just want to make this as painless as possible.
    Right now, I thought of 2 options
    1) Create my own LDIF file with my custom attributes and object classes, so if one day I need to export to another directory server, I can just copy that custom created LDIF file over. (Will this work?)
    2) Create a JAVA application using JNDI. What this Java App. will do is read through a XML file and create those object classes and attributes on-the-fly. (of course, the XML structure will be predefined by me, so that my Java App. will be able to parse through it correctly. Will this work?)
    Anymore suggestion? I would want to hear more advices and suggestions.
    Also, I assume that will work even with replication. All I need to update is the master server, and the slaves will replicate automatically.
    Thank you very much! :)

    Demo: I'm using the nul character to represent the end of the word, so that the data structure can represent that "hell" and "hello" are both in the vocabulary:
    import java.util.*;
    class Node {
        private SortedMap<Character, Node> children = new TreeMap<Character, Node>();
        //0 <= index <= word.length()
        private void add(String word, int index) {
            if (index == word.length()) {
                children.put(Character.valueOf('\u0000'), null);
            } else {
                char ch = word.charAt(index);
                Node child = children.get(ch);
                if (child == null) {
                    children.put(ch, child = new Node());
                child.add(word, index+1);
        public void add(String word) {
            if (word == null || word.length()==0)
                throw new IllegalArgumentException();
            add(word, 0);
        public String toString() {
            return children.toString();
    public class Example {
        public static void main(String[] args) throws Exception {
            Node root = new Node();
            root.add("hello");
            root.add("how");
            root.add("who");
            root.add("hell");
            System.out.println(root.toString());
    }

  • What's a good way to do a thread dump into a separate file

    What is a good way to do a thread dump automatically into a separate file.
    Example. I run a script to do the thread dump, but unfortunetly, it goes into my stdout log file with the rest of my weblogic errors.
    Any ideas? I want it in a separate file when I run my script?

    Do a Google search on "Drobo S" "benchmark."  I don't have a Drobo S, only the regular Drobo.  But here's a guy who tested one on Windows:
    http://mansurovs.com/drobo-s-review-usb-3-0-2nd-generation
    This one has it a bit faster:
    http://the-gadgeteer.com/2011/12/31/drobo-s-storage-array-review/
    Do read up on a few reviews of it, and be absolutely clear that interface speed (i.e. eSATA versus Firewire versus Thunderbolt) is NOT the same as the performance of the system.  The Drobo cannot keep up with any interfaces... at least the Drobo and the Drobo S cannot.
    I am not using the FS model which is a NAS.  I am using the plain old "Drobo" which is slower than the Drobo S, but that's not to say that the Drobo S is fast, because it is not.
    The Drobo in theory is really attractive: Dead simple to manage, can mix and match drive sizes, offers you some data protection, etc.  However do note that protected storage is not, in and of itself, a backup.  You need other backups besides just the data on the Drobo.  And, because it's so slow, it's really not a great fit for photo storage.  See this review from a guy who used to think the Drobo was great for that and then appended his review:
    http://www.stuckincustoms.com/drobo-review/
    To be as clear as possible, IMO the BEST backup strategy with something like Aperture (so long as your managed Aperture library is of a manageable size, like < 800 GB), is to get a few small portable Firewire 800 drives and keep vaults on each one.  They are great because they are easy to use, to have with you, are bus powered, and you WILL offsite them.

  • Is this a good way for loading and handling an xml file?

    I'm new to xml files, but it seems to me that a good way to handle them may be this:
    - create an xmltype table with a unique xmltype column
    - load the xml file in the xmltype column of the table
    - writing a procedure for scanning the whole column of the table by using the extractvalue built-in function for inserting the different nodes of the xml file (just loaded) in the final tables (to link correctly the father and son tags with foreign keys)
    Does it seem to you a good way to load the nodes of a xml file in a relation database?
    Thanks!

    Is this the 10gR2 Express Edition you mentioned over in How to load a XML file into a table or a different version?

  • What's The Best Way to Load a Replacement iPod Touch?

    I just received a replacement iPod Touch 3G for my daughter. Her iTunes library for the Touch that we returned to Apple is on her MacBook Pro. What is the best way to load the new Touch? She plans on using the same name for it. Should I restore from a previous backup or just plug it in and let in sync? Do I need to set anything on iTunes for disk management?
    Thank you,
    Bruce

    If she wants it like her old one the previous backup, if there is anything she wants to change though, now would be a perfect time to just resync it and start over. It really depends on what she wants.

  • What is the good way to charge the battery

    I don't know if this topic been answered before.
    I want to know what is the good way to charge the battery on Macbook pro? Should I start charging battery when it's 50% or 20% or 10% left? and Is it good to leave the charge plug in after it fully charged?
    thanks guys

    Everything you need to know about your battery is explained here:
    http://discussions.apple.com/thread.jspa?threadID=1764220

  • What is the best way to work with Word documents in The InDesign CS4???

    I work in Microsoft Word 2007 and all my documents have *.doc format.
    What is the best way to work with Word documents in InDesign CS4???
    David Blatner says to avoid copying and pasting text from Word instead of placing it (Ctrl+D).
    How about pasting RTF or Text Document???
    I want to make book's layout in ID CS4 and its main feature is that there is the left page with text and the right - with graphics.
    So, as I understand to place the text on each page I must create for example 70 Word documents and place each item on 70 left pages???
    It loks like wasting time. I sthere another way of making such layout???  What kind????

    It's best to place any text.
    You can have all of your text in one file and use auto-flow to add threaded text frames and pages as required (Hold down the Shift key when you click the loaded text cursor), but it's a little non-standard to have the thread only on one side of the spread from the auto-flow perspective, so you'll have to set up properly.
    This is one case where a master text frame will work to your advantage. On your master page spread, add a text frame to the left page, but not to the right (or at least not threaded to one on the right -- for some other project you might actually want two independent text threads). Hold the loaded cursor over a frame on the left side of a document page and auto-flow. ID will add new spreads as necessary, but only put the text on the left side.
    Peter

  • What's the best way to load balance multiple protocols on one vserver?

    Hi,
    We have a CSM blade on a 6513, in bridge mode. I'm just wondering what is the best way to serve HTTP and HTTPS (or any two or more ports) from the same group of servers. As I see it, we have two options:
    1. Don't set a port on the vserver, so it is load balancing "any" or "tcp". This is easy but I want to be sure there isn't a downside to this, other than the obvious security issue.
    2. Create multiple vservers and point them at the same serverfarm. I tried this and I got some odd results with the health checks.
    Any ideas? Thanks a lot.

    you listed the only 2 options available.
    The advantage of solution #2 is that you can apply specific config for each protocol ie: for HTTP you can turn 'persistent rebalance' if needed.
    If you want to use specific probes [not icmp], it is also a good practice to create a different serverfarm for each protocol.
    Like this, if the HTTP service goes down but not the server, you can still have other protocols loadbalanced.
    Regards,
    Gilles.
    Thanks for rating this answer.

  • What's most effecient way to load dvd op sys to G-4 with only cd capab ?

    Have 2 Macs. Recently upgraded newer one to Tiger (full version). Would like to use my original 10.2.7 OS to load on my older machine but it only has CD capabilities. What's most efficient way to do this? (I'm assuming it's legal to do so since I bought a new version for my new machine).
    PowerMac3,6   Mac OS X (10.4.5)  

    Hi Bill, and welcome to the discussions.
    Since you purchased a version of Tiger for the newer Mac, what you want to do would not violate the spirit of the licensing agreement (one install per copy of the software).
    That said, what you want to do will likely violate the technical limitations of your original 10.2.7 installation DVD. Grey installation DVDs shipped with specific hardware only contain the drivers for that specific hardware. In other words, if the newer Mac is G4 MDD tower with FireWire 800 and a dual 1GHz processor, that's what the DVD will be expecting when it tries to install the system software. If your older machine is a G3 iMac, the DVD won't recognize the processor, CD drive, CRT monitor, or anything else on the logic board. It simply doesn't have the drivers required to support the different hardware.
    If the older Mac has FireWire, you can install your original 10.2.7 on it, but there's no guarantee it'll work. Here's how:
    1. Boot the old mac into Target Disk Mode (hold the T key while booting; when done, you'll get an orange FireWire icon bouncing around on a blue screen);
    2. Connect the computers using FireWire cable, and the old mac should mount as an external hard drive;
    3. Reoot the new mac from the 10.2.7 install DVD;
    4. When it comes time to select the destination for the install, select the old Mac;
    5. When finished, turn off both computers, and disconnect the FireWire cable;
    6. Restart the old mac, and if you're really luck, and the hardware on the two macs isn't too different, it'll boot to 10.2.7;
    7. Download the 10.2.8 Combo Update and install.
    Just remember: the greater the difference in hardware specifications between the two macs, the greater the likelihood that this will not work.
    Good luck!
    Andrew

  • What's a good way to dynamically allow shortened commands?

    I have a BufferedReader set up that attaches to System.in.
    so, when I check the BufferedReader, it will retreive any commands put in by a user.
    I will have a hash table for commands set up like so:
    "north", "dn"
    "west", "dw"
    "south", "ds"
    "east", "de"
    "follow", "follow"
    "flee", "flee"
    "flip", "flip"
    So, what I want is to be able to check all the commands and shorten them to the lowest amount of chars before they're the same.
    Therefore, for the example hash list, if I typed in "fle", the code would select "flee". Or, if I typed "fo", the code would select "follow". But, if I typed in "f", then the code would respond with a "command not found" error, because "f" could be follow, flip or flee. I want to make it so that I can type "follow", "follo", "foll", "fol", "fo", and "fo" to select the same "follow" command.
    Has anyone found a good way of going about this? If my description is confusing, I'll try to re-explain in greater detail.

    Demo: I'm using the nul character to represent the end of the word, so that the data structure can represent that "hell" and "hello" are both in the vocabulary:
    import java.util.*;
    class Node {
        private SortedMap<Character, Node> children = new TreeMap<Character, Node>();
        //0 <= index <= word.length()
        private void add(String word, int index) {
            if (index == word.length()) {
                children.put(Character.valueOf('\u0000'), null);
            } else {
                char ch = word.charAt(index);
                Node child = children.get(ch);
                if (child == null) {
                    children.put(ch, child = new Node());
                child.add(word, index+1);
        public void add(String word) {
            if (word == null || word.length()==0)
                throw new IllegalArgumentException();
            add(word, 0);
        public String toString() {
            return children.toString();
    public class Example {
        public static void main(String[] args) throws Exception {
            Node root = new Node();
            root.add("hello");
            root.add("how");
            root.add("who");
            root.add("hell");
            System.out.println(root.toString());
    }

  • What is a good way for a heavy iMovie HD user to learn iMovie'09 ?

    Hi Guys,
    I have been using iMovie HD to edit my videos since mid 2005. I understand that Apple will, sooner or later, stop supporting iMovie HD with updates etc (it may have done so already).
    I'm doing event and safety videos for the company I'm working for. I tried to edit a video in iMovie'08 and found it too different from iMovie HD and it was difficult for me to "unlearn" iMovie HD.
    What would be a good way to learn iMovie '09 ?
    I would appreciate any tips & pointers. Thanks in advance.
    Sincerely,
    Azman

    I think the best way is to jump in and make a movie. If you get frustrated about something you know you should be able to do, there is probably a way to do it and you can ask here.
    Also, there are some [very good video tutorials here|http://www.apple.com/findouthow/movies>
    I recommend you start with the iMovie 08 videos because they cover some of the basics, and Apple has not gotten around to doing some of these topics for iMovie 09. Especially watch the videos for handling audio and keywords. Then watch the iMovie 9 videos.
    The biggest thing is getting used to a different metaphor. iMovie 6 uses the timeline, scotch tape and scissors metaphor, while iMovie 09 uses a storyboard, copy and paste metaphor more like a word processing program. Both metaphors work. Both do the same thing. But it can be frustrating making the switch. Good luck.

  • [SOLVED] What is the best way to load iptables/nftables on boot?

    Hi -
    New arch user migrating from Ubuntu/debian.  I'm used to loading netfilter iptable rules using an /etc/network/interfaces file.  What is the best way to do this using netctl/systemd?
    An RTFM of the wiki and a Google search didn't provide an answer, but maybe I'm just looking in the wrong places.
    Last edited by pgoetz (2014-06-21 08:32:01)

    Ah, OK -- got it.  I can just blindly enable the iptables.service, and systemd will make sure the interfaces are up before running the service.  How cool.  I'm still getting used to the luxury of not having to worry about stuff like this myself.  The only minor issue is it looks like I'll have to get the systemd nftables service out of AUR.
    Last edited by pgoetz (2014-02-24 19:58:53)

  • I want to load movies in to my iPad through itune in my pc that runs window XP. First of all I am not able to add the movies to the itune ,leave aside sync to iPad .can some one suggest what is the simplest way to load movies in my iPad.

    I have plenty of DVD which I load in my laptop and watch the movies while travel, however I am not able to do that in my iPad .
    I would appreciate some one guide me the simplest way of loading the movies in to my iPad. I tried the conventional procedure of loading/copying in to itune then sync to the iPad to load all movies in it, but failed to do the first step( not able to load /copy in itune) leave aside the sync to iPad next.
    Do I need to sellout some 20 odd $ to buy a conversion software that would help?
    Is it that Apple trying to make money by not making this simplest of thing possible by the user.( ref. windows, every thing is possible with little IQ of the user).
    If it is so , I would stop many of my friends to restrain from buying a device like this by spending some 1000$   ,rather look at a good Droid that is catching up so well.

    If you are the only user on your computer you probably don't have multiple user accounts set up and can disregard that.  If you are using iTunes 11 go to View>Show Sidebar.  Now see if your iPad appears under Devices on the left side when you connect it.  If it does, click on the name of your iPad on the left side and your iTunes sync settings options will appear in folders with tabbed headings to the right.
    If it doesn't appear on the left side, follow the troubleshooting steps shown in this article: http://support.apple.com/kb/TS1538.

Maybe you are looking for