You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@couchdb.apache.org by Kevin Coombes <ke...@gmail.com> on 2012/11/06 12:13:53 UTC

Initial Bulk Upload (was Re: Exist test?)

Hi Dave,

Special thanks for your suggestion on initial bulk upload.  Point [2] 
explains why I always had to compact immediately afterwards, and reduced 
disk space usage ten-fold....

(And the subject change is so that I and others can maybe find this 
advice again in the future.)

     Kevin

On 11/6/2012 2:15 AM, Dave Cottlehuber wrote:
> On 5 November 2012 19:22, Kevin Burton <rk...@charter.net> wrote:
>> [SNIP]
>>
> Hi Kevin,
>
> [SNIP]
> If you're initially bulk uploading data, I would do 3 things
> differently to what you're currently doing.
>
> 1. assign UUIDs myself
> This is the only enforced unique indexed attribute in a DB, so use it
> well. Put something you want in it. It's basically free text ** within
> reason.
>
> 2. insert them in sorted UUID order
> CouchDB is a database and sorting matters. Couch uses a B~tree ** and
> so if you insert randomly you spend a lot of time forcing the re-write
> of intermediate nodes for no gain. As Couch is an append-only
> datastore this means several things -
> - wasted space until you compact
> - slower insert performance as you have multiple writes instead of one
> http://horicky.blogspot.co.at/2008/10/couchdb-implementation.html
>
> 3. try inserting the first few docs by hand with curl. And read up on
> the _bulk_docs API, this is much much faster.
>
> Re your drivers, there are several but I personally don't use any of
> them. There are more popular ones (based on my dodgy recollection)
> here http://wiki.apache.org/couchdb/Related_Projects hopefully some of
> the other Windows folk will pipe up.
>
> A+
> Dave
>
> ** handwavey