You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@couchdb.apache.org by Brian Karlak <ze...@metaweb.com> on 2009/10/13 06:37:05 UTC

insert performance

Hello All --

I am running CouchDB-0.8.1 on a dual-core Gentoo box.  Things have  
been smooth for several months, but suddenly I noticed that insert  
performance has decreased dramatically, even for an insert into a  
newly created database:

# no database yet ...
$ time curl -X POST --data '{"_id":"12345", "guid":"128736218763"}' http://localhost:5984/freeq_brk%2F1/
{"error":"not_found","reason":"missing"}
real	0m0.009s
user	0m0.004s
sys	0m0.004s

# create a database
$ time curl -X PUT http://oat.corp:5984/freeq_brk%2F1/
{"ok":true}
real	0m0.182s
user	0m0.004s
sys	0m0.004s

# insert a document
$ time curl -X POST --data '{"_id":"12345", "guid":"128736218763"}' http://localhost:5984/freeq_brk%2F1/
{"ok":true,"id":"6cd41ba6ddfae66fc5f2f468c31411f1","rev":"3389439647"}
real	0m0.117s
user	0m0.004s
sys	0m0.000s

Can anyone provide me with any clues as to what might be going on, or  
even how to debug the situation?  This seems one or two orders of  
magnitude slower than I would expect -- or I remember seeing.

Many thanks in advance for any advice anyone can provide.

Brian

Re: insert performance

Posted by Paul Davis <pa...@gmail.com>.

On Tue, Oct 13, 2009 at 9:29 AM, Brian Karlak <ze...@metaweb.com> wrote:
>
> On Oct 13, 2009, at 5:37 AM, Paul Davis wrote:
>
>> Have you compacted your production DB recently?
>
> Well, these stats are for a newly created database, so compaction shouldn't
> matter, yes?
>
> But in any case, I have run compaction on the production database, to no
> avail.
>
> One caveat, however: we have one (somewhat funky) usecase which creates a
> large number of small databases.  Could the existence of several thousand
> small databases affect performance?
>
> Thanks,
> Brian
>

It could be an issue if you're having lots of churn in the open
databases. Though, there was some work on that just before and after
0.9.0 or .1 IIRC. You could test that by slowing traffic and writing
to a single database repeatedly to see if those inserts are requiring
the database file to be opened for each write maybe.

I didn't think the insert time should suffer on the empty database
like you had, but I figured it'd be worth a shot.

Also, 0.10.0 was just officially release ;)

HTH,
Paul Davis

Re: insert performance

Posted by Glenn Rempe <gl...@rempe.us>.

Well there you go!

:-)

On Wed, Oct 14, 2009 at 4:53 PM, Paul Davis <pa...@gmail.com> wrote:
> Glenn,
>
> CouchDB will already create subdirectories for db's with /'s. Though
> its not as automatic it'll allow for clients to do exactly the
> structure you're thinking of.
>
> HTH,
> Paul

Re: insert performance

Posted by Paul Davis <pa...@gmail.com>.

Glenn,

CouchDB will already create subdirectories for db's with /'s. Though
its not as automatic it'll allow for clients to do exactly the
structure you're thinking of.

HTH,
Paul

On Wed, Oct 14, 2009 at 7:48 PM, Glenn Rempe <gl...@rempe.us> wrote:
> Millions op DB's?
>
> Wouldn't you run into filesystem limitations due to the fact that
> CouchDB writes all of its DB's/indexes into a single dir?
>
> e.g For the ext3 filesystem "There is a limit of 31998 sub-directories
> per one directory, stemming from its limit of 32000 links per inode."
>
> http://en.wikipedia.org/wiki/Ext3
>
> And my limited knowledge of filesystem internals says that the more
> files you have in a single dir the longer it will take to seek on
> those files.
>
> "The ext2 inode specification allows for over 100 trillion files to
> reside in a single directory, however because of the current
> linked-list directoryimplementation, only about 10-15 thousand files
> can realistically be stored in a single directory.  This is why
> systems such as Squid (http://www.squid-cache.org ) use cache
> directories with many subdirectories - searching through tens of
> thousands of files in one directory is sloooooooow."
>
> http://answers.google.com/answers/threadview/id/122241.html
>
> Of course this will vary by filesystem in absolute terms, but I think
> the concept is the same for all current file systems. No?
>
> CouchDB might really be able to address this if it did something like
> make subdirs under the couchdb data dir that were derived from
> portions of a hash of the filename.  Using such a 2 or 3 level deep
> dir structure would indeed allow for a huge number of DB's.
>
> e.g. if the db name hashes to a123df4g34fd.couch
>
> Make dirs/files like:
>
> DATA_DIR/a1/12/3d/f4g34fd.couch  # DB
> DATA_DIR/a1/12/3d/.some_view_index_hidden_dir  # view index
>
> No?
>
> On Tue, Oct 13, 2009 at 3:59 PM, Chris Anderson <jc...@apache.org> wrote:
>> On Tue, Oct 13, 2009 at 6:29 AM, Brian Karlak <ze...@metaweb.com> wrote:
>>>
>>> One caveat, however: we have one (somewhat funky) usecase which creates a
>>> large number of small databases.  Could the existence of several thousand
>>> small databases affect performance?
>>>
>>
>> We definitely support the many-databases use case (eg: 1 per user, aka
>> millions of databases). I think there is extra support for that in
>> 0.9.1, and of course 0.10 has only improved from there.
>>
>> Chris
>>
>>
>> --
>> Chris Anderson
>> http://jchrisa.net
>> http://couch.io
>>
>
>
>
> --
> Glenn Rempe
>
> email                 : glenn@rempe.us
> voice                 : (415) 894-5366 or (415)-89G-LENN
> twitter                : @grempe
> contact info        : http://www.rempe.us/contact.html
> pgp                    : http://www.rempe.us/gnupg.txt
>

Re: insert performance

Posted by Leo Simons <ma...@leosimons.com>.

On Thu, Oct 15, 2009 at 12:48 AM, Glenn Rempe <gl...@rempe.us> wrote:
> "The ext2 inode specification allows for over 100 trillion files to
> reside in a single directory, however because of the current
> linked-list directoryimplementation, only about 10-15 thousand files
> can realistically be stored in a single directory.
...
> Of course this will vary by filesystem in absolute terms, but I think
> the concept is the same for all current file systems. No?

Uhm, well, for the record, basically, no :-)

There's file systems that can use things like a tree instead of a
linked list for directories.

IOW more powerful file systems like ZFS or GPFS have different characteristics.

Things like running 'ls' probably do become more and more expensive
rather non-linearly across almost all filesystems, but you probably
don't _really_ have to run that 'ls' :)

All that said, put them /-es in your namespace name and you can
probably stick with EXT2!

ciao,

- Leo

Re: insert performance

Posted by Glenn Rempe <gl...@rempe.us>.

Millions op DB's?

Wouldn't you run into filesystem limitations due to the fact that
CouchDB writes all of its DB's/indexes into a single dir?

e.g For the ext3 filesystem "There is a limit of 31998 sub-directories
per one directory, stemming from its limit of 32000 links per inode."

http://en.wikipedia.org/wiki/Ext3

And my limited knowledge of filesystem internals says that the more
files you have in a single dir the longer it will take to seek on
those files.

"The ext2 inode specification allows for over 100 trillion files to
reside in a single directory, however because of the current
linked-list directoryimplementation, only about 10-15 thousand files
can realistically be stored in a single directory.  This is why
systems such as Squid (http://www.squid-cache.org ) use cache
directories with many subdirectories - searching through tens of
thousands of files in one directory is sloooooooow."

http://answers.google.com/answers/threadview/id/122241.html

Of course this will vary by filesystem in absolute terms, but I think
the concept is the same for all current file systems. No?

CouchDB might really be able to address this if it did something like
make subdirs under the couchdb data dir that were derived from
portions of a hash of the filename.  Using such a 2 or 3 level deep
dir structure would indeed allow for a huge number of DB's.

e.g. if the db name hashes to a123df4g34fd.couch

Make dirs/files like:

DATA_DIR/a1/12/3d/f4g34fd.couch  # DB
DATA_DIR/a1/12/3d/.some_view_index_hidden_dir  # view index

No?

On Tue, Oct 13, 2009 at 3:59 PM, Chris Anderson <jc...@apache.org> wrote:
> On Tue, Oct 13, 2009 at 6:29 AM, Brian Karlak <ze...@metaweb.com> wrote:
>>
>> One caveat, however: we have one (somewhat funky) usecase which creates a
>> large number of small databases.  Could the existence of several thousand
>> small databases affect performance?
>>
>
> We definitely support the many-databases use case (eg: 1 per user, aka
> millions of databases). I think there is extra support for that in
> 0.9.1, and of course 0.10 has only improved from there.
>
> Chris
>
>
> --
> Chris Anderson
> http://jchrisa.net
> http://couch.io
>

-- 
Glenn Rempe

email                 : glenn@rempe.us
voice                 : (415) 894-5366 or (415)-89G-LENN
twitter                : @grempe
contact info        : http://www.rempe.us/contact.html
pgp                    : http://www.rempe.us/gnupg.txt

Re: insert performance

Posted by Chris Anderson <jc...@apache.org>.

On Tue, Oct 13, 2009 at 6:29 AM, Brian Karlak <ze...@metaweb.com> wrote:
>
> One caveat, however: we have one (somewhat funky) usecase which creates a
> large number of small databases.  Could the existence of several thousand
> small databases affect performance?
>

We definitely support the many-databases use case (eg: 1 per user, aka
millions of databases). I think there is extra support for that in
0.9.1, and of course 0.10 has only improved from there.

Chris

-- 
Chris Anderson
http://jchrisa.net
http://couch.io

Re: insert performance

Posted by Brian Karlak <ze...@metaweb.com>.

On Oct 13, 2009, at 5:37 AM, Paul Davis wrote:

> Have you compacted your production DB recently?

Well, these stats are for a newly created database, so compaction  
shouldn't matter, yes?

But in any case, I have run compaction on the production database, to  
no avail.

One caveat, however: we have one (somewhat funky) usecase which  
creates a large number of small databases.  Could the existence of  
several thousand small databases affect performance?

Thanks,
Brian

Re: insert performance

Posted by Paul Davis <pa...@gmail.com>.

On Tue, Oct 13, 2009 at 1:06 AM, Brian Karlak <ze...@metaweb.com> wrote:
> Hi Paul --
>
> Thanks for the reply.  The rates are very constant.  I'm seeing ~120ms
> insert times on the production DB where this all started.  Here's a second
> POST (and a PUT, for good measure) to the same database:
>
> [oat]$ time curl -X POST --data '{"_id":"999999", "guid":"918292819"}'
> http://localhost:5984/freeq_brk%2F1/
> {"ok":true,"id":"561ce26f6ec87c05f77a7124d0a267d9","rev":"3569556693"}
> real    0m0.133s
> user    0m0.004s
> sys     0m0.000s
>
> [oat]$ time curl -X PUT --data '{"_id":"999999", "guid":"918292819"}'
> http://localhost:5984/freeq_brk%2F1/9999
> {"ok":true,"id":"9999","rev":"560195410"}
> real    0m0.129s
> user    0m0.004s
> sys     0m0.000s
>
> The advice to move to 0.10.0 is well-taken, but probably a more involved
> solution.
>
> Brian
>
> On Oct 12, 2009, at 9:51 PM, Paul Davis wrote:
>
>> On Tue, Oct 13, 2009 at 12:37 AM, Brian Karlak <ze...@metaweb.com> wrote:
>>>
>>> Hello All --
>>>
>>> I am running CouchDB-0.8.1 on a dual-core Gentoo box.  Things have been
>>> smooth for several months, but suddenly I noticed that insert performance
>>> has decreased dramatically, even for an insert into a newly created
>>> database:
>>>
>>> # no database yet ...
>>> $ time curl -X POST --data '{"_id":"12345", "guid":"128736218763"}'
>>> http://localhost:5984/freeq_brk%2F1/
>>> {"error":"not_found","reason":"missing"}
>>> real    0m0.009s
>>> user    0m0.004s
>>> sys     0m0.004s
>>>
>>> # create a database
>>> $ time curl -X PUT http://oat.corp:5984/freeq_brk%2F1/
>>> {"ok":true}
>>> real    0m0.182s
>>> user    0m0.004s
>>> sys     0m0.004s
>>>
>>> # insert a document
>>> $ time curl -X POST --data '{"_id":"12345", "guid":"128736218763"}'
>>> http://localhost:5984/freeq_brk%2F1/
>>> {"ok":true,"id":"6cd41ba6ddfae66fc5f2f468c31411f1","rev":"3389439647"}
>>> real    0m0.117s
>>> user    0m0.004s
>>> sys     0m0.000s
>>>
>>> Can anyone provide me with any clues as to what might be going on, or
>>> even
>>> how to debug the situation?  This seems one or two orders of magnitude
>>> slower than I would expect -- or I remember seeing.
>>>
>>> Many thanks in advance for any advice anyone can provide.
>>>
>>> Brian
>>
>> Brian,
>>
>> Can you paste numbers for the second PUT on that database? Also, 0.8.1
>> hasn't been touched in about a year, so if you're noticing sudden
>> differences I'd wager a guess that its a system difference that's not
>> related to CouchDB.
>>
>> In other news, we're just now releasing 0.10.0 which has some definite
>> performance gains if you're interested in upgrading. The API is
>> definitely different so it may take a bit of effort, but most client
>> libraries have since made the change.
>>
>> HTH,
>> Paul Davis
>
>

Brian,

Have you compacted your production DB recently?

Paul

Re: insert performance

Posted by Brian Karlak <ze...@metaweb.com>.

Hi Paul --

Thanks for the reply.  The rates are very constant.  I'm seeing ~120ms  
insert times on the production DB where this all started.  Here's a  
second POST (and a PUT, for good measure) to the same database:

[oat]$ time curl -X POST --data '{"_id":"999999", "guid":"918292819"}' http://localhost:5984/freeq_brk%2F1/
{"ok":true,"id":"561ce26f6ec87c05f77a7124d0a267d9","rev":"3569556693"}
real	0m0.133s
user	0m0.004s
sys	0m0.000s

[oat]$ time curl -X PUT --data '{"_id":"999999", "guid":"918292819"}' http://localhost:5984/freeq_brk%2F1/9999
{"ok":true,"id":"9999","rev":"560195410"}
real	0m0.129s
user	0m0.004s
sys	0m0.000s

The advice to move to 0.10.0 is well-taken, but probably a more  
involved solution.

Brian

On Oct 12, 2009, at 9:51 PM, Paul Davis wrote:

> On Tue, Oct 13, 2009 at 12:37 AM, Brian Karlak <ze...@metaweb.com>  
> wrote:
>> Hello All --
>>
>> I am running CouchDB-0.8.1 on a dual-core Gentoo box.  Things have  
>> been
>> smooth for several months, but suddenly I noticed that insert  
>> performance
>> has decreased dramatically, even for an insert into a newly created
>> database:
>>
>> # no database yet ...
>> $ time curl -X POST --data '{"_id":"12345", "guid":"128736218763"}'
>> http://localhost:5984/freeq_brk%2F1/
>> {"error":"not_found","reason":"missing"}
>> real    0m0.009s
>> user    0m0.004s
>> sys     0m0.004s
>>
>> # create a database
>> $ time curl -X PUT http://oat.corp:5984/freeq_brk%2F1/
>> {"ok":true}
>> real    0m0.182s
>> user    0m0.004s
>> sys     0m0.004s
>>
>> # insert a document
>> $ time curl -X POST --data '{"_id":"12345", "guid":"128736218763"}'
>> http://localhost:5984/freeq_brk%2F1/
>> {"ok 
>> ":true,"id":"6cd41ba6ddfae66fc5f2f468c31411f1","rev":"3389439647"}
>> real    0m0.117s
>> user    0m0.004s
>> sys     0m0.000s
>>
>> Can anyone provide me with any clues as to what might be going on,  
>> or even
>> how to debug the situation?  This seems one or two orders of  
>> magnitude
>> slower than I would expect -- or I remember seeing.
>>
>> Many thanks in advance for any advice anyone can provide.
>>
>> Brian
>
> Brian,
>
> Can you paste numbers for the second PUT on that database? Also, 0.8.1
> hasn't been touched in about a year, so if you're noticing sudden
> differences I'd wager a guess that its a system difference that's not
> related to CouchDB.
>
> In other news, we're just now releasing 0.10.0 which has some definite
> performance gains if you're interested in upgrading. The API is
> definitely different so it may take a bit of effort, but most client
> libraries have since made the change.
>
> HTH,
> Paul Davis

Re: insert performance

Posted by Paul Davis <pa...@gmail.com>.

On Tue, Oct 13, 2009 at 12:37 AM, Brian Karlak <ze...@metaweb.com> wrote:
> Hello All --
>
> I am running CouchDB-0.8.1 on a dual-core Gentoo box.  Things have been
> smooth for several months, but suddenly I noticed that insert performance
> has decreased dramatically, even for an insert into a newly created
> database:
>
> # no database yet ...
> $ time curl -X POST --data '{"_id":"12345", "guid":"128736218763"}'
> http://localhost:5984/freeq_brk%2F1/
> {"error":"not_found","reason":"missing"}
> real    0m0.009s
> user    0m0.004s
> sys     0m0.004s
>
> # create a database
> $ time curl -X PUT http://oat.corp:5984/freeq_brk%2F1/
> {"ok":true}
> real    0m0.182s
> user    0m0.004s
> sys     0m0.004s
>
> # insert a document
> $ time curl -X POST --data '{"_id":"12345", "guid":"128736218763"}'
> http://localhost:5984/freeq_brk%2F1/
> {"ok":true,"id":"6cd41ba6ddfae66fc5f2f468c31411f1","rev":"3389439647"}
> real    0m0.117s
> user    0m0.004s
> sys     0m0.000s
>
> Can anyone provide me with any clues as to what might be going on, or even
> how to debug the situation?  This seems one or two orders of magnitude
> slower than I would expect -- or I remember seeing.
>
> Many thanks in advance for any advice anyone can provide.
>
> Brian

Brian,

Can you paste numbers for the second PUT on that database? Also, 0.8.1
hasn't been touched in about a year, so if you're noticing sudden
differences I'd wager a guess that its a system difference that's not
related to CouchDB.

In other news, we're just now releasing 0.10.0 which has some definite
performance gains if you're interested in upgrading. The API is
definitely different so it may take a bit of effort, but most client
libraries have since made the change.

HTH,
Paul Davis