You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@couchdb.apache.org by Brad Schick <sc...@gmail.com> on 2008/06/17 21:11:32 UTC

Bad write performance

I am seeing very poor write performance running CouchDB on Ubuntu Linux
8.04. I built CouchDB from source about a week ago, and I am using the
Python wrapper to access it all on localhost.

I am trying to add over 90,000 documents (each is about 400 bytes) and
finding that I can only add about 16 documents per second. And while
this is happening, my CPU is about 75% idle. I saw this discussion of
the Nagle algorithm
(http://www.cmlenz.net/archives/2008/03/python-httplib-performance-problems),
but I don't think that is the issue since mochiweb_sockert_server.erl
now has {nodelay, true}. I've also tried adding nodelay on the client
side and that didn't help.

Profiling the client app I see that almost all of the time is spent in
httplib.py:getresponse. So there is some bottleneck between the client
and server that is not CPU bound. Any suggestions? I know very little
about Erlang, is there an easy way to profile CouchDB?

-Brad

Re: Bad write performance

Posted by Brad Schick <sc...@gmail.com>.

> On 06/17/2008 12:57 PM, Brian Whitman wrote:
>   
>> On Jun 17, 2008, at 3:11 PM, Brad Schick wrote:
>>
>>     
>>> I am seeing very poor write performance running CouchDB on Ubuntu Linux
>>> 8.04. I built CouchDB from source about a week ago, and I am using the
>>> Python wrapper to access it all on localhost.
>>>
>>> I am trying to add over 90,000 documents (each is about 400 bytes) and
>>> finding that I can only add about 16 documents per second. And while
>>> this is happening, my CPU is about 75% idle.
>>>       
>> Are you using the update method in python (which uses _bulk_docs) ?
>> You need the svn version of the python wrapper for it.
>>
>> On a slow machine (dual proc 1.8 GHz) using the python wrapper I added
>> 100K docs 10K at a time, each with a random string, uuid, float and
>> int in 180 seconds (roughly 550 documents/second.) One CPU was pegged
>> at 100% with beam.smp during the update and the python client did not
>> even rate on top during the update (it was just waiting for a response.)
>>
>> I couldn't increase my chunk size much past 10K -- couch would return
>> a (very long) error if I tried 5 20K chunks for example.
>>     
> I have the code from svn, but I haven't tried bulk uploading yet because
> I've been primarily testing schema.py which currently only does single
> document stores. Maybe I'll hook that up to bulk writes and see how it goes.
>
>   

Thanks for the tip. I switched to bulk updates and I can now saturate
the CPU and get a max of about 240 docs/second uploaded (and much of
that is in the client). It would be nice to have a tunable write
batching setting in future versions of couchdb.

If anyone is interested, I rearranged the code in schema.py a bit so
that schema classes derive from client.Document. This makes it easy to
call db.update with a list of them.


-Brad

Re: Bad write performance

Posted by Brad Schick <sc...@gmail.com>.

On 06/17/2008 12:57 PM, Brian Whitman wrote:
>
> On Jun 17, 2008, at 3:11 PM, Brad Schick wrote:
>
>> I am seeing very poor write performance running CouchDB on Ubuntu Linux
>> 8.04. I built CouchDB from source about a week ago, and I am using the
>> Python wrapper to access it all on localhost.
>>
>> I am trying to add over 90,000 documents (each is about 400 bytes) and
>> finding that I can only add about 16 documents per second. And while
>> this is happening, my CPU is about 75% idle.
>
> Are you using the update method in python (which uses _bulk_docs) ?
> You need the svn version of the python wrapper for it.
>
> On a slow machine (dual proc 1.8 GHz) using the python wrapper I added
> 100K docs 10K at a time, each with a random string, uuid, float and
> int in 180 seconds (roughly 550 documents/second.) One CPU was pegged
> at 100% with beam.smp during the update and the python client did not
> even rate on top during the update (it was just waiting for a response.)
>
> I couldn't increase my chunk size much past 10K -- couch would return
> a (very long) error if I tried 5 20K chunks for example.
>

I have the code from svn, but I haven't tried bulk uploading yet because
I've been primarily testing schema.py which currently only does single
document stores. Maybe I'll hook that up to bulk writes and see how it goes.

-Brad

Re: Bad write performance

Posted by Brian Whitman <br...@variogr.am>.

On Jun 17, 2008, at 3:11 PM, Brad Schick wrote:

> I am seeing very poor write performance running CouchDB on Ubuntu  
> Linux
> 8.04. I built CouchDB from source about a week ago, and I am using the
> Python wrapper to access it all on localhost.
>
> I am trying to add over 90,000 documents (each is about 400 bytes) and
> finding that I can only add about 16 documents per second. And while
> this is happening, my CPU is about 75% idle.

Are you using the update method in python (which uses _bulk_docs) ?  
You need the svn version of the python wrapper for it.

On a slow machine (dual proc 1.8 GHz) using the python wrapper I added  
100K docs 10K at a time, each with a random string, uuid, float and  
int in 180 seconds (roughly 550 documents/second.) One CPU was pegged  
at 100% with beam.smp during the update and the python client did not  
even rate on top during the update (it was just waiting for a response.)

I couldn't increase my chunk size much past 10K -- couch would return  
a (very long) error if I tried 5 20K chunks for example.

Re: Bad write performance

Posted by Brad Schick <sc...@gmail.com>.

On 06/17/2008 12:11 PM, Brad Schick wrote:
> I am seeing very poor write performance running CouchDB on Ubuntu Linux
> 8.04. I built CouchDB from source about a week ago, and I am using the
> Python wrapper to access it all on localhost.
>
> I am trying to add over 90,000 documents (each is about 400 bytes) and
> finding that I can only add about 16 documents per second. And while
> this is happening, my CPU is about 75% idle. 
>   
I'm running these tests on an old machine running a Gnome desktop, so it
is not a clean environment, but it seems like the bottleneck should
still be apparent. So far disk access looks like the most likely issue.
Running a single writer client, I get the following results from 'iostat
-x sdb -k 1'

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
          21.00    0.00    7.00    1.00    0.00   71.00

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s
avgrq-sz avgqu-sz   await  svctm  %util
sdb               0.00     0.00    0.00   78.00     0.00   326.50    
8.37     0.77   10.00   9.54  74.40

Although it doesn't look totally saturated, I found a fairly similar
maximum %util running 'stress -d 1 --hdd-bytes 400'  (many small disk
writes).

Here is the output of 'vmstat 1' with one writer client running

procs -----------memory---------- ---swap-- -----io---- -system--
----cpu----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy
id wa
 1  0  39224 589712     40 156556    0    0     0   321  502  873 23  9
67  1
 0  0  39224 589656     40 156624    0    0     0   321  388  630 24  6
69  1
 0  0  39224 589492     40 156680    0    0     0   279  460  804 23  6
70  1
 0  0  39224 589476     40 156740    0    0     0   277  370  575 20  8
71  1

The only real differences between that and "idle" (shown below) are the
blocks-out a small amount of iowait.

procs -----------memory---------- ---swap-- -----io---- -system--
----cpu----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy
id wa
 0  0  39224 581296     40 167920    0    0     0    10  305  570  3  3
94  0
 0  0  39224 581296     40 167920    0    0     0     0  194  306  3  1
96  0
 0  0  39224 581296     40 167920    0    0     0     0  283  528  3  2
95  0
 0  0  39224 581296     40 167920    0    0     0     0  193  297  3  0
97  0

I'm not sure what else to check from the OS. I found erlang's fprof
(http://www.erlang.org/doc/man/fprof.html), but I haven't tried to
figure how to start couchdb in that environment. I'll test block updates
next.

-Brad