You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@couchdb.apache.org by Brad Schick <sc...@gmail.com> on 2008/06/17 21:11:32 UTC
Bad write performance
I am seeing very poor write performance running CouchDB on Ubuntu Linux
8.04. I built CouchDB from source about a week ago, and I am using the
Python wrapper to access it all on localhost.
I am trying to add over 90,000 documents (each is about 400 bytes) and
finding that I can only add about 16 documents per second. And while
this is happening, my CPU is about 75% idle. I saw this discussion of
the Nagle algorithm
(http://www.cmlenz.net/archives/2008/03/python-httplib-performance-problems),
but I don't think that is the issue since mochiweb_sockert_server.erl
now has {nodelay, true}. I've also tried adding nodelay on the client
side and that didn't help.
Profiling the client app I see that almost all of the time is spent in
httplib.py:getresponse. So there is some bottleneck between the client
and server that is not CPU bound. Any suggestions? I know very little
about Erlang, is there an easy way to profile CouchDB?
-Brad
Re: Bad write performance
Posted by Brad Schick <sc...@gmail.com>.
> On 06/17/2008 12:57 PM, Brian Whitman wrote:
>
>> On Jun 17, 2008, at 3:11 PM, Brad Schick wrote:
>>
>>
>>> I am seeing very poor write performance running CouchDB on Ubuntu Linux
>>> 8.04. I built CouchDB from source about a week ago, and I am using the
>>> Python wrapper to access it all on localhost.
>>>
>>> I am trying to add over 90,000 documents (each is about 400 bytes) and
>>> finding that I can only add about 16 documents per second. And while
>>> this is happening, my CPU is about 75% idle.
>>>
>> Are you using the update method in python (which uses _bulk_docs) ?
>> You need the svn version of the python wrapper for it.
>>
>> On a slow machine (dual proc 1.8 GHz) using the python wrapper I added
>> 100K docs 10K at a time, each with a random string, uuid, float and
>> int in 180 seconds (roughly 550 documents/second.) One CPU was pegged
>> at 100% with beam.smp during the update and the python client did not
>> even rate on top during the update (it was just waiting for a response.)
>>
>> I couldn't increase my chunk size much past 10K -- couch would return
>> a (very long) error if I tried 5 20K chunks for example.
>>
> I have the code from svn, but I haven't tried bulk uploading yet because
> I've been primarily testing schema.py which currently only does single
> document stores. Maybe I'll hook that up to bulk writes and see how it goes.
>
>
Thanks for the tip. I switched to bulk updates and I can now saturate
the CPU and get a max of about 240 docs/second uploaded (and much of
that is in the client). It would be nice to have a tunable write
batching setting in future versions of couchdb.
If anyone is interested, I rearranged the code in schema.py a bit so
that schema classes derive from client.Document. This makes it easy to
call db.update with a list of them.
-Brad
Re: Bad write performance
Posted by Brad Schick <sc...@gmail.com>.
On 06/17/2008 12:57 PM, Brian Whitman wrote:
>
> On Jun 17, 2008, at 3:11 PM, Brad Schick wrote:
>
>> I am seeing very poor write performance running CouchDB on Ubuntu Linux
>> 8.04. I built CouchDB from source about a week ago, and I am using the
>> Python wrapper to access it all on localhost.
>>
>> I am trying to add over 90,000 documents (each is about 400 bytes) and
>> finding that I can only add about 16 documents per second. And while
>> this is happening, my CPU is about 75% idle.
>
> Are you using the update method in python (which uses _bulk_docs) ?
> You need the svn version of the python wrapper for it.
>
> On a slow machine (dual proc 1.8 GHz) using the python wrapper I added
> 100K docs 10K at a time, each with a random string, uuid, float and
> int in 180 seconds (roughly 550 documents/second.) One CPU was pegged
> at 100% with beam.smp during the update and the python client did not
> even rate on top during the update (it was just waiting for a response.)
>
> I couldn't increase my chunk size much past 10K -- couch would return
> a (very long) error if I tried 5 20K chunks for example.
>
I have the code from svn, but I haven't tried bulk uploading yet because
I've been primarily testing schema.py which currently only does single
document stores. Maybe I'll hook that up to bulk writes and see how it goes.
-Brad
Re: Bad write performance
Posted by Brian Whitman <br...@variogr.am>.
On Jun 17, 2008, at 3:11 PM, Brad Schick wrote:
> I am seeing very poor write performance running CouchDB on Ubuntu
> Linux
> 8.04. I built CouchDB from source about a week ago, and I am using the
> Python wrapper to access it all on localhost.
>
> I am trying to add over 90,000 documents (each is about 400 bytes) and
> finding that I can only add about 16 documents per second. And while
> this is happening, my CPU is about 75% idle.
Are you using the update method in python (which uses _bulk_docs) ?
You need the svn version of the python wrapper for it.
On a slow machine (dual proc 1.8 GHz) using the python wrapper I added
100K docs 10K at a time, each with a random string, uuid, float and
int in 180 seconds (roughly 550 documents/second.) One CPU was pegged
at 100% with beam.smp during the update and the python client did not
even rate on top during the update (it was just waiting for a response.)
I couldn't increase my chunk size much past 10K -- couch would return
a (very long) error if I tried 5 20K chunks for example.
Re: Bad write performance
Posted by Brad Schick <sc...@gmail.com>.
On 06/17/2008 12:11 PM, Brad Schick wrote:
> I am seeing very poor write performance running CouchDB on Ubuntu Linux
> 8.04. I built CouchDB from source about a week ago, and I am using the
> Python wrapper to access it all on localhost.
>
> I am trying to add over 90,000 documents (each is about 400 bytes) and
> finding that I can only add about 16 documents per second. And while
> this is happening, my CPU is about 75% idle.
>
I'm running these tests on an old machine running a Gnome desktop, so it
is not a clean environment, but it seems like the bottleneck should
still be apparent. So far disk access looks like the most likely issue.
Running a single writer client, I get the following results from 'iostat
-x sdb -k 1'
avg-cpu: %user %nice %system %iowait %steal %idle
21.00 0.00 7.00 1.00 0.00 71.00
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s
avgrq-sz avgqu-sz await svctm %util
sdb 0.00 0.00 0.00 78.00 0.00 326.50
8.37 0.77 10.00 9.54 74.40
Although it doesn't look totally saturated, I found a fairly similar
maximum %util running 'stress -d 1 --hdd-bytes 400' (many small disk
writes).
Here is the output of 'vmstat 1' with one writer client running
procs -----------memory---------- ---swap-- -----io---- -system--
----cpu----
r b swpd free buff cache si so bi bo in cs us sy
id wa
1 0 39224 589712 40 156556 0 0 0 321 502 873 23 9
67 1
0 0 39224 589656 40 156624 0 0 0 321 388 630 24 6
69 1
0 0 39224 589492 40 156680 0 0 0 279 460 804 23 6
70 1
0 0 39224 589476 40 156740 0 0 0 277 370 575 20 8
71 1
The only real differences between that and "idle" (shown below) are the
blocks-out a small amount of iowait.
procs -----------memory---------- ---swap-- -----io---- -system--
----cpu----
r b swpd free buff cache si so bi bo in cs us sy
id wa
0 0 39224 581296 40 167920 0 0 0 10 305 570 3 3
94 0
0 0 39224 581296 40 167920 0 0 0 0 194 306 3 1
96 0
0 0 39224 581296 40 167920 0 0 0 0 283 528 3 2
95 0
0 0 39224 581296 40 167920 0 0 0 0 193 297 3 0
97 0
I'm not sure what else to check from the OS. I found erlang's fprof
(http://www.erlang.org/doc/man/fprof.html), but I haven't tried to
figure how to start couchdb in that environment. I'll test block updates
next.
-Brad