You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@couchdb.apache.org by "Eli Stevens (Gmail)" <wi...@gmail.com> on 2011/12/31 01:02:40 UTC
Attachment performance testing script

I've been doing some performance testing of the various ways that
attachments can be uploaded to CouchDB.  I think that what I'm seeing
points to some pathological behavoir inside couch, but that's just a
guess (I don't really know anything about couch internals).  However,
if I'm understanding the implications correctly, there might be the
possibility to make replication much, much faster for large
attachments (by speeding up the multipart API).

To get the data yourself, run 'python makedata.py' once, and then
repeatedly run 'bash do-curls.sh' to get timing information (perhaps
while making performance tweaks, if you're a dev).  Code is on github:

https://github.com/wickedgrey/couchdb-attachment-speed

It's a bit janky, but gets the job done.  The main takeaway: the
multipart API is just as slow as base64 encoding everything.  Expect
to pay roughly a 10x performance penalty for using either api vs.
uploading the attachment separately.

All of the tests were run against a local 1.1.1 couch recently
installed via brew with delayed commits set to false.  Hardware was a
2010 macbook pro w/ 8GB of ram, lightly loaded (browser and IDE
running but idle at the same time as the tests were run).  The general
shape of the timing data didn't change over multiple runs.  I haven't
looked into couch memory or cpu usage while handling the uploads.

n       raw             base64          multipart       py b64 encode
 py b64 decode
1       0m0.136s        0m0.014s        0m0.013s
0:00:00.000015	0:00:00.000009
2       0m0.014s        0m0.016s        0m0.015s
0:00:00.000012	0:00:00.000011
3       0m0.015s        0m0.017s        0m1.027s
0:00:00.000016	0:00:00.000021
4       0m0.015s        0m0.018s        0m2.020s
0:00:00.000057	0:00:00.000090
5       0m0.017s        0m0.035s        0m2.027s
0:00:00.000361	0:00:00.000801
6       0m0.054s        0m0.202s        0m1.133s
0:00:00.003541	0:00:00.005455
7       0m0.361s        0m1.859s        0m2.318s
0:00:00.043847	0:00:00.059307
8       0m3.531s        0m19.336s       0m15.820s
0:00:00.472431	0:00:00.822210
9       0m36.594s       3m24.152s       5m45.110s       ?               ?

One of the interesting issues that I ran into when working on
constructing the data was with trying to run a gig of text data
through the python JSON parser.  It seemed that there were a couple
copies of the data being made (I'd guess the original data, then an
escaped version, and then the final string?) which slowed things down
quite a bit.

The current state of affairs is especially frustrating for me, since
my use case doesn't permit having documents in an attachment-less
(read: inconsistent) state.  My ideal case would to have the multipart
API:

- Sped up to be roughly the same speed as standalone attachments
- Extended/changed/supplemented to allow for multiple documents at
once, like the bulk API.

In any case, thanks for reading.  I hope this helps make CouchDB even
better.  :)

Cheers,
Eli