You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@couchdb.apache.org by "Eli Stevens (Gmail)" <wi...@gmail.com> on 2011/06/08 10:36:53 UTC
Upload speed for large attachments
Running the following code on a macbook pro, using CouchDBX 1.0.2
(everything local), we're seeing the following output when trying to
attach a file with 10MB of random data:
Code: https://gist.github.com/bc0c36f36be0c85e2a36 (code included in full below)
Output:
Using curl: 0.168450117111
Using put_attachment: 0.309157133102
post time: 2.5557808876
Using multipart: 2.61283898354
Encoding base64: 0.0497629642487
Updating: 5.0550069809
Server log: https://gist.github.com/a80a495fd35049ff871f (there's a
HEAD/DELETE/PUT/GET cycle that's just cleanup)
The calls in question are:
Using curl: 0.168450117111
1> [info] [<0.27828.7>] 127.0.0.1 - - 'PUT'
/benchmark_entity/bigfile/bigfile/bigfile.gz?rev=78-db58ded2899c5546e349feb5a8c0eee4
201
Using put_attachment: 0.309157133102
1> [info] [<0.27809.7>] 127.0.0.1 - - 'PUT'
/benchmark_entity/bigfile/smallfile?rev=81-c538b38a8463952f0136143cfa49e9fa
201
Using multipart: 2.61283898354 (post time: 2.5557808876)
1> [info] [<0.27809.7>] 127.0.0.1 - - 'POST' /benchmark_entity/bigfile 201
Updating: 5.0550069809
1> [info] [<0.27809.7>] 127.0.0.1 - - 'POST' /benchmark_entity/_bulk_docs 201
Profiling our code shows 1.5 sec of CPU usage in our code (which
covers setup / cleanup code that's not included in the times above),
and 11.8 sec of total run time, which roughly matches up with the
PUT/POST times above. Basically, I feel pretty confident that the
bulk of the times above are not in our client code, and are instead
due to couchdb's handling time.
Why is the form/multipart handler so much slower than using a bare PUT
on the attachment? Why is the base64 approach even slower? Is it due
to bandwidth issues, couchdb CPU usage...?
Thanks for any help,
Eli
Full code from: https://gist.github.com/bc0c36f36be0c85e2a36
import base64
import contextlib
import cStringIO
import subprocess
import time
import couchdb
import couchdb.json
import couchdb.multipart
@contextlib.contextmanager
def stopwatch(m=''):
t0=time.time()
yield
tdiff=time.time() - t0
if m:
print '{}: {}'.format(m, tdiff)
else:
print tdiff
def reset(d):
try:
del d['bigfile']
except couchdb.http.ResourceNotFound:
pass
d['bigfile'] = {'foo': 'bar'}
return d['bigfile']
s = couchdb.Server()
d = s['benchmark_entity']
fn = '/tmp/bigfile.gz'
fn = '/tmp/smallfile'
doc = reset(d)
with stopwatch('Using curl'):
p = subprocess.Popen([
'curl',
'-X', 'PUT',
'http://localhost:5984/benchmark_entity/{}/bigfile/bigfile.gz?rev={}'.format(doc.id,
doc.rev),
'-d', '@{}'.format(fn),
'-H', 'Content-Type: application/gzip'
])
p.wait()
doc = reset(d)
with open(fn, 'r') as f:
with stopwatch('Using put_attachment'):
d.put_attachment(doc, f)
doc = reset(d)
with open(fn, 'r') as f:
content_name = 'bigfile.gz'
content = f.read()
content_type = 'application/gzip'
with stopwatch('Using multipart'):
fileobj = cStringIO.StringIO()
with couchdb.multipart.MultipartWriter(fileobj, headers=None,
subtype='form-data') as mpw:
mime_headers = {'Content-Disposition': '''form-data; name="_doc"'''}
mpw.add('application/json', couchdb.json.encode(doc), mime_headers)
mime_headers = {'Content-Disposition': '''form-data;
name="_attachments"; filename="{}"'''.format(content_name)}
mpw.add(content_type, content, mime_headers)
header_str, blank_str, body = fileobj.getvalue().split('\r\n', 2)
http_headers = {'Referer': d.resource.url, 'Content-Type':
header_str[len('Content-Type: '):]}
params = {}
t0 = time.time()
status, msg, data = d.resource.post(doc['_id'], body,
http_headers, **params)
print 'post time: {}'.format(time.time() - t0)
doc = reset(d)
with open(fn, 'r') as f:
content_name = 'bigfile.gz'
content = f.read()
content_type = 'application/gzip'
with stopwatch('Encoding base64'):
doc['_attachments'] = {content_name: {'content_type':
content_type, 'data': base64.b64encode(content)}}
with stopwatch('Updating'):
d.update([doc])
Re: Upload speed for large attachments
Posted by "Eli Stevens (Gmail)" <wi...@gmail.com>.
Tilgovi on IRC asked me to open an issue:
https://issues.apache.org/jira/browse/COUCHDB-1192
Cheers,
Eli
On Wed, Jun 8, 2011 at 1:36 AM, Eli Stevens (Gmail)
<wi...@gmail.com> wrote:
> Running the following code on a macbook pro, using CouchDBX 1.0.2
> (everything local), we're seeing the following output when trying to
> attach a file with 10MB of random data:
>
> Code: https://gist.github.com/bc0c36f36be0c85e2a36 (code included in full below)
> Output:
>
> Using curl: 0.168450117111
> Using put_attachment: 0.309157133102
> post time: 2.5557808876
> Using multipart: 2.61283898354
> Encoding base64: 0.0497629642487
> Updating: 5.0550069809
>
> Server log: https://gist.github.com/a80a495fd35049ff871f (there's a
> HEAD/DELETE/PUT/GET cycle that's just cleanup)
>
> The calls in question are:
>
> Using curl: 0.168450117111
> 1> [info] [<0.27828.7>] 127.0.0.1 - - 'PUT'
> /benchmark_entity/bigfile/bigfile/bigfile.gz?rev=78-db58ded2899c5546e349feb5a8c0eee4
> 201
>
> Using put_attachment: 0.309157133102
> 1> [info] [<0.27809.7>] 127.0.0.1 - - 'PUT'
> /benchmark_entity/bigfile/smallfile?rev=81-c538b38a8463952f0136143cfa49e9fa
> 201
>
> Using multipart: 2.61283898354 (post time: 2.5557808876)
> 1> [info] [<0.27809.7>] 127.0.0.1 - - 'POST' /benchmark_entity/bigfile 201
>
> Updating: 5.0550069809
> 1> [info] [<0.27809.7>] 127.0.0.1 - - 'POST' /benchmark_entity/_bulk_docs 201
>
> Profiling our code shows 1.5 sec of CPU usage in our code (which
> covers setup / cleanup code that's not included in the times above),
> and 11.8 sec of total run time, which roughly matches up with the
> PUT/POST times above. Basically, I feel pretty confident that the
> bulk of the times above are not in our client code, and are instead
> due to couchdb's handling time.
>
> Why is the form/multipart handler so much slower than using a bare PUT
> on the attachment? Why is the base64 approach even slower? Is it due
> to bandwidth issues, couchdb CPU usage...?
>
> Thanks for any help,
> Eli
>
> Full code from: https://gist.github.com/bc0c36f36be0c85e2a36
>
> import base64
> import contextlib
> import cStringIO
> import subprocess
> import time
>
> import couchdb
> import couchdb.json
> import couchdb.multipart
>
> @contextlib.contextmanager
> def stopwatch(m=''):
> t0=time.time()
> yield
> tdiff=time.time() - t0
> if m:
> print '{}: {}'.format(m, tdiff)
> else:
> print tdiff
>
> def reset(d):
> try:
> del d['bigfile']
> except couchdb.http.ResourceNotFound:
> pass
> d['bigfile'] = {'foo': 'bar'}
> return d['bigfile']
>
> s = couchdb.Server()
> d = s['benchmark_entity']
>
> fn = '/tmp/bigfile.gz'
> fn = '/tmp/smallfile'
>
> doc = reset(d)
> with stopwatch('Using curl'):
> p = subprocess.Popen([
> 'curl',
> '-X', 'PUT',
> 'http://localhost:5984/benchmark_entity/{}/bigfile/bigfile.gz?rev={}'.format(doc.id,
> doc.rev),
> '-d', '@{}'.format(fn),
> '-H', 'Content-Type: application/gzip'
> ])
> p.wait()
>
> doc = reset(d)
> with open(fn, 'r') as f:
> with stopwatch('Using put_attachment'):
> d.put_attachment(doc, f)
>
> doc = reset(d)
> with open(fn, 'r') as f:
> content_name = 'bigfile.gz'
> content = f.read()
> content_type = 'application/gzip'
> with stopwatch('Using multipart'):
> fileobj = cStringIO.StringIO()
>
> with couchdb.multipart.MultipartWriter(fileobj, headers=None,
> subtype='form-data') as mpw:
> mime_headers = {'Content-Disposition': '''form-data; name="_doc"'''}
> mpw.add('application/json', couchdb.json.encode(doc), mime_headers)
>
> mime_headers = {'Content-Disposition': '''form-data;
> name="_attachments"; filename="{}"'''.format(content_name)}
> mpw.add(content_type, content, mime_headers)
>
> header_str, blank_str, body = fileobj.getvalue().split('\r\n', 2)
>
> http_headers = {'Referer': d.resource.url, 'Content-Type':
> header_str[len('Content-Type: '):]}
> params = {}
> t0 = time.time()
> status, msg, data = d.resource.post(doc['_id'], body,
> http_headers, **params)
> print 'post time: {}'.format(time.time() - t0)
>
> doc = reset(d)
> with open(fn, 'r') as f:
> content_name = 'bigfile.gz'
> content = f.read()
> content_type = 'application/gzip'
> with stopwatch('Encoding base64'):
> doc['_attachments'] = {content_name: {'content_type':
> content_type, 'data': base64.b64encode(content)}}
> with stopwatch('Updating'):
> d.update([doc])
>