You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@couchdb.apache.org by Attila Nagy <br...@fsn.hu> on 2012/04/19 08:28:56 UTC

CouchDB slow response times

Hi,

I'm experimenting with CouchDB 1.2.0 (default settings) on FreeBSD 9 and 
Erlang r14b on an Intel Xeon X5670 @ 2.93GHz.
I've inserted some (about 50000) simple (one key and a value) entries to 
a test database with increasing IDs (1,2,3...) and try to get them with 
a simple GET on localhost.
What I see in wireshark (to exclude all parsing overhead):
07:54:21.410845 HTTP GET /test/1
07:54:21.411955 HTTP/1.1 200 OK
07:54:21.509766 the JSON data

So getting an exact document took .098921 seconds (nearly 98.9 
milliseconds) on a completely idle machine.

Any subsequent queries are in the order of the above response time, 
which is just slow.

Is this what CouchDB and Erlang capable of, or something is wrong in my 
setup? I haven't turned compression off, BTW, but will measure its effect.

Thanks,

Re: CouchDB slow response times

Posted by Adam Kocoloski <ko...@apache.org>.

Ok, thanks for the clarification.  HTTP pipelining and persistent connections are complementary.  Whether it provides a significant benefit over just using parallel persistent connections depends in large part on the latency between the client and the server (the higher the latency the more pipelining matters).  Regards,

Adam

On Apr 23, 2012, at 3:06 AM, Attila Nagy wrote:

> Hi,
> 
> Yes, the cause is in the final setup, CouchDB will have a power of that, so I would like to benchmark what kind of performance it can deliver.
> Am I right when I assume that HTTP pipelining won't give anything serious over multiple persistent connections (and no pipelining on them)?
> (I've measured non persistent and persistent: 1700 QPS vs 2200, each of them gave 100% CPU usage from CouchDB)
> 
> On 04/20/12 17:55, Adam Kocoloski wrote:
>> Hi Attila, I assume you have your reasons for limiting CouchDB to one core, but you should be able to improve concurrent read performance by leveraging a few more of those cores on your server.  This is Erlang, after all ;-)  I've seen BigCouch nodes pretty nearly saturate Dual X5670s given enough concurrent readers.
>> 
>> Also, HTTP pipelining allows you to submit multiple outstanding queries on the wire and is fully supported in CouchDB.  Admittedly finding clients that do it well is not always an easy task.  Regards,
>> 
>> Adam
>> 
>> On Apr 20, 2012, at 6:03 AM, Attila Nagy wrote:
>> 
>>> Hi,
>>> 
>>> What I need is a multi-site replicated document DB (well, most of the time a key-value DB would also be fine, but CouchDB views are very handy for the rest, which spares me to build my own indexes) where I can read and write all instances every time and the last modification wins -whole document-.
>>> Also, I don't like read repair, the DB should log the changes and replicate them (having last update conflict resolution is fine as said) to the others when they can be reached.
>>> For this specific application the read/write ratio is very high, like 5M:1 or higher.
>>> 
>>> So CouchDB is a perfect fit, my only problem is it (for this particular case the read performance) should be somewhat faster. Also, a different API would be good, with the attributes of -say- LDAP:
>>> - binary for quick processing
>>> - multiple outstanding queries on the wire
>>> 
>>> HTTP is easy to use, but I guess it adds somewhere 30-50% of the current processing time (are there any exact measurements maybe?).
>>> 
>>> I really think that april 1 post about switching to Java would bring more boost (at least raw performance wise, from other perspectives, maybe Erlang is a good fit).
>>> Waiting some hundred milliseconds for the GCs is fine with me if they don't happen too often. ;)
>>> 
>>> On 04/20/12 10:27, Mike Kimber wrote:
>>>> Performance is relative and effective performance is very much determined by the use case i.e. we do analytics with couchdb its faster than a traditional RDBMS in many cases (especially if your views are queried regularly) on less hardware (disk space not included, but that's a trade off and compression in 1.2 helps greatly here) and is easier to use for document analysis. However it may not be a great fit for very high read use cases currently. If that's your use case then there are other options i.e. Redis (possibly as a front end to Couch) or dare I say it here Mongodb and Couchbase or numerous other commercial options from in-memory databases to column orientated databases, but again it depends on the use case.
>>>> 
>>>> You may want to describe your use case i.e. what you are trying to accomplish to  allow the community to provide  informed comment on your observations.
>>>> 
>>>> Thanks
>>>> 
>>>> Mike
>>>> 
>>>> -----Original Message-----
>>>> From: Attila Nagy [mailto:bra@fsn.hu]
>>>> Sent: 20 April 2012 08:35
>>>> To: user@couchdb.apache.org
>>>> Subject: Re: CouchDB slow response times
>>>> 
>>>> On 04/19/12 08:28, Attila Nagy wrote:
>>>>> So getting an exact document took .098921 seconds (nearly 98.9
>>>>> milliseconds) on a completely idle machine.
>>>>> 
>>>>> Any subsequent queries are in the order of the above response time,
>>>>> which is just slow.
>>>>> 
>>>>> Is this what CouchDB and Erlang capable of, or something is wrong in
>>>>> my setup? I haven't turned compression off, BTW, but will measure its
>>>>> effect.
>>>> Without compression:
>>>> 07:43:03.822390 HTTP GET /test/1
>>>> 07:43:03.823475 HTTP/1.1 200 OK
>>>> 07:43:03.919761 the JSON data
>>>> so the response time is .097371 seconds (97.37 ms)
>>>> 
>>>> In the mean time, I've found that somewhere in time CouchDB/HTTPd turned
>>>> TCP NODELAY to off, so
>>>> socket_options = [{nodelay, true}]
>>>> gives: 2.47 ms response time, which is a major increase.
>>>> I could lower that down to 2.1 ms by switching to
>>>> null_authentication_handler, which is not good, but better.
>>>> 
>>>> On query performance: when I fetch the same documents (one by one, ID
>>>> number one to the last) from three different machines on four threads on
>>>> each of them (so 12 concurrent HTTP GETs can be on the wire), I can
>>>> saturate one CPU core (Xeon X5670 @ 2.93GHz, I've limited it to one
>>>> core) to 100% CouchDB and can get about 1700 query/sec performance.
>>>> These are just plain HTTP GETs, so no JSON parsing is involved.
>>>> Switching to persistent connections gives 2200 query/sec (again, CouchDB
>>>> maxes the CPU out).
>>>> 
>>>> I hope some day CouchDB will be able to deliver performance too.
>

Re: CouchDB slow response times

Posted by Attila Nagy <br...@fsn.hu>.

Hi,

Yes, the cause is in the final setup, CouchDB will have a power of that, 
so I would like to benchmark what kind of performance it can deliver.
Am I right when I assume that HTTP pipelining won't give anything 
serious over multiple persistent connections (and no pipelining on them)?
(I've measured non persistent and persistent: 1700 QPS vs 2200, each of 
them gave 100% CPU usage from CouchDB)

On 04/20/12 17:55, Adam Kocoloski wrote:
> Hi Attila, I assume you have your reasons for limiting CouchDB to one core, but you should be able to improve concurrent read performance by leveraging a few more of those cores on your server.  This is Erlang, after all ;-)  I've seen BigCouch nodes pretty nearly saturate Dual X5670s given enough concurrent readers.
>
> Also, HTTP pipelining allows you to submit multiple outstanding queries on the wire and is fully supported in CouchDB.  Admittedly finding clients that do it well is not always an easy task.  Regards,
>
> Adam
>
> On Apr 20, 2012, at 6:03 AM, Attila Nagy wrote:
>
>> Hi,
>>
>> What I need is a multi-site replicated document DB (well, most of the time a key-value DB would also be fine, but CouchDB views are very handy for the rest, which spares me to build my own indexes) where I can read and write all instances every time and the last modification wins -whole document-.
>> Also, I don't like read repair, the DB should log the changes and replicate them (having last update conflict resolution is fine as said) to the others when they can be reached.
>> For this specific application the read/write ratio is very high, like 5M:1 or higher.
>>
>> So CouchDB is a perfect fit, my only problem is it (for this particular case the read performance) should be somewhat faster. Also, a different API would be good, with the attributes of -say- LDAP:
>> - binary for quick processing
>> - multiple outstanding queries on the wire
>>
>> HTTP is easy to use, but I guess it adds somewhere 30-50% of the current processing time (are there any exact measurements maybe?).
>>
>> I really think that april 1 post about switching to Java would bring more boost (at least raw performance wise, from other perspectives, maybe Erlang is a good fit).
>> Waiting some hundred milliseconds for the GCs is fine with me if they don't happen too often. ;)
>>
>> On 04/20/12 10:27, Mike Kimber wrote:
>>> Performance is relative and effective performance is very much determined by the use case i.e. we do analytics with couchdb its faster than a traditional RDBMS in many cases (especially if your views are queried regularly) on less hardware (disk space not included, but that's a trade off and compression in 1.2 helps greatly here) and is easier to use for document analysis. However it may not be a great fit for very high read use cases currently. If that's your use case then there are other options i.e. Redis (possibly as a front end to Couch) or dare I say it here Mongodb and Couchbase or numerous other commercial options from in-memory databases to column orientated databases, but again it depends on the use case.
>>>
>>> You may want to describe your use case i.e. what you are trying to accomplish to  allow the community to provide  informed comment on your observations.
>>>
>>> Thanks
>>>
>>> Mike
>>>
>>> -----Original Message-----
>>> From: Attila Nagy [mailto:bra@fsn.hu]
>>> Sent: 20 April 2012 08:35
>>> To: user@couchdb.apache.org
>>> Subject: Re: CouchDB slow response times
>>>
>>> On 04/19/12 08:28, Attila Nagy wrote:
>>>> So getting an exact document took .098921 seconds (nearly 98.9
>>>> milliseconds) on a completely idle machine.
>>>>
>>>> Any subsequent queries are in the order of the above response time,
>>>> which is just slow.
>>>>
>>>> Is this what CouchDB and Erlang capable of, or something is wrong in
>>>> my setup? I haven't turned compression off, BTW, but will measure its
>>>> effect.
>>> Without compression:
>>> 07:43:03.822390 HTTP GET /test/1
>>> 07:43:03.823475 HTTP/1.1 200 OK
>>> 07:43:03.919761 the JSON data
>>> so the response time is .097371 seconds (97.37 ms)
>>>
>>> In the mean time, I've found that somewhere in time CouchDB/HTTPd turned
>>> TCP NODELAY to off, so
>>> socket_options = [{nodelay, true}]
>>> gives: 2.47 ms response time, which is a major increase.
>>> I could lower that down to 2.1 ms by switching to
>>> null_authentication_handler, which is not good, but better.
>>>
>>> On query performance: when I fetch the same documents (one by one, ID
>>> number one to the last) from three different machines on four threads on
>>> each of them (so 12 concurrent HTTP GETs can be on the wire), I can
>>> saturate one CPU core (Xeon X5670 @ 2.93GHz, I've limited it to one
>>> core) to 100% CouchDB and can get about 1700 query/sec performance.
>>> These are just plain HTTP GETs, so no JSON parsing is involved.
>>> Switching to persistent connections gives 2200 query/sec (again, CouchDB
>>> maxes the CPU out).
>>>
>>> I hope some day CouchDB will be able to deliver performance too.

Re: CouchDB slow response times

Posted by Adam Kocoloski <ko...@apache.org>.

Hi Attila, I assume you have your reasons for limiting CouchDB to one core, but you should be able to improve concurrent read performance by leveraging a few more of those cores on your server.  This is Erlang, after all ;-)  I've seen BigCouch nodes pretty nearly saturate Dual X5670s given enough concurrent readers.

Also, HTTP pipelining allows you to submit multiple outstanding queries on the wire and is fully supported in CouchDB.  Admittedly finding clients that do it well is not always an easy task.  Regards,

Adam

On Apr 20, 2012, at 6:03 AM, Attila Nagy wrote:

> Hi,
> 
> What I need is a multi-site replicated document DB (well, most of the time a key-value DB would also be fine, but CouchDB views are very handy for the rest, which spares me to build my own indexes) where I can read and write all instances every time and the last modification wins -whole document-.
> Also, I don't like read repair, the DB should log the changes and replicate them (having last update conflict resolution is fine as said) to the others when they can be reached.
> For this specific application the read/write ratio is very high, like 5M:1 or higher.
> 
> So CouchDB is a perfect fit, my only problem is it (for this particular case the read performance) should be somewhat faster. Also, a different API would be good, with the attributes of -say- LDAP:
> - binary for quick processing
> - multiple outstanding queries on the wire
> 
> HTTP is easy to use, but I guess it adds somewhere 30-50% of the current processing time (are there any exact measurements maybe?).
> 
> I really think that april 1 post about switching to Java would bring more boost (at least raw performance wise, from other perspectives, maybe Erlang is a good fit).
> Waiting some hundred milliseconds for the GCs is fine with me if they don't happen too often. ;)
> 
> On 04/20/12 10:27, Mike Kimber wrote:
>> Performance is relative and effective performance is very much determined by the use case i.e. we do analytics with couchdb its faster than a traditional RDBMS in many cases (especially if your views are queried regularly) on less hardware (disk space not included, but that's a trade off and compression in 1.2 helps greatly here) and is easier to use for document analysis. However it may not be a great fit for very high read use cases currently. If that's your use case then there are other options i.e. Redis (possibly as a front end to Couch) or dare I say it here Mongodb and Couchbase or numerous other commercial options from in-memory databases to column orientated databases, but again it depends on the use case.
>> 
>> You may want to describe your use case i.e. what you are trying to accomplish to  allow the community to provide  informed comment on your observations.
>> 
>> Thanks
>> 
>> Mike
>> 
>> -----Original Message-----
>> From: Attila Nagy [mailto:bra@fsn.hu]
>> Sent: 20 April 2012 08:35
>> To: user@couchdb.apache.org
>> Subject: Re: CouchDB slow response times
>> 
>> On 04/19/12 08:28, Attila Nagy wrote:
>>> So getting an exact document took .098921 seconds (nearly 98.9
>>> milliseconds) on a completely idle machine.
>>> 
>>> Any subsequent queries are in the order of the above response time,
>>> which is just slow.
>>> 
>>> Is this what CouchDB and Erlang capable of, or something is wrong in
>>> my setup? I haven't turned compression off, BTW, but will measure its
>>> effect.
>> Without compression:
>> 07:43:03.822390 HTTP GET /test/1
>> 07:43:03.823475 HTTP/1.1 200 OK
>> 07:43:03.919761 the JSON data
>> so the response time is .097371 seconds (97.37 ms)
>> 
>> In the mean time, I've found that somewhere in time CouchDB/HTTPd turned
>> TCP NODELAY to off, so
>> socket_options = [{nodelay, true}]
>> gives: 2.47 ms response time, which is a major increase.
>> I could lower that down to 2.1 ms by switching to
>> null_authentication_handler, which is not good, but better.
>> 
>> On query performance: when I fetch the same documents (one by one, ID
>> number one to the last) from three different machines on four threads on
>> each of them (so 12 concurrent HTTP GETs can be on the wire), I can
>> saturate one CPU core (Xeon X5670 @ 2.93GHz, I've limited it to one
>> core) to 100% CouchDB and can get about 1700 query/sec performance.
>> These are just plain HTTP GETs, so no JSON parsing is involved.
>> Switching to persistent connections gives 2200 query/sec (again, CouchDB
>> maxes the CPU out).
>> 
>> I hope some day CouchDB will be able to deliver performance too.
>

Re: CouchDB slow response times

Posted by Attila Nagy <br...@fsn.hu>.

On 04/20/12 14:42, Matthieu Rakotojaona wrote:
> On Fri, Apr 20, 2012 at 12:03 PM, Attila Nagy<br...@fsn.hu>  wrote:
>> - binary for quick processing
> There's a thread from last month talking about slow reads, but I
> cannot find it in the archives
> (http://mail-archives.apache.org/mod_mbox/couchdb-user/201204.mbox/browser)
>
> Basically, one of the bottlenecks is the erlang-format<->  JSON
> conversion. You could try to bypass it for a read and send the HTTP
> response with the raw erlang document to see if there's a real
> improvement in the speed.
>
Could you please help me in how to do that?

Re: CouchDB slow response times

Posted by Matthieu Rakotojaona <ma...@gmail.com>.

On Fri, Apr 20, 2012 at 12:03 PM, Attila Nagy <br...@fsn.hu> wrote:
> - binary for quick processing

There's a thread from last month talking about slow reads, but I
cannot find it in the archives
(http://mail-archives.apache.org/mod_mbox/couchdb-user/201204.mbox/browser)

Basically, one of the bottlenecks is the erlang-format <-> JSON
conversion. You could try to bypass it for a read and send the HTTP
response with the raw erlang document to see if there's a real
improvement in the speed.


-- 
Matthieu RAKOTOJAONA

Re: CouchDB slow response times

Posted by Attila Nagy <br...@fsn.hu>.

Hi,

What I need is a multi-site replicated document DB (well, most of the 
time a key-value DB would also be fine, but CouchDB views are very handy 
for the rest, which spares me to build my own indexes) where I can read 
and write all instances every time and the last modification wins -whole 
document-.
Also, I don't like read repair, the DB should log the changes and 
replicate them (having last update conflict resolution is fine as said) 
to the others when they can be reached.
For this specific application the read/write ratio is very high, like 
5M:1 or higher.

So CouchDB is a perfect fit, my only problem is it (for this particular 
case the read performance) should be somewhat faster. Also, a different 
API would be good, with the attributes of -say- LDAP:
- binary for quick processing
- multiple outstanding queries on the wire

HTTP is easy to use, but I guess it adds somewhere 30-50% of the current 
processing time (are there any exact measurements maybe?).

I really think that april 1 post about switching to Java would bring 
more boost (at least raw performance wise, from other perspectives, 
maybe Erlang is a good fit).
Waiting some hundred milliseconds for the GCs is fine with me if they 
don't happen too often. ;)

On 04/20/12 10:27, Mike Kimber wrote:
> Performance is relative and effective performance is very much determined by the use case i.e. we do analytics with couchdb its faster than a traditional RDBMS in many cases (especially if your views are queried regularly) on less hardware (disk space not included, but that's a trade off and compression in 1.2 helps greatly here) and is easier to use for document analysis. However it may not be a great fit for very high read use cases currently. If that's your use case then there are other options i.e. Redis (possibly as a front end to Couch) or dare I say it here Mongodb and Couchbase or numerous other commercial options from in-memory databases to column orientated databases, but again it depends on the use case.
>
> You may want to describe your use case i.e. what you are trying to accomplish to  allow the community to provide  informed comment on your observations.
>
> Thanks
>
> Mike
>
> -----Original Message-----
> From: Attila Nagy [mailto:bra@fsn.hu]
> Sent: 20 April 2012 08:35
> To: user@couchdb.apache.org
> Subject: Re: CouchDB slow response times
>
> On 04/19/12 08:28, Attila Nagy wrote:
>> So getting an exact document took .098921 seconds (nearly 98.9
>> milliseconds) on a completely idle machine.
>>
>> Any subsequent queries are in the order of the above response time,
>> which is just slow.
>>
>> Is this what CouchDB and Erlang capable of, or something is wrong in
>> my setup? I haven't turned compression off, BTW, but will measure its
>> effect.
> Without compression:
> 07:43:03.822390 HTTP GET /test/1
> 07:43:03.823475 HTTP/1.1 200 OK
> 07:43:03.919761 the JSON data
> so the response time is .097371 seconds (97.37 ms)
>
> In the mean time, I've found that somewhere in time CouchDB/HTTPd turned
> TCP NODELAY to off, so
> socket_options = [{nodelay, true}]
> gives: 2.47 ms response time, which is a major increase.
> I could lower that down to 2.1 ms by switching to
> null_authentication_handler, which is not good, but better.
>
> On query performance: when I fetch the same documents (one by one, ID
> number one to the last) from three different machines on four threads on
> each of them (so 12 concurrent HTTP GETs can be on the wire), I can
> saturate one CPU core (Xeon X5670 @ 2.93GHz, I've limited it to one
> core) to 100% CouchDB and can get about 1700 query/sec performance.
> These are just plain HTTP GETs, so no JSON parsing is involved.
> Switching to persistent connections gives 2200 query/sec (again, CouchDB
> maxes the CPU out).
>
> I hope some day CouchDB will be able to deliver performance too.

RE: CouchDB slow response times

Posted by Mike Kimber <mk...@kana.com>.

Performance is relative and effective performance is very much determined by the use case i.e. we do analytics with couchdb its faster than a traditional RDBMS in many cases (especially if your views are queried regularly) on less hardware (disk space not included, but that's a trade off and compression in 1.2 helps greatly here) and is easier to use for document analysis. However it may not be a great fit for very high read use cases currently. If that's your use case then there are other options i.e. Redis (possibly as a front end to Couch) or dare I say it here Mongodb and Couchbase or numerous other commercial options from in-memory databases to column orientated databases, but again it depends on the use case.

You may want to describe your use case i.e. what you are trying to accomplish to  allow the community to provide  informed comment on your observations.

Thanks

Mike 

-----Original Message-----
From: Attila Nagy [mailto:bra@fsn.hu] 
Sent: 20 April 2012 08:35
To: user@couchdb.apache.org
Subject: Re: CouchDB slow response times

On 04/19/12 08:28, Attila Nagy wrote:
>
> So getting an exact document took .098921 seconds (nearly 98.9 
> milliseconds) on a completely idle machine.
>
> Any subsequent queries are in the order of the above response time, 
> which is just slow.
>
> Is this what CouchDB and Erlang capable of, or something is wrong in 
> my setup? I haven't turned compression off, BTW, but will measure its 
> effect.
Without compression:
07:43:03.822390 HTTP GET /test/1
07:43:03.823475 HTTP/1.1 200 OK
07:43:03.919761 the JSON data
so the response time is .097371 seconds (97.37 ms)

In the mean time, I've found that somewhere in time CouchDB/HTTPd turned 
TCP NODELAY to off, so
socket_options = [{nodelay, true}]
gives: 2.47 ms response time, which is a major increase.
I could lower that down to 2.1 ms by switching to 
null_authentication_handler, which is not good, but better.

On query performance: when I fetch the same documents (one by one, ID 
number one to the last) from three different machines on four threads on 
each of them (so 12 concurrent HTTP GETs can be on the wire), I can 
saturate one CPU core (Xeon X5670 @ 2.93GHz, I've limited it to one 
core) to 100% CouchDB and can get about 1700 query/sec performance.
These are just plain HTTP GETs, so no JSON parsing is involved.
Switching to persistent connections gives 2200 query/sec (again, CouchDB 
maxes the CPU out).

I hope some day CouchDB will be able to deliver performance too.

Re: CouchDB slow response times

Posted by Attila Nagy <br...@fsn.hu>.

On 04/19/12 08:28, Attila Nagy wrote:
>
> So getting an exact document took .098921 seconds (nearly 98.9 
> milliseconds) on a completely idle machine.
>
> Any subsequent queries are in the order of the above response time, 
> which is just slow.
>
> Is this what CouchDB and Erlang capable of, or something is wrong in 
> my setup? I haven't turned compression off, BTW, but will measure its 
> effect.
Without compression:
07:43:03.822390 HTTP GET /test/1
07:43:03.823475 HTTP/1.1 200 OK
07:43:03.919761 the JSON data
so the response time is .097371 seconds (97.37 ms)

In the mean time, I've found that somewhere in time CouchDB/HTTPd turned 
TCP NODELAY to off, so
socket_options = [{nodelay, true}]
gives: 2.47 ms response time, which is a major increase.
I could lower that down to 2.1 ms by switching to 
null_authentication_handler, which is not good, but better.

On query performance: when I fetch the same documents (one by one, ID 
number one to the last) from three different machines on four threads on 
each of them (so 12 concurrent HTTP GETs can be on the wire), I can 
saturate one CPU core (Xeon X5670 @ 2.93GHz, I've limited it to one 
core) to 100% CouchDB and can get about 1700 query/sec performance.
These are just plain HTTP GETs, so no JSON parsing is involved.
Switching to persistent connections gives 2200 query/sec (again, CouchDB 
maxes the CPU out).

I hope some day CouchDB will be able to deliver performance too.