You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@couchdb.apache.org by Arnaud Schoonjans <ar...@student.kuleuven.be> on 2014/04/09 17:11:59 UTC

Couchdb benchmark shows weird behaviour

Hallo,

I am a student currently working on my thesis, for which I am 
benchmarking several NoSQL database. One of the benchmarks I ran on 
couchdb show some unexpected/counter-intuitive results. I included a 
graph in de attachment.

The benchmark set-up is as follows:

* A three node couchdb cluster with replication links between each node 
in both directions.
* All update/insert operation are send to the same couchdb node
* All Read/scan (range queries) operations are load balanced over all 
three nodes (round robin)
* Ektorp is used as the java client library
* The scan operation is implemented by querying the view "_all_docs" 
with a certain startkey and a limit of 100 documents.

De benchmark consists of four parts:

* The first five minutes are warm-up. (not shown in the graph)
* Between timepoint 5min. and 10min. in the graph all nodes are up and 
running normally.
* Between 10min and 15min a firewall rule is inserted on one of the 
(read) nodes which prevent incoming and outgoing couchdb traffic (port 
5984).
* From 15min. on, this firewall rule is removed again. So we're back to 
normal operation.

The purpose of the firewall rule in the middle of the benchmark is to 
simulate network-failure. The benchmark test examines how the couchdb 
database reacts to a network partition by looking at the latency of the 
different operations in time.

The weird thing is: the graph show that read and update operations are 
getting significantly slower when the firewall rule is present, while 
the insert and scan operation doesn't seem to feel an increase in 
latency. The node where the firewall rule is present, is only used for 
read and scan operation. So, normally only the read and scan operations 
should have more latency. The latency of the other operations should 
stay stable. I did the same benchmark using several other NoSQL 
databases which show no such behaviour as the one we see here. I closely 
monitored the behaviour of couchdb, but I didn't found an explanation 
for this phenomena. So, I think it has something to do with the 
architecture of couchdb. Can anyone help me with an architectural 
explanation, which explains why this behaviour is showing up?

Thanks in advance,
Arnaud Schoonjans

Re: Couchdb benchmark shows weird behaviour

Posted by Robert Samuel Newson <rn...@apache.org>.

By "a three node couchdb cluster" do you mean BigCouch? If not, then you have three independent couchdb servers, so your findings aren’t that surprising.

B.

On 9 Apr 2014, at 17:36, Alexander Shorin <kx...@gmail.com> wrote:

> On Wed, Apr 9, 2014 at 7:11 PM, Arnaud Schoonjans
> <ar...@student.kuleuven.be> wrote:
>> The purpose of the firewall rule in the middle of the benchmark is to
>> simulate network-failure. The benchmark test examines how the couchdb
>> database reacts to a network partition by looking at the latency of the
>> different operations in time.
> 
> Side note: if you really want to test behavior on unstable networks, use netem
> http://www.linuxfoundation.org/collaborate/workgroups/networking/netem
> Since destination down is quite trivial case and you only can test
> there timeouts and request repeating. Packets loss, corruption and
> duplication, rate limiting and delays are the real behavior of the
> real unstable networks, especially like wifi and 3g.
> 
> About the subj: what have you used for balancing? Are you sure that
> this phenomena isn't balancer issue which tries to reach "failed" node
> before try the next one - that could cause the latency.
> 
> As usual for any benchmarks, it would be good to see numbers and
> how-to guide to reproduce test bench and results locally.
> 
> Thanks.
> 
> --
> ,,,^..^,,,

Re: Couchdb benchmark shows weird behaviour

Posted by Alexander Shorin <kx...@gmail.com>.

On Wed, Apr 9, 2014 at 7:11 PM, Arnaud Schoonjans
<ar...@student.kuleuven.be> wrote:
> The purpose of the firewall rule in the middle of the benchmark is to
> simulate network-failure. The benchmark test examines how the couchdb
> database reacts to a network partition by looking at the latency of the
> different operations in time.

Side note: if you really want to test behavior on unstable networks, use netem
http://www.linuxfoundation.org/collaborate/workgroups/networking/netem
Since destination down is quite trivial case and you only can test
there timeouts and request repeating. Packets loss, corruption and
duplication, rate limiting and delays are the real behavior of the
real unstable networks, especially like wifi and 3g.

About the subj: what have you used for balancing? Are you sure that
this phenomena isn't balancer issue which tries to reach "failed" node
before try the next one - that could cause the latency.

As usual for any benchmarks, it would be good to see numbers and
how-to guide to reproduce test bench and results locally.

Thanks.

--
,,,^..^,,,