You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@zookeeper.apache.org by Eric Robert <er...@datacratic.com> on 2013/03/08 16:44:30 UTC

Connections issues with ZooKeeper

Hello,

I am experiencing connection issues when many process try to connect to ZK
at the same time. I quickly found that I needed to increased maxClientCnxns
to cover our use case but I still get timeout when connecting. Most of the
client run from the same machine. I've tried different setup with similar
results i.e. standalone on the same machine as the clients, standalone on
another machine or in an ensemble of 5 servers.

For example, when ZK is local and standalone, I start to get a few timeouts
with 100 clients and the problem gets a lot worse from there i.e. with
something like 1000 clients, most of them can't connect.

I think I reproduced the problem with zk-smoketest with a simple script
that starts multiple instances of the test concurrently. Note that with an
ensemble of 5 servers, I start to get exceptions with very few connections.

Maybe we're missing something in the configuration?

Here it is:

tickTime=2000
initLimit=10
syncLimit=5
dataDir=/tmp/zookeeper
clientPort=2181
maxClientCnxns=4000
server.1=69.90.81.244:2888:3888
server.2=69.90.81.246:2888:3888
server.3=69.90.81.248:2888:3888
server.4=69.90.81.250:2888:3888
server.5=69.90.81.252:2888:3888

Here is my test:

#!/bin/bash
for i in {1..10}
do
   ./zk-latencies.py --root_znode=/zoo-$i --znode_count=10 --servers="
ag4.recoset.com:2181" &
   echo "running $i"
   #waiting makes everything good again
   #sleep 1
done

Here is one of the exception I get:

Traceback (most recent call last):
  File "./zk-latencies.py", line 304, in <module>
    asynchronous_latency_test(s, data)
  File "./zk-latencies.py", line 188, in asynchronous_latency_test
    timer2(func, "get     %7d           znodes " % (options.znode_count))
  File "./zk-latencies.py", line 85, in timer2
    func()
  File "./zk-latencies.py", line 183, in func
    cb.waitForSuccess()
  File "/home/eric/code/zk-smoketest/zkclient.py", line 181, in
waitForSuccess
    (self.handle, self.rc))
zkclient.ZKClientError: 'asynchronous operation failed on handle 0 with rc
-4'

For reference, I seems to get good performance with 1 connection and 10000
nodes:

Connected in 189 ms, handle is 0
Testing latencies on server ag4.recoset.com:2181 using asynchronous calls
created   10000 permanent znodes  in   2155 ms (0.215594 ms/op
4638.358658/sec)
set       10000           znodes  in   1027 ms (0.102703 ms/op
9736.823045/sec)
get       10000           znodes  in   1096 ms (0.109671 ms/op
9118.163178/sec)
deleted   10000 permanent znodes  in   1574 ms (0.157465 ms/op
6350.621245/sec)
created   10000 ephemeral znodes  in   1776 ms (0.177664 ms/op
5628.592681/sec)
watched   10000           znodes  in   1282 ms (0.128248 ms/op
7797.367901/sec)
deleted   10000 ephemeral znodes  in   1006 ms (0.100612 ms/op
9939.141978/sec)
notif     10000           watches in      0 ms (included in prior)
Latency test complete

Thanks!


Éric