You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@zookeeper.apache.org by Eric Robert <er...@datacratic.com> on 2013/03/08 16:44:30 UTC
Connections issues with ZooKeeper
Hello,
I am experiencing connection issues when many process try to connect to ZK
at the same time. I quickly found that I needed to increased maxClientCnxns
to cover our use case but I still get timeout when connecting. Most of the
client run from the same machine. I've tried different setup with similar
results i.e. standalone on the same machine as the clients, standalone on
another machine or in an ensemble of 5 servers.
For example, when ZK is local and standalone, I start to get a few timeouts
with 100 clients and the problem gets a lot worse from there i.e. with
something like 1000 clients, most of them can't connect.
I think I reproduced the problem with zk-smoketest with a simple script
that starts multiple instances of the test concurrently. Note that with an
ensemble of 5 servers, I start to get exceptions with very few connections.
Maybe we're missing something in the configuration?
Here it is:
tickTime=2000
initLimit=10
syncLimit=5
dataDir=/tmp/zookeeper
clientPort=2181
maxClientCnxns=4000
server.1=69.90.81.244:2888:3888
server.2=69.90.81.246:2888:3888
server.3=69.90.81.248:2888:3888
server.4=69.90.81.250:2888:3888
server.5=69.90.81.252:2888:3888
Here is my test:
#!/bin/bash
for i in {1..10}
do
./zk-latencies.py --root_znode=/zoo-$i --znode_count=10 --servers="
ag4.recoset.com:2181" &
echo "running $i"
#waiting makes everything good again
#sleep 1
done
Here is one of the exception I get:
Traceback (most recent call last):
File "./zk-latencies.py", line 304, in <module>
asynchronous_latency_test(s, data)
File "./zk-latencies.py", line 188, in asynchronous_latency_test
timer2(func, "get %7d znodes " % (options.znode_count))
File "./zk-latencies.py", line 85, in timer2
func()
File "./zk-latencies.py", line 183, in func
cb.waitForSuccess()
File "/home/eric/code/zk-smoketest/zkclient.py", line 181, in
waitForSuccess
(self.handle, self.rc))
zkclient.ZKClientError: 'asynchronous operation failed on handle 0 with rc
-4'
For reference, I seems to get good performance with 1 connection and 10000
nodes:
Connected in 189 ms, handle is 0
Testing latencies on server ag4.recoset.com:2181 using asynchronous calls
created 10000 permanent znodes in 2155 ms (0.215594 ms/op
4638.358658/sec)
set 10000 znodes in 1027 ms (0.102703 ms/op
9736.823045/sec)
get 10000 znodes in 1096 ms (0.109671 ms/op
9118.163178/sec)
deleted 10000 permanent znodes in 1574 ms (0.157465 ms/op
6350.621245/sec)
created 10000 ephemeral znodes in 1776 ms (0.177664 ms/op
5628.592681/sec)
watched 10000 znodes in 1282 ms (0.128248 ms/op
7797.367901/sec)
deleted 10000 ephemeral znodes in 1006 ms (0.100612 ms/op
9939.141978/sec)
notif 10000 watches in 0 ms (included in prior)
Latency test complete
Thanks!
Éric