You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@zookeeper.apache.org by Deepinder Singh Setia <ds...@juniper.net> on 2013/08/09 18:13:21 UTC

OperationTimeoutException error

I am using Kazoo client and see the following error:

Aug  9 07:07:20 a2s1 python[2085]: File "/usr/lib/python2.7/site-packages/kazoo/client.py", line 894, in set
Aug  9 07:07:20 a2s1 python[2085]: return self.set_async(path, data, version).get()
Aug  9 07:07:20 a2s1 python[2085]: File "/usr/lib64/python2.7/site-packages/gevent/event.py", line 223, in get
Aug  9 07:07:20 a2s1 python[2085]: raise self._exception 
Aug  9 07:07:20 a2s1 python[2085]: OperationTimeoutException: operation timeout

zookeeper logs around the error time:

2013-08-09 07:07:06,580 [myid:] - WARN  [SyncThread:0:FileTxnLog@321] - fsync-ing the write ahead log in SyncThread:0 took 2291ms which will adversely effect operation latency. See the ZooKeeper troubleshooting guide
…
2013-08-09 07:07:19,827 [myid:] - WARN  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@349] - caught end of stream exception
EndOfStreamException: Unable to read additional data from client sessionid 0x1406356c3bc0001, likely client has closed socket
        at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:220)
        at org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:224)
        at java.lang.Thread.run(Thread.java:722)
2013-08-09 07:07:19,828 [myid:] - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1000] - Closed socket connection for client /127.0.0.1:54698 which had sessionid 0x1406356c3bc0001
2013-08-09 07:07:20,246 [myid:] - WARN  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@349] - caught end of stream exception
EndOfStreamException: Unable to read additional data from client sessionid 0x1406356c3bc0000, likely client has closed socket
        at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:220)
        at org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:224)
        at java.lang.Thread.run(Thread.java:722)
2013-08-09 07:07:20,246 [myid:] - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1000] - Closed socket connection for client /127.0.0.1:54579 which had sessionid 0x1406356c3bc0000
2013-08-09 07:07:21,583 [myid:] - WARN  [SyncThread:0:FileTxnLog@321] - fsync-ing the write ahead log in SyncThread:0 took 8001ms which will adversely effect operation latency. See the ZooKeeper troubleshooting guide
2013-08-09 07:07:21,680 [myid:] - ERROR [SyncThread:0:NIOServerCnxn@180] - Unexpected Exception:
java.nio.channels.CancelledKeyException

[root@a2s1 zookeeper]# grep fsync /var/log/zookeeper/zookeeper.out 
2013-08-09 07:06:52,280 [myid:] - WARN  [SyncThread:0:FileTxnLog@321] - fsync-ing the write ahead log in SyncThread:0 took 1724ms which will adversely effect operation latency. See the ZooKeeper troubleshooting guide
2013-08-09 07:06:58,315 [myid:] - WARN  [SyncThread:0:FileTxnLog@321] - fsync-ing the write ahead log in SyncThread:0 took 2378ms which will adversely effect operation latency. See the ZooKeeper troubleshooting guide
2013-08-09 07:07:01,389 [myid:] - WARN  [SyncThread:0:FileTxnLog@321] - fsync-ing the write ahead log in SyncThread:0 took 1113ms which will adversely effect operation latency. See the ZooKeeper troubleshooting guide
2013-08-09 07:07:06,580 [myid:] - WARN  [SyncThread:0:FileTxnLog@321] - fsync-ing the write ahead log in SyncThread:0 took 2291ms which will adversely effect operation latency. See the ZooKeeper troubleshooting guide
2013-08-09 07:07:21,583 [myid:] - WARN  [SyncThread:0:FileTxnLog@321] - fsync-ing the write ahead log in SyncThread:0 took 8001ms which will adversely effect operation latency. See the ZooKeeper troubleshooting guide
[

Could the client (Kazoo) be timing out because of fsync delay? What parameter would control duration for OperationTimeoutException that I can perhaps increase to verify? There is only ZooKeeper client and the load isn't much - 1 read/sec and 2 writes/sec roughly. Zookeeper configuration is default. Kazoo client params are also default. 

Thanks
Deepinder



Re: OperationTimeoutException error

Posted by Deepinder Singh Setia <ds...@juniper.net>.
Hanno, thanks for your feedback. I have a better understanding of the
problem now. 

I am not using a dedicated disk for transaction log or dedicated machine
for Zookeeper. Will seriously consider latter first (which will
automatically solve the former issue).

Meanwhile I have increased the session timeout as a work around. I am able
to do that because my sundry clients do not communicate with Zookeeper
directly, instead go through a proxy process. Thus, it is possible to
increase the session timeout for essentially a single ZooKeeper client.

I am also going to look at client side retries along with tuning of GC
parameters to further alleviate the problem.

Thanks
Deepinder

On 8/10/13 12:02 PM, "Hanno Schlichting" <ha...@hannosch.eu> wrote:

>Hi.
>
>On Fri, Aug 9, 2013, at 18:13, Deepinder Singh Setia wrote:
>> Aug  9 07:07:20 a2s1 python[2085]: OperationTimeoutException: operation
>> timeout
>
>That's one of the "retryable exceptions" in Kazoo. So if you'd use
>client.retry, you could tolerate one or more instances of this error.
>
>> zookeeper logs around the error time:
>> 
>> 2013-08-09 07:07:06,580 [myid:] - WARN  [SyncThread:0:FileTxnLog@321] -
>> fsync-ing the write ahead log in SyncThread:0 took 2291ms which will
>> adversely effect operation latency. See the ZooKeeper troubleshooting
>> guide
>
>More than 2 seconds of fsync stall is quite long. And with that or GC
>pauses, it's more than likely that you exceed the session timeout
>limits.
>
>Did you follow the recommendations in
>http://zookeeper.apache.org/doc/trunk/zookeeperAdmin.html? Especially
>around using dedicated disks for the transaction log and using a
>dedicated machine for Zookeeper to avoid other processes stalling it?
>
>> Could the client (Kazoo) be timing out because of fsync delay? What
>> parameter would control duration for OperationTimeoutException that I
>>can
>> perhaps increase to verify? There is only ZooKeeper client and the load
>> isn't much - 1 read/sec and 2 writes/sec roughly. Zookeeper
>>configuration
>> is default. Kazoo client params are also default.
>
>In the admin guide, look at tickTime and syncLimit. In a default config
>the session timeout is ~4 seconds. While you can increase this value,
>you thereby also increase the minimum time it takes Zookeeper to
>consider an actual client to be dead. Depending on what you use ZK for,
>you might prefer failing fast and thus low session timeout values.
>
>Hanno
>
>



Re: OperationTimeoutException error

Posted by Hanno Schlichting <ha...@hannosch.eu>.
Hi.

On Fri, Aug 9, 2013, at 18:13, Deepinder Singh Setia wrote:
> Aug  9 07:07:20 a2s1 python[2085]: OperationTimeoutException: operation
> timeout

That's one of the "retryable exceptions" in Kazoo. So if you'd use
client.retry, you could tolerate one or more instances of this error.

> zookeeper logs around the error time:
> 
> 2013-08-09 07:07:06,580 [myid:] - WARN  [SyncThread:0:FileTxnLog@321] -
> fsync-ing the write ahead log in SyncThread:0 took 2291ms which will
> adversely effect operation latency. See the ZooKeeper troubleshooting
> guide

More than 2 seconds of fsync stall is quite long. And with that or GC
pauses, it's more than likely that you exceed the session timeout
limits.

Did you follow the recommendations in
http://zookeeper.apache.org/doc/trunk/zookeeperAdmin.html? Especially
around using dedicated disks for the transaction log and using a
dedicated machine for Zookeeper to avoid other processes stalling it?

> Could the client (Kazoo) be timing out because of fsync delay? What
> parameter would control duration for OperationTimeoutException that I can
> perhaps increase to verify? There is only ZooKeeper client and the load
> isn't much - 1 read/sec and 2 writes/sec roughly. Zookeeper configuration
> is default. Kazoo client params are also default. 

In the admin guide, look at tickTime and syncLimit. In a default config
the session timeout is ~4 seconds. While you can increase this value,
you thereby also increase the minimum time it takes Zookeeper to
consider an actual client to be dead. Depending on what you use ZK for,
you might prefer failing fast and thus low session timeout values.

Hanno