You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@zookeeper.apache.org by "Mel Martinez (JIRA)" <ji...@apache.org> on 2016/11/28 17:25:58 UTC

[jira] [Commented] (ZOOKEEPER-2251) Add Client side packet response timeout to avoid infinite wait.

    [ https://issues.apache.org/jira/browse/ZOOKEEPER-2251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15702561#comment-15702561 ] 

Mel Martinez commented on ZOOKEEPER-2251:
-----------------------------------------

Hi.  We are definitely running into what looks to be this problem and I was wondering if there was any movement towards incorporating the fix into a release version?

I've worked around the issue in one part of our code by wrapping my own timeout around the call that is vulnerable to the deadlock.

However, this isn't possible in some cases where 3rd party code (in our case, Accumulo client & Thrift libraries) call down into the ZooKeeper client code and hit this.

This has become a pretty critical issue for us. 

> Add Client side packet response timeout to avoid infinite wait.
> ---------------------------------------------------------------
>
>                 Key: ZOOKEEPER-2251
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2251
>             Project: ZooKeeper
>          Issue Type: Bug
>            Reporter: nijel
>            Assignee: Arshad Mohammad
>         Attachments: ZOOKEEPER-2251-01.patch, ZOOKEEPER-2251-02.patch, ZOOKEEPER-2251-03.patch, ZOOKEEPER-2251-04.patch
>
>
> I came across one issue related to Client side packet response timeout In my cluster many packet drops happened for some time.
> One observation is the zookeeper client got hanged. As per the thread dump it is waiting for the response/ACK for the operation performed (synchronous API used here).
> I am using zookeeper.serverCnxnFactory=org.apache.zookeeper.server.NIOServerCnxnFactory
> Since only few packets missed there is no DISCONNECTED event occurred.
> Need add a "response time out" for the operations or packets.
> *Comments from [~rakeshr]*
> My observation about the problem:-
> * Can use tools like 'Wireshark' to simulate the artificial packet loss.
> * Assume there is only one packet in the 'outgoingQueue' and unfortunately the server response packet lost. Now, client will enter into infinite waiting. https://github.com/apache/zookeeper/blob/trunk/src/java/main/org/apache/zookeeper/ClientCnxn.java#L1515
> * Probably we can discuss more about this problem and possible solutions(add packet ACK timeout or another better approach) in the jira.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)