You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@zookeeper.apache.org by "Daniel Lord (Commented) (JIRA)" <ji...@apache.org> on 2011/09/26 23:10:16 UTC

[jira] [Commented] (ZOOKEEPER-126) zookeeper client close operation may block indefinitely

    [ https://issues.apache.org/jira/browse/ZOOKEEPER-126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13114934#comment-13114934 ] 

Daniel Lord commented on ZOOKEEPER-126:
---------------------------------------

Hey guys, I just ran in to a cascading failure that ended up settling on this.  The issue was my SendThread was killed by an NPE when trying to log some unknown exception.  The line that caused the SendThread to die was: 

at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1161)

This is using zookeeper 3.3.2.  The NPE from the logging ends up killing the SendThread by an uncaught exception.  Eventually the lack of a send thread causes the client to disconnect/expire.  At this point the zookeeper client is close()'d.  The close call will hang forever because the final packet is never sent so it is never ACK'd or notify()'d and we stall out forever. 

Should this NPE be filed as a new issue or are you all content that this timed close will solve the problem?
                
> zookeeper client close operation may block indefinitely
> -------------------------------------------------------
>
>                 Key: ZOOKEEPER-126
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-126
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: java client
>            Reporter: Patrick Hunt
>             Fix For: 3.5.0
>
>         Attachments: ZOOKEEPER-126.patch, ZOOKEEPER-126.patch
>
>
> Moving the hang issue from ZOOKEEPER-63 to here. See 63 for background and potential patch (patch_ZOOKEEPER-63.patch).
> specifically (from James): 
> "I'm thinking the close() method should not wait() forever on the disconnect packet, just a closeTimeout length - say a few seconds. Afterall blocking and forcing a reconnect just to redeliver the disconnect packet seems a bit silly - when the server has to deal with clients which just have their sockets fail anyway"

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira