You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@zookeeper.apache.org by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2017/04/01 20:59:41 UTC

[jira] [Commented] (ZOOKEEPER-2743) Netty connection leaks JMX connection bean upon connection close in certain race conditions.

    [ https://issues.apache.org/jira/browse/ZOOKEEPER-2743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15952406#comment-15952406 ] 

ASF GitHub Bot commented on ZOOKEEPER-2743:
-------------------------------------------

GitHub user hanm opened a pull request:

    https://github.com/apache/zookeeper/pull/211

    ZOOKEEPER-2743: Netty connection leaks JMX connection bean.

    See https://issues.apache.org/jira/browse/ZOOKEEPER-2743 for details on the symptom and diagnostic.
    
    There are many ways of fixing this. For example we can enforce certain ordering of the close operations to eliminate the race condition however that would incur non trivial performance penalty as a result of synchronization. I think the solution in this patch is simple and effective as it observes that the bean unregister call is idempotent and the call itself is trivial so we just always unregister connection upon close the connection, to ensure the bean is unregistered regardless of the races.
    
    I've also stress tested this solution on internal Jenkins which indicates no failures of https://issues.apache.org/jira/browse/ZOOKEEPER-2707 for the past two days.
    
    A side note is NIO connection in theory would suffer the same problem however I am unable to reproduce the same racing with existing unit test. So I just leave NIO as it is, for now.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/hanm/zookeeper ZOOKEEPER-2743

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/zookeeper/pull/211.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #211
    
----
commit 7dd916c4c7c8d4e0e83cb5780175058a5626de19
Author: Michael Han <ha...@apache.org>
Date:   2017-04-01T20:37:48Z

    ZOOKEEPER-2743: Netty connection leaks JMX connection bean upon connection close in certain race conditions.
    Always unregister connection upon close to prevent connection bean leak under certain race conditions.

----


> Netty connection leaks JMX connection bean upon connection close in certain race conditions.
> --------------------------------------------------------------------------------------------
>
>                 Key: ZOOKEEPER-2743
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2743
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: server
>    Affects Versions: 3.4.10, 3.5.2
>            Reporter: Michael Han
>            Assignee: Michael Han
>              Labels: netty
>
> This is a tricky issue found while debugging failure of "flaky" watcher test (ZOOKEEPER-2686). When closing a Netty connection, depend on timing the connection bean registered when the connection was provisioned might not get unregistered, leading to leaked Java beans. 
> The race happens at the time when the client is in the process of finalizing the session. As part of session finalization, a connection bean will be registered [1]. But right before the registering bean, the connection might gets closed, in cases for example the server that the client is connecting to is shutdown. As part of connection close, the bean will be un-registered, as expected [2], however the problem is when we execute at [2], the connection bean might not finish registering at [1], so the unregister of bean is a NOP. What's worse, as part of connection close, we remove this connection from connection factory [3], so future connection close call will get short circuited and directly return; in other words the bean unregister code in connection close call will only get executed once. Depends on luck, the bean might not get unregistered, as previously illustrated.
> [1] https://github.com/apache/zookeeper/blob/master/src/java/main/org/apache/zookeeper/server/ZooKeeperServer.java#L700
> [2] https://github.com/apache/zookeeper/blob/master/src/java/main/org/apache/zookeeper/server/NettyServerCnxn.java#L114
> [3] 
> https://github.com/apache/zookeeper/blob/master/src/java/main/org/apache/zookeeper/server/NettyServerCnxn.java#L96



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)