You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@zookeeper.apache.org by hanm <gi...@git.apache.org> on 2017/04/01 20:59:31 UTC

[GitHub] zookeeper pull request #211: ZOOKEEPER-2743: Netty connection leaks JMX conn...

GitHub user hanm opened a pull request:

    https://github.com/apache/zookeeper/pull/211

    ZOOKEEPER-2743: Netty connection leaks JMX connection bean.

    See https://issues.apache.org/jira/browse/ZOOKEEPER-2743 for details on the symptom and diagnostic.
    
    There are many ways of fixing this. For example we can enforce certain ordering of the close operations to eliminate the race condition however that would incur non trivial performance penalty as a result of synchronization. I think the solution in this patch is simple and effective as it observes that the bean unregister call is idempotent and the call itself is trivial so we just always unregister connection upon close the connection, to ensure the bean is unregistered regardless of the races.
    
    I've also stress tested this solution on internal Jenkins which indicates no failures of https://issues.apache.org/jira/browse/ZOOKEEPER-2707 for the past two days.
    
    A side note is NIO connection in theory would suffer the same problem however I am unable to reproduce the same racing with existing unit test. So I just leave NIO as it is, for now.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/hanm/zookeeper ZOOKEEPER-2743

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/zookeeper/pull/211.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #211
    
----
commit 7dd916c4c7c8d4e0e83cb5780175058a5626de19
Author: Michael Han <ha...@apache.org>
Date:   2017-04-01T20:37:48Z

    ZOOKEEPER-2743: Netty connection leaks JMX connection bean upon connection close in certain race conditions.
    Always unregister connection upon close to prevent connection bean leak under certain race conditions.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] zookeeper pull request #211: ZOOKEEPER-2743: Netty connection leaks JMX conn...

Posted by hanm <gi...@git.apache.org>.
Github user hanm commented on a diff in the pull request:

    https://github.com/apache/zookeeper/pull/211#discussion_r110711280
  
    --- Diff: src/java/main/org/apache/zookeeper/server/NettyServerCnxn.java ---
    @@ -87,6 +87,12 @@ public void close() {
                 LOG.debug("close called for sessionid:0x"
                         + Long.toHexString(sessionId));
             }
    +
    +        // ZOOKEEPER-2743:
    +        // Always unregister connection upon close to prevent
    +        // connection bean leak under certain race conditions.
    +        factory.unregisterConnection(this);
    --- End diff --
    
    >> Could you please raise a new jira to track the changes.
    
    Created https://issues.apache.org/jira/browse/ZOOKEEPER-2751 for this. Merging this PR now.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] zookeeper pull request #211: ZOOKEEPER-2743: Netty connection leaks JMX conn...

Posted by hanm <gi...@git.apache.org>.
Github user hanm commented on a diff in the pull request:

    https://github.com/apache/zookeeper/pull/211#discussion_r110082136
  
    --- Diff: src/java/main/org/apache/zookeeper/server/NettyServerCnxn.java ---
    @@ -87,6 +87,12 @@ public void close() {
                 LOG.debug("close called for sessionid:0x"
                         + Long.toHexString(sessionId));
             }
    +
    +        // ZOOKEEPER-2743:
    +        // Always unregister connection upon close to prevent
    +        // connection bean leak under certain race conditions.
    +        factory.unregisterConnection(this);
    --- End diff --
    
    Also it is guaranteed that the close call will be called at least once after the cnx bean is registered (e.g. when sever shutdown, the factor will iterate every cnx and close it..).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] zookeeper pull request #211: ZOOKEEPER-2743: Netty connection leaks JMX conn...

Posted by hanm <gi...@git.apache.org>.
Github user hanm commented on a diff in the pull request:

    https://github.com/apache/zookeeper/pull/211#discussion_r110081828
  
    --- Diff: src/java/main/org/apache/zookeeper/server/NettyServerCnxn.java ---
    @@ -87,6 +87,12 @@ public void close() {
                 LOG.debug("close called for sessionid:0x"
                         + Long.toHexString(sessionId));
             }
    +
    +        // ZOOKEEPER-2743:
    +        // Always unregister connection upon close to prevent
    +        // connection bean leak under certain race conditions.
    +        factory.unregisterConnection(this);
    --- End diff --
    
    That is fine, I might able to provide a formal verification of the theorem but here is a quick prove of that case:
    * Assume close is called before connection bean is registered [1]
    * The unregister bean in close call is no-op because the bean is not registered. But the channel will be closed, as part of close call.
    * Now before finalizing session returns, some sort of exception is going to throw, because the channel is closed. Probably here [2].
    * As part of exception the close is called again. This time it will unregister the bean (before this fix it will not, so it will miss this edge case.).
    
    Basically we are safe as close will be called multiple times and guaranteed at least one close call will happen after cnx bean is registered. 
    
    [1] https://github.com/apache/zookeeper/blob/master/src/java/main/org/apache/zookeeper/server/ZooKeeperServer.java#L699
    [2]
    https://github.com/apache/zookeeper/blob/master/src/java/main/org/apache/zookeeper/server/ZooKeeperServer.java#L716


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] zookeeper pull request #211: ZOOKEEPER-2743: Netty connection leaks JMX conn...

Posted by rakeshadr <gi...@git.apache.org>.
Github user rakeshadr commented on a diff in the pull request:

    https://github.com/apache/zookeeper/pull/211#discussion_r110702056
  
    --- Diff: src/java/main/org/apache/zookeeper/server/NettyServerCnxn.java ---
    @@ -87,6 +87,12 @@ public void close() {
                 LOG.debug("close called for sessionid:0x"
                         + Long.toHexString(sessionId));
             }
    +
    +        // ZOOKEEPER-2743:
    +        // Always unregister connection upon close to prevent
    +        // connection bean leak under certain race conditions.
    +        factory.unregisterConnection(this);
    --- End diff --
    
    Agreed Michael. I've noticed `NIOServerCnxnFactory#removeCnxn()` execution path, we should correct this part as well, like you mentioned in the PR comment. Could you please raise a new jira to track the  changes.
    
    +1 the netty code changes looks good to me.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] zookeeper pull request #211: ZOOKEEPER-2743: Netty connection leaks JMX conn...

Posted by rakeshadr <gi...@git.apache.org>.
Github user rakeshadr commented on a diff in the pull request:

    https://github.com/apache/zookeeper/pull/211#discussion_r110080170
  
    --- Diff: src/java/main/org/apache/zookeeper/server/NettyServerCnxn.java ---
    @@ -87,6 +87,12 @@ public void close() {
                 LOG.debug("close called for sessionid:0x"
                         + Long.toHexString(sessionId));
             }
    +
    +        // ZOOKEEPER-2743:
    +        // Always unregister connection upon close to prevent
    +        // connection bean leak under certain race conditions.
    +        factory.unregisterConnection(this);
    --- End diff --
    
    Any chance to occur another race condition, [1] session finalization is in progress [2] client cnxn closure due to server shutdown. Assume [1] step is in progress and not yet registered the connection bean [2] step is in progress and as per this patch unregistered bean irrespective of cnxn exists in factory and returns. Again assume now, [1] step continues and registering the connection bean. This is again leaking the cnxn bean, right?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] zookeeper pull request #211: ZOOKEEPER-2743: Netty connection leaks JMX conn...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/zookeeper/pull/211


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---