You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@zookeeper.apache.org by joshelser <gi...@git.apache.org> on 2017/03/07 21:10:12 UTC

[GitHub] zookeeper pull request #186: ZOOKEEPER-2711 Avoid synchronization on NettySe...

GitHub user joshelser opened a pull request:

    https://github.com/apache/zookeeper/pull/186

    ZOOKEEPER-2711 Avoid synchronization on NettyServerCnxn in Factory

    NettyServerCnxnFactory previously synchronized on the (Netty)ServerCnxn
    object to provide mutual exclusion at the RPC layer. However, this was
    at odds with the synchronized methods in ServerCnxn (which shared the
    same monitor). As such, it was possible to deadlock between concurrent
    4LW commands that were invoking one of these synchronized methods
    on ServerCnxn.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/joshelser/zookeeper 2711-4lw

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/zookeeper/pull/186.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #186
    
----
commit 9953e32963e6167f88b97519233b0872cce69a71
Author: Josh Elser <el...@apache.org>
Date:   2017-03-07T21:03:57Z

    ZOOKEEPER-2711 Avoid synchronization on NettyServerCnxn in Factory
    
    NettyServerCnxnFactory previously synchronized on the (Netty)ServerCnxn
    object to provide mutual exclusion at the RPC layer. However, this was
    at odds with the synchronized methods in ServerCnxn (which shared the
    same monitor). As such, it was possible to deadlock between concurrent
    4LW commands that were invoking one of these synchronized methods
    on ServerCnxn.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] zookeeper pull request #186: ZOOKEEPER-2711 Avoid synchronization on NettySe...

Posted by joshelser <gi...@git.apache.org>.
Github user joshelser commented on a diff in the pull request:

    https://github.com/apache/zookeeper/pull/186#discussion_r105444315
  
    --- Diff: src/java/main/org/apache/zookeeper/server/NettyServerCnxnFactory.java ---
    @@ -162,7 +162,7 @@ public void messageReceived(ChannelHandlerContext ctx, MessageEvent e)
                                 + " from " + ctx.getChannel());
                     }
                     NettyServerCnxn cnxn = (NettyServerCnxn)ctx.getAttachment();
    -                synchronized(cnxn) {
    +                synchronized(cnxn.getRpcLock()) {
                         processMessage(e, cnxn);
    --- End diff --
    
    > I think I like your way better. Although it may be a good idea to have a comment next to RPC_LOCK pointing to the JIRA
    
    Totally agree there :) 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] zookeeper issue #186: ZOOKEEPER-2711 Avoid synchronization on NettyServerCnx...

Posted by joshelser <gi...@git.apache.org>.
Github user joshelser commented on the issue:

    https://github.com/apache/zookeeper/pull/186
  
    I need to see if I can come up with a unit test to catch this case, but I'm not sure how useful one would be (I think anything I come up with would be a bit contrived..)
    
    I was able to test this locally by running 7 iterations of:
    
    ```bash
    while true; do echo stat | nc localhost 2181 >/dev/null 2>&1; echo -n .; sleep 1; done
    ```
    
    while generating some extra load (heavy java tests from another ASF project, *winks*) on my local machine. This ran for ~10minutes without error.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] zookeeper pull request #186: ZOOKEEPER-2711 Avoid synchronization on NettySe...

Posted by afine <gi...@git.apache.org>.
Github user afine commented on a diff in the pull request:

    https://github.com/apache/zookeeper/pull/186#discussion_r105297517
  
    --- Diff: src/java/main/org/apache/zookeeper/server/NettyServerCnxnFactory.java ---
    @@ -162,7 +162,7 @@ public void messageReceived(ChannelHandlerContext ctx, MessageEvent e)
                                 + " from " + ctx.getChannel());
                     }
                     NettyServerCnxn cnxn = (NettyServerCnxn)ctx.getAttachment();
    -                synchronized(cnxn) {
    +                synchronized(cnxn.getRpcLock()) {
                         processMessage(e, cnxn);
    --- End diff --
    
    > Now one thread can be in processMessage while another thread is getting stats about the connection. Is that ok?
    >> I believe this is OK. We can receive two concurrent stat commands, but we only process one of them at a time. I'm also not a Netty wizard, so I could be wildly wrong :)
    I'm not sure I agree with this. Since the results output by the stat command are no longer guaranteed to represent one single instance in time. 
    
    Would it be possible to put a shared lock on the command's execution (shared with any other 4LW that acquires a lock on a connection other than its own, I THINK `cons` is the only other one)?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] zookeeper pull request #186: ZOOKEEPER-2711 Avoid synchronization on NettySe...

Posted by keith-turner <gi...@git.apache.org>.
Github user keith-turner commented on a diff in the pull request:

    https://github.com/apache/zookeeper/pull/186#discussion_r104790720
  
    --- Diff: src/java/main/org/apache/zookeeper/server/NettyServerCnxnFactory.java ---
    @@ -162,7 +162,7 @@ public void messageReceived(ChannelHandlerContext ctx, MessageEvent e)
                                 + " from " + ctx.getChannel());
                     }
                     NettyServerCnxn cnxn = (NettyServerCnxn)ctx.getAttachment();
    -                synchronized(cnxn) {
    +                synchronized(cnxn.getRpcLock()) {
                         processMessage(e, cnxn);
    --- End diff --
    
    Now one thread can be in processMessage while another thread is getting stats about the connection.  Is that ok?  
    
    Do you know if anything else besides ServerCnxn is synchronizing on its monitor?  Basically I am wondering if there is any other code out there that should now sync on what getRpcLock() returns.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] zookeeper issue #186: ZOOKEEPER-2711 Avoid synchronization on NettyServerCnx...

Posted by joshelser <gi...@git.apache.org>.
Github user joshelser commented on the issue:

    https://github.com/apache/zookeeper/pull/186
  
    Rebased the changes on top of master. f127aab is the same, 854e48e adds a comment and a (hopefully not contrived) unit test.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] zookeeper pull request #186: ZOOKEEPER-2711 Avoid synchronization on NettySe...

Posted by afine <gi...@git.apache.org>.
Github user afine commented on a diff in the pull request:

    https://github.com/apache/zookeeper/pull/186#discussion_r105300714
  
    --- Diff: src/java/main/org/apache/zookeeper/server/NettyServerCnxnFactory.java ---
    @@ -162,7 +162,7 @@ public void messageReceived(ChannelHandlerContext ctx, MessageEvent e)
                                 + " from " + ctx.getChannel());
                     }
                     NettyServerCnxn cnxn = (NettyServerCnxn)ctx.getAttachment();
    -                synchronized(cnxn) {
    +                synchronized(cnxn.getRpcLock()) {
                         processMessage(e, cnxn);
    --- End diff --
    
    >> Now one thread can be in processMessage while another thread is getting stats about the connection. Is that ok?
    
    > I believe this is OK. We can receive two concurrent stat commands, but we only process one of them at a time. I'm also not a Netty wizard, so I could be wildly wrong :)
    
    This should be OK since all fields printed by `synchronized void dumpConnectionInfo(PrintWriter pwriter, boolean brief)` are only updated in `synchronized` methods.
    
    Another option would be to put a shared lock on the command's execution (shared with any other 4LW that acquires a lock on a connection other than its own, I THINK `cons` is the only other one)? I think this may impact 4LW performance slightly but it would prevent the need to add another lock to `NettyServerCnxn`. 
    
    I think I like your way better.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] zookeeper issue #186: ZOOKEEPER-2711 Avoid synchronization on NettyServerCnx...

Posted by joshelser <gi...@git.apache.org>.
Github user joshelser commented on the issue:

    https://github.com/apache/zookeeper/pull/186
  
    Also, I'll try to take a look at the test failures over the weekend and try to come up with a unit test as well.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] zookeeper pull request #186: ZOOKEEPER-2711 Avoid synchronization on NettySe...

Posted by joshelser <gi...@git.apache.org>.
Github user joshelser commented on a diff in the pull request:

    https://github.com/apache/zookeeper/pull/186#discussion_r104792592
  
    --- Diff: src/java/main/org/apache/zookeeper/server/NettyServerCnxnFactory.java ---
    @@ -162,7 +162,7 @@ public void messageReceived(ChannelHandlerContext ctx, MessageEvent e)
                                 + " from " + ctx.getChannel());
                     }
                     NettyServerCnxn cnxn = (NettyServerCnxn)ctx.getAttachment();
    -                synchronized(cnxn) {
    +                synchronized(cnxn.getRpcLock()) {
                         processMessage(e, cnxn);
    --- End diff --
    
    > Now one thread can be in processMessage while another thread is getting stats about the connection. Is that ok?
    
    I believe this is OK. We can receive two concurrent `stat` commands, but we only process one of them at a time. I'm also not a Netty wizard, so I could be wildly wrong :)
    
    > Basically I am wondering if there is any other code out there that should now sync on what getRpcLock() returns.
    
    That's a good point. I had looked to see that there were multiple other methods in `ServerCnxn` which were also synchronized (thus, changing `dumpConnectionInfo(..)` wouldn't have been sufficient). I didn't look to see if there's another code-path which is also synchronizing on the `NettyServerCnxn` (or `ServerCnxn`).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---