You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@zookeeper.apache.org by GitBox <gi...@apache.org> on 2022/08/18 15:20:46 UTC

[GitHub] [zookeeper] MikeEdgar opened a new pull request, #1917: ZOOKEEPER-4293: Lock Contention in ClientCnxnSocketNetty

MikeEdgar opened a new pull request, #1917:
URL: https://github.com/apache/zookeeper/pull/1917

   Check for failed/cancelled ChannelFutures before acquiring connectLock. This prevents lock contention where cleanup is unable to complete.
   
   Replaces #1713


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@zookeeper.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [zookeeper] MikeEdgar commented on pull request #1917: ZOOKEEPER-4293: Lock Contention in ClientCnxnSocketNetty

Posted by GitBox <gi...@apache.org>.
MikeEdgar commented on PR #1917:
URL: https://github.com/apache/zookeeper/pull/1917#issuecomment-1287298764

   @eolivelli , @maoling - please take a look when you have a chance. This replaces https://github.com/apache/zookeeper/pull/1713 that you looked at last year.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@zookeeper.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [zookeeper] sonatype-lift[bot] commented on pull request #1917: ZOOKEEPER-4293: Lock Contention in ClientCnxnSocketNetty

Posted by GitBox <gi...@apache.org>.
sonatype-lift[bot] commented on PR #1917:
URL: https://github.com/apache/zookeeper/pull/1917#issuecomment-1287031813

   :warning: **52 God Classes** were detected by Lift in this project. [Visit the Lift web console](https://lift.sonatype.com/results/github.com/apache/zookeeper/01GFXDYKDY0JQG3X3R1R99WF5D?tab=technical-debt&utm_source=github.com&utm_campaign=lift-comment&utm_content=apache\%20zookeeper) for more details.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@zookeeper.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [zookeeper] kezhuw commented on a diff in pull request #1917: ZOOKEEPER-4293: Lock Contention in ClientCnxnSocketNetty

Posted by "kezhuw (via GitHub)" <gi...@apache.org>.
kezhuw commented on code in PR #1917:
URL: https://github.com/apache/zookeeper/pull/1917#discussion_r1198762873


##########
zookeeper-server/src/main/java/org/apache/zookeeper/ClientCnxnSocketNetty.java:
##########
@@ -196,13 +209,26 @@ public void operationComplete(ChannelFuture channelFuture) throws Exception {
         }
     }
 
+    boolean cancelled(ChannelFuture channelFuture) {
+        if (connectFuture == null) {
+            LOG.info("connect attempt cancelled");
+            // If the connect attempt was cancelled but succeeded
+            // anyway, make sure to close the channel, otherwise
+            // we may leak a file descriptor.
+            channelFuture.channel().close();
+            return true;
+        }
+        return false;
+    }
+
     @Override
     void cleanup() {
         connectLock.lock();
         try {
             if (connectFuture != null) {
                 connectFuture.cancel(false);

Review Comment:
   After taking a look at netty code [AbstractNioChannel::connect](https://github.com/netty/netty/blob/4.1/transport/src/main/java/io/netty/channel/nio/AbstractNioChannel.java#L235), [AbstractEpollChannel::connect](https://github.com/netty/netty/blob/4.1/transport-classes-epoll/src/main/java/io/netty/channel/epoll/AbstractEpollChannel.java#L588) and [AbstractKQueueChannel::connect](https://github.com/netty/netty/blob/4.1/transport-classes-kqueue/src/main/java/io/netty/channel/kqueue/AbstractKQueueChannel.java#L546), I think this statement `connectFuture.cancel(false)` cloud be the root cause.
   
   Basically, a connecting future is uncancellable which means `connectFuture.cancel(false)` could be a nop. So, the old `connectFuture` could still pending for completion. When we cleanup new `connectFuture` and old `connectFuture` completes, they are stuck due to leak of event threads to complete `channel.close().syncUninterruptibly()`(invoke close in event thread).
   
   I think we could also wait `connectFuture` to complete in `cleanup`, but I don't think it is a good. Connection timeout is a possible reason for us to fall here. It is nonsense to block next connection for the same reason.
   
   I have watched this issue for a while, and could not image a situation where there is single `connectFuture` and this stuck emerged. The `channel.close().syncUninterruptibly()` means its `connectFuture` already completed successfully.
   
   https://github.com/apache/zookeeper/blob/a64dbf5b06ca1a73dc2ad6c7d31e679e312082aa/zookeeper-server/src/main/java/org/apache/zookeeper/ClientCnxnSocketNetty.java#L86



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@zookeeper.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] ZOOKEEPER-4293: Lock Contention in ClientCnxnSocketNetty [zookeeper]

Posted by "MikeEdgar (via GitHub)" <gi...@apache.org>.
MikeEdgar commented on PR #1917:
URL: https://github.com/apache/zookeeper/pull/1917#issuecomment-1760036809

   @eolivelli , what are you thoughts on @kezhuw's comments? Does this PR need additional updates or can it be accepted as-is? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@zookeeper.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [zookeeper] MikeEdgar commented on pull request #1917: ZOOKEEPER-4293: Lock Contention in ClientCnxnSocketNetty

Posted by "MikeEdgar (via GitHub)" <gi...@apache.org>.
MikeEdgar commented on PR #1917:
URL: https://github.com/apache/zookeeper/pull/1917#issuecomment-1542077611

   @eolivelli @anmolnar @phunt @symat PTAL. The CI has not yet run with the test fix.
   
   Please also note my earlier question regarding `FollowerRequestProcessorTest`. The package declaration does not match the source directory. Is this intended?
   
   https://github.com/apache/zookeeper/blob/a64dbf5b06ca1a73dc2ad6c7d31e679e312082aa/zookeeper-server/src/test/java/org/apache/zookeeper/server/FollowerRequestProcessorTest.java#L19


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@zookeeper.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [zookeeper] eolivelli commented on pull request #1917: ZOOKEEPER-4293: Lock Contention in ClientCnxnSocketNetty

Posted by GitBox <gi...@apache.org>.
eolivelli commented on PR #1917:
URL: https://github.com/apache/zookeeper/pull/1917#issuecomment-1335116169

   it looks some test related to this patch failed on CI
   
   ![image](https://user-images.githubusercontent.com/9469110/205284615-0e88a089-3c91-498e-a087-2aec1526e147.png)
   
   
   can you please take a look @MikeEdgar  ?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@zookeeper.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [zookeeper] MikeEdgar commented on pull request #1917: ZOOKEEPER-4293: Lock Contention in ClientCnxnSocketNetty

Posted by GitBox <gi...@apache.org>.
MikeEdgar commented on PR #1917:
URL: https://github.com/apache/zookeeper/pull/1917#issuecomment-1338209065

   @eolivelli , I've made a change that should address the NPE in the new test. 
   
   Question.. I see errors locally around `org.apache.zookeeper.server.FollowerRequestProcessorTest` having a bad `package` declaration. Is this intended? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@zookeeper.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [zookeeper] MikeEdgar commented on pull request #1917: ZOOKEEPER-4293: Lock Contention in ClientCnxnSocketNetty

Posted by "MikeEdgar (via GitHub)" <gi...@apache.org>.
MikeEdgar commented on PR #1917:
URL: https://github.com/apache/zookeeper/pull/1917#issuecomment-1504277284

   @eolivelli , please take another look.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@zookeeper.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [zookeeper] kezhuw commented on a diff in pull request #1917: ZOOKEEPER-4293: Lock Contention in ClientCnxnSocketNetty

Posted by "kezhuw (via GitHub)" <gi...@apache.org>.
kezhuw commented on code in PR #1917:
URL: https://github.com/apache/zookeeper/pull/1917#discussion_r1198737024


##########
zookeeper-server/src/main/java/org/apache/zookeeper/ClientCnxnSocketNetty.java:
##########
@@ -196,13 +209,26 @@ public void operationComplete(ChannelFuture channelFuture) throws Exception {
         }
     }
 
+    boolean cancelled(ChannelFuture channelFuture) {
+        if (connectFuture == null) {
+            LOG.info("connect attempt cancelled");
+            // If the connect attempt was cancelled but succeeded
+            // anyway, make sure to close the channel, otherwise
+            // we may leak a file descriptor.
+            channelFuture.channel().close();
+            return true;
+        }
+        return false;
+    }
+
     @Override
     void cleanup() {
         connectLock.lock();
         try {
             if (connectFuture != null) {
                 connectFuture.cancel(false);
                 connectFuture = null;
+                afterConnectFutureCancel();
             }
             if (channel != null) {
                 channel.close().syncUninterruptibly();

Review Comment:
   I think it is a good to move this blocking `syncUninterruptibly` part out of lock scope. This could also solve the deadlock issue.



##########
zookeeper-server/src/main/java/org/apache/zookeeper/ClientCnxnSocketNetty.java:
##########
@@ -196,13 +209,26 @@ public void operationComplete(ChannelFuture channelFuture) throws Exception {
         }
     }
 
+    boolean cancelled(ChannelFuture channelFuture) {

Review Comment:
   I think we could concentrate more on identity of `channelFuture` and `connectFuture`. This will make it clear that `channelFuture` could come from a old connection. Better to document a bit.
   
   It is also possible that new connection is not completed yet. In this case, old connection should be simply filtered out, otherwise we could introduce inconsistence with outside world.



##########
zookeeper-server/src/main/java/org/apache/zookeeper/ClientCnxnSocketNetty.java:
##########
@@ -196,13 +209,26 @@ public void operationComplete(ChannelFuture channelFuture) throws Exception {
         }
     }
 
+    boolean cancelled(ChannelFuture channelFuture) {
+        if (connectFuture == null) {
+            LOG.info("connect attempt cancelled");
+            // If the connect attempt was cancelled but succeeded
+            // anyway, make sure to close the channel, otherwise
+            // we may leak a file descriptor.
+            channelFuture.channel().close();
+            return true;
+        }
+        return false;
+    }
+
     @Override
     void cleanup() {
         connectLock.lock();
         try {
             if (connectFuture != null) {
                 connectFuture.cancel(false);

Review Comment:
   After taking a look at netty code [AbstractNioChannel::connect](https://github.com/netty/netty/blob/4.1/transport/src/main/java/io/netty/channel/nio/AbstractNioChannel.java#L235), [AbstractEpollChannel::connect](https://github.com/netty/netty/blob/4.1/transport-classes-epoll/src/main/java/io/netty/channel/epoll/AbstractEpollChannel.java#L588) and [AbstractKQueueChannel::connect](https://github.com/netty/netty/blob/4.1/transport-classes-kqueue/src/main/java/io/netty/channel/kqueue/AbstractKQueueChannel.java#L546), I think this statement `connectFuture.cancel(false)` cloud be the root cause.
   
   Basically, a connecting future in uncancellable which means `connectFuture.cancel(false)` could be a nop. So, the old `connectFuture` could still pending for completion. When we cleanup new `connectFuture` and old `connectFuture` completes, they are stuck due to leak of event threads to complete `channel.close().syncUninterruptibly()`(invoke close in event thread).
   
   I think we could also wait `connectFuture` to complete in `cleanup`, but I don't think it is a good. Connection timeout is a possible reason for us to fall here. It is nonsense to block next connection for the same reason.
   
   I have watched this issue for a while, and could not image a situation where there is single `connectFuture` and this stuck emerged. The `channel.close().syncUninterruptibly()` means its `connectFuture` already completed successfully.
   
   https://github.com/apache/zookeeper/blob/a64dbf5b06ca1a73dc2ad6c7d31e679e312082aa/zookeeper-server/src/main/java/org/apache/zookeeper/ClientCnxnSocketNetty.java#L86



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@zookeeper.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org