You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by GitBox <gi...@apache.org> on 2020/06/05 08:50:46 UTC

[GitHub] [hbase] Apache9 opened a new pull request #1858: HBASE-24506 async client deadlock

Apache9 opened a new pull request #1858:
URL: https://github.com/apache/hbase/pull/1858


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hbase] Apache9 commented on a change in pull request #1858: HBASE-24506 async client deadlock

Posted by GitBox <gi...@apache.org>.
Apache9 commented on a change in pull request #1858:
URL: https://github.com/apache/hbase/pull/1858#discussion_r436370826



##########
File path: hbase-client/src/main/java/org/apache/hadoop/hbase/ipc/NettyRpcConnection.java
##########
@@ -73,40 +79,77 @@
   private static final Logger LOG = LoggerFactory.getLogger(NettyRpcConnection.class);
 
   private static final ScheduledExecutorService RELOGIN_EXECUTOR =
-      Executors.newSingleThreadScheduledExecutor(Threads.newDaemonThreadFactory("Relogin"));
+    Executors.newSingleThreadScheduledExecutor(Threads.newDaemonThreadFactory("Relogin"));
 
   private final NettyRpcClient rpcClient;
 
+  // the event loop used to set up the connection, we will also execute other operations for this
+  // connection in this event loop, to avoid locking everywhere.
+  private final EventLoop eventLoop;
+
   private ByteBuf connectionHeaderPreamble;
 
   private ByteBuf connectionHeaderWithLength;
 
-  @edu.umd.cs.findbugs.annotations.SuppressWarnings(value = "IS2_INCONSISTENT_SYNC",
-      justification = "connect is also under lock as notifyOnCancel will call our action directly")
-  private Channel channel;
+  // make it volatile so in the isActive method below we do not need to switch to the event loop
+  // thread to access this field.
+  private volatile Channel channel;
 
   NettyRpcConnection(NettyRpcClient rpcClient, ConnectionId remoteId) throws IOException {
     super(rpcClient.conf, AbstractRpcClient.WHEEL_TIMER, remoteId, rpcClient.clusterId,
-        rpcClient.userProvider.isHBaseSecurityEnabled(), rpcClient.codec, rpcClient.compressor);
+      rpcClient.userProvider.isHBaseSecurityEnabled(), rpcClient.codec, rpcClient.compressor);
     this.rpcClient = rpcClient;
+    this.eventLoop = rpcClient.group.next();
     byte[] connectionHeaderPreamble = getConnectionHeaderPreamble();
     this.connectionHeaderPreamble =
-        Unpooled.directBuffer(connectionHeaderPreamble.length).writeBytes(connectionHeaderPreamble);
+      Unpooled.directBuffer(connectionHeaderPreamble.length).writeBytes(connectionHeaderPreamble);
     ConnectionHeader header = getConnectionHeader();
     this.connectionHeaderWithLength = Unpooled.directBuffer(4 + header.getSerializedSize());
     this.connectionHeaderWithLength.writeInt(header.getSerializedSize());
     header.writeTo(new ByteBufOutputStream(this.connectionHeaderWithLength));
   }
 
-  @Override
-  protected synchronized void callTimeout(Call call) {
-    if (channel != null) {
-      channel.pipeline().fireUserEventTriggered(new CallEvent(TIMEOUT, call));
+  private static final FastThreadLocal<MutableInt> DEPTH = new FastThreadLocal<MutableInt>() {

Review comment:
       The overhead is very small I suppose? Only a final static field...
   
   Using Short or Byte is easy to introduce other problems as when comparing they will be cast to int...




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hbase] Apache-HBase commented on pull request #1858: HBASE-24506 async client deadlock

Posted by GitBox <gi...@apache.org>.
Apache-HBase commented on pull request #1858:
URL: https://github.com/apache/hbase/pull/1858#issuecomment-640252687


   :confetti_ball: **+1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime | Comment |
   |:----:|----------:|--------:|:--------|
   | +0 :ok: |  reexec  |   0m 30s |  Docker mode activated.  |
   | -0 :warning: |  yetus  |   0m  3s |  Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --whitespace-eol-ignore-list --whitespace-tabs-ignore-list --quick-hadoopcheck  |
   ||| _ Prechecks _ |
   ||| _ master Compile Tests _ |
   | +0 :ok: |  mvndep  |   0m 22s |  Maven dependency ordering for branch  |
   | +1 :green_heart: |  mvninstall  |   3m 36s |  master passed  |
   | +1 :green_heart: |  compile  |   1m 21s |  master passed  |
   | +1 :green_heart: |  shadedjars  |   5m 35s |  branch has no errors when building our shaded downstream artifacts.  |
   | +1 :green_heart: |  javadoc  |   1m  2s |  master passed  |
   ||| _ Patch Compile Tests _ |
   | +0 :ok: |  mvndep  |   0m 16s |  Maven dependency ordering for patch  |
   | +1 :green_heart: |  mvninstall  |   3m 28s |  the patch passed  |
   | +1 :green_heart: |  compile  |   1m 21s |  the patch passed  |
   | +1 :green_heart: |  javac  |   1m 21s |  the patch passed  |
   | +1 :green_heart: |  shadedjars  |   5m 33s |  patch has no errors when building our shaded downstream artifacts.  |
   | +1 :green_heart: |  javadoc  |   0m 58s |  the patch passed  |
   ||| _ Other Tests _ |
   | +1 :green_heart: |  unit  |   1m  0s |  hbase-client in the patch passed.  |
   | +1 :green_heart: |  unit  | 134m  0s |  hbase-server in the patch passed.  |
   |  |   | 161m 27s |   |
   
   
   | Subsystem | Report/Notes |
   |----------:|:-------------|
   | Docker | Client=19.03.11 Server=19.03.11 base: https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-1858/4/artifact/yetus-jdk8-hadoop3-check/output/Dockerfile |
   | GITHUB PR | https://github.com/apache/hbase/pull/1858 |
   | Optional Tests | javac javadoc unit shadedjars compile |
   | uname | Linux 383128c9ea53 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/hbase-personality.sh |
   | git revision | master / f2fde77fc3 |
   | Default Java | 1.8.0_232 |
   |  Test Results | https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-1858/4/testReport/ |
   | Max. process+thread count | 4775 (vs. ulimit of 12500) |
   | modules | C: hbase-client hbase-server U: . |
   | Console output | https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-1858/4/console |
   | versions | git=2.17.1 maven=(cecedd343002696d0abb50b32b541b8a6ba2883f) |
   | Powered by | Apache Yetus 0.11.1 https://yetus.apache.org |
   
   
   This message was automatically generated.
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hbase] esteban commented on a change in pull request #1858: HBASE-24506 async client deadlock

Posted by GitBox <gi...@apache.org>.
esteban commented on a change in pull request #1858:
URL: https://github.com/apache/hbase/pull/1858#discussion_r436153301



##########
File path: hbase-client/src/main/java/org/apache/hadoop/hbase/ipc/NettyRpcConnection.java
##########
@@ -146,45 +163,43 @@ private void established(Channel ch) throws IOException {
   private boolean reloginInProgress;
 
   private void scheduleRelogin(Throwable error) {
+    assert eventLoop.inEventLoop();
     if (error instanceof FallbackDisallowedException) {
       return;
     }
     if (!provider.canRetry()) {
       LOG.trace("SASL Provider does not support retries");
       return;
     }
-    synchronized (this) {
-      if (reloginInProgress) {
-        return;
-      }
-      reloginInProgress = true;
-      RELOGIN_EXECUTOR.schedule(new Runnable() {
+    if (reloginInProgress) {
+      return;
+    }
+    reloginInProgress = true;
+    RELOGIN_EXECUTOR.schedule(new Runnable() {
 
-        @Override
-        public void run() {
-          try {
-            provider.relogin();
-          } catch (IOException e) {
-            LOG.warn("Relogin failed", e);
-          }
-          synchronized (this) {
-            reloginInProgress = false;
-          }
+      @Override
+      public void run() {
+        try {
+          provider.relogin();
+        } catch (IOException e) {
+          LOG.warn("Relogin failed", e);
         }
-      }, ThreadLocalRandom.current().nextInt(reloginMaxBackoff), TimeUnit.MILLISECONDS);
-    }
+        eventLoop.execute(() -> {
+          reloginInProgress = false;
+        });
+      }
+    }, ThreadLocalRandom.current().nextInt(reloginMaxBackoff), TimeUnit.MILLISECONDS);
   }
 
   private void failInit(Channel ch, IOException e) {
-    synchronized (this) {
-      // fail all pending calls
-      ch.pipeline().fireUserEventTriggered(BufferCallEvent.fail(e));
-      shutdown0();
-      return;
-    }
+    assert eventLoop.inEventLoop();
+    // fail all pending calls
+    ch.pipeline().fireUserEventTriggered(BufferCallEvent.fail(e));
+    shutdown0();
   }
 
   private void saslNegotiate(final Channel ch) {
+    assert eventLoop.inEventLoop();

Review comment:
       same https://github.com/apache/hbase/pull/1858/commits/08b77a5eca368f391f528e009c95e749d5fa750f#r436153126




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hbase] busbey edited a comment on pull request #1858: HBASE-24506 async client deadlock

Posted by GitBox <gi...@apache.org>.
busbey edited a comment on pull request #1858:
URL: https://github.com/apache/hbase/pull/1858#issuecomment-640095140


   That reasoning sounds like a correct use of asserts. Just wanted to make sure we're guarding against a code change and not something folks could get configured wrong in a prod deployment.
   
   Thanks for writing out the explanation!


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hbase] Apache9 commented on a change in pull request #1858: HBASE-24506 async client deadlock

Posted by GitBox <gi...@apache.org>.
Apache9 commented on a change in pull request #1858:
URL: https://github.com/apache/hbase/pull/1858#discussion_r436244428



##########
File path: hbase-client/src/main/java/org/apache/hadoop/hbase/ipc/NettyRpcConnection.java
##########
@@ -253,52 +268,38 @@ public void operationComplete(Future<Boolean> future) throws Exception {
   }
 
   private void connect() {
+    assert eventLoop.inEventLoop();
     LOG.trace("Connecting to {}", remoteId.address);
 
-    this.channel = new Bootstrap().group(rpcClient.group).channel(rpcClient.channelClass)
-        .option(ChannelOption.TCP_NODELAY, rpcClient.isTcpNoDelay())
-        .option(ChannelOption.SO_KEEPALIVE, rpcClient.tcpKeepAlive)
-        .option(ChannelOption.CONNECT_TIMEOUT_MILLIS, rpcClient.connectTO)
-        .handler(new BufferCallBeforeInitHandler()).localAddress(rpcClient.localAddr)
-        .remoteAddress(remoteId.address).connect().addListener(new ChannelFutureListener() {
-
-          @Override
-          public void operationComplete(ChannelFuture future) throws Exception {
-            Channel ch = future.channel();
-            if (!future.isSuccess()) {
-              failInit(ch, toIOE(future.cause()));
-              rpcClient.failedServers.addToFailedServers(remoteId.address, future.cause());
-              return;
-            }
-            ch.writeAndFlush(connectionHeaderPreamble.retainedDuplicate());
-            if (useSasl) {
-              saslNegotiate(ch);
-            } else {
-              // send the connection header to server
-              ch.write(connectionHeaderWithLength.retainedDuplicate());
-              established(ch);
-            }
-          }
-        }).channel();
-  }
+    this.channel = new Bootstrap().group(eventLoop).channel(rpcClient.channelClass)
+      .option(ChannelOption.TCP_NODELAY, rpcClient.isTcpNoDelay())
+      .option(ChannelOption.SO_KEEPALIVE, rpcClient.tcpKeepAlive)
+      .option(ChannelOption.CONNECT_TIMEOUT_MILLIS, rpcClient.connectTO)
+      .handler(new BufferCallBeforeInitHandler()).localAddress(rpcClient.localAddr)
+      .remoteAddress(remoteId.address).connect().addListener(new ChannelFutureListener() {
 
-  private void write(Channel ch, final Call call) {
-    ch.writeAndFlush(call).addListener(new ChannelFutureListener() {
-
-      @Override
-      public void operationComplete(ChannelFuture future) throws Exception {
-        // Fail the call if we failed to write it out. This usually because the channel is
-        // closed. This is needed because we may shutdown the channel inside event loop and
-        // there may still be some pending calls in the event loop queue after us.
-        if (!future.isSuccess()) {
-          call.setException(toIOE(future.cause()));
+        @Override
+        public void operationComplete(ChannelFuture future) throws Exception {
+          Channel ch = future.channel();
+          if (!future.isSuccess()) {
+            failInit(ch, toIOE(future.cause()));
+            rpcClient.failedServers.addToFailedServers(remoteId.address, future.cause());
+            return;
+          }
+          ch.writeAndFlush(connectionHeaderPreamble.retainedDuplicate());
+          if (useSasl) {
+            saslNegotiate(ch);
+          } else {
+            // send the connection header to server
+            ch.write(connectionHeaderWithLength.retainedDuplicate());
+            established(ch);
+          }
         }
-      }
-    });
+      }).channel();
   }
 
-  @Override
-  public synchronized void sendRequest(final Call call, HBaseRpcController hrc) throws IOException {
+  private void sendRequest0(Call call, HBaseRpcController hrc) throws IOException {
+    assert eventLoop.inEventLoop();

Review comment:
       I think you two are talking about different things? For the assert, just want to make sure that later developpers do not break the assumption as usually we will enable assert for UTs. For a production cluster where you want performance, you can just disable assert.
   
   And on the performance effect on event loop, I think it would be fine. As when you call write on a channel, finally they will switch to the event loop thread to call actual handlers. But anyway, we could run a PE to see the performance impact.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hbase] bharathv commented on a change in pull request #1858: HBASE-24506 async client deadlock

Posted by GitBox <gi...@apache.org>.
bharathv commented on a change in pull request #1858:
URL: https://github.com/apache/hbase/pull/1858#discussion_r436123009



##########
File path: hbase-client/src/main/java/org/apache/hadoop/hbase/ipc/NettyRpcConnection.java
##########
@@ -73,40 +73,53 @@
   private static final Logger LOG = LoggerFactory.getLogger(NettyRpcConnection.class);
 
   private static final ScheduledExecutorService RELOGIN_EXECUTOR =
-      Executors.newSingleThreadScheduledExecutor(Threads.newDaemonThreadFactory("Relogin"));
+    Executors.newSingleThreadScheduledExecutor(Threads.newDaemonThreadFactory("Relogin"));
 
   private final NettyRpcClient rpcClient;
 
+  // the event loop used to set up the connection, we will also execute other operations for this
+  // connection in this event loop, to avoid locking everywhere.
+  private final EventLoop eventLoop;
+
   private ByteBuf connectionHeaderPreamble;
 
   private ByteBuf connectionHeaderWithLength;
 
-  @edu.umd.cs.findbugs.annotations.SuppressWarnings(value = "IS2_INCONSISTENT_SYNC",
-      justification = "connect is also under lock as notifyOnCancel will call our action directly")
-  private Channel channel;
+  private volatile Channel channel;
 
   NettyRpcConnection(NettyRpcClient rpcClient, ConnectionId remoteId) throws IOException {
     super(rpcClient.conf, AbstractRpcClient.WHEEL_TIMER, remoteId, rpcClient.clusterId,
-        rpcClient.userProvider.isHBaseSecurityEnabled(), rpcClient.codec, rpcClient.compressor);
+      rpcClient.userProvider.isHBaseSecurityEnabled(), rpcClient.codec, rpcClient.compressor);
     this.rpcClient = rpcClient;
+    this.eventLoop = rpcClient.group.next();
     byte[] connectionHeaderPreamble = getConnectionHeaderPreamble();
     this.connectionHeaderPreamble =
-        Unpooled.directBuffer(connectionHeaderPreamble.length).writeBytes(connectionHeaderPreamble);
+      Unpooled.directBuffer(connectionHeaderPreamble.length).writeBytes(connectionHeaderPreamble);
     ConnectionHeader header = getConnectionHeader();
     this.connectionHeaderWithLength = Unpooled.directBuffer(4 + header.getSerializedSize());
     this.connectionHeaderWithLength.writeInt(header.getSerializedSize());
     header.writeTo(new ByteBufOutputStream(this.connectionHeaderWithLength));
   }
 
-  @Override
-  protected synchronized void callTimeout(Call call) {
-    if (channel != null) {
-      channel.pipeline().fireUserEventTriggered(new CallEvent(TIMEOUT, call));
+  private void execute(Runnable action) {
+    if (eventLoop.inEventLoop()) {

Review comment:
       nit: would the first if() be ever true for client threads?

##########
File path: hbase-client/src/main/java/org/apache/hadoop/hbase/ipc/NettyRpcConnection.java
##########
@@ -253,52 +268,38 @@ public void operationComplete(Future<Boolean> future) throws Exception {
   }
 
   private void connect() {
+    assert eventLoop.inEventLoop();
     LOG.trace("Connecting to {}", remoteId.address);
 
-    this.channel = new Bootstrap().group(rpcClient.group).channel(rpcClient.channelClass)
-        .option(ChannelOption.TCP_NODELAY, rpcClient.isTcpNoDelay())
-        .option(ChannelOption.SO_KEEPALIVE, rpcClient.tcpKeepAlive)
-        .option(ChannelOption.CONNECT_TIMEOUT_MILLIS, rpcClient.connectTO)
-        .handler(new BufferCallBeforeInitHandler()).localAddress(rpcClient.localAddr)
-        .remoteAddress(remoteId.address).connect().addListener(new ChannelFutureListener() {
-
-          @Override
-          public void operationComplete(ChannelFuture future) throws Exception {
-            Channel ch = future.channel();
-            if (!future.isSuccess()) {
-              failInit(ch, toIOE(future.cause()));
-              rpcClient.failedServers.addToFailedServers(remoteId.address, future.cause());
-              return;
-            }
-            ch.writeAndFlush(connectionHeaderPreamble.retainedDuplicate());
-            if (useSasl) {
-              saslNegotiate(ch);
-            } else {
-              // send the connection header to server
-              ch.write(connectionHeaderWithLength.retainedDuplicate());
-              established(ch);
-            }
-          }
-        }).channel();
-  }
+    this.channel = new Bootstrap().group(eventLoop).channel(rpcClient.channelClass)
+      .option(ChannelOption.TCP_NODELAY, rpcClient.isTcpNoDelay())
+      .option(ChannelOption.SO_KEEPALIVE, rpcClient.tcpKeepAlive)
+      .option(ChannelOption.CONNECT_TIMEOUT_MILLIS, rpcClient.connectTO)
+      .handler(new BufferCallBeforeInitHandler()).localAddress(rpcClient.localAddr)
+      .remoteAddress(remoteId.address).connect().addListener(new ChannelFutureListener() {
 
-  private void write(Channel ch, final Call call) {
-    ch.writeAndFlush(call).addListener(new ChannelFutureListener() {
-
-      @Override
-      public void operationComplete(ChannelFuture future) throws Exception {
-        // Fail the call if we failed to write it out. This usually because the channel is
-        // closed. This is needed because we may shutdown the channel inside event loop and
-        // there may still be some pending calls in the event loop queue after us.
-        if (!future.isSuccess()) {
-          call.setException(toIOE(future.cause()));
+        @Override
+        public void operationComplete(ChannelFuture future) throws Exception {
+          Channel ch = future.channel();
+          if (!future.isSuccess()) {
+            failInit(ch, toIOE(future.cause()));
+            rpcClient.failedServers.addToFailedServers(remoteId.address, future.cause());
+            return;
+          }
+          ch.writeAndFlush(connectionHeaderPreamble.retainedDuplicate());
+          if (useSasl) {
+            saslNegotiate(ch);
+          } else {
+            // send the connection header to server
+            ch.write(connectionHeaderWithLength.retainedDuplicate());
+            established(ch);
+          }
         }
-      }
-    });
+      }).channel();
   }
 
-  @Override
-  public synchronized void sendRequest(final Call call, HBaseRpcController hrc) throws IOException {

Review comment:
       Just a question, why does it have to be giant synchronized? What is the un-safe shared state? Wondering if we can narrow down the lock and simplify the threading model to fix the actual problem.

##########
File path: hbase-client/src/main/java/org/apache/hadoop/hbase/ipc/NettyRpcConnection.java
##########
@@ -253,52 +268,38 @@ public void operationComplete(Future<Boolean> future) throws Exception {
   }
 
   private void connect() {
+    assert eventLoop.inEventLoop();
     LOG.trace("Connecting to {}", remoteId.address);
 
-    this.channel = new Bootstrap().group(rpcClient.group).channel(rpcClient.channelClass)
-        .option(ChannelOption.TCP_NODELAY, rpcClient.isTcpNoDelay())
-        .option(ChannelOption.SO_KEEPALIVE, rpcClient.tcpKeepAlive)
-        .option(ChannelOption.CONNECT_TIMEOUT_MILLIS, rpcClient.connectTO)
-        .handler(new BufferCallBeforeInitHandler()).localAddress(rpcClient.localAddr)
-        .remoteAddress(remoteId.address).connect().addListener(new ChannelFutureListener() {
-
-          @Override
-          public void operationComplete(ChannelFuture future) throws Exception {
-            Channel ch = future.channel();
-            if (!future.isSuccess()) {
-              failInit(ch, toIOE(future.cause()));
-              rpcClient.failedServers.addToFailedServers(remoteId.address, future.cause());
-              return;
-            }
-            ch.writeAndFlush(connectionHeaderPreamble.retainedDuplicate());
-            if (useSasl) {
-              saslNegotiate(ch);
-            } else {
-              // send the connection header to server
-              ch.write(connectionHeaderWithLength.retainedDuplicate());
-              established(ch);
-            }
-          }
-        }).channel();
-  }
+    this.channel = new Bootstrap().group(eventLoop).channel(rpcClient.channelClass)
+      .option(ChannelOption.TCP_NODELAY, rpcClient.isTcpNoDelay())
+      .option(ChannelOption.SO_KEEPALIVE, rpcClient.tcpKeepAlive)
+      .option(ChannelOption.CONNECT_TIMEOUT_MILLIS, rpcClient.connectTO)
+      .handler(new BufferCallBeforeInitHandler()).localAddress(rpcClient.localAddr)
+      .remoteAddress(remoteId.address).connect().addListener(new ChannelFutureListener() {
 
-  private void write(Channel ch, final Call call) {
-    ch.writeAndFlush(call).addListener(new ChannelFutureListener() {
-
-      @Override
-      public void operationComplete(ChannelFuture future) throws Exception {
-        // Fail the call if we failed to write it out. This usually because the channel is
-        // closed. This is needed because we may shutdown the channel inside event loop and
-        // there may still be some pending calls in the event loop queue after us.
-        if (!future.isSuccess()) {
-          call.setException(toIOE(future.cause()));
+        @Override
+        public void operationComplete(ChannelFuture future) throws Exception {
+          Channel ch = future.channel();
+          if (!future.isSuccess()) {
+            failInit(ch, toIOE(future.cause()));
+            rpcClient.failedServers.addToFailedServers(remoteId.address, future.cause());
+            return;
+          }
+          ch.writeAndFlush(connectionHeaderPreamble.retainedDuplicate());
+          if (useSasl) {
+            saslNegotiate(ch);
+          } else {
+            // send the connection header to server
+            ch.write(connectionHeaderWithLength.retainedDuplicate());
+            established(ch);
+          }
         }
-      }
-    });
+      }).channel();
   }
 
-  @Override
-  public synchronized void sendRequest(final Call call, HBaseRpcController hrc) throws IOException {
+  private void sendRequest0(Call call, HBaseRpcController hrc) throws IOException {
+    assert eventLoop.inEventLoop();

Review comment:
       Curious to know your thoughts on the performance implications of this change. Now that the event loop is responsible for more than what it was doing before, how would it affect the overall throughput.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hbase] Apache-HBase commented on pull request #1858: HBASE-24506 async client deadlock

Posted by GitBox <gi...@apache.org>.
Apache-HBase commented on pull request #1858:
URL: https://github.com/apache/hbase/pull/1858#issuecomment-640235516


   :confetti_ball: **+1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime | Comment |
   |:----:|----------:|--------:|:--------|
   | +0 :ok: |  reexec  |   0m 29s |  Docker mode activated.  |
   ||| _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  No case conflicting files found.  |
   | +1 :green_heart: |  hbaseanti  |   0m  0s |  Patch does not have any anti-patterns.  |
   | +1 :green_heart: |  @author  |   0m  0s |  The patch does not contain any @author tags.  |
   ||| _ master Compile Tests _ |
   | +0 :ok: |  mvndep  |   0m 26s |  Maven dependency ordering for branch  |
   | +1 :green_heart: |  mvninstall  |   3m 23s |  master passed  |
   | +1 :green_heart: |  checkstyle  |   1m 31s |  master passed  |
   | +1 :green_heart: |  spotbugs  |   2m 54s |  master passed  |
   ||| _ Patch Compile Tests _ |
   | +0 :ok: |  mvndep  |   0m 13s |  Maven dependency ordering for patch  |
   | +1 :green_heart: |  mvninstall  |   3m 21s |  the patch passed  |
   | +1 :green_heart: |  checkstyle  |   0m 27s |  hbase-client: The patch generated 0 new + 1 unchanged - 4 fixed = 1 total (was 5)  |
   | +1 :green_heart: |  checkstyle  |   1m  3s |  The patch passed checkstyle in hbase-server  |
   | +1 :green_heart: |  whitespace  |   0m  0s |  The patch has no whitespace issues.  |
   | +1 :green_heart: |  hadoopcheck  |  11m  6s |  Patch does not cause any errors with Hadoop 3.1.2 3.2.1.  |
   | +1 :green_heart: |  spotbugs  |   3m 13s |  the patch passed  |
   ||| _ Other Tests _ |
   | +1 :green_heart: |  asflicense  |   0m 24s |  The patch does not generate ASF License warnings.  |
   |  |   |  35m 59s |   |
   
   
   | Subsystem | Report/Notes |
   |----------:|:-------------|
   | Docker | Client=19.03.11 Server=19.03.11 base: https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-1858/4/artifact/yetus-general-check/output/Dockerfile |
   | GITHUB PR | https://github.com/apache/hbase/pull/1858 |
   | Optional Tests | dupname asflicense spotbugs hadoopcheck hbaseanti checkstyle |
   | uname | Linux 5259f0afade0 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/hbase-personality.sh |
   | git revision | master / f2fde77fc3 |
   | Max. process+thread count | 94 (vs. ulimit of 12500) |
   | modules | C: hbase-client hbase-server U: . |
   | Console output | https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-1858/4/console |
   | versions | git=2.17.1 maven=(cecedd343002696d0abb50b32b541b8a6ba2883f) spotbugs=3.1.12 |
   | Powered by | Apache Yetus 0.11.1 https://yetus.apache.org |
   
   
   This message was automatically generated.
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hbase] Apache9 commented on a change in pull request #1858: HBASE-24506 async client deadlock

Posted by GitBox <gi...@apache.org>.
Apache9 commented on a change in pull request #1858:
URL: https://github.com/apache/hbase/pull/1858#discussion_r436244996



##########
File path: hbase-client/src/main/java/org/apache/hadoop/hbase/ipc/NettyRpcConnection.java
##########
@@ -146,45 +163,43 @@ private void established(Channel ch) throws IOException {
   private boolean reloginInProgress;
 
   private void scheduleRelogin(Throwable error) {
+    assert eventLoop.inEventLoop();

Review comment:
       My intention is to let later developpers know that this method must be called in the event loop thread. Do not want to introduce any overhead when executing the code in production, as in production, if it is not called in an event loop thread, it will be a ctitical bug and can not be recovered, so I do not think it worth other checks like Preconditions.checkXXX...




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hbase] busbey commented on pull request #1858: HBASE-24506 async client deadlock

Posted by GitBox <gi...@apache.org>.
busbey commented on pull request #1858:
URL: https://github.com/apache/hbase/pull/1858#issuecomment-640095140


   That reasoning sounds like a correct use if asserts. Just wanted to make sure we're guarding against a code change and not something folks could get configured wrong in a prod deployment.
   
   Thanks for writing out the explanation!


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hbase] Apache-HBase commented on pull request #1858: HBASE-24506 async client deadlock

Posted by GitBox <gi...@apache.org>.
Apache-HBase commented on pull request #1858:
URL: https://github.com/apache/hbase/pull/1858#issuecomment-639484408


   :confetti_ball: **+1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime | Comment |
   |:----:|----------:|--------:|:--------|
   | +0 :ok: |  reexec  |   2m 44s |  Docker mode activated.  |
   | -0 :warning: |  yetus  |   0m  3s |  Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --whitespace-eol-ignore-list --whitespace-tabs-ignore-list --quick-hadoopcheck  |
   ||| _ Prechecks _ |
   ||| _ master Compile Tests _ |
   | +0 :ok: |  mvndep  |   0m 23s |  Maven dependency ordering for branch  |
   | +1 :green_heart: |  mvninstall  |   4m 59s |  master passed  |
   | +1 :green_heart: |  compile  |   1m 40s |  master passed  |
   | +1 :green_heart: |  shadedjars  |   6m 50s |  branch has no errors when building our shaded downstream artifacts.  |
   | +1 :green_heart: |  javadoc  |   1m  9s |  master passed  |
   ||| _ Patch Compile Tests _ |
   | +0 :ok: |  mvndep  |   0m 16s |  Maven dependency ordering for patch  |
   | +1 :green_heart: |  mvninstall  |   4m  8s |  the patch passed  |
   | +1 :green_heart: |  compile  |   1m 35s |  the patch passed  |
   | +1 :green_heart: |  javac  |   1m 35s |  the patch passed  |
   | +1 :green_heart: |  shadedjars  |   7m  7s |  patch has no errors when building our shaded downstream artifacts.  |
   | +1 :green_heart: |  javadoc  |   1m 16s |  the patch passed  |
   ||| _ Other Tests _ |
   | +1 :green_heart: |  unit  |   1m 30s |  hbase-client in the patch passed.  |
   | +1 :green_heart: |  unit  | 220m  3s |  hbase-server in the patch passed.  |
   |  |   | 256m  0s |   |
   
   
   | Subsystem | Report/Notes |
   |----------:|:-------------|
   | Docker | Client=19.03.11 Server=19.03.11 base: https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-1858/1/artifact/yetus-jdk8-hadoop3-check/output/Dockerfile |
   | GITHUB PR | https://github.com/apache/hbase/pull/1858 |
   | Optional Tests | javac javadoc unit shadedjars compile |
   | uname | Linux a8ab42fc6300 4.15.0-74-generic #84-Ubuntu SMP Thu Dec 19 08:06:28 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/hbase-personality.sh |
   | git revision | master / 2c2b1f0174 |
   | Default Java | 1.8.0_232 |
   |  Test Results | https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-1858/1/testReport/ |
   | Max. process+thread count | 3674 (vs. ulimit of 12500) |
   | modules | C: hbase-client hbase-server U: . |
   | Console output | https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-1858/1/console |
   | versions | git=2.17.1 maven=(cecedd343002696d0abb50b32b541b8a6ba2883f) |
   | Powered by | Apache Yetus 0.11.1 https://yetus.apache.org |
   
   
   This message was automatically generated.
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hbase] virajjasani commented on a change in pull request #1858: HBASE-24506 async client deadlock

Posted by GitBox <gi...@apache.org>.
virajjasani commented on a change in pull request #1858:
URL: https://github.com/apache/hbase/pull/1858#discussion_r436337656



##########
File path: hbase-client/src/main/java/org/apache/hadoop/hbase/ipc/NettyRpcConnection.java
##########
@@ -253,52 +268,38 @@ public void operationComplete(Future<Boolean> future) throws Exception {
   }
 
   private void connect() {
+    assert eventLoop.inEventLoop();
     LOG.trace("Connecting to {}", remoteId.address);
 
-    this.channel = new Bootstrap().group(rpcClient.group).channel(rpcClient.channelClass)
-        .option(ChannelOption.TCP_NODELAY, rpcClient.isTcpNoDelay())
-        .option(ChannelOption.SO_KEEPALIVE, rpcClient.tcpKeepAlive)
-        .option(ChannelOption.CONNECT_TIMEOUT_MILLIS, rpcClient.connectTO)
-        .handler(new BufferCallBeforeInitHandler()).localAddress(rpcClient.localAddr)
-        .remoteAddress(remoteId.address).connect().addListener(new ChannelFutureListener() {
-
-          @Override
-          public void operationComplete(ChannelFuture future) throws Exception {
-            Channel ch = future.channel();
-            if (!future.isSuccess()) {
-              failInit(ch, toIOE(future.cause()));
-              rpcClient.failedServers.addToFailedServers(remoteId.address, future.cause());
-              return;
-            }
-            ch.writeAndFlush(connectionHeaderPreamble.retainedDuplicate());
-            if (useSasl) {
-              saslNegotiate(ch);
-            } else {
-              // send the connection header to server
-              ch.write(connectionHeaderWithLength.retainedDuplicate());
-              established(ch);
-            }
-          }
-        }).channel();
-  }
+    this.channel = new Bootstrap().group(eventLoop).channel(rpcClient.channelClass)
+      .option(ChannelOption.TCP_NODELAY, rpcClient.isTcpNoDelay())
+      .option(ChannelOption.SO_KEEPALIVE, rpcClient.tcpKeepAlive)
+      .option(ChannelOption.CONNECT_TIMEOUT_MILLIS, rpcClient.connectTO)
+      .handler(new BufferCallBeforeInitHandler()).localAddress(rpcClient.localAddr)
+      .remoteAddress(remoteId.address).connect().addListener(new ChannelFutureListener() {
 
-  private void write(Channel ch, final Call call) {
-    ch.writeAndFlush(call).addListener(new ChannelFutureListener() {
-
-      @Override
-      public void operationComplete(ChannelFuture future) throws Exception {
-        // Fail the call if we failed to write it out. This usually because the channel is
-        // closed. This is needed because we may shutdown the channel inside event loop and
-        // there may still be some pending calls in the event loop queue after us.
-        if (!future.isSuccess()) {
-          call.setException(toIOE(future.cause()));
+        @Override
+        public void operationComplete(ChannelFuture future) throws Exception {
+          Channel ch = future.channel();
+          if (!future.isSuccess()) {
+            failInit(ch, toIOE(future.cause()));
+            rpcClient.failedServers.addToFailedServers(remoteId.address, future.cause());
+            return;
+          }
+          ch.writeAndFlush(connectionHeaderPreamble.retainedDuplicate());
+          if (useSasl) {
+            saslNegotiate(ch);
+          } else {
+            // send the connection header to server
+            ch.write(connectionHeaderWithLength.retainedDuplicate());
+            established(ch);
+          }
         }
-      }
-    });
+      }).channel();
   }
 
-  @Override
-  public synchronized void sendRequest(final Call call, HBaseRpcController hrc) throws IOException {

Review comment:
       This is covered already right? Maybe older comment, I might have read it in different order. Sorry for the confusion.
   Btw +1 for volatile channel and removal of synchronized from this method as netty is taking care of incoming requests already.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hbase] Apache-HBase commented on pull request #1858: HBASE-24506 async client deadlock

Posted by GitBox <gi...@apache.org>.
Apache-HBase commented on pull request #1858:
URL: https://github.com/apache/hbase/pull/1858#issuecomment-640090019


   :confetti_ball: **+1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime | Comment |
   |:----:|----------:|--------:|:--------|
   | +0 :ok: |  reexec  |   1m  3s |  Docker mode activated.  |
   | -0 :warning: |  yetus  |   0m  3s |  Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --whitespace-eol-ignore-list --whitespace-tabs-ignore-list --quick-hadoopcheck  |
   ||| _ Prechecks _ |
   ||| _ master Compile Tests _ |
   | +0 :ok: |  mvndep  |   0m 20s |  Maven dependency ordering for branch  |
   | +1 :green_heart: |  mvninstall  |   4m 42s |  master passed  |
   | +1 :green_heart: |  compile  |   1m 38s |  master passed  |
   | +1 :green_heart: |  shadedjars  |   6m 27s |  branch has no errors when building our shaded downstream artifacts.  |
   | -0 :warning: |  javadoc  |   0m 26s |  hbase-client in master failed.  |
   | -0 :warning: |  javadoc  |   0m 42s |  hbase-server in master failed.  |
   ||| _ Patch Compile Tests _ |
   | +0 :ok: |  mvndep  |   0m 13s |  Maven dependency ordering for patch  |
   | +1 :green_heart: |  mvninstall  |   4m 28s |  the patch passed  |
   | +1 :green_heart: |  compile  |   1m 36s |  the patch passed  |
   | +1 :green_heart: |  javac  |   1m 36s |  the patch passed  |
   | +1 :green_heart: |  shadedjars  |   6m 19s |  patch has no errors when building our shaded downstream artifacts.  |
   | -0 :warning: |  javadoc  |   0m 25s |  hbase-client in the patch failed.  |
   | -0 :warning: |  javadoc  |   0m 49s |  hbase-server in the patch failed.  |
   ||| _ Other Tests _ |
   | +1 :green_heart: |  unit  |   1m 33s |  hbase-client in the patch passed.  |
   | +1 :green_heart: |  unit  | 189m 13s |  hbase-server in the patch passed.  |
   |  |   | 221m 55s |   |
   
   
   | Subsystem | Report/Notes |
   |----------:|:-------------|
   | Docker | Client=19.03.11 Server=19.03.11 base: https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-1858/3/artifact/yetus-jdk11-hadoop3-check/output/Dockerfile |
   | GITHUB PR | https://github.com/apache/hbase/pull/1858 |
   | Optional Tests | javac javadoc unit shadedjars compile |
   | uname | Linux 9abc35ba0ec9 4.15.0-101-generic #102-Ubuntu SMP Mon May 11 10:07:26 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/hbase-personality.sh |
   | git revision | master / 16116fa35e |
   | Default Java | 2020-01-14 |
   | javadoc | https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-1858/3/artifact/yetus-jdk11-hadoop3-check/output/branch-javadoc-hbase-client.txt |
   | javadoc | https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-1858/3/artifact/yetus-jdk11-hadoop3-check/output/branch-javadoc-hbase-server.txt |
   | javadoc | https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-1858/3/artifact/yetus-jdk11-hadoop3-check/output/patch-javadoc-hbase-client.txt |
   | javadoc | https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-1858/3/artifact/yetus-jdk11-hadoop3-check/output/patch-javadoc-hbase-server.txt |
   |  Test Results | https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-1858/3/testReport/ |
   | Max. process+thread count | 3156 (vs. ulimit of 12500) |
   | modules | C: hbase-client hbase-server U: . |
   | Console output | https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-1858/3/console |
   | versions | git=2.17.1 maven=(cecedd343002696d0abb50b32b541b8a6ba2883f) |
   | Powered by | Apache Yetus 0.11.1 https://yetus.apache.org |
   
   
   This message was automatically generated.
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hbase] esteban commented on a change in pull request #1858: HBASE-24506 async client deadlock

Posted by GitBox <gi...@apache.org>.
esteban commented on a change in pull request #1858:
URL: https://github.com/apache/hbase/pull/1858#discussion_r436155356



##########
File path: hbase-client/src/main/java/org/apache/hadoop/hbase/ipc/NettyRpcConnection.java
##########
@@ -253,52 +268,38 @@ public void operationComplete(Future<Boolean> future) throws Exception {
   }
 
   private void connect() {
+    assert eventLoop.inEventLoop();
     LOG.trace("Connecting to {}", remoteId.address);
 
-    this.channel = new Bootstrap().group(rpcClient.group).channel(rpcClient.channelClass)
-        .option(ChannelOption.TCP_NODELAY, rpcClient.isTcpNoDelay())
-        .option(ChannelOption.SO_KEEPALIVE, rpcClient.tcpKeepAlive)
-        .option(ChannelOption.CONNECT_TIMEOUT_MILLIS, rpcClient.connectTO)
-        .handler(new BufferCallBeforeInitHandler()).localAddress(rpcClient.localAddr)
-        .remoteAddress(remoteId.address).connect().addListener(new ChannelFutureListener() {
-
-          @Override
-          public void operationComplete(ChannelFuture future) throws Exception {
-            Channel ch = future.channel();
-            if (!future.isSuccess()) {
-              failInit(ch, toIOE(future.cause()));
-              rpcClient.failedServers.addToFailedServers(remoteId.address, future.cause());
-              return;
-            }
-            ch.writeAndFlush(connectionHeaderPreamble.retainedDuplicate());
-            if (useSasl) {
-              saslNegotiate(ch);
-            } else {
-              // send the connection header to server
-              ch.write(connectionHeaderWithLength.retainedDuplicate());
-              established(ch);
-            }
-          }
-        }).channel();
-  }
+    this.channel = new Bootstrap().group(eventLoop).channel(rpcClient.channelClass)
+      .option(ChannelOption.TCP_NODELAY, rpcClient.isTcpNoDelay())
+      .option(ChannelOption.SO_KEEPALIVE, rpcClient.tcpKeepAlive)
+      .option(ChannelOption.CONNECT_TIMEOUT_MILLIS, rpcClient.connectTO)
+      .handler(new BufferCallBeforeInitHandler()).localAddress(rpcClient.localAddr)
+      .remoteAddress(remoteId.address).connect().addListener(new ChannelFutureListener() {
 
-  private void write(Channel ch, final Call call) {
-    ch.writeAndFlush(call).addListener(new ChannelFutureListener() {
-
-      @Override
-      public void operationComplete(ChannelFuture future) throws Exception {
-        // Fail the call if we failed to write it out. This usually because the channel is
-        // closed. This is needed because we may shutdown the channel inside event loop and
-        // there may still be some pending calls in the event loop queue after us.
-        if (!future.isSuccess()) {
-          call.setException(toIOE(future.cause()));
+        @Override
+        public void operationComplete(ChannelFuture future) throws Exception {
+          Channel ch = future.channel();
+          if (!future.isSuccess()) {
+            failInit(ch, toIOE(future.cause()));
+            rpcClient.failedServers.addToFailedServers(remoteId.address, future.cause());
+            return;
+          }
+          ch.writeAndFlush(connectionHeaderPreamble.retainedDuplicate());
+          if (useSasl) {
+            saslNegotiate(ch);
+          } else {
+            // send the connection header to server
+            ch.write(connectionHeaderWithLength.retainedDuplicate());
+            established(ch);
+          }
         }
-      }
-    });
+      }).channel();
   }
 
-  @Override
-  public synchronized void sendRequest(final Call call, HBaseRpcController hrc) throws IOException {
+  private void sendRequest0(Call call, HBaseRpcController hrc) throws IOException {
+    assert eventLoop.inEventLoop();

Review comment:
       +1 I don't think the assert should be sitting in a hot path.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hbase] Apache-HBase commented on pull request #1858: HBASE-24506 async client deadlock

Posted by GitBox <gi...@apache.org>.
Apache-HBase commented on pull request #1858:
URL: https://github.com/apache/hbase/pull/1858#issuecomment-640047890


   :confetti_ball: **+1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime | Comment |
   |:----:|----------:|--------:|:--------|
   | +0 :ok: |  reexec  |   2m 32s |  Docker mode activated.  |
   | -0 :warning: |  yetus  |   0m  3s |  Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --whitespace-eol-ignore-list --whitespace-tabs-ignore-list --quick-hadoopcheck  |
   ||| _ Prechecks _ |
   ||| _ master Compile Tests _ |
   | +0 :ok: |  mvndep  |   0m 21s |  Maven dependency ordering for branch  |
   | +1 :green_heart: |  mvninstall  |   4m 41s |  master passed  |
   | +1 :green_heart: |  compile  |   1m 39s |  master passed  |
   | +1 :green_heart: |  shadedjars  |   6m 23s |  branch has no errors when building our shaded downstream artifacts.  |
   | -0 :warning: |  javadoc  |   0m 27s |  hbase-client in master failed.  |
   | -0 :warning: |  javadoc  |   0m 40s |  hbase-server in master failed.  |
   ||| _ Patch Compile Tests _ |
   | +0 :ok: |  mvndep  |   0m 14s |  Maven dependency ordering for patch  |
   | +1 :green_heart: |  mvninstall  |   4m 31s |  the patch passed  |
   | +1 :green_heart: |  compile  |   1m 36s |  the patch passed  |
   | +1 :green_heart: |  javac  |   1m 36s |  the patch passed  |
   | +1 :green_heart: |  shadedjars  |   6m 22s |  patch has no errors when building our shaded downstream artifacts.  |
   | -0 :warning: |  javadoc  |   0m 25s |  hbase-client in the patch failed.  |
   | -0 :warning: |  javadoc  |   0m 40s |  hbase-server in the patch failed.  |
   ||| _ Other Tests _ |
   | +1 :green_heart: |  unit  |   1m 17s |  hbase-client in the patch passed.  |
   | +1 :green_heart: |  unit  | 191m 33s |  hbase-server in the patch passed.  |
   |  |   | 225m 23s |   |
   
   
   | Subsystem | Report/Notes |
   |----------:|:-------------|
   | Docker | Client=19.03.11 Server=19.03.11 base: https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-1858/2/artifact/yetus-jdk11-hadoop3-check/output/Dockerfile |
   | GITHUB PR | https://github.com/apache/hbase/pull/1858 |
   | Optional Tests | javac javadoc unit shadedjars compile |
   | uname | Linux 88f6cac99ca5 4.15.0-101-generic #102-Ubuntu SMP Mon May 11 10:07:26 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/hbase-personality.sh |
   | git revision | master / 89b7b5a7f9 |
   | Default Java | 2020-01-14 |
   | javadoc | https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-1858/2/artifact/yetus-jdk11-hadoop3-check/output/branch-javadoc-hbase-client.txt |
   | javadoc | https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-1858/2/artifact/yetus-jdk11-hadoop3-check/output/branch-javadoc-hbase-server.txt |
   | javadoc | https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-1858/2/artifact/yetus-jdk11-hadoop3-check/output/patch-javadoc-hbase-client.txt |
   | javadoc | https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-1858/2/artifact/yetus-jdk11-hadoop3-check/output/patch-javadoc-hbase-server.txt |
   |  Test Results | https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-1858/2/testReport/ |
   | Max. process+thread count | 3951 (vs. ulimit of 12500) |
   | modules | C: hbase-client hbase-server U: . |
   | Console output | https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-1858/2/console |
   | versions | git=2.17.1 maven=(cecedd343002696d0abb50b32b541b8a6ba2883f) |
   | Powered by | Apache Yetus 0.11.1 https://yetus.apache.org |
   
   
   This message was automatically generated.
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hbase] virajjasani commented on a change in pull request #1858: HBASE-24506 async client deadlock

Posted by GitBox <gi...@apache.org>.
virajjasani commented on a change in pull request #1858:
URL: https://github.com/apache/hbase/pull/1858#discussion_r436337656



##########
File path: hbase-client/src/main/java/org/apache/hadoop/hbase/ipc/NettyRpcConnection.java
##########
@@ -253,52 +268,38 @@ public void operationComplete(Future<Boolean> future) throws Exception {
   }
 
   private void connect() {
+    assert eventLoop.inEventLoop();
     LOG.trace("Connecting to {}", remoteId.address);
 
-    this.channel = new Bootstrap().group(rpcClient.group).channel(rpcClient.channelClass)
-        .option(ChannelOption.TCP_NODELAY, rpcClient.isTcpNoDelay())
-        .option(ChannelOption.SO_KEEPALIVE, rpcClient.tcpKeepAlive)
-        .option(ChannelOption.CONNECT_TIMEOUT_MILLIS, rpcClient.connectTO)
-        .handler(new BufferCallBeforeInitHandler()).localAddress(rpcClient.localAddr)
-        .remoteAddress(remoteId.address).connect().addListener(new ChannelFutureListener() {
-
-          @Override
-          public void operationComplete(ChannelFuture future) throws Exception {
-            Channel ch = future.channel();
-            if (!future.isSuccess()) {
-              failInit(ch, toIOE(future.cause()));
-              rpcClient.failedServers.addToFailedServers(remoteId.address, future.cause());
-              return;
-            }
-            ch.writeAndFlush(connectionHeaderPreamble.retainedDuplicate());
-            if (useSasl) {
-              saslNegotiate(ch);
-            } else {
-              // send the connection header to server
-              ch.write(connectionHeaderWithLength.retainedDuplicate());
-              established(ch);
-            }
-          }
-        }).channel();
-  }
+    this.channel = new Bootstrap().group(eventLoop).channel(rpcClient.channelClass)
+      .option(ChannelOption.TCP_NODELAY, rpcClient.isTcpNoDelay())
+      .option(ChannelOption.SO_KEEPALIVE, rpcClient.tcpKeepAlive)
+      .option(ChannelOption.CONNECT_TIMEOUT_MILLIS, rpcClient.connectTO)
+      .handler(new BufferCallBeforeInitHandler()).localAddress(rpcClient.localAddr)
+      .remoteAddress(remoteId.address).connect().addListener(new ChannelFutureListener() {
 
-  private void write(Channel ch, final Call call) {
-    ch.writeAndFlush(call).addListener(new ChannelFutureListener() {
-
-      @Override
-      public void operationComplete(ChannelFuture future) throws Exception {
-        // Fail the call if we failed to write it out. This usually because the channel is
-        // closed. This is needed because we may shutdown the channel inside event loop and
-        // there may still be some pending calls in the event loop queue after us.
-        if (!future.isSuccess()) {
-          call.setException(toIOE(future.cause()));
+        @Override
+        public void operationComplete(ChannelFuture future) throws Exception {
+          Channel ch = future.channel();
+          if (!future.isSuccess()) {
+            failInit(ch, toIOE(future.cause()));
+            rpcClient.failedServers.addToFailedServers(remoteId.address, future.cause());
+            return;
+          }
+          ch.writeAndFlush(connectionHeaderPreamble.retainedDuplicate());
+          if (useSasl) {
+            saslNegotiate(ch);
+          } else {
+            // send the connection header to server
+            ch.write(connectionHeaderWithLength.retainedDuplicate());
+            established(ch);
+          }
         }
-      }
-    });
+      }).channel();
   }
 
-  @Override
-  public synchronized void sendRequest(final Call call, HBaseRpcController hrc) throws IOException {

Review comment:
       This is covered already right? Maybe older comment, I might have read it in different order. Sorry for the confusion.
   Btw +1 for volatile channel and removal of synchronized from this method.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hbase] Apache9 commented on a change in pull request #1858: HBASE-24506 async client deadlock

Posted by GitBox <gi...@apache.org>.
Apache9 commented on a change in pull request #1858:
URL: https://github.com/apache/hbase/pull/1858#discussion_r436244858



##########
File path: hbase-client/src/main/java/org/apache/hadoop/hbase/ipc/NettyRpcConnection.java
##########
@@ -253,52 +268,38 @@ public void operationComplete(Future<Boolean> future) throws Exception {
   }
 
   private void connect() {
+    assert eventLoop.inEventLoop();
     LOG.trace("Connecting to {}", remoteId.address);
 
-    this.channel = new Bootstrap().group(rpcClient.group).channel(rpcClient.channelClass)
-        .option(ChannelOption.TCP_NODELAY, rpcClient.isTcpNoDelay())
-        .option(ChannelOption.SO_KEEPALIVE, rpcClient.tcpKeepAlive)
-        .option(ChannelOption.CONNECT_TIMEOUT_MILLIS, rpcClient.connectTO)
-        .handler(new BufferCallBeforeInitHandler()).localAddress(rpcClient.localAddr)
-        .remoteAddress(remoteId.address).connect().addListener(new ChannelFutureListener() {
-
-          @Override
-          public void operationComplete(ChannelFuture future) throws Exception {
-            Channel ch = future.channel();
-            if (!future.isSuccess()) {
-              failInit(ch, toIOE(future.cause()));
-              rpcClient.failedServers.addToFailedServers(remoteId.address, future.cause());
-              return;
-            }
-            ch.writeAndFlush(connectionHeaderPreamble.retainedDuplicate());
-            if (useSasl) {
-              saslNegotiate(ch);
-            } else {
-              // send the connection header to server
-              ch.write(connectionHeaderWithLength.retainedDuplicate());
-              established(ch);
-            }
-          }
-        }).channel();
-  }
+    this.channel = new Bootstrap().group(eventLoop).channel(rpcClient.channelClass)
+      .option(ChannelOption.TCP_NODELAY, rpcClient.isTcpNoDelay())
+      .option(ChannelOption.SO_KEEPALIVE, rpcClient.tcpKeepAlive)
+      .option(ChannelOption.CONNECT_TIMEOUT_MILLIS, rpcClient.connectTO)
+      .handler(new BufferCallBeforeInitHandler()).localAddress(rpcClient.localAddr)
+      .remoteAddress(remoteId.address).connect().addListener(new ChannelFutureListener() {
 
-  private void write(Channel ch, final Call call) {
-    ch.writeAndFlush(call).addListener(new ChannelFutureListener() {
-
-      @Override
-      public void operationComplete(ChannelFuture future) throws Exception {
-        // Fail the call if we failed to write it out. This usually because the channel is
-        // closed. This is needed because we may shutdown the channel inside event loop and
-        // there may still be some pending calls in the event loop queue after us.
-        if (!future.isSuccess()) {
-          call.setException(toIOE(future.cause()));
+        @Override
+        public void operationComplete(ChannelFuture future) throws Exception {
+          Channel ch = future.channel();
+          if (!future.isSuccess()) {
+            failInit(ch, toIOE(future.cause()));
+            rpcClient.failedServers.addToFailedServers(remoteId.address, future.cause());
+            return;
+          }
+          ch.writeAndFlush(connectionHeaderPreamble.retainedDuplicate());
+          if (useSasl) {
+            saslNegotiate(ch);
+          } else {
+            // send the connection header to server
+            ch.write(connectionHeaderWithLength.retainedDuplicate());
+            established(ch);
+          }
         }
-      }
-    });
+      }).channel();
   }
 
-  @Override
-  public synchronized void sendRequest(final Call call, HBaseRpcController hrc) throws IOException {

Review comment:
       The lock is to make sure that we always use the up to date Channel instance. You can see the old code, every time when we want to use the channel field, we will have a lock on it. And on the old sendRequest implementation, the problem is that, we need to hold the lock for the rpc controller, so that there will be no race when users call startCancel on the rpc controller. And to prevent dead lock, the preferred order is to acquire connection lock first, and then rpc controller lock, so we have to acquire the lock before calling notifyOnCancel method of the rpc controller...




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hbase] Apache9 commented on a change in pull request #1858: HBASE-24506 async client deadlock

Posted by GitBox <gi...@apache.org>.
Apache9 commented on a change in pull request #1858:
URL: https://github.com/apache/hbase/pull/1858#discussion_r436346597



##########
File path: hbase-client/src/main/java/org/apache/hadoop/hbase/ipc/NettyRpcConnection.java
##########
@@ -253,52 +268,38 @@ public void operationComplete(Future<Boolean> future) throws Exception {
   }
 
   private void connect() {
+    assert eventLoop.inEventLoop();
     LOG.trace("Connecting to {}", remoteId.address);
 
-    this.channel = new Bootstrap().group(rpcClient.group).channel(rpcClient.channelClass)
-        .option(ChannelOption.TCP_NODELAY, rpcClient.isTcpNoDelay())
-        .option(ChannelOption.SO_KEEPALIVE, rpcClient.tcpKeepAlive)
-        .option(ChannelOption.CONNECT_TIMEOUT_MILLIS, rpcClient.connectTO)
-        .handler(new BufferCallBeforeInitHandler()).localAddress(rpcClient.localAddr)
-        .remoteAddress(remoteId.address).connect().addListener(new ChannelFutureListener() {
-
-          @Override
-          public void operationComplete(ChannelFuture future) throws Exception {
-            Channel ch = future.channel();
-            if (!future.isSuccess()) {
-              failInit(ch, toIOE(future.cause()));
-              rpcClient.failedServers.addToFailedServers(remoteId.address, future.cause());
-              return;
-            }
-            ch.writeAndFlush(connectionHeaderPreamble.retainedDuplicate());
-            if (useSasl) {
-              saslNegotiate(ch);
-            } else {
-              // send the connection header to server
-              ch.write(connectionHeaderWithLength.retainedDuplicate());
-              established(ch);
-            }
-          }
-        }).channel();
-  }
+    this.channel = new Bootstrap().group(eventLoop).channel(rpcClient.channelClass)
+      .option(ChannelOption.TCP_NODELAY, rpcClient.isTcpNoDelay())
+      .option(ChannelOption.SO_KEEPALIVE, rpcClient.tcpKeepAlive)
+      .option(ChannelOption.CONNECT_TIMEOUT_MILLIS, rpcClient.connectTO)
+      .handler(new BufferCallBeforeInitHandler()).localAddress(rpcClient.localAddr)
+      .remoteAddress(remoteId.address).connect().addListener(new ChannelFutureListener() {
 
-  private void write(Channel ch, final Call call) {
-    ch.writeAndFlush(call).addListener(new ChannelFutureListener() {
-
-      @Override
-      public void operationComplete(ChannelFuture future) throws Exception {
-        // Fail the call if we failed to write it out. This usually because the channel is
-        // closed. This is needed because we may shutdown the channel inside event loop and
-        // there may still be some pending calls in the event loop queue after us.
-        if (!future.isSuccess()) {
-          call.setException(toIOE(future.cause()));
+        @Override
+        public void operationComplete(ChannelFuture future) throws Exception {
+          Channel ch = future.channel();
+          if (!future.isSuccess()) {
+            failInit(ch, toIOE(future.cause()));
+            rpcClient.failedServers.addToFailedServers(remoteId.address, future.cause());
+            return;
+          }
+          ch.writeAndFlush(connectionHeaderPreamble.retainedDuplicate());
+          if (useSasl) {
+            saslNegotiate(ch);
+          } else {
+            // send the connection header to server
+            ch.write(connectionHeaderWithLength.retainedDuplicate());
+            established(ch);
+          }
         }
-      }
-    });
+      }).channel();
   }
 
-  @Override
-  public synchronized void sendRequest(final Call call, HBaseRpcController hrc) throws IOException {

Review comment:
       The channel instance is created on-demand in NettyRpcConnection, so we need to acquire lock to prevernt multiple creations. And also, the channel instance could be cleaned up, this is another reason that we need to acquire lock, otherwise there will be race that you operate on a closed channel and cause trouble.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hbase] Apache-HBase commented on pull request #1858: HBASE-24506 async client deadlock

Posted by GitBox <gi...@apache.org>.
Apache-HBase commented on pull request #1858:
URL: https://github.com/apache/hbase/pull/1858#issuecomment-640014251


   :confetti_ball: **+1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime | Comment |
   |:----:|----------:|--------:|:--------|
   | +0 :ok: |  reexec  |   0m 30s |  Docker mode activated.  |
   ||| _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  No case conflicting files found.  |
   | +1 :green_heart: |  hbaseanti  |   0m  0s |  Patch does not have any anti-patterns.  |
   | +1 :green_heart: |  @author  |   0m  0s |  The patch does not contain any @author tags.  |
   ||| _ master Compile Tests _ |
   | +0 :ok: |  mvndep  |   0m 24s |  Maven dependency ordering for branch  |
   | +1 :green_heart: |  mvninstall  |   3m 33s |  master passed  |
   | +1 :green_heart: |  checkstyle  |   1m 32s |  master passed  |
   | +1 :green_heart: |  spotbugs  |   2m 57s |  master passed  |
   ||| _ Patch Compile Tests _ |
   | +0 :ok: |  mvndep  |   0m 13s |  Maven dependency ordering for patch  |
   | +1 :green_heart: |  mvninstall  |   3m 21s |  the patch passed  |
   | +1 :green_heart: |  checkstyle  |   0m 26s |  hbase-client: The patch generated 0 new + 0 unchanged - 4 fixed = 0 total (was 4)  |
   | +1 :green_heart: |  checkstyle  |   1m  2s |  The patch passed checkstyle in hbase-server  |
   | +1 :green_heart: |  whitespace  |   0m  0s |  The patch has no whitespace issues.  |
   | +1 :green_heart: |  hadoopcheck  |  11m  1s |  Patch does not cause any errors with Hadoop 3.1.2 3.2.1.  |
   | +1 :green_heart: |  spotbugs  |   3m 11s |  the patch passed  |
   ||| _ Other Tests _ |
   | +1 :green_heart: |  asflicense  |   0m 27s |  The patch does not generate ASF License warnings.  |
   |  |   |  36m  0s |   |
   
   
   | Subsystem | Report/Notes |
   |----------:|:-------------|
   | Docker | Client=19.03.11 Server=19.03.11 base: https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-1858/2/artifact/yetus-general-check/output/Dockerfile |
   | GITHUB PR | https://github.com/apache/hbase/pull/1858 |
   | Optional Tests | dupname asflicense spotbugs hadoopcheck hbaseanti checkstyle |
   | uname | Linux 6897b7d01304 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/hbase-personality.sh |
   | git revision | master / 89b7b5a7f9 |
   | Max. process+thread count | 94 (vs. ulimit of 12500) |
   | modules | C: hbase-client hbase-server U: . |
   | Console output | https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-1858/2/console |
   | versions | git=2.17.1 maven=(cecedd343002696d0abb50b32b541b8a6ba2883f) spotbugs=3.1.12 |
   | Powered by | Apache Yetus 0.11.1 https://yetus.apache.org |
   
   
   This message was automatically generated.
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hbase] esteban commented on a change in pull request #1858: HBASE-24506 async client deadlock

Posted by GitBox <gi...@apache.org>.
esteban commented on a change in pull request #1858:
URL: https://github.com/apache/hbase/pull/1858#discussion_r436153505



##########
File path: hbase-client/src/main/java/org/apache/hadoop/hbase/ipc/NettyRpcConnection.java
##########
@@ -253,52 +268,38 @@ public void operationComplete(Future<Boolean> future) throws Exception {
   }
 
   private void connect() {
+    assert eventLoop.inEventLoop();

Review comment:
       same https://github.com/apache/hbase/pull/1858/commits/08b77a5eca368f391f528e009c95e749d5fa750f#r436153126




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hbase] Apache9 commented on pull request #1858: HBASE-24506 async client deadlock

Posted by GitBox <gi...@apache.org>.
Apache9 commented on pull request #1858:
URL: https://github.com/apache/hbase/pull/1858#issuecomment-640189751


   > Can we make sure we have a test that asserts are enabled that will also run when we run whatever tests we expect to exercise this code path to catch an assert failure should someone change the code such that these things run outside of the event loop?
   > 
   > Something simple like a test that expects the exception from an assertion failure implemented with an assert that always fails?
   
   Maybe call it from the test thread to check whether assert is enabled?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hbase] esteban commented on a change in pull request #1858: HBASE-24506 async client deadlock

Posted by GitBox <gi...@apache.org>.
esteban commented on a change in pull request #1858:
URL: https://github.com/apache/hbase/pull/1858#discussion_r436153126



##########
File path: hbase-client/src/main/java/org/apache/hadoop/hbase/ipc/NettyRpcConnection.java
##########
@@ -146,45 +163,43 @@ private void established(Channel ch) throws IOException {
   private boolean reloginInProgress;
 
   private void scheduleRelogin(Throwable error) {
+    assert eventLoop.inEventLoop();
     if (error instanceof FallbackDisallowedException) {
       return;
     }
     if (!provider.canRetry()) {
       LOG.trace("SASL Provider does not support retries");
       return;
     }
-    synchronized (this) {
-      if (reloginInProgress) {
-        return;
-      }
-      reloginInProgress = true;
-      RELOGIN_EXECUTOR.schedule(new Runnable() {
+    if (reloginInProgress) {
+      return;
+    }
+    reloginInProgress = true;
+    RELOGIN_EXECUTOR.schedule(new Runnable() {
 
-        @Override
-        public void run() {
-          try {
-            provider.relogin();
-          } catch (IOException e) {
-            LOG.warn("Relogin failed", e);
-          }
-          synchronized (this) {
-            reloginInProgress = false;
-          }
+      @Override
+      public void run() {
+        try {
+          provider.relogin();
+        } catch (IOException e) {
+          LOG.warn("Relogin failed", e);
         }
-      }, ThreadLocalRandom.current().nextInt(reloginMaxBackoff), TimeUnit.MILLISECONDS);
-    }
+        eventLoop.execute(() -> {
+          reloginInProgress = false;
+        });
+      }
+    }, ThreadLocalRandom.current().nextInt(reloginMaxBackoff), TimeUnit.MILLISECONDS);
   }
 
   private void failInit(Channel ch, IOException e) {
-    synchronized (this) {
-      // fail all pending calls
-      ch.pipeline().fireUserEventTriggered(BufferCallEvent.fail(e));
-      shutdown0();
-      return;
-    }
+    assert eventLoop.inEventLoop();

Review comment:
       Same as @busbey mentioned at L166, asserts are not usually enabled.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hbase] Apache-HBase commented on pull request #1858: HBASE-24506 async client deadlock

Posted by GitBox <gi...@apache.org>.
Apache-HBase commented on pull request #1858:
URL: https://github.com/apache/hbase/pull/1858#issuecomment-639442065


   :confetti_ball: **+1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime | Comment |
   |:----:|----------:|--------:|:--------|
   | +0 :ok: |  reexec  |   1m  2s |  Docker mode activated.  |
   | -0 :warning: |  yetus  |   0m  3s |  Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --whitespace-eol-ignore-list --whitespace-tabs-ignore-list --quick-hadoopcheck  |
   ||| _ Prechecks _ |
   ||| _ master Compile Tests _ |
   | +0 :ok: |  mvndep  |   0m 22s |  Maven dependency ordering for branch  |
   | +1 :green_heart: |  mvninstall  |   5m 10s |  master passed  |
   | +1 :green_heart: |  compile  |   1m 43s |  master passed  |
   | +1 :green_heart: |  shadedjars  |   6m 56s |  branch has no errors when building our shaded downstream artifacts.  |
   | -0 :warning: |  javadoc  |   0m 30s |  hbase-client in master failed.  |
   | -0 :warning: |  javadoc  |   0m 46s |  hbase-server in master failed.  |
   ||| _ Patch Compile Tests _ |
   | +0 :ok: |  mvndep  |   0m 17s |  Maven dependency ordering for patch  |
   | +1 :green_heart: |  mvninstall  |   4m 41s |  the patch passed  |
   | +1 :green_heart: |  compile  |   1m 41s |  the patch passed  |
   | +1 :green_heart: |  javac  |   1m 41s |  the patch passed  |
   | +1 :green_heart: |  shadedjars  |   6m 29s |  patch has no errors when building our shaded downstream artifacts.  |
   | -0 :warning: |  javadoc  |   0m 29s |  hbase-client in the patch failed.  |
   | -0 :warning: |  javadoc  |   0m 46s |  hbase-server in the patch failed.  |
   ||| _ Other Tests _ |
   | +1 :green_heart: |  unit  |   1m 15s |  hbase-client in the patch passed.  |
   | +1 :green_heart: |  unit  | 132m 18s |  hbase-server in the patch passed.  |
   |  |   | 166m 46s |   |
   
   
   | Subsystem | Report/Notes |
   |----------:|:-------------|
   | Docker | Client=19.03.11 Server=19.03.11 base: https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-1858/1/artifact/yetus-jdk11-hadoop3-check/output/Dockerfile |
   | GITHUB PR | https://github.com/apache/hbase/pull/1858 |
   | Optional Tests | javac javadoc unit shadedjars compile |
   | uname | Linux 908c2db63594 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/hbase-personality.sh |
   | git revision | master / 2c2b1f0174 |
   | Default Java | 2020-01-14 |
   | javadoc | https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-1858/1/artifact/yetus-jdk11-hadoop3-check/output/branch-javadoc-hbase-client.txt |
   | javadoc | https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-1858/1/artifact/yetus-jdk11-hadoop3-check/output/branch-javadoc-hbase-server.txt |
   | javadoc | https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-1858/1/artifact/yetus-jdk11-hadoop3-check/output/patch-javadoc-hbase-client.txt |
   | javadoc | https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-1858/1/artifact/yetus-jdk11-hadoop3-check/output/patch-javadoc-hbase-server.txt |
   |  Test Results | https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-1858/1/testReport/ |
   | Max. process+thread count | 4022 (vs. ulimit of 12500) |
   | modules | C: hbase-client hbase-server U: . |
   | Console output | https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-1858/1/console |
   | versions | git=2.17.1 maven=(cecedd343002696d0abb50b32b541b8a6ba2883f) |
   | Powered by | Apache Yetus 0.11.1 https://yetus.apache.org |
   
   
   This message was automatically generated.
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hbase] busbey commented on pull request #1858: HBASE-24506 async client deadlock

Posted by GitBox <gi...@apache.org>.
busbey commented on pull request #1858:
URL: https://github.com/apache/hbase/pull/1858#issuecomment-640095417


   Can we make sure we have a test that asserts are enabled that will also run when we run whatever tests we expect to exercise this code path to catch an assert failure should someone change the code such that these things run outside of the event loop?
   
   Something simple like a test that expects the exception from an assertion failure implemented with an assert that always fails?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hbase] Apache9 commented on pull request #1858: HBASE-24506 async client deadlock

Posted by GitBox <gi...@apache.org>.
Apache9 commented on pull request #1858:
URL: https://github.com/apache/hbase/pull/1858#issuecomment-639351935


   Add a minor improvement for TestAsyncTableGetMultiThreaded, so we will also run the UTs in hbase-server.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hbase] Apache9 merged pull request #1858: HBASE-24506 async client deadlock

Posted by GitBox <gi...@apache.org>.
Apache9 merged pull request #1858:
URL: https://github.com/apache/hbase/pull/1858


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hbase] virajjasani commented on a change in pull request #1858: HBASE-24506 async client deadlock

Posted by GitBox <gi...@apache.org>.
virajjasani commented on a change in pull request #1858:
URL: https://github.com/apache/hbase/pull/1858#discussion_r436336655



##########
File path: hbase-client/src/main/java/org/apache/hadoop/hbase/ipc/NettyRpcConnection.java
##########
@@ -153,35 +200,30 @@ private void scheduleRelogin(Throwable error) {
       LOG.trace("SASL Provider does not support retries");
       return;
     }
-    synchronized (this) {
-      if (reloginInProgress) {
-        return;
-      }
-      reloginInProgress = true;
-      RELOGIN_EXECUTOR.schedule(new Runnable() {
+    if (reloginInProgress) {
+      return;
+    }
+    reloginInProgress = true;
+    RELOGIN_EXECUTOR.schedule(new Runnable() {

Review comment:
       nit: Would you prefer converting to lambda `() -> `? 

##########
File path: hbase-client/src/main/java/org/apache/hadoop/hbase/ipc/NettyRpcConnection.java
##########
@@ -73,40 +79,77 @@
   private static final Logger LOG = LoggerFactory.getLogger(NettyRpcConnection.class);
 
   private static final ScheduledExecutorService RELOGIN_EXECUTOR =
-      Executors.newSingleThreadScheduledExecutor(Threads.newDaemonThreadFactory("Relogin"));
+    Executors.newSingleThreadScheduledExecutor(Threads.newDaemonThreadFactory("Relogin"));
 
   private final NettyRpcClient rpcClient;
 
+  // the event loop used to set up the connection, we will also execute other operations for this
+  // connection in this event loop, to avoid locking everywhere.
+  private final EventLoop eventLoop;
+
   private ByteBuf connectionHeaderPreamble;
 
   private ByteBuf connectionHeaderWithLength;
 
-  @edu.umd.cs.findbugs.annotations.SuppressWarnings(value = "IS2_INCONSISTENT_SYNC",
-      justification = "connect is also under lock as notifyOnCancel will call our action directly")
-  private Channel channel;
+  // make it volatile so in the isActive method below we do not need to switch to the event loop
+  // thread to access this field.
+  private volatile Channel channel;
 
   NettyRpcConnection(NettyRpcClient rpcClient, ConnectionId remoteId) throws IOException {
     super(rpcClient.conf, AbstractRpcClient.WHEEL_TIMER, remoteId, rpcClient.clusterId,
-        rpcClient.userProvider.isHBaseSecurityEnabled(), rpcClient.codec, rpcClient.compressor);
+      rpcClient.userProvider.isHBaseSecurityEnabled(), rpcClient.codec, rpcClient.compressor);
     this.rpcClient = rpcClient;
+    this.eventLoop = rpcClient.group.next();
     byte[] connectionHeaderPreamble = getConnectionHeaderPreamble();
     this.connectionHeaderPreamble =
-        Unpooled.directBuffer(connectionHeaderPreamble.length).writeBytes(connectionHeaderPreamble);
+      Unpooled.directBuffer(connectionHeaderPreamble.length).writeBytes(connectionHeaderPreamble);
     ConnectionHeader header = getConnectionHeader();
     this.connectionHeaderWithLength = Unpooled.directBuffer(4 + header.getSerializedSize());
     this.connectionHeaderWithLength.writeInt(header.getSerializedSize());
     header.writeTo(new ByteBufOutputStream(this.connectionHeaderWithLength));
   }
 
-  @Override
-  protected synchronized void callTimeout(Call call) {
-    if (channel != null) {
-      channel.pipeline().fireUserEventTriggered(new CallEvent(TIMEOUT, call));
+  private static final FastThreadLocal<MutableInt> DEPTH = new FastThreadLocal<MutableInt>() {

Review comment:
       nit: if we want to go till depth of 4, want to use `MutableShort`? 

##########
File path: hbase-client/src/main/java/org/apache/hadoop/hbase/ipc/NettyRpcConnection.java
##########
@@ -253,52 +268,38 @@ public void operationComplete(Future<Boolean> future) throws Exception {
   }
 
   private void connect() {
+    assert eventLoop.inEventLoop();
     LOG.trace("Connecting to {}", remoteId.address);
 
-    this.channel = new Bootstrap().group(rpcClient.group).channel(rpcClient.channelClass)
-        .option(ChannelOption.TCP_NODELAY, rpcClient.isTcpNoDelay())
-        .option(ChannelOption.SO_KEEPALIVE, rpcClient.tcpKeepAlive)
-        .option(ChannelOption.CONNECT_TIMEOUT_MILLIS, rpcClient.connectTO)
-        .handler(new BufferCallBeforeInitHandler()).localAddress(rpcClient.localAddr)
-        .remoteAddress(remoteId.address).connect().addListener(new ChannelFutureListener() {
-
-          @Override
-          public void operationComplete(ChannelFuture future) throws Exception {
-            Channel ch = future.channel();
-            if (!future.isSuccess()) {
-              failInit(ch, toIOE(future.cause()));
-              rpcClient.failedServers.addToFailedServers(remoteId.address, future.cause());
-              return;
-            }
-            ch.writeAndFlush(connectionHeaderPreamble.retainedDuplicate());
-            if (useSasl) {
-              saslNegotiate(ch);
-            } else {
-              // send the connection header to server
-              ch.write(connectionHeaderWithLength.retainedDuplicate());
-              established(ch);
-            }
-          }
-        }).channel();
-  }
+    this.channel = new Bootstrap().group(eventLoop).channel(rpcClient.channelClass)
+      .option(ChannelOption.TCP_NODELAY, rpcClient.isTcpNoDelay())
+      .option(ChannelOption.SO_KEEPALIVE, rpcClient.tcpKeepAlive)
+      .option(ChannelOption.CONNECT_TIMEOUT_MILLIS, rpcClient.connectTO)
+      .handler(new BufferCallBeforeInitHandler()).localAddress(rpcClient.localAddr)
+      .remoteAddress(remoteId.address).connect().addListener(new ChannelFutureListener() {
 
-  private void write(Channel ch, final Call call) {
-    ch.writeAndFlush(call).addListener(new ChannelFutureListener() {
-
-      @Override
-      public void operationComplete(ChannelFuture future) throws Exception {
-        // Fail the call if we failed to write it out. This usually because the channel is
-        // closed. This is needed because we may shutdown the channel inside event loop and
-        // there may still be some pending calls in the event loop queue after us.
-        if (!future.isSuccess()) {
-          call.setException(toIOE(future.cause()));
+        @Override
+        public void operationComplete(ChannelFuture future) throws Exception {
+          Channel ch = future.channel();
+          if (!future.isSuccess()) {
+            failInit(ch, toIOE(future.cause()));
+            rpcClient.failedServers.addToFailedServers(remoteId.address, future.cause());
+            return;
+          }
+          ch.writeAndFlush(connectionHeaderPreamble.retainedDuplicate());
+          if (useSasl) {
+            saslNegotiate(ch);
+          } else {
+            // send the connection header to server
+            ch.write(connectionHeaderWithLength.retainedDuplicate());
+            established(ch);
+          }
         }
-      }
-    });
+      }).channel();
   }
 
-  @Override
-  public synchronized void sendRequest(final Call call, HBaseRpcController hrc) throws IOException {

Review comment:
       This is covered already right?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hbase] busbey commented on a change in pull request #1858: HBASE-24506 async client deadlock

Posted by GitBox <gi...@apache.org>.
busbey commented on a change in pull request #1858:
URL: https://github.com/apache/hbase/pull/1858#discussion_r436074273



##########
File path: hbase-client/src/main/java/org/apache/hadoop/hbase/ipc/NettyRpcConnection.java
##########
@@ -146,45 +163,43 @@ private void established(Channel ch) throws IOException {
   private boolean reloginInProgress;
 
   private void scheduleRelogin(Throwable error) {
+    assert eventLoop.inEventLoop();

Review comment:
       just checking you want these to be java asserts? production jdks are almost never run with asserts enabled




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hbase] Apache-HBase commented on pull request #1858: HBASE-24506 async client deadlock

Posted by GitBox <gi...@apache.org>.
Apache-HBase commented on pull request #1858:
URL: https://github.com/apache/hbase/pull/1858#issuecomment-639379476


   :confetti_ball: **+1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime | Comment |
   |:----:|----------:|--------:|:--------|
   | +0 :ok: |  reexec  |   0m 32s |  Docker mode activated.  |
   ||| _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  1s |  No case conflicting files found.  |
   | +1 :green_heart: |  hbaseanti  |   0m  0s |  Patch does not have any anti-patterns.  |
   | +1 :green_heart: |  @author  |   0m  0s |  The patch does not contain any @author tags.  |
   ||| _ master Compile Tests _ |
   | +0 :ok: |  mvndep  |   0m 22s |  Maven dependency ordering for branch  |
   | +1 :green_heart: |  mvninstall  |   3m 34s |  master passed  |
   | +1 :green_heart: |  checkstyle  |   1m 36s |  master passed  |
   | +1 :green_heart: |  spotbugs  |   2m 56s |  master passed  |
   ||| _ Patch Compile Tests _ |
   | +0 :ok: |  mvndep  |   0m 13s |  Maven dependency ordering for patch  |
   | +1 :green_heart: |  mvninstall  |   3m 21s |  the patch passed  |
   | +1 :green_heart: |  checkstyle  |   0m 26s |  hbase-client: The patch generated 0 new + 0 unchanged - 4 fixed = 0 total (was 4)  |
   | +1 :green_heart: |  checkstyle  |   1m  3s |  The patch passed checkstyle in hbase-server  |
   | +1 :green_heart: |  whitespace  |   0m  0s |  The patch has no whitespace issues.  |
   | +1 :green_heart: |  hadoopcheck  |  11m  6s |  Patch does not cause any errors with Hadoop 3.1.2 3.2.1.  |
   | +1 :green_heart: |  spotbugs  |   3m 11s |  the patch passed  |
   ||| _ Other Tests _ |
   | +1 :green_heart: |  asflicense  |   0m 26s |  The patch does not generate ASF License warnings.  |
   |  |   |  36m  8s |   |
   
   
   | Subsystem | Report/Notes |
   |----------:|:-------------|
   | Docker | Client=19.03.11 Server=19.03.11 base: https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-1858/1/artifact/yetus-general-check/output/Dockerfile |
   | GITHUB PR | https://github.com/apache/hbase/pull/1858 |
   | Optional Tests | dupname asflicense spotbugs hadoopcheck hbaseanti checkstyle |
   | uname | Linux fbe9ee49483e 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/hbase-personality.sh |
   | git revision | master / 2c2b1f0174 |
   | Max. process+thread count | 94 (vs. ulimit of 12500) |
   | modules | C: hbase-client hbase-server U: . |
   | Console output | https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-1858/1/console |
   | versions | git=2.17.1 maven=(cecedd343002696d0abb50b32b541b8a6ba2883f) spotbugs=3.1.12 |
   | Powered by | Apache Yetus 0.11.1 https://yetus.apache.org |
   
   
   This message was automatically generated.
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hbase] Apache9 commented on pull request #1858: HBASE-24506 async client deadlock

Posted by GitBox <gi...@apache.org>.
Apache9 commented on pull request #1858:
URL: https://github.com/apache/hbase/pull/1858#issuecomment-640229473


   > Can we make sure we have a test that asserts are enabled that will also run when we run whatever tests we expect to exercise this code path to catch an assert failure should someone change the code such that these things run outside of the event loop?
   > 
   > Something simple like a test that expects the exception from an assertion failure implemented with an assert that always fails?
   
   Done. Please see TestNettyRpcConnection.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hbase] bharathv commented on a change in pull request #1858: HBASE-24506 async client deadlock

Posted by GitBox <gi...@apache.org>.
bharathv commented on a change in pull request #1858:
URL: https://github.com/apache/hbase/pull/1858#discussion_r436321731



##########
File path: hbase-client/src/main/java/org/apache/hadoop/hbase/ipc/NettyRpcConnection.java
##########
@@ -253,52 +268,38 @@ public void operationComplete(Future<Boolean> future) throws Exception {
   }
 
   private void connect() {
+    assert eventLoop.inEventLoop();
     LOG.trace("Connecting to {}", remoteId.address);
 
-    this.channel = new Bootstrap().group(rpcClient.group).channel(rpcClient.channelClass)
-        .option(ChannelOption.TCP_NODELAY, rpcClient.isTcpNoDelay())
-        .option(ChannelOption.SO_KEEPALIVE, rpcClient.tcpKeepAlive)
-        .option(ChannelOption.CONNECT_TIMEOUT_MILLIS, rpcClient.connectTO)
-        .handler(new BufferCallBeforeInitHandler()).localAddress(rpcClient.localAddr)
-        .remoteAddress(remoteId.address).connect().addListener(new ChannelFutureListener() {
-
-          @Override
-          public void operationComplete(ChannelFuture future) throws Exception {
-            Channel ch = future.channel();
-            if (!future.isSuccess()) {
-              failInit(ch, toIOE(future.cause()));
-              rpcClient.failedServers.addToFailedServers(remoteId.address, future.cause());
-              return;
-            }
-            ch.writeAndFlush(connectionHeaderPreamble.retainedDuplicate());
-            if (useSasl) {
-              saslNegotiate(ch);
-            } else {
-              // send the connection header to server
-              ch.write(connectionHeaderWithLength.retainedDuplicate());
-              established(ch);
-            }
-          }
-        }).channel();
-  }
+    this.channel = new Bootstrap().group(eventLoop).channel(rpcClient.channelClass)
+      .option(ChannelOption.TCP_NODELAY, rpcClient.isTcpNoDelay())
+      .option(ChannelOption.SO_KEEPALIVE, rpcClient.tcpKeepAlive)
+      .option(ChannelOption.CONNECT_TIMEOUT_MILLIS, rpcClient.connectTO)
+      .handler(new BufferCallBeforeInitHandler()).localAddress(rpcClient.localAddr)
+      .remoteAddress(remoteId.address).connect().addListener(new ChannelFutureListener() {
 
-  private void write(Channel ch, final Call call) {
-    ch.writeAndFlush(call).addListener(new ChannelFutureListener() {
-
-      @Override
-      public void operationComplete(ChannelFuture future) throws Exception {
-        // Fail the call if we failed to write it out. This usually because the channel is
-        // closed. This is needed because we may shutdown the channel inside event loop and
-        // there may still be some pending calls in the event loop queue after us.
-        if (!future.isSuccess()) {
-          call.setException(toIOE(future.cause()));
+        @Override
+        public void operationComplete(ChannelFuture future) throws Exception {
+          Channel ch = future.channel();
+          if (!future.isSuccess()) {
+            failInit(ch, toIOE(future.cause()));
+            rpcClient.failedServers.addToFailedServers(remoteId.address, future.cause());
+            return;
+          }
+          ch.writeAndFlush(connectionHeaderPreamble.retainedDuplicate());
+          if (useSasl) {
+            saslNegotiate(ch);
+          } else {
+            // send the connection header to server
+            ch.write(connectionHeaderWithLength.retainedDuplicate());
+            established(ch);
+          }
         }
-      }
-    });
+      }).channel();
   }
 
-  @Override
-  public synchronized void sendRequest(final Call call, HBaseRpcController hrc) throws IOException {

Review comment:
       Thanks for the explanation.
   
   > The lock is to make sure that we always use the up to date Channel instance. 
   
   Ya, this is the part that was not clear to me. As I understand it, Netty channel implementations are thread-safe (correct me if I'm wrong here). If I read the netty code correctly, all the channel/pipeline operations are internally enqueued on the eventloop.
   
   So why can't we just make it a volatile and we have the up to date channel  for all threads to operate on? That removes one lock from the nested locking protocol for Channels and RpcControllers. In other words, whats the need to serialize all the operations on a channel inside NettyRpcConnection, before with synchronized and now with the event loop.
   
   (Also posted a question for you on jira, just so that I understand the problem 100%, thanks for your patience).
   




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hbase] Apache-HBase commented on pull request #1858: HBASE-24506 async client deadlock

Posted by GitBox <gi...@apache.org>.
Apache-HBase commented on pull request #1858:
URL: https://github.com/apache/hbase/pull/1858#issuecomment-640047121


   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime | Comment |
   |:----:|----------:|--------:|:--------|
   | +0 :ok: |  reexec  |   0m 24s |  Docker mode activated.  |
   | -0 :warning: |  yetus  |   0m  3s |  Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --whitespace-eol-ignore-list --whitespace-tabs-ignore-list --quick-hadoopcheck  |
   ||| _ Prechecks _ |
   ||| _ master Compile Tests _ |
   | +0 :ok: |  mvndep  |   0m 21s |  Maven dependency ordering for branch  |
   | +1 :green_heart: |  mvninstall  |   3m 58s |  master passed  |
   | +1 :green_heart: |  compile  |   1m 22s |  master passed  |
   | +1 :green_heart: |  shadedjars  |   6m  4s |  branch has no errors when building our shaded downstream artifacts.  |
   | +1 :green_heart: |  javadoc  |   0m 58s |  master passed  |
   ||| _ Patch Compile Tests _ |
   | +0 :ok: |  mvndep  |   0m 13s |  Maven dependency ordering for patch  |
   | +1 :green_heart: |  mvninstall  |   3m 48s |  the patch passed  |
   | +1 :green_heart: |  compile  |   1m 19s |  the patch passed  |
   | +1 :green_heart: |  javac  |   1m 19s |  the patch passed  |
   | +1 :green_heart: |  shadedjars  |   6m  2s |  patch has no errors when building our shaded downstream artifacts.  |
   | +1 :green_heart: |  javadoc  |   0m 56s |  the patch passed  |
   ||| _ Other Tests _ |
   | +1 :green_heart: |  unit  |   1m 12s |  hbase-client in the patch passed.  |
   | -1 :x: |  unit  | 192m 26s |  hbase-server in the patch failed.  |
   |  |   | 221m  3s |   |
   
   
   | Subsystem | Report/Notes |
   |----------:|:-------------|
   | Docker | Client=19.03.11 Server=19.03.11 base: https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-1858/2/artifact/yetus-jdk8-hadoop3-check/output/Dockerfile |
   | GITHUB PR | https://github.com/apache/hbase/pull/1858 |
   | Optional Tests | javac javadoc unit shadedjars compile |
   | uname | Linux 8a688ac03090 4.15.0-101-generic #102-Ubuntu SMP Mon May 11 10:07:26 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/hbase-personality.sh |
   | git revision | master / 89b7b5a7f9 |
   | Default Java | 1.8.0_232 |
   | unit | https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-1858/2/artifact/yetus-jdk8-hadoop3-check/output/patch-unit-hbase-server.txt |
   |  Test Results | https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-1858/2/testReport/ |
   | Max. process+thread count | 3736 (vs. ulimit of 12500) |
   | modules | C: hbase-client hbase-server U: . |
   | Console output | https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-1858/2/console |
   | versions | git=2.17.1 maven=(cecedd343002696d0abb50b32b541b8a6ba2883f) |
   | Powered by | Apache Yetus 0.11.1 https://yetus.apache.org |
   
   
   This message was automatically generated.
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hbase] Apache-HBase commented on pull request #1858: HBASE-24506 async client deadlock

Posted by GitBox <gi...@apache.org>.
Apache-HBase commented on pull request #1858:
URL: https://github.com/apache/hbase/pull/1858#issuecomment-640083777


   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime | Comment |
   |:----:|----------:|--------:|:--------|
   | +0 :ok: |  reexec  |   0m 29s |  Docker mode activated.  |
   | -0 :warning: |  yetus  |   0m  3s |  Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --whitespace-eol-ignore-list --whitespace-tabs-ignore-list --quick-hadoopcheck  |
   ||| _ Prechecks _ |
   ||| _ master Compile Tests _ |
   | +0 :ok: |  mvndep  |   0m 22s |  Maven dependency ordering for branch  |
   | +1 :green_heart: |  mvninstall  |   3m 37s |  master passed  |
   | +1 :green_heart: |  compile  |   1m 20s |  master passed  |
   | +1 :green_heart: |  shadedjars  |   5m 48s |  branch has no errors when building our shaded downstream artifacts.  |
   | +1 :green_heart: |  javadoc  |   1m  1s |  master passed  |
   ||| _ Patch Compile Tests _ |
   | +0 :ok: |  mvndep  |   0m 15s |  Maven dependency ordering for patch  |
   | +1 :green_heart: |  mvninstall  |   3m 36s |  the patch passed  |
   | +1 :green_heart: |  compile  |   1m 21s |  the patch passed  |
   | +1 :green_heart: |  javac  |   1m 21s |  the patch passed  |
   | +1 :green_heart: |  shadedjars  |   5m 36s |  patch has no errors when building our shaded downstream artifacts.  |
   | +1 :green_heart: |  javadoc  |   1m  0s |  the patch passed  |
   ||| _ Other Tests _ |
   | +1 :green_heart: |  unit  |   1m  4s |  hbase-client in the patch passed.  |
   | -1 :x: |  unit  | 146m 55s |  hbase-server in the patch failed.  |
   |  |   | 174m 33s |   |
   
   
   | Subsystem | Report/Notes |
   |----------:|:-------------|
   | Docker | Client=19.03.11 Server=19.03.11 base: https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-1858/3/artifact/yetus-jdk8-hadoop3-check/output/Dockerfile |
   | GITHUB PR | https://github.com/apache/hbase/pull/1858 |
   | Optional Tests | javac javadoc unit shadedjars compile |
   | uname | Linux a799dff27b86 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/hbase-personality.sh |
   | git revision | master / 16116fa35e |
   | Default Java | 1.8.0_232 |
   | unit | https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-1858/3/artifact/yetus-jdk8-hadoop3-check/output/patch-unit-hbase-server.txt |
   |  Test Results | https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-1858/3/testReport/ |
   | Max. process+thread count | 3895 (vs. ulimit of 12500) |
   | modules | C: hbase-client hbase-server U: . |
   | Console output | https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-1858/3/console |
   | versions | git=2.17.1 maven=(cecedd343002696d0abb50b32b541b8a6ba2883f) |
   | Powered by | Apache Yetus 0.11.1 https://yetus.apache.org |
   
   
   This message was automatically generated.
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hbase] bharathv commented on a change in pull request #1858: HBASE-24506 async client deadlock

Posted by GitBox <gi...@apache.org>.
bharathv commented on a change in pull request #1858:
URL: https://github.com/apache/hbase/pull/1858#discussion_r436409895



##########
File path: hbase-client/src/main/java/org/apache/hadoop/hbase/ipc/NettyRpcConnection.java
##########
@@ -253,52 +268,38 @@ public void operationComplete(Future<Boolean> future) throws Exception {
   }
 
   private void connect() {
+    assert eventLoop.inEventLoop();
     LOG.trace("Connecting to {}", remoteId.address);
 
-    this.channel = new Bootstrap().group(rpcClient.group).channel(rpcClient.channelClass)
-        .option(ChannelOption.TCP_NODELAY, rpcClient.isTcpNoDelay())
-        .option(ChannelOption.SO_KEEPALIVE, rpcClient.tcpKeepAlive)
-        .option(ChannelOption.CONNECT_TIMEOUT_MILLIS, rpcClient.connectTO)
-        .handler(new BufferCallBeforeInitHandler()).localAddress(rpcClient.localAddr)
-        .remoteAddress(remoteId.address).connect().addListener(new ChannelFutureListener() {
-
-          @Override
-          public void operationComplete(ChannelFuture future) throws Exception {
-            Channel ch = future.channel();
-            if (!future.isSuccess()) {
-              failInit(ch, toIOE(future.cause()));
-              rpcClient.failedServers.addToFailedServers(remoteId.address, future.cause());
-              return;
-            }
-            ch.writeAndFlush(connectionHeaderPreamble.retainedDuplicate());
-            if (useSasl) {
-              saslNegotiate(ch);
-            } else {
-              // send the connection header to server
-              ch.write(connectionHeaderWithLength.retainedDuplicate());
-              established(ch);
-            }
-          }
-        }).channel();
-  }
+    this.channel = new Bootstrap().group(eventLoop).channel(rpcClient.channelClass)
+      .option(ChannelOption.TCP_NODELAY, rpcClient.isTcpNoDelay())
+      .option(ChannelOption.SO_KEEPALIVE, rpcClient.tcpKeepAlive)
+      .option(ChannelOption.CONNECT_TIMEOUT_MILLIS, rpcClient.connectTO)
+      .handler(new BufferCallBeforeInitHandler()).localAddress(rpcClient.localAddr)
+      .remoteAddress(remoteId.address).connect().addListener(new ChannelFutureListener() {
 
-  private void write(Channel ch, final Call call) {
-    ch.writeAndFlush(call).addListener(new ChannelFutureListener() {
-
-      @Override
-      public void operationComplete(ChannelFuture future) throws Exception {
-        // Fail the call if we failed to write it out. This usually because the channel is
-        // closed. This is needed because we may shutdown the channel inside event loop and
-        // there may still be some pending calls in the event loop queue after us.
-        if (!future.isSuccess()) {
-          call.setException(toIOE(future.cause()));
+        @Override
+        public void operationComplete(ChannelFuture future) throws Exception {
+          Channel ch = future.channel();
+          if (!future.isSuccess()) {
+            failInit(ch, toIOE(future.cause()));
+            rpcClient.failedServers.addToFailedServers(remoteId.address, future.cause());
+            return;
+          }
+          ch.writeAndFlush(connectionHeaderPreamble.retainedDuplicate());
+          if (useSasl) {
+            saslNegotiate(ch);
+          } else {
+            // send the connection header to server
+            ch.write(connectionHeaderWithLength.retainedDuplicate());
+            established(ch);
+          }
         }
-      }
-    });
+      }).channel();
   }
 
-  @Override
-  public synchronized void sendRequest(final Call call, HBaseRpcController hrc) throws IOException {

Review comment:
       Hmm okay. I still think the locking can be made fine grained with rw locks for channel and totally get rid of eventloop but its okay, your perf run shows no regression, so its fine.

##########
File path: hbase-client/src/test/java/org/apache/hadoop/hbase/ipc/TestNettyRpcConnection.java
##########
@@ -0,0 +1,88 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hbase.ipc;
+
+import static org.hamcrest.CoreMatchers.instanceOf;
+import static org.hamcrest.MatcherAssert.assertThat;
+import static org.junit.Assert.assertThrows;
+import static org.junit.Assert.fail;
+
+import java.io.IOException;
+import java.lang.reflect.InvocationTargetException;
+import java.lang.reflect.Method;
+import java.lang.reflect.Modifier;
+import java.net.InetSocketAddress;
+import org.apache.hadoop.hbase.HBaseClassTestRule;
+import org.apache.hadoop.hbase.HBaseConfiguration;
+import org.apache.hadoop.hbase.security.User;
+import org.junit.AfterClass;
+import org.junit.BeforeClass;
+import org.junit.ClassRule;
+import org.junit.Test;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import org.apache.hbase.thirdparty.com.google.common.io.Closeables;
+
+public class TestNettyRpcConnection {
+
+  @ClassRule
+  public static final HBaseClassTestRule CLASS_RULE =
+    HBaseClassTestRule.forClass(TestNettyRpcConnection.class);
+
+  private static final Logger LOG = LoggerFactory.getLogger(TestNettyRpcConnection.class);
+
+  private static NettyRpcClient CLIENT;
+
+  private static NettyRpcConnection CONN;
+
+  @BeforeClass
+  public static void setUp() throws IOException {
+    CLIENT = new NettyRpcClient(HBaseConfiguration.create());
+    CONN = new NettyRpcConnection(CLIENT,
+      new ConnectionId(User.getCurrent(), "test", new InetSocketAddress("localhost", 1234)));
+  }
+
+  @AfterClass
+  public static void tearDown() throws IOException {
+    Closeables.close(CLIENT, true);
+  }
+
+  @Test
+  public void testPrivateMethodExecutedInEventLoop() throws IllegalAccessException {
+    // make sure the test is executed with "-ea"
+    assertThrows(AssertionError.class, () -> {
+      assert false;
+    });
+    for (Method method : NettyRpcConnection.class.getDeclaredMethods()) {
+      if (Modifier.isPrivate(method.getModifiers()) && !method.getName().contains("$")) {
+        LOG.info("checking {}", method);
+        method.setAccessible(true);
+        // all private methods should be called inside the event loop thread, so calling it from

Review comment:
       haha, nice test :-)




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hbase] Apache9 commented on pull request #1858: HBASE-24506 async client deadlock

Posted by GitBox <gi...@apache.org>.
Apache9 commented on pull request #1858:
URL: https://github.com/apache/hbase/pull/1858#issuecomment-640059821


   @busbey it is really not easy to implement the UT. You have to use the RoundRobin connection pool and set the pool size to greater than 1. And I tried several ways, it is really not easy to control which connection is returned, as the retries are in a separated timer thread, and also the first attemp is in the client thread, not in the event loop thread...
   
   Since the patch here will remove all locks, we can make sure that the dead lock problem is gone, so I think it is fine to not introduce a UT?
   
   I will have a simple PE run to see if there are performance issues.
   
   Thanks.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hbase] Apache9 commented on pull request #1858: HBASE-24506 async client deadlock

Posted by GitBox <gi...@apache.org>.
Apache9 commented on pull request #1858:
URL: https://github.com/apache/hbase/pull/1858#issuecomment-639996204


   > Can we add a unit test that shows the deadlock prior to this fix?
   
   Was thinking of this but finally when I found that it is possible to implement the logic without locking...
   
   Anyway, let me thin if there is a way to reproduce the dead lock stably.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hbase] Apache-HBase commented on pull request #1858: HBASE-24506 async client deadlock

Posted by GitBox <gi...@apache.org>.
Apache-HBase commented on pull request #1858:
URL: https://github.com/apache/hbase/pull/1858#issuecomment-640252043


   :confetti_ball: **+1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime | Comment |
   |:----:|----------:|--------:|:--------|
   | +0 :ok: |  reexec  |   3m 52s |  Docker mode activated.  |
   | -0 :warning: |  yetus  |   0m  3s |  Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --whitespace-eol-ignore-list --whitespace-tabs-ignore-list --quick-hadoopcheck  |
   ||| _ Prechecks _ |
   ||| _ master Compile Tests _ |
   | +0 :ok: |  mvndep  |   0m 29s |  Maven dependency ordering for branch  |
   | +1 :green_heart: |  mvninstall  |   3m 59s |  master passed  |
   | +1 :green_heart: |  compile  |   1m 30s |  master passed  |
   | +1 :green_heart: |  shadedjars  |   5m 45s |  branch has no errors when building our shaded downstream artifacts.  |
   | -0 :warning: |  javadoc  |   0m 28s |  hbase-client in master failed.  |
   | -0 :warning: |  javadoc  |   0m 39s |  hbase-server in master failed.  |
   ||| _ Patch Compile Tests _ |
   | +0 :ok: |  mvndep  |   0m 16s |  Maven dependency ordering for patch  |
   | +1 :green_heart: |  mvninstall  |   3m 55s |  the patch passed  |
   | +1 :green_heart: |  compile  |   1m 32s |  the patch passed  |
   | +1 :green_heart: |  javac  |   1m 32s |  the patch passed  |
   | +1 :green_heart: |  shadedjars  |   5m 42s |  patch has no errors when building our shaded downstream artifacts.  |
   | -0 :warning: |  javadoc  |   0m 26s |  hbase-client in the patch failed.  |
   | -0 :warning: |  javadoc  |   0m 38s |  hbase-server in the patch failed.  |
   ||| _ Other Tests _ |
   | +1 :green_heart: |  unit  |   1m  6s |  hbase-client in the patch passed.  |
   | +1 :green_heart: |  unit  | 124m 19s |  hbase-server in the patch passed.  |
   |  |   | 157m  0s |   |
   
   
   | Subsystem | Report/Notes |
   |----------:|:-------------|
   | Docker | Client=19.03.11 Server=19.03.11 base: https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-1858/4/artifact/yetus-jdk11-hadoop3-check/output/Dockerfile |
   | GITHUB PR | https://github.com/apache/hbase/pull/1858 |
   | Optional Tests | javac javadoc unit shadedjars compile |
   | uname | Linux f9908f620acb 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/hbase-personality.sh |
   | git revision | master / f2fde77fc3 |
   | Default Java | 2020-01-14 |
   | javadoc | https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-1858/4/artifact/yetus-jdk11-hadoop3-check/output/branch-javadoc-hbase-client.txt |
   | javadoc | https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-1858/4/artifact/yetus-jdk11-hadoop3-check/output/branch-javadoc-hbase-server.txt |
   | javadoc | https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-1858/4/artifact/yetus-jdk11-hadoop3-check/output/patch-javadoc-hbase-client.txt |
   | javadoc | https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-1858/4/artifact/yetus-jdk11-hadoop3-check/output/patch-javadoc-hbase-server.txt |
   |  Test Results | https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-1858/4/testReport/ |
   | Max. process+thread count | 4144 (vs. ulimit of 12500) |
   | modules | C: hbase-client hbase-server U: . |
   | Console output | https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-1858/4/console |
   | versions | git=2.17.1 maven=(cecedd343002696d0abb50b32b541b8a6ba2883f) |
   | Powered by | Apache Yetus 0.11.1 https://yetus.apache.org |
   
   
   This message was automatically generated.
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hbase] Apache9 commented on a change in pull request #1858: HBASE-24506 async client deadlock

Posted by GitBox <gi...@apache.org>.
Apache9 commented on a change in pull request #1858:
URL: https://github.com/apache/hbase/pull/1858#discussion_r436244190



##########
File path: hbase-client/src/main/java/org/apache/hadoop/hbase/ipc/NettyRpcConnection.java
##########
@@ -73,40 +73,53 @@
   private static final Logger LOG = LoggerFactory.getLogger(NettyRpcConnection.class);
 
   private static final ScheduledExecutorService RELOGIN_EXECUTOR =
-      Executors.newSingleThreadScheduledExecutor(Threads.newDaemonThreadFactory("Relogin"));
+    Executors.newSingleThreadScheduledExecutor(Threads.newDaemonThreadFactory("Relogin"));
 
   private final NettyRpcClient rpcClient;
 
+  // the event loop used to set up the connection, we will also execute other operations for this
+  // connection in this event loop, to avoid locking everywhere.
+  private final EventLoop eventLoop;
+
   private ByteBuf connectionHeaderPreamble;
 
   private ByteBuf connectionHeaderWithLength;
 
-  @edu.umd.cs.findbugs.annotations.SuppressWarnings(value = "IS2_INCONSISTENT_SYNC",
-      justification = "connect is also under lock as notifyOnCancel will call our action directly")
-  private Channel channel;
+  private volatile Channel channel;
 
   NettyRpcConnection(NettyRpcClient rpcClient, ConnectionId remoteId) throws IOException {
     super(rpcClient.conf, AbstractRpcClient.WHEEL_TIMER, remoteId, rpcClient.clusterId,
-        rpcClient.userProvider.isHBaseSecurityEnabled(), rpcClient.codec, rpcClient.compressor);
+      rpcClient.userProvider.isHBaseSecurityEnabled(), rpcClient.codec, rpcClient.compressor);
     this.rpcClient = rpcClient;
+    this.eventLoop = rpcClient.group.next();
     byte[] connectionHeaderPreamble = getConnectionHeaderPreamble();
     this.connectionHeaderPreamble =
-        Unpooled.directBuffer(connectionHeaderPreamble.length).writeBytes(connectionHeaderPreamble);
+      Unpooled.directBuffer(connectionHeaderPreamble.length).writeBytes(connectionHeaderPreamble);
     ConnectionHeader header = getConnectionHeader();
     this.connectionHeaderWithLength = Unpooled.directBuffer(4 + header.getSerializedSize());
     this.connectionHeaderWithLength.writeInt(header.getSerializedSize());
     header.writeTo(new ByteBufOutputStream(this.connectionHeaderWithLength));
   }
 
-  @Override
-  protected synchronized void callTimeout(Call call) {
-    if (channel != null) {
-      channel.pipeline().fireUserEventTriggered(new CallEvent(TIMEOUT, call));
+  private void execute(Runnable action) {
+    if (eventLoop.inEventLoop()) {

Review comment:
       If your client is also in the same event loop thread then it will return yes.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hbase] Apache-HBase commented on pull request #1858: HBASE-24506 async client deadlock

Posted by GitBox <gi...@apache.org>.
Apache-HBase commented on pull request #1858:
URL: https://github.com/apache/hbase/pull/1858#issuecomment-640065523


   :confetti_ball: **+1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime | Comment |
   |:----:|----------:|--------:|:--------|
   | +0 :ok: |  reexec  |   1m 10s |  Docker mode activated.  |
   ||| _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  No case conflicting files found.  |
   | +1 :green_heart: |  hbaseanti  |   0m  0s |  Patch does not have any anti-patterns.  |
   | +1 :green_heart: |  @author  |   0m  0s |  The patch does not contain any @author tags.  |
   ||| _ master Compile Tests _ |
   | +0 :ok: |  mvndep  |   0m 23s |  Maven dependency ordering for branch  |
   | +1 :green_heart: |  mvninstall  |   3m 55s |  master passed  |
   | +1 :green_heart: |  checkstyle  |   1m 38s |  master passed  |
   | +1 :green_heart: |  spotbugs  |   3m  7s |  master passed  |
   ||| _ Patch Compile Tests _ |
   | +0 :ok: |  mvndep  |   0m 12s |  Maven dependency ordering for patch  |
   | +1 :green_heart: |  mvninstall  |   3m 41s |  the patch passed  |
   | +1 :green_heart: |  checkstyle  |   0m 28s |  hbase-client: The patch generated 0 new + 0 unchanged - 4 fixed = 0 total (was 4)  |
   | +1 :green_heart: |  checkstyle  |   1m 10s |  The patch passed checkstyle in hbase-server  |
   | +1 :green_heart: |  whitespace  |   0m  0s |  The patch has no whitespace issues.  |
   | +1 :green_heart: |  hadoopcheck  |  12m 14s |  Patch does not cause any errors with Hadoop 3.1.2 3.2.1.  |
   | +1 :green_heart: |  spotbugs  |   3m 24s |  the patch passed  |
   ||| _ Other Tests _ |
   | +1 :green_heart: |  asflicense  |   0m 23s |  The patch does not generate ASF License warnings.  |
   |  |   |  39m 40s |   |
   
   
   | Subsystem | Report/Notes |
   |----------:|:-------------|
   | Docker | Client=19.03.11 Server=19.03.11 base: https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-1858/3/artifact/yetus-general-check/output/Dockerfile |
   | GITHUB PR | https://github.com/apache/hbase/pull/1858 |
   | Optional Tests | dupname asflicense spotbugs hadoopcheck hbaseanti checkstyle |
   | uname | Linux d4280188456c 4.15.0-91-generic #92-Ubuntu SMP Fri Feb 28 11:09:48 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/hbase-personality.sh |
   | git revision | master / 16116fa35e |
   | Max. process+thread count | 84 (vs. ulimit of 12500) |
   | modules | C: hbase-client hbase-server U: . |
   | Console output | https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-1858/3/console |
   | versions | git=2.17.1 maven=(cecedd343002696d0abb50b32b541b8a6ba2883f) spotbugs=3.1.12 |
   | Powered by | Apache Yetus 0.11.1 https://yetus.apache.org |
   
   
   This message was automatically generated.
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hbase] bharathv commented on a change in pull request #1858: HBASE-24506 async client deadlock

Posted by GitBox <gi...@apache.org>.
bharathv commented on a change in pull request #1858:
URL: https://github.com/apache/hbase/pull/1858#discussion_r436321731



##########
File path: hbase-client/src/main/java/org/apache/hadoop/hbase/ipc/NettyRpcConnection.java
##########
@@ -253,52 +268,38 @@ public void operationComplete(Future<Boolean> future) throws Exception {
   }
 
   private void connect() {
+    assert eventLoop.inEventLoop();
     LOG.trace("Connecting to {}", remoteId.address);
 
-    this.channel = new Bootstrap().group(rpcClient.group).channel(rpcClient.channelClass)
-        .option(ChannelOption.TCP_NODELAY, rpcClient.isTcpNoDelay())
-        .option(ChannelOption.SO_KEEPALIVE, rpcClient.tcpKeepAlive)
-        .option(ChannelOption.CONNECT_TIMEOUT_MILLIS, rpcClient.connectTO)
-        .handler(new BufferCallBeforeInitHandler()).localAddress(rpcClient.localAddr)
-        .remoteAddress(remoteId.address).connect().addListener(new ChannelFutureListener() {
-
-          @Override
-          public void operationComplete(ChannelFuture future) throws Exception {
-            Channel ch = future.channel();
-            if (!future.isSuccess()) {
-              failInit(ch, toIOE(future.cause()));
-              rpcClient.failedServers.addToFailedServers(remoteId.address, future.cause());
-              return;
-            }
-            ch.writeAndFlush(connectionHeaderPreamble.retainedDuplicate());
-            if (useSasl) {
-              saslNegotiate(ch);
-            } else {
-              // send the connection header to server
-              ch.write(connectionHeaderWithLength.retainedDuplicate());
-              established(ch);
-            }
-          }
-        }).channel();
-  }
+    this.channel = new Bootstrap().group(eventLoop).channel(rpcClient.channelClass)
+      .option(ChannelOption.TCP_NODELAY, rpcClient.isTcpNoDelay())
+      .option(ChannelOption.SO_KEEPALIVE, rpcClient.tcpKeepAlive)
+      .option(ChannelOption.CONNECT_TIMEOUT_MILLIS, rpcClient.connectTO)
+      .handler(new BufferCallBeforeInitHandler()).localAddress(rpcClient.localAddr)
+      .remoteAddress(remoteId.address).connect().addListener(new ChannelFutureListener() {
 
-  private void write(Channel ch, final Call call) {
-    ch.writeAndFlush(call).addListener(new ChannelFutureListener() {
-
-      @Override
-      public void operationComplete(ChannelFuture future) throws Exception {
-        // Fail the call if we failed to write it out. This usually because the channel is
-        // closed. This is needed because we may shutdown the channel inside event loop and
-        // there may still be some pending calls in the event loop queue after us.
-        if (!future.isSuccess()) {
-          call.setException(toIOE(future.cause()));
+        @Override
+        public void operationComplete(ChannelFuture future) throws Exception {
+          Channel ch = future.channel();
+          if (!future.isSuccess()) {
+            failInit(ch, toIOE(future.cause()));
+            rpcClient.failedServers.addToFailedServers(remoteId.address, future.cause());
+            return;
+          }
+          ch.writeAndFlush(connectionHeaderPreamble.retainedDuplicate());
+          if (useSasl) {
+            saslNegotiate(ch);
+          } else {
+            // send the connection header to server
+            ch.write(connectionHeaderWithLength.retainedDuplicate());
+            established(ch);
+          }
         }
-      }
-    });
+      }).channel();
   }
 
-  @Override
-  public synchronized void sendRequest(final Call call, HBaseRpcController hrc) throws IOException {

Review comment:
       Thanks for the explanation.
   
   > The lock is to make sure that we always use the up to date Channel instance. 
   
   Ya, this is the part that was not clear to me. As I understand it, Netty channel implementations are thread-safe (correct me if I'm wrong here). If I read the netty code correctly, all the channel/pipeline operations are internally enqueued on the eventloop.
   
   So why can't we just make it a volatile and we have the up to date channel  for all threads to operate on? That removes one lock from the nested locking protocol for Channels and RpcControllers. In other words, whats the need to serialize all the operations on a channel inside NettyRpcConnection, before with synchronized and now with the event loop.
   
   (Also posted a question for you on jira, just so that I understand the problem 100%, thanks for your patience).
   

##########
File path: hbase-client/src/main/java/org/apache/hadoop/hbase/ipc/NettyRpcConnection.java
##########
@@ -253,52 +268,38 @@ public void operationComplete(Future<Boolean> future) throws Exception {
   }
 
   private void connect() {
+    assert eventLoop.inEventLoop();
     LOG.trace("Connecting to {}", remoteId.address);
 
-    this.channel = new Bootstrap().group(rpcClient.group).channel(rpcClient.channelClass)
-        .option(ChannelOption.TCP_NODELAY, rpcClient.isTcpNoDelay())
-        .option(ChannelOption.SO_KEEPALIVE, rpcClient.tcpKeepAlive)
-        .option(ChannelOption.CONNECT_TIMEOUT_MILLIS, rpcClient.connectTO)
-        .handler(new BufferCallBeforeInitHandler()).localAddress(rpcClient.localAddr)
-        .remoteAddress(remoteId.address).connect().addListener(new ChannelFutureListener() {
-
-          @Override
-          public void operationComplete(ChannelFuture future) throws Exception {
-            Channel ch = future.channel();
-            if (!future.isSuccess()) {
-              failInit(ch, toIOE(future.cause()));
-              rpcClient.failedServers.addToFailedServers(remoteId.address, future.cause());
-              return;
-            }
-            ch.writeAndFlush(connectionHeaderPreamble.retainedDuplicate());
-            if (useSasl) {
-              saslNegotiate(ch);
-            } else {
-              // send the connection header to server
-              ch.write(connectionHeaderWithLength.retainedDuplicate());
-              established(ch);
-            }
-          }
-        }).channel();
-  }
+    this.channel = new Bootstrap().group(eventLoop).channel(rpcClient.channelClass)
+      .option(ChannelOption.TCP_NODELAY, rpcClient.isTcpNoDelay())
+      .option(ChannelOption.SO_KEEPALIVE, rpcClient.tcpKeepAlive)
+      .option(ChannelOption.CONNECT_TIMEOUT_MILLIS, rpcClient.connectTO)
+      .handler(new BufferCallBeforeInitHandler()).localAddress(rpcClient.localAddr)
+      .remoteAddress(remoteId.address).connect().addListener(new ChannelFutureListener() {
 
-  private void write(Channel ch, final Call call) {
-    ch.writeAndFlush(call).addListener(new ChannelFutureListener() {
-
-      @Override
-      public void operationComplete(ChannelFuture future) throws Exception {
-        // Fail the call if we failed to write it out. This usually because the channel is
-        // closed. This is needed because we may shutdown the channel inside event loop and
-        // there may still be some pending calls in the event loop queue after us.
-        if (!future.isSuccess()) {
-          call.setException(toIOE(future.cause()));
+        @Override
+        public void operationComplete(ChannelFuture future) throws Exception {
+          Channel ch = future.channel();
+          if (!future.isSuccess()) {
+            failInit(ch, toIOE(future.cause()));
+            rpcClient.failedServers.addToFailedServers(remoteId.address, future.cause());
+            return;
+          }
+          ch.writeAndFlush(connectionHeaderPreamble.retainedDuplicate());
+          if (useSasl) {
+            saslNegotiate(ch);
+          } else {
+            // send the connection header to server
+            ch.write(connectionHeaderWithLength.retainedDuplicate());
+            established(ch);
+          }
         }
-      }
-    });
+      }).channel();
   }
 
-  @Override
-  public synchronized void sendRequest(final Call call, HBaseRpcController hrc) throws IOException {

Review comment:
       Hmm okay. I still think the locking can be made fine grained with rw locks for channel and totally get rid of eventloop but its okay, your perf run shows no regression, so its fine.

##########
File path: hbase-client/src/test/java/org/apache/hadoop/hbase/ipc/TestNettyRpcConnection.java
##########
@@ -0,0 +1,88 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hbase.ipc;
+
+import static org.hamcrest.CoreMatchers.instanceOf;
+import static org.hamcrest.MatcherAssert.assertThat;
+import static org.junit.Assert.assertThrows;
+import static org.junit.Assert.fail;
+
+import java.io.IOException;
+import java.lang.reflect.InvocationTargetException;
+import java.lang.reflect.Method;
+import java.lang.reflect.Modifier;
+import java.net.InetSocketAddress;
+import org.apache.hadoop.hbase.HBaseClassTestRule;
+import org.apache.hadoop.hbase.HBaseConfiguration;
+import org.apache.hadoop.hbase.security.User;
+import org.junit.AfterClass;
+import org.junit.BeforeClass;
+import org.junit.ClassRule;
+import org.junit.Test;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import org.apache.hbase.thirdparty.com.google.common.io.Closeables;
+
+public class TestNettyRpcConnection {
+
+  @ClassRule
+  public static final HBaseClassTestRule CLASS_RULE =
+    HBaseClassTestRule.forClass(TestNettyRpcConnection.class);
+
+  private static final Logger LOG = LoggerFactory.getLogger(TestNettyRpcConnection.class);
+
+  private static NettyRpcClient CLIENT;
+
+  private static NettyRpcConnection CONN;
+
+  @BeforeClass
+  public static void setUp() throws IOException {
+    CLIENT = new NettyRpcClient(HBaseConfiguration.create());
+    CONN = new NettyRpcConnection(CLIENT,
+      new ConnectionId(User.getCurrent(), "test", new InetSocketAddress("localhost", 1234)));
+  }
+
+  @AfterClass
+  public static void tearDown() throws IOException {
+    Closeables.close(CLIENT, true);
+  }
+
+  @Test
+  public void testPrivateMethodExecutedInEventLoop() throws IllegalAccessException {
+    // make sure the test is executed with "-ea"
+    assertThrows(AssertionError.class, () -> {
+      assert false;
+    });
+    for (Method method : NettyRpcConnection.class.getDeclaredMethods()) {
+      if (Modifier.isPrivate(method.getModifiers()) && !method.getName().contains("$")) {
+        LOG.info("checking {}", method);
+        method.setAccessible(true);
+        // all private methods should be called inside the event loop thread, so calling it from

Review comment:
       haha, nice test :-)




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hbase] Apache9 commented on pull request #1858: HBASE-24506 async client deadlock

Posted by GitBox <gi...@apache.org>.
Apache9 commented on pull request #1858:
URL: https://github.com/apache/hbase/pull/1858#issuecomment-640232998


   Set up a 5 nodes cluster with the latest branch-2.2 code.
   
   Run
   `./bin/hbase pe --nomapred --presplit=5 randomWrite 10`
   To generated test data.
   
   Run
   `./bin/hbase pe --nomapred randomRead 10`
   Several times to load all data into block cache
   
   Run
   `./bin/hbase pe --rows=100000 --nomapred randomRead 10`
   Several times, before patch and after patch. Set --rows to 100k is because 1m rows requires too much time to finish.
   
   The latest result is
   Before patch: 
   ```
   2020-06-07 22:54:30,362 INFO  [main] hbase.PerformanceEvaluation: [RandomReadTest] Summary of timings (ms): [141302, 141522, 140684, 138496, 140165, 141211, 141792, 141077, 141257, 137735]
   2020-06-07 22:54:30,418 INFO  [main] hbase.PerformanceEvaluation: [RandomReadTest duration ]    Min: 137735ms   Max: 141792ms   Avg: 140524ms
   2020-06-07 22:54:30,418 INFO  [main] hbase.PerformanceEvaluation: [ Avg latency (us)]   1403
   2020-06-07 22:54:30,418 INFO  [main] hbase.PerformanceEvaluation: [ Avg TPS/QPS]        7117     row per second
   ```
   
   After patch:
   ```
   2020-06-07 22:49:25,923 INFO  [main] hbase.PerformanceEvaluation: [RandomReadTest] Summary of timings (ms): [140656, 140635, 139256, 140225, 141125, 140199, 140405, 142181, 140379, 140557]
   2020-06-07 22:49:25,980 INFO  [main] hbase.PerformanceEvaluation: [RandomReadTest duration ]    Min: 139256ms   Max: 142181ms   Avg: 140561ms
   2020-06-07 22:49:25,980 INFO  [main] hbase.PerformanceEvaluation: [ Avg latency (us)]   1403
   2020-06-07 22:49:25,980 INFO  [main] hbase.PerformanceEvaluation: [ Avg TPS/QPS]        7114     row per second
   ```
   The avg latencies are between 139x-141x for both. As the difference before and after the patch is even smaller than the difference between each run, I do not think there is observable performance impact for the patch.
   
   Thanks.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org