You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by GitBox <gi...@apache.org> on 2020/05/08 13:43:15 UTC

[GitHub] [hbase] virajjasani opened a new pull request #1684: HBASE-24327 : Handle shutdown() if master cannot be contacted

virajjasani opened a new pull request #1684:
URL: https://github.com/apache/hbase/pull/1684


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hbase] Apache-HBase commented on pull request #1684: HBASE-24327 : Handle shutdown() if master cannot be contacted

Posted by GitBox <gi...@apache.org>.
Apache-HBase commented on pull request #1684:
URL: https://github.com/apache/hbase/pull/1684#issuecomment-626235939


   :confetti_ball: **+1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime | Comment |
   |:----:|----------:|--------:|:--------|
   | +0 :ok: |  reexec  |   1m 50s |  Docker mode activated.  |
   | -0 :warning: |  yetus  |   0m  3s |  Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --whitespace-eol-ignore-list --whitespace-tabs-ignore-list --quick-hadoopcheck  |
   ||| _ Prechecks _ |
   ||| _ master Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   3m 55s |  master passed  |
   | +1 :green_heart: |  compile  |   0m 54s |  master passed  |
   | +1 :green_heart: |  shadedjars  |   5m 11s |  branch has no errors when building our shaded downstream artifacts.  |
   | +1 :green_heart: |  javadoc  |   0m 38s |  master passed  |
   ||| _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   3m 33s |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 54s |  the patch passed  |
   | +1 :green_heart: |  javac  |   0m 54s |  the patch passed  |
   | +1 :green_heart: |  shadedjars  |   5m 16s |  patch has no errors when building our shaded downstream artifacts.  |
   | +1 :green_heart: |  javadoc  |   0m 36s |  the patch passed  |
   ||| _ Other Tests _ |
   | +1 :green_heart: |  unit  | 151m 19s |  hbase-server in the patch passed.  |
   |  |   | 176m 33s |   |
   
   
   | Subsystem | Report/Notes |
   |----------:|:-------------|
   | Docker | Client=19.03.8 Server=19.03.8 base: https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-1684/2/artifact/yetus-jdk8-hadoop3-check/output/Dockerfile |
   | GITHUB PR | https://github.com/apache/hbase/pull/1684 |
   | Optional Tests | javac javadoc unit shadedjars compile |
   | uname | Linux 87ff77db30a4 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/hbase-personality.sh |
   | git revision | master / f3c0a0e06d |
   | Default Java | 1.8.0_232 |
   |  Test Results | https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-1684/2/testReport/ |
   | Max. process+thread count | 4058 (vs. ulimit of 12500) |
   | modules | C: hbase-server U: hbase-server |
   | Console output | https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-1684/2/console |
   | versions | git=2.17.1 maven=(cecedd343002696d0abb50b32b541b8a6ba2883f) |
   | Powered by | Apache Yetus 0.11.1 https://yetus.apache.org |
   
   
   This message was automatically generated.
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hbase] virajjasani edited a comment on pull request #1684: HBASE-24327 : Handle shutdown() if master cannot be contacted

Posted by GitBox <gi...@apache.org>.
virajjasani edited a comment on pull request #1684:
URL: https://github.com/apache/hbase/pull/1684#issuecomment-626314537


   > A couple of changes, otherwise patch looks fine to me.
   
   Thanks for the review. I addressed your concerns and ran test 115 + 125 times in a loop with the latest patch. Let me merge it and we can start observing flaky reports.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hbase] Apache-HBase commented on pull request #1684: HBASE-24327 : Handle shutdown() if master cannot be contacted

Posted by GitBox <gi...@apache.org>.
Apache-HBase commented on pull request #1684:
URL: https://github.com/apache/hbase/pull/1684#issuecomment-626303991


   :confetti_ball: **+1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime | Comment |
   |:----:|----------:|--------:|:--------|
   | +0 :ok: |  reexec  |   0m 34s |  Docker mode activated.  |
   | -0 :warning: |  yetus  |   0m  3s |  Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --whitespace-eol-ignore-list --whitespace-tabs-ignore-list --quick-hadoopcheck  |
   ||| _ Prechecks _ |
   ||| _ master Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   4m 30s |  master passed  |
   | +1 :green_heart: |  compile  |   1m  6s |  master passed  |
   | +1 :green_heart: |  shadedjars  |   5m 31s |  branch has no errors when building our shaded downstream artifacts.  |
   | -0 :warning: |  javadoc  |   0m 41s |  hbase-server in master failed.  |
   ||| _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   4m  7s |  the patch passed  |
   | +1 :green_heart: |  compile  |   1m  5s |  the patch passed  |
   | +1 :green_heart: |  javac  |   1m  5s |  the patch passed  |
   | +1 :green_heart: |  shadedjars  |   5m 48s |  patch has no errors when building our shaded downstream artifacts.  |
   | -0 :warning: |  javadoc  |   0m 50s |  hbase-server in the patch failed.  |
   ||| _ Other Tests _ |
   | +1 :green_heart: |  unit  | 141m 25s |  hbase-server in the patch passed.  |
   |  |   | 167m 44s |   |
   
   
   | Subsystem | Report/Notes |
   |----------:|:-------------|
   | Docker | Client=19.03.8 Server=19.03.8 base: https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-1684/3/artifact/yetus-jdk11-hadoop3-check/output/Dockerfile |
   | GITHUB PR | https://github.com/apache/hbase/pull/1684 |
   | Optional Tests | javac javadoc unit shadedjars compile |
   | uname | Linux 930f6a158593 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/hbase-personality.sh |
   | git revision | master / f3c0a0e06d |
   | Default Java | 2020-01-14 |
   | javadoc | https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-1684/3/artifact/yetus-jdk11-hadoop3-check/output/branch-javadoc-hbase-server.txt |
   | javadoc | https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-1684/3/artifact/yetus-jdk11-hadoop3-check/output/patch-javadoc-hbase-server.txt |
   |  Test Results | https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-1684/3/testReport/ |
   | Max. process+thread count | 3889 (vs. ulimit of 12500) |
   | modules | C: hbase-server U: hbase-server |
   | Console output | https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-1684/3/console |
   | versions | git=2.17.1 maven=(cecedd343002696d0abb50b32b541b8a6ba2883f) |
   | Powered by | Apache Yetus 0.11.1 https://yetus.apache.org |
   
   
   This message was automatically generated.
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hbase] virajjasani commented on a change in pull request #1684: HBASE-24327 : Handle shutdown() if master cannot be contacted

Posted by GitBox <gi...@apache.org>.
virajjasani commented on a change in pull request #1684:
URL: https://github.com/apache/hbase/pull/1684#discussion_r422508178



##########
File path: hbase-server/src/test/java/org/apache/hadoop/hbase/master/TestMasterShutdown.java
##########
@@ -163,7 +164,16 @@ public void testMasterShutdownBeforeStartingAnyRegionServer() throws Exception {
       assertNotEquals("Timeout waiting for server manager to become available.",
         -1, Waiter.waitFor(htu.getConfiguration(), timeout,
           () -> masterThread.getMaster().getServerManager() != null));
-      htu.getConnection().getAdmin().shutdown();
+      try {
+        htu.getConnection().getAdmin().shutdown();

Review comment:
       Thanks @bharathv for the context. IMHO we should make shutdown() async in the test until fixing HBASE-24070.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hbase] virajjasani commented on a change in pull request #1684: HBASE-24327 : Handle shutdown() if master cannot be contacted

Posted by GitBox <gi...@apache.org>.
virajjasani commented on a change in pull request #1684:
URL: https://github.com/apache/hbase/pull/1684#discussion_r422339899



##########
File path: hbase-server/src/test/java/org/apache/hadoop/hbase/master/TestMasterShutdown.java
##########
@@ -163,7 +164,16 @@ public void testMasterShutdownBeforeStartingAnyRegionServer() throws Exception {
       assertNotEquals("Timeout waiting for server manager to become available.",
         -1, Waiter.waitFor(htu.getConfiguration(), timeout,
           () -> masterThread.getMaster().getServerManager() != null));
-      htu.getConnection().getAdmin().shutdown();
+      try {
+        htu.getConnection().getAdmin().shutdown();

Review comment:
       Btw the root cause that I have seen for the above Exception:
   
   ```
   2020-05-09 00:58:39,957 ERROR [RpcServer.priority.RWQ.Fifo.read.handler=1,queue=1,port=53033] master.HMaster(2878): ZooKeeper exception trying to set cluster as down in ZK
   org.apache.zookeeper.KeeperException$SystemErrorException: KeeperErrorCode = SystemError
   	at org.apache.hadoop.hbase.zookeeper.ZKWatcher.interruptedException(ZKWatcher.java:626)
   	at org.apache.hadoop.hbase.zookeeper.ZKUtil.deleteNode(ZKUtil.java:1285)
   	at org.apache.hadoop.hbase.zookeeper.ZKUtil.deleteNode(ZKUtil.java:1269)
   	at org.apache.hadoop.hbase.zookeeper.ClusterStatusTracker.setClusterDown(ClusterStatusTracker.java:84)
   	at org.apache.hadoop.hbase.master.HMaster.shutdown(HMaster.java:2876)
   	at org.apache.hadoop.hbase.master.MasterRpcServices.shutdown(MasterRpcServices.java:1630)
   	at org.apache.hadoop.hbase.shaded.protobuf.generated.MasterProtos$MasterService$2.callBlockingMethod(MasterProtos.java)
   	at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:395)
   	at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:133)
   	at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:338)
   	at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:318)
   Caused by: java.lang.InterruptedException
   	at java.lang.Object.wait(Native Method)
   	at java.lang.Object.wait(Object.java:502)
   	at org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1529)
   	at org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1512)
   	at org.apache.zookeeper.ZooKeeper.delete(ZooKeeper.java:1791)
   	at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.delete(RecoverableZooKeeper.java:171)
   	at org.apache.hadoop.hbase.zookeeper.ZKUtil.deleteNode(ZKUtil.java:1280)
   	... 9 more
   2020-05-09 00:58:39,957 DEBUG [Time-limited test-EventThread] zookeeper.ZKWatcher(490): master:53033-0x1000d0785dd0000, quorum=127.0.0.1:60460, baseZNode=/hbase Received ZooKeeper Event, type=NodeDeleted, state=SyncConnected, path=/hbase/master
   2020-05-09 00:58:39,957 INFO  [M:0;172.20.10.2:53033] regionserver.HRegionServer(1119): stopping server 172.20.10.2,53033,1588966118448; all regions closed.
   2020-05-09 00:58:39,958 INFO  [M:0;172.20.10.2:53033] hbase.ChoreService(329): Chore service for: master/172.20.10.2:0 had [] on shutdown
   2020-05-09 00:58:39,958 DEBUG [M:0;172.20.10.2:53033] master.HMaster(1516): Stopping service threads
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hbase] virajjasani commented on pull request #1684: HBASE-24327 : Handle shutdown() if master cannot be contacted

Posted by GitBox <gi...@apache.org>.
virajjasani commented on pull request #1684:
URL: https://github.com/apache/hbase/pull/1684#issuecomment-626474121


   Oh no, this failed again: https://builds.apache.org/job/HBase-Flaky-Tests/job/master/6216/testReport/org.apache.hadoop.hbase.master/TestMasterShutdown/testMasterShutdownBeforeStartingAnyRegionServer/  :( 
   
   Let me reopen the Jira.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hbase] bharathv commented on a change in pull request #1684: HBASE-24327 : Handle shutdown() if master cannot be contacted

Posted by GitBox <gi...@apache.org>.
bharathv commented on a change in pull request #1684:
URL: https://github.com/apache/hbase/pull/1684#discussion_r422555603



##########
File path: hbase-server/src/test/java/org/apache/hadoop/hbase/master/TestMasterShutdown.java
##########
@@ -151,19 +156,47 @@ public void testMasterShutdownBeforeStartingAnyRegionServer() throws Exception {
       hbaseCluster = new LocalHBaseCluster(htu.getConfiguration(), options.getNumMasters(),
         options.getNumRegionServers(), options.getMasterClass(), options.getRsClass());
       final MasterThread masterThread = hbaseCluster.getMasters().get(0);
+
+      final CompletableFuture<Void> shutdownFuture = CompletableFuture.runAsync(() -> {
+        // Switching to master registry exacerbated a race in the master bootstrap that can result
+        // in a lost shutdown command (HBASE-8422, HBASE-23836). The race is essentially because
+        // the server manager in HMaster is not initialized by the time shutdown() RPC (below) is
+        // made to the master. The suspected reason as to why it was uncommon before HBASE-18095
+        // is because the connection creation with ZK registry is so slow that by then the server
+        // manager is usually init'ed in time for the RPC to be made. For now, adding an explicit
+        // wait() in the test, waiting for the server manager to become available.
+        final long timeout = TimeUnit.MINUTES.toMillis(10);
+        assertNotEquals("timeout waiting for server manager to become available.", -1,
+          htu.waitFor(timeout, () -> masterThread.getMaster().getServerManager() != null));
+
+        // Master has come up far enough that we can terminate it without creating a zombie.
+        final long result = htu.waitFor(timeout, 1000, () -> {
+          final Configuration conf = createResponsiveZkConfig(htu.getConfiguration());
+          LOG.debug("Attempting to establish connection.");
+          final CompletableFuture<AsyncConnection> connFuture =
+            ConnectionFactory.createAsyncConnection(conf);
+          try (final AsyncConnection conn = connFuture.join()) {

Review comment:
       we can use htu.getConnection()?

##########
File path: hbase-server/src/test/java/org/apache/hadoop/hbase/master/TestMasterShutdown.java
##########
@@ -151,19 +156,47 @@ public void testMasterShutdownBeforeStartingAnyRegionServer() throws Exception {
       hbaseCluster = new LocalHBaseCluster(htu.getConfiguration(), options.getNumMasters(),
         options.getNumRegionServers(), options.getMasterClass(), options.getRsClass());
       final MasterThread masterThread = hbaseCluster.getMasters().get(0);
+
+      final CompletableFuture<Void> shutdownFuture = CompletableFuture.runAsync(() -> {
+        // Switching to master registry exacerbated a race in the master bootstrap that can result
+        // in a lost shutdown command (HBASE-8422, HBASE-23836). The race is essentially because
+        // the server manager in HMaster is not initialized by the time shutdown() RPC (below) is
+        // made to the master. The suspected reason as to why it was uncommon before HBASE-18095
+        // is because the connection creation with ZK registry is so slow that by then the server
+        // manager is usually init'ed in time for the RPC to be made. For now, adding an explicit
+        // wait() in the test, waiting for the server manager to become available.
+        final long timeout = TimeUnit.MINUTES.toMillis(10);
+        assertNotEquals("timeout waiting for server manager to become available.", -1,
+          htu.waitFor(timeout, () -> masterThread.getMaster().getServerManager() != null));
+
+        // Master has come up far enough that we can terminate it without creating a zombie.
+        final long result = htu.waitFor(timeout, 1000, () -> {
+          final Configuration conf = createResponsiveZkConfig(htu.getConfiguration());
+          LOG.debug("Attempting to establish connection.");
+          final CompletableFuture<AsyncConnection> connFuture =
+            ConnectionFactory.createAsyncConnection(conf);
+          try (final AsyncConnection conn = connFuture.join()) {
+            LOG.info("Sending shutdown RPC.");
+            try {
+              conn.getAdmin().shutdown().join();
+              LOG.info("Shutdown RPC sent.");
+              return true;
+            } catch (CompletionException e) {
+              LOG.error("Failure sending shutdown RPC.");
+            }
+          } catch (IOException|CompletionException e) {
+            LOG.error("Failed to establish connection.");
+          } catch (Throwable e) {
+            LOG.error("Something unexpected happened.", e);
+          }
+          return false;
+        });
+        assertNotEquals("Failed to issue shutdown RPC after " + Duration.ofMillis(timeout),
+          -1, result);
+      });
+
       masterThread.start();

Review comment:
       Shouldn't this be _before_ we trigger the shutdown command?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hbase] bharathv commented on a change in pull request #1684: HBASE-24327 : Handle shutdown() if master cannot be contacted

Posted by GitBox <gi...@apache.org>.
bharathv commented on a change in pull request #1684:
URL: https://github.com/apache/hbase/pull/1684#discussion_r422412820



##########
File path: hbase-server/src/test/java/org/apache/hadoop/hbase/master/TestMasterShutdown.java
##########
@@ -163,7 +164,16 @@ public void testMasterShutdownBeforeStartingAnyRegionServer() throws Exception {
       assertNotEquals("Timeout waiting for server manager to become available.",
         -1, Waiter.waitFor(htu.getConfiguration(), timeout,
           () -> masterThread.getMaster().getServerManager() != null));
-      htu.getConnection().getAdmin().shutdown();
+      try {
+        htu.getConnection().getAdmin().shutdown();

Review comment:
       Ya, doesn't look like it. This is a known problem I think (Root cause is a flaky Zk connection. I can repro it locally once every ~100 runs). The problem here seems to me that shutdown is running inline with the rpc call [1]. Nick actually made the shutdown call async in HBASE-23808 but Stack undid that in HBASE-24052 [2]. Perhaps the fix is to actually make it async in test until we fix HBASE-24070? Thoughts? (@saintstack curious what the rationale was in HBASE-24052).
   
   [1] https://issues.apache.org/jira/browse/HBASE-24070 
   [2] https://issues.apache.org/jira/browse/HBASE-24052?focusedCommentId=17068822&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17068822




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hbase] virajjasani commented on a change in pull request #1684: HBASE-24327 : Handle shutdown() if master cannot be contacted

Posted by GitBox <gi...@apache.org>.
virajjasani commented on a change in pull request #1684:
URL: https://github.com/apache/hbase/pull/1684#discussion_r422315817



##########
File path: hbase-server/src/test/java/org/apache/hadoop/hbase/master/TestMasterShutdown.java
##########
@@ -163,7 +164,16 @@ public void testMasterShutdownBeforeStartingAnyRegionServer() throws Exception {
       assertNotEquals("Timeout waiting for server manager to become available.",
         -1, Waiter.waitFor(htu.getConfiguration(), timeout,
           () -> masterThread.getMaster().getServerManager() != null));
-      htu.getConnection().getAdmin().shutdown();
+      try {
+        htu.getConnection().getAdmin().shutdown();

Review comment:
       This is what I thought that after waitFor(), bootstrap should be over but in 2/50 cases, this stacktrace comes. This test is also present in flaky report and I just realized that flaky results data accessible is 2 days after Duo's commit. Let me check 5-10 older builds before the commit and see if this was not flaky before.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hbase] virajjasani commented on a change in pull request #1684: HBASE-24327 : Handle shutdown() if master cannot be contacted

Posted by GitBox <gi...@apache.org>.
virajjasani commented on a change in pull request #1684:
URL: https://github.com/apache/hbase/pull/1684#discussion_r422675456



##########
File path: hbase-server/src/test/java/org/apache/hadoop/hbase/master/TestMasterShutdown.java
##########
@@ -151,19 +156,47 @@ public void testMasterShutdownBeforeStartingAnyRegionServer() throws Exception {
       hbaseCluster = new LocalHBaseCluster(htu.getConfiguration(), options.getNumMasters(),
         options.getNumRegionServers(), options.getMasterClass(), options.getRsClass());
       final MasterThread masterThread = hbaseCluster.getMasters().get(0);
+
+      final CompletableFuture<Void> shutdownFuture = CompletableFuture.runAsync(() -> {
+        // Switching to master registry exacerbated a race in the master bootstrap that can result
+        // in a lost shutdown command (HBASE-8422, HBASE-23836). The race is essentially because
+        // the server manager in HMaster is not initialized by the time shutdown() RPC (below) is
+        // made to the master. The suspected reason as to why it was uncommon before HBASE-18095
+        // is because the connection creation with ZK registry is so slow that by then the server
+        // manager is usually init'ed in time for the RPC to be made. For now, adding an explicit
+        // wait() in the test, waiting for the server manager to become available.
+        final long timeout = TimeUnit.MINUTES.toMillis(10);
+        assertNotEquals("timeout waiting for server manager to become available.", -1,
+          htu.waitFor(timeout, () -> masterThread.getMaster().getServerManager() != null));
+
+        // Master has come up far enough that we can terminate it without creating a zombie.
+        final long result = htu.waitFor(timeout, 1000, () -> {
+          final Configuration conf = createResponsiveZkConfig(htu.getConfiguration());
+          LOG.debug("Attempting to establish connection.");
+          final CompletableFuture<AsyncConnection> connFuture =
+            ConnectionFactory.createAsyncConnection(conf);
+          try (final AsyncConnection conn = connFuture.join()) {

Review comment:
       Hmm since it is already in ForkJoin, this doesn't matter much. I don't have strong opinion, but it's just that we are directly using AsyncConnection and AsyncAdmin rather than via Connection and Admin interfaces. But yes even htu.getConnection() should work after putting ZK recovery configs in htu directly.
   However, since it's not much of a diff, do you want Addendum now, or let's wait for some time and see reports and then we can add it maybe in a week?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hbase] virajjasani commented on a change in pull request #1684: HBASE-24327 : Handle shutdown() if master cannot be contacted

Posted by GitBox <gi...@apache.org>.
virajjasani commented on a change in pull request #1684:
URL: https://github.com/apache/hbase/pull/1684#discussion_r422598671



##########
File path: hbase-server/src/test/java/org/apache/hadoop/hbase/master/TestMasterShutdown.java
##########
@@ -151,19 +156,47 @@ public void testMasterShutdownBeforeStartingAnyRegionServer() throws Exception {
       hbaseCluster = new LocalHBaseCluster(htu.getConfiguration(), options.getNumMasters(),
         options.getNumRegionServers(), options.getMasterClass(), options.getRsClass());
       final MasterThread masterThread = hbaseCluster.getMasters().get(0);
+
+      final CompletableFuture<Void> shutdownFuture = CompletableFuture.runAsync(() -> {
+        // Switching to master registry exacerbated a race in the master bootstrap that can result
+        // in a lost shutdown command (HBASE-8422, HBASE-23836). The race is essentially because
+        // the server manager in HMaster is not initialized by the time shutdown() RPC (below) is
+        // made to the master. The suspected reason as to why it was uncommon before HBASE-18095
+        // is because the connection creation with ZK registry is so slow that by then the server
+        // manager is usually init'ed in time for the RPC to be made. For now, adding an explicit
+        // wait() in the test, waiting for the server manager to become available.
+        final long timeout = TimeUnit.MINUTES.toMillis(10);
+        assertNotEquals("timeout waiting for server manager to become available.", -1,
+          htu.waitFor(timeout, () -> masterThread.getMaster().getServerManager() != null));
+
+        // Master has come up far enough that we can terminate it without creating a zombie.
+        final long result = htu.waitFor(timeout, 1000, () -> {
+          final Configuration conf = createResponsiveZkConfig(htu.getConfiguration());
+          LOG.debug("Attempting to establish connection.");
+          final CompletableFuture<AsyncConnection> connFuture =
+            ConnectionFactory.createAsyncConnection(conf);
+          try (final AsyncConnection conn = connFuture.join()) {

Review comment:
       since we want to use AsynAdmin with AsyncConnection, htu.getConnection() wouldn't work.

##########
File path: hbase-server/src/test/java/org/apache/hadoop/hbase/master/TestMasterShutdown.java
##########
@@ -151,19 +156,47 @@ public void testMasterShutdownBeforeStartingAnyRegionServer() throws Exception {
       hbaseCluster = new LocalHBaseCluster(htu.getConfiguration(), options.getNumMasters(),
         options.getNumRegionServers(), options.getMasterClass(), options.getRsClass());
       final MasterThread masterThread = hbaseCluster.getMasters().get(0);
+
+      final CompletableFuture<Void> shutdownFuture = CompletableFuture.runAsync(() -> {
+        // Switching to master registry exacerbated a race in the master bootstrap that can result
+        // in a lost shutdown command (HBASE-8422, HBASE-23836). The race is essentially because
+        // the server manager in HMaster is not initialized by the time shutdown() RPC (below) is
+        // made to the master. The suspected reason as to why it was uncommon before HBASE-18095
+        // is because the connection creation with ZK registry is so slow that by then the server
+        // manager is usually init'ed in time for the RPC to be made. For now, adding an explicit
+        // wait() in the test, waiting for the server manager to become available.
+        final long timeout = TimeUnit.MINUTES.toMillis(10);
+        assertNotEquals("timeout waiting for server manager to become available.", -1,
+          htu.waitFor(timeout, () -> masterThread.getMaster().getServerManager() != null));
+
+        // Master has come up far enough that we can terminate it without creating a zombie.
+        final long result = htu.waitFor(timeout, 1000, () -> {
+          final Configuration conf = createResponsiveZkConfig(htu.getConfiguration());
+          LOG.debug("Attempting to establish connection.");
+          final CompletableFuture<AsyncConnection> connFuture =
+            ConnectionFactory.createAsyncConnection(conf);
+          try (final AsyncConnection conn = connFuture.join()) {
+            LOG.info("Sending shutdown RPC.");
+            try {
+              conn.getAdmin().shutdown().join();
+              LOG.info("Shutdown RPC sent.");
+              return true;
+            } catch (CompletionException e) {
+              LOG.error("Failure sending shutdown RPC.");
+            }
+          } catch (IOException|CompletionException e) {
+            LOG.error("Failed to establish connection.");
+          } catch (Throwable e) {
+            LOG.error("Something unexpected happened.", e);
+          }
+          return false;
+        });
+        assertNotEquals("Failed to issue shutdown RPC after " + Duration.ofMillis(timeout),
+          -1, result);
+      });
+
       masterThread.start();

Review comment:
       Oh yes, let me update.
   Thanks




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hbase] bharathv commented on a change in pull request #1684: HBASE-24327 : Handle shutdown() if master cannot be contacted

Posted by GitBox <gi...@apache.org>.
bharathv commented on a change in pull request #1684:
URL: https://github.com/apache/hbase/pull/1684#discussion_r422305113



##########
File path: hbase-server/src/test/java/org/apache/hadoop/hbase/master/TestMasterShutdown.java
##########
@@ -163,7 +164,16 @@ public void testMasterShutdownBeforeStartingAnyRegionServer() throws Exception {
       assertNotEquals("Timeout waiting for server manager to become available.",
         -1, Waiter.waitFor(htu.getConfiguration(), timeout,
           () -> masterThread.getMaster().getServerManager() != null));
-      htu.getConnection().getAdmin().shutdown();
+      try {
+        htu.getConnection().getAdmin().shutdown();

Review comment:
       I have some on context on this test. Did you happen to dig into the root cause of the ConnectionRefused stack trace that you posted (you have the full stack trace with failed RPC name?)? 
   
   Ideally that shouldn't happen right? The waitFor() above means the bootstrap of the master is complete and it should be able to process the shutdown() command, in a normal case. Just want to be sure we are not masking a real bug, especially after Duo moved all the code from RPC to registry.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hbase] Apache-HBase commented on pull request #1684: HBASE-24327 : Handle shutdown() if master cannot be contacted

Posted by GitBox <gi...@apache.org>.
Apache-HBase commented on pull request #1684:
URL: https://github.com/apache/hbase/pull/1684#issuecomment-626289463


   :confetti_ball: **+1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime | Comment |
   |:----:|----------:|--------:|:--------|
   | +0 :ok: |  reexec  |   4m 15s |  Docker mode activated.  |
   ||| _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  No case conflicting files found.  |
   | +1 :green_heart: |  hbaseanti  |   0m  0s |  Patch does not have any anti-patterns.  |
   | +1 :green_heart: |  @author  |   0m  0s |  The patch does not contain any @author tags.  |
   ||| _ master Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   3m 40s |  master passed  |
   | +1 :green_heart: |  checkstyle  |   1m  5s |  master passed  |
   | +1 :green_heart: |  spotbugs  |   1m 59s |  master passed  |
   ||| _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   3m 25s |  the patch passed  |
   | +1 :green_heart: |  checkstyle  |   1m  3s |  the patch passed  |
   | +1 :green_heart: |  whitespace  |   0m  0s |  The patch has no whitespace issues.  |
   | +1 :green_heart: |  hadoopcheck  |  11m 15s |  Patch does not cause any errors with Hadoop 3.1.2 3.2.1.  |
   | +1 :green_heart: |  spotbugs  |   2m  7s |  the patch passed  |
   ||| _ Other Tests _ |
   | +1 :green_heart: |  asflicense  |   0m 14s |  The patch does not generate ASF License warnings.  |
   |  |   |  36m 15s |   |
   
   
   | Subsystem | Report/Notes |
   |----------:|:-------------|
   | Docker | Client=19.03.8 Server=19.03.8 base: https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-1684/3/artifact/yetus-general-check/output/Dockerfile |
   | GITHUB PR | https://github.com/apache/hbase/pull/1684 |
   | Optional Tests | dupname asflicense spotbugs hadoopcheck hbaseanti checkstyle |
   | uname | Linux bf41cd66476f 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/hbase-personality.sh |
   | git revision | master / f3c0a0e06d |
   | Max. process+thread count | 94 (vs. ulimit of 12500) |
   | modules | C: hbase-server U: hbase-server |
   | Console output | https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-1684/3/console |
   | versions | git=2.17.1 maven=(cecedd343002696d0abb50b32b541b8a6ba2883f) spotbugs=3.1.12 |
   | Powered by | Apache Yetus 0.11.1 https://yetus.apache.org |
   
   
   This message was automatically generated.
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hbase] Apache-HBase commented on pull request #1684: HBASE-24327 : Handle shutdown() if master cannot be contacted

Posted by GitBox <gi...@apache.org>.
Apache-HBase commented on pull request #1684:
URL: https://github.com/apache/hbase/pull/1684#issuecomment-625837709


   :confetti_ball: **+1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime | Comment |
   |:----:|----------:|--------:|:--------|
   | +0 :ok: |  reexec  |   0m 32s |  Docker mode activated.  |
   ||| _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  No case conflicting files found.  |
   | +1 :green_heart: |  hbaseanti  |   0m  0s |  Patch does not have any anti-patterns.  |
   | +1 :green_heart: |  @author  |   0m  0s |  The patch does not contain any @author tags.  |
   ||| _ master Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   3m 42s |  master passed  |
   | +1 :green_heart: |  checkstyle  |   1m  8s |  master passed  |
   | +1 :green_heart: |  spotbugs  |   2m  0s |  master passed  |
   ||| _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   3m 25s |  the patch passed  |
   | +1 :green_heart: |  checkstyle  |   1m  2s |  the patch passed  |
   | +1 :green_heart: |  whitespace  |   0m  0s |  The patch has no whitespace issues.  |
   | +1 :green_heart: |  hadoopcheck  |  11m 11s |  Patch does not cause any errors with Hadoop 3.1.2 3.2.1.  |
   | +1 :green_heart: |  spotbugs  |   2m  6s |  the patch passed  |
   ||| _ Other Tests _ |
   | +1 :green_heart: |  asflicense  |   0m 15s |  The patch does not generate ASF License warnings.  |
   |  |   |  32m 31s |   |
   
   
   | Subsystem | Report/Notes |
   |----------:|:-------------|
   | Docker | Client=19.03.8 Server=19.03.8 base: https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-1684/1/artifact/yetus-general-check/output/Dockerfile |
   | GITHUB PR | https://github.com/apache/hbase/pull/1684 |
   | Optional Tests | dupname asflicense spotbugs hadoopcheck hbaseanti checkstyle |
   | uname | Linux 0104adf7bcdd 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/hbase-personality.sh |
   | git revision | master / 0cd70ed89c |
   | Max. process+thread count | 94 (vs. ulimit of 12500) |
   | modules | C: hbase-server U: hbase-server |
   | Console output | https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-1684/1/console |
   | versions | git=2.17.1 maven=(cecedd343002696d0abb50b32b541b8a6ba2883f) spotbugs=3.1.12 |
   | Powered by | Apache Yetus 0.11.1 https://yetus.apache.org |
   
   
   This message was automatically generated.
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hbase] Apache-HBase commented on pull request #1684: HBASE-24327 : Handle shutdown() if master cannot be contacted

Posted by GitBox <gi...@apache.org>.
Apache-HBase commented on pull request #1684:
URL: https://github.com/apache/hbase/pull/1684#issuecomment-626235181


   :confetti_ball: **+1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime | Comment |
   |:----:|----------:|--------:|:--------|
   | +0 :ok: |  reexec  |   1m 50s |  Docker mode activated.  |
   | -0 :warning: |  yetus  |   0m  3s |  Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --whitespace-eol-ignore-list --whitespace-tabs-ignore-list --quick-hadoopcheck  |
   ||| _ Prechecks _ |
   ||| _ master Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   4m 31s |  master passed  |
   | +1 :green_heart: |  compile  |   1m  6s |  master passed  |
   | +1 :green_heart: |  shadedjars  |   5m 26s |  branch has no errors when building our shaded downstream artifacts.  |
   | -0 :warning: |  javadoc  |   0m 42s |  hbase-server in master failed.  |
   ||| _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   4m  6s |  the patch passed  |
   | +1 :green_heart: |  compile  |   1m  5s |  the patch passed  |
   | +1 :green_heart: |  javac  |   1m  5s |  the patch passed  |
   | +1 :green_heart: |  shadedjars  |   5m 49s |  patch has no errors when building our shaded downstream artifacts.  |
   | -0 :warning: |  javadoc  |   0m 50s |  hbase-server in the patch failed.  |
   ||| _ Other Tests _ |
   | +1 :green_heart: |  unit  | 141m 34s |  hbase-server in the patch passed.  |
   |  |   | 169m 11s |   |
   
   
   | Subsystem | Report/Notes |
   |----------:|:-------------|
   | Docker | Client=19.03.8 Server=19.03.8 base: https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-1684/2/artifact/yetus-jdk11-hadoop3-check/output/Dockerfile |
   | GITHUB PR | https://github.com/apache/hbase/pull/1684 |
   | Optional Tests | javac javadoc unit shadedjars compile |
   | uname | Linux 1ddcfb3acd81 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/hbase-personality.sh |
   | git revision | master / f3c0a0e06d |
   | Default Java | 2020-01-14 |
   | javadoc | https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-1684/2/artifact/yetus-jdk11-hadoop3-check/output/branch-javadoc-hbase-server.txt |
   | javadoc | https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-1684/2/artifact/yetus-jdk11-hadoop3-check/output/patch-javadoc-hbase-server.txt |
   |  Test Results | https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-1684/2/testReport/ |
   | Max. process+thread count | 4011 (vs. ulimit of 12500) |
   | modules | C: hbase-server U: hbase-server |
   | Console output | https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-1684/2/console |
   | versions | git=2.17.1 maven=(cecedd343002696d0abb50b32b541b8a6ba2883f) |
   | Powered by | Apache Yetus 0.11.1 https://yetus.apache.org |
   
   
   This message was automatically generated.
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hbase] virajjasani commented on a change in pull request #1684: HBASE-24327 : Handle shutdown() if master cannot be contacted

Posted by GitBox <gi...@apache.org>.
virajjasani commented on a change in pull request #1684:
URL: https://github.com/apache/hbase/pull/1684#discussion_r422318551



##########
File path: hbase-server/src/test/java/org/apache/hadoop/hbase/master/TestMasterShutdown.java
##########
@@ -163,7 +164,16 @@ public void testMasterShutdownBeforeStartingAnyRegionServer() throws Exception {
       assertNotEquals("Timeout waiting for server manager to become available.",
         -1, Waiter.waitFor(htu.getConfiguration(), timeout,
           () -> masterThread.getMaster().getServerManager() != null));
-      htu.getConnection().getAdmin().shutdown();
+      try {
+        htu.getConnection().getAdmin().shutdown();

Review comment:
       Test results before 5th May doesn't seem available and the flakes seem present on the day of commit (need to check timezone diff): https://builds.apache.org/job/HBase-Flaky-Tests/job/master/6184/




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hbase] Apache-HBase commented on pull request #1684: HBASE-24327 : Handle shutdown() if master cannot be contacted

Posted by GitBox <gi...@apache.org>.
Apache-HBase commented on pull request #1684:
URL: https://github.com/apache/hbase/pull/1684#issuecomment-626304880


   :confetti_ball: **+1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime | Comment |
   |:----:|----------:|--------:|:--------|
   | +0 :ok: |  reexec  |   0m 32s |  Docker mode activated.  |
   | -0 :warning: |  yetus  |   0m  3s |  Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --whitespace-eol-ignore-list --whitespace-tabs-ignore-list --quick-hadoopcheck  |
   ||| _ Prechecks _ |
   ||| _ master Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   3m 47s |  master passed  |
   | +1 :green_heart: |  compile  |   0m 55s |  master passed  |
   | +1 :green_heart: |  shadedjars  |   5m 11s |  branch has no errors when building our shaded downstream artifacts.  |
   | +1 :green_heart: |  javadoc  |   0m 38s |  master passed  |
   ||| _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   3m 37s |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 54s |  the patch passed  |
   | +1 :green_heart: |  javac  |   0m 54s |  the patch passed  |
   | +1 :green_heart: |  shadedjars  |   5m 10s |  patch has no errors when building our shaded downstream artifacts.  |
   | +1 :green_heart: |  javadoc  |   0m 35s |  the patch passed  |
   ||| _ Other Tests _ |
   | +1 :green_heart: |  unit  | 151m 38s |  hbase-server in the patch passed.  |
   |  |   | 175m 12s |   |
   
   
   | Subsystem | Report/Notes |
   |----------:|:-------------|
   | Docker | Client=19.03.8 Server=19.03.8 base: https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-1684/3/artifact/yetus-jdk8-hadoop3-check/output/Dockerfile |
   | GITHUB PR | https://github.com/apache/hbase/pull/1684 |
   | Optional Tests | javac javadoc unit shadedjars compile |
   | uname | Linux befbc1b813e4 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/hbase-personality.sh |
   | git revision | master / f3c0a0e06d |
   | Default Java | 1.8.0_232 |
   |  Test Results | https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-1684/3/testReport/ |
   | Max. process+thread count | 3991 (vs. ulimit of 12500) |
   | modules | C: hbase-server U: hbase-server |
   | Console output | https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-1684/3/console |
   | versions | git=2.17.1 maven=(cecedd343002696d0abb50b32b541b8a6ba2883f) |
   | Powered by | Apache Yetus 0.11.1 https://yetus.apache.org |
   
   
   This message was automatically generated.
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hbase] Apache-HBase commented on pull request #1684: HBASE-24327 : Handle shutdown() if master cannot be contacted

Posted by GitBox <gi...@apache.org>.
Apache-HBase commented on pull request #1684:
URL: https://github.com/apache/hbase/pull/1684#issuecomment-625891717


   :confetti_ball: **+1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime | Comment |
   |:----:|----------:|--------:|:--------|
   | +0 :ok: |  reexec  |   0m 33s |  Docker mode activated.  |
   | -0 :warning: |  yetus  |   0m  3s |  Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --whitespace-eol-ignore-list --whitespace-tabs-ignore-list --quick-hadoopcheck  |
   ||| _ Prechecks _ |
   ||| _ master Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   4m 12s |  master passed  |
   | +1 :green_heart: |  compile  |   1m  3s |  master passed  |
   | +1 :green_heart: |  shadedjars  |   5m 24s |  branch has no errors when building our shaded downstream artifacts.  |
   | -0 :warning: |  javadoc  |   0m 42s |  hbase-server in master failed.  |
   ||| _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   4m  0s |  the patch passed  |
   | +1 :green_heart: |  compile  |   1m  3s |  the patch passed  |
   | +1 :green_heart: |  javac  |   1m  3s |  the patch passed  |
   | +1 :green_heart: |  shadedjars  |   5m 20s |  patch has no errors when building our shaded downstream artifacts.  |
   | -0 :warning: |  javadoc  |   0m 40s |  hbase-server in the patch failed.  |
   ||| _ Other Tests _ |
   | +1 :green_heart: |  unit  | 122m 41s |  hbase-server in the patch passed.  |
   |  |   | 147m 45s |   |
   
   
   | Subsystem | Report/Notes |
   |----------:|:-------------|
   | Docker | Client=19.03.8 Server=19.03.8 base: https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-1684/1/artifact/yetus-jdk11-hadoop3-check/output/Dockerfile |
   | GITHUB PR | https://github.com/apache/hbase/pull/1684 |
   | Optional Tests | javac javadoc unit shadedjars compile |
   | uname | Linux cf2b58b7e138 4.15.0-60-generic #67-Ubuntu SMP Thu Aug 22 16:55:30 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/hbase-personality.sh |
   | git revision | master / 0cd70ed89c |
   | Default Java | 2020-01-14 |
   | javadoc | https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-1684/1/artifact/yetus-jdk11-hadoop3-check/output/branch-javadoc-hbase-server.txt |
   | javadoc | https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-1684/1/artifact/yetus-jdk11-hadoop3-check/output/patch-javadoc-hbase-server.txt |
   |  Test Results | https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-1684/1/testReport/ |
   | Max. process+thread count | 3837 (vs. ulimit of 12500) |
   | modules | C: hbase-server U: hbase-server |
   | Console output | https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-1684/1/console |
   | versions | git=2.17.1 maven=(cecedd343002696d0abb50b32b541b8a6ba2883f) |
   | Powered by | Apache Yetus 0.11.1 https://yetus.apache.org |
   
   
   This message was automatically generated.
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hbase] bharathv commented on a change in pull request #1684: HBASE-24327 : Handle shutdown() if master cannot be contacted

Posted by GitBox <gi...@apache.org>.
bharathv commented on a change in pull request #1684:
URL: https://github.com/apache/hbase/pull/1684#discussion_r422669157



##########
File path: hbase-server/src/test/java/org/apache/hadoop/hbase/master/TestMasterShutdown.java
##########
@@ -151,19 +156,47 @@ public void testMasterShutdownBeforeStartingAnyRegionServer() throws Exception {
       hbaseCluster = new LocalHBaseCluster(htu.getConfiguration(), options.getNumMasters(),
         options.getNumRegionServers(), options.getMasterClass(), options.getRsClass());
       final MasterThread masterThread = hbaseCluster.getMasters().get(0);
+
+      final CompletableFuture<Void> shutdownFuture = CompletableFuture.runAsync(() -> {
+        // Switching to master registry exacerbated a race in the master bootstrap that can result
+        // in a lost shutdown command (HBASE-8422, HBASE-23836). The race is essentially because
+        // the server manager in HMaster is not initialized by the time shutdown() RPC (below) is
+        // made to the master. The suspected reason as to why it was uncommon before HBASE-18095
+        // is because the connection creation with ZK registry is so slow that by then the server
+        // manager is usually init'ed in time for the RPC to be made. For now, adding an explicit
+        // wait() in the test, waiting for the server manager to become available.
+        final long timeout = TimeUnit.MINUTES.toMillis(10);
+        assertNotEquals("timeout waiting for server manager to become available.", -1,
+          htu.waitFor(timeout, () -> masterThread.getMaster().getServerManager() != null));
+
+        // Master has come up far enough that we can terminate it without creating a zombie.
+        final long result = htu.waitFor(timeout, 1000, () -> {
+          final Configuration conf = createResponsiveZkConfig(htu.getConfiguration());
+          LOG.debug("Attempting to establish connection.");
+          final CompletableFuture<AsyncConnection> connFuture =
+            ConnectionFactory.createAsyncConnection(conf);
+          try (final AsyncConnection conn = connFuture.join()) {

Review comment:
       This is already running inside a thread. What purpose does AsyncAdmin serve? Also, you are joining on it right away.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hbase] Apache-HBase commented on pull request #1684: HBASE-24327 : Handle shutdown() if master cannot be contacted

Posted by GitBox <gi...@apache.org>.
Apache-HBase commented on pull request #1684:
URL: https://github.com/apache/hbase/pull/1684#issuecomment-626219964


   :confetti_ball: **+1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime | Comment |
   |:----:|----------:|--------:|:--------|
   | +0 :ok: |  reexec  |   1m  7s |  Docker mode activated.  |
   ||| _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  No case conflicting files found.  |
   | +1 :green_heart: |  hbaseanti  |   0m  0s |  Patch does not have any anti-patterns.  |
   | +1 :green_heart: |  @author  |   0m  0s |  The patch does not contain any @author tags.  |
   ||| _ master Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   4m  5s |  master passed  |
   | +1 :green_heart: |  checkstyle  |   1m 14s |  master passed  |
   | +1 :green_heart: |  spotbugs  |   2m  4s |  master passed  |
   ||| _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   3m 47s |  the patch passed  |
   | +1 :green_heart: |  checkstyle  |   1m 10s |  the patch passed  |
   | +1 :green_heart: |  whitespace  |   0m  0s |  The patch has no whitespace issues.  |
   | +1 :green_heart: |  hadoopcheck  |  12m 23s |  Patch does not cause any errors with Hadoop 3.1.2 3.2.1.  |
   | +1 :green_heart: |  spotbugs  |   2m 12s |  the patch passed  |
   ||| _ Other Tests _ |
   | +1 :green_heart: |  asflicense  |   0m 13s |  The patch does not generate ASF License warnings.  |
   |  |   |  35m 52s |   |
   
   
   | Subsystem | Report/Notes |
   |----------:|:-------------|
   | Docker | Client=19.03.8 Server=19.03.8 base: https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-1684/2/artifact/yetus-general-check/output/Dockerfile |
   | GITHUB PR | https://github.com/apache/hbase/pull/1684 |
   | Optional Tests | dupname asflicense spotbugs hadoopcheck hbaseanti checkstyle |
   | uname | Linux a288fff967f5 4.15.0-74-generic #84-Ubuntu SMP Thu Dec 19 08:06:28 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/hbase-personality.sh |
   | git revision | master / f3c0a0e06d |
   | Max. process+thread count | 84 (vs. ulimit of 12500) |
   | modules | C: hbase-server U: hbase-server |
   | Console output | https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-1684/2/console |
   | versions | git=2.17.1 maven=(cecedd343002696d0abb50b32b541b8a6ba2883f) spotbugs=3.1.12 |
   | Powered by | Apache Yetus 0.11.1 https://yetus.apache.org |
   
   
   This message was automatically generated.
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hbase] Apache-HBase commented on pull request #1684: HBASE-24327 : Handle shutdown() if master cannot be contacted

Posted by GitBox <gi...@apache.org>.
Apache-HBase commented on pull request #1684:
URL: https://github.com/apache/hbase/pull/1684#issuecomment-625924985


   :confetti_ball: **+1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime | Comment |
   |:----:|----------:|--------:|:--------|
   | +0 :ok: |  reexec  |   1m  6s |  Docker mode activated.  |
   | -0 :warning: |  yetus  |   0m  3s |  Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --whitespace-eol-ignore-list --whitespace-tabs-ignore-list --quick-hadoopcheck  |
   ||| _ Prechecks _ |
   ||| _ master Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   4m  6s |  master passed  |
   | +1 :green_heart: |  compile  |   0m 57s |  master passed  |
   | +1 :green_heart: |  shadedjars  |   5m 40s |  branch has no errors when building our shaded downstream artifacts.  |
   | +1 :green_heart: |  javadoc  |   0m 37s |  master passed  |
   ||| _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   3m 44s |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 58s |  the patch passed  |
   | +1 :green_heart: |  javac  |   0m 58s |  the patch passed  |
   | +1 :green_heart: |  shadedjars  |   5m 42s |  patch has no errors when building our shaded downstream artifacts.  |
   | +1 :green_heart: |  javadoc  |   0m 36s |  the patch passed  |
   ||| _ Other Tests _ |
   | +1 :green_heart: |  unit  | 190m 52s |  hbase-server in the patch passed.  |
   |  |   | 216m  6s |   |
   
   
   | Subsystem | Report/Notes |
   |----------:|:-------------|
   | Docker | Client=19.03.8 Server=19.03.8 base: https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-1684/1/artifact/yetus-jdk8-hadoop3-check/output/Dockerfile |
   | GITHUB PR | https://github.com/apache/hbase/pull/1684 |
   | Optional Tests | javac javadoc unit shadedjars compile |
   | uname | Linux 6a419a89fb0b 4.15.0-74-generic #84-Ubuntu SMP Thu Dec 19 08:06:28 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/hbase-personality.sh |
   | git revision | master / 0cd70ed89c |
   | Default Java | 1.8.0_232 |
   |  Test Results | https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-1684/1/testReport/ |
   | Max. process+thread count | 3242 (vs. ulimit of 12500) |
   | modules | C: hbase-server U: hbase-server |
   | Console output | https://builds.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-1684/1/console |
   | versions | git=2.17.1 maven=(cecedd343002696d0abb50b32b541b8a6ba2883f) |
   | Powered by | Apache Yetus 0.11.1 https://yetus.apache.org |
   
   
   This message was automatically generated.
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hbase] virajjasani commented on pull request #1684: HBASE-24327 : Handle shutdown() if master cannot be contacted

Posted by GitBox <gi...@apache.org>.
virajjasani commented on pull request #1684:
URL: https://github.com/apache/hbase/pull/1684#issuecomment-626315661


   Oops, I should have updated the title


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hbase] virajjasani commented on a change in pull request #1684: HBASE-24327 : Handle shutdown() if master cannot be contacted

Posted by GitBox <gi...@apache.org>.
virajjasani commented on a change in pull request #1684:
URL: https://github.com/apache/hbase/pull/1684#discussion_r422339899



##########
File path: hbase-server/src/test/java/org/apache/hadoop/hbase/master/TestMasterShutdown.java
##########
@@ -163,7 +164,16 @@ public void testMasterShutdownBeforeStartingAnyRegionServer() throws Exception {
       assertNotEquals("Timeout waiting for server manager to become available.",
         -1, Waiter.waitFor(htu.getConfiguration(), timeout,
           () -> masterThread.getMaster().getServerManager() != null));
-      htu.getConnection().getAdmin().shutdown();
+      try {
+        htu.getConnection().getAdmin().shutdown();

Review comment:
       Btw the root cause that I have seen for the above Exception:
   
   ```
   2020-05-09 00:58:39,957 ERROR [RpcServer.priority.RWQ.Fifo.read.handler=1,queue=1,port=53033] master.HMaster(2878): ZooKeeper exception trying to set cluster as down in ZK
   org.apache.zookeeper.KeeperException$SystemErrorException: KeeperErrorCode = SystemError
   	at org.apache.hadoop.hbase.zookeeper.ZKWatcher.interruptedException(ZKWatcher.java:626)
   	at org.apache.hadoop.hbase.zookeeper.ZKUtil.deleteNode(ZKUtil.java:1285)
   	at org.apache.hadoop.hbase.zookeeper.ZKUtil.deleteNode(ZKUtil.java:1269)
   	at org.apache.hadoop.hbase.zookeeper.ClusterStatusTracker.setClusterDown(ClusterStatusTracker.java:84)
   	at org.apache.hadoop.hbase.master.HMaster.shutdown(HMaster.java:2876)
   	at org.apache.hadoop.hbase.master.MasterRpcServices.shutdown(MasterRpcServices.java:1630)
   	at org.apache.hadoop.hbase.shaded.protobuf.generated.MasterProtos$MasterService$2.callBlockingMethod(MasterProtos.java)
   	at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:395)
   	at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:133)
   	at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:338)
   	at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:318)
   Caused by: java.lang.InterruptedException
   	at java.lang.Object.wait(Native Method)
   	at java.lang.Object.wait(Object.java:502)
   	at org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1529)
   	at org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1512)
   	at org.apache.zookeeper.ZooKeeper.delete(ZooKeeper.java:1791)
   	at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.delete(RecoverableZooKeeper.java:171)
   	at org.apache.hadoop.hbase.zookeeper.ZKUtil.deleteNode(ZKUtil.java:1280)
   	... 9 more
   2020-05-09 00:58:39,957 DEBUG [Time-limited test-EventThread] zookeeper.ZKWatcher(490): master:53033-0x1000d0785dd0000, quorum=127.0.0.1:60460, baseZNode=/hbase Received ZooKeeper Event, type=NodeDeleted, state=SyncConnected, path=/hbase/master
   2020-05-09 00:58:39,957 INFO  [M:0;172.20.10.2:53033] regionserver.HRegionServer(1119): stopping server 172.20.10.2,53033,1588966118448; all regions closed.
   2020-05-09 00:58:39,958 INFO  [M:0;172.20.10.2:53033] hbase.ChoreService(329): Chore service for: master/172.20.10.2:0 had [] on shutdown
   2020-05-09 00:58:39,958 DEBUG [M:0;172.20.10.2:53033] master.HMaster(1516): Stopping service threads
   ```
   
   Does not look like issue with moving code to registry.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hbase] virajjasani commented on pull request #1684: HBASE-24327 : Handle shutdown() if master cannot be contacted

Posted by GitBox <gi...@apache.org>.
virajjasani commented on pull request #1684:
URL: https://github.com/apache/hbase/pull/1684#issuecomment-626314537


   > A couple of changes, otherwise patch looks fine to me.
   
   Thanks for the review. I addressed your concerns and ran test 115 + 125 times in a loop with the latest patch. Let me merge it and we can observe flaky reports.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org