You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Duo Zhang (JIRA)" <ji...@apache.org> on 2018/04/08 07:19:00 UTC
[jira] [Commented] (HBASE-20362) TestMasterShutdown.testMasterShutdownBeforeStartingAnyRegionServer is flaky

    [ https://issues.apache.org/jira/browse/HBASE-20362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16429653#comment-16429653 ] 

Duo Zhang commented on HBASE-20362:
-----------------------------------

OK, we have enough errors in the logs. I think the problem is that, we do hot have any region servers in this test, so we will call master.stop immediately in serverManager.shutdownCluster, and then HRegionServer.run will start closing the related resources, such as zookeeper connection, the rpc server, and so on. If it runs quick enough, which stops the rpc server before we send the request back, then the admin.shutdown call will fail, and we will not call cluster.waitOnMaster and cause the test to fail.

I do not think this is a big deal, so I prefer to define this as a testcase problem. In fact, a shutdown call which ends with connection refused is expected since the server shuts itself down...

See the shutdown command for redis

https://redis.io/commands/shutdown

{noformat}
Return value
Simple string reply on error. On success nothing is returned since the server quits and the connection is closed.
{noformat}

So I think here we should move the cluster.waitOnMaster(MASTER_INDEX); out of the try block. And also, add comments to Admin.shutdown to indicate that you may not get a response since the server has already shut itself down, this is expected.

> TestMasterShutdown.testMasterShutdownBeforeStartingAnyRegionServer is flaky
> ---------------------------------------------------------------------------
>
>                 Key: HBASE-20362
>                 URL: https://issues.apache.org/jira/browse/HBASE-20362
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Duo Zhang
>            Priority: Major
>
> {code}
>     Thread shutdownThread = new Thread("Shutdown-Thread") {
>       @Override
>       public void run() {
>         LOG.info("Before call to shutdown master");
>         try {
>           try (Connection connection =
>               ConnectionFactory.createConnection(util.getConfiguration())) {
>             try (Admin admin = connection.getAdmin()) {
>               admin.shutdown();
>             }
>           }
>           LOG.info("After call to shutdown master");
>           cluster.waitOnMaster(MASTER_INDEX);
>         } catch (Exception e) {
>         }
>       }
>     };
> {code}
> https://builds.apache.org/job/HBASE-Flaky-Tests/28970/artifact/hbase-server/target/surefire-reports/org.apache.hadoop.hbase.master.TestMasterShutdown-output.txt
> In the output for a failed running, we only have 'Before call to shutdown master' but no 'After call to shutdown master', so I think there must be something wrong when calling admin.shutdown, but in the catch block below we just ignore the exception.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)