You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Michael Stack (Jira)" <ji...@apache.org> on 2020/06/29 18:10:00 UTC

[jira] [Commented] (HBASE-24656) [Flakey Tests] branch-2 TestMasterNoCluster.testStopDuringStart

    [ https://issues.apache.org/jira/browse/HBASE-24656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17148043#comment-17148043 ] 

Michael Stack commented on HBASE-24656:
---------------------------------------

Here is how the shutdown looks when all goes well:
{code}
 2020-06-29 11:04:42,194 DEBUG [zk-event-processor-pool2-t1] zookeeper.ZKWatcher(580): @Before-0x100797d56510001 connected
 2020-06-29 11:04:42,196 DEBUG [Time-limited test] zookeeper.ZKUtil(358): @Before-0x100797d56510001, quorum=127.0.0.1:63507, baseZNode=/hbase Set watcher on existing znode=/hbase/rs
 2020-06-29 11:04:42,197 DEBUG [Time-limited test] zookeeper.ZKUtil(358): @Before-0x100797d56510001, quorum=127.0.0.1:63507, baseZNode=/hbase Set watcher on existing znode=/hbase/splitWAL
 2020-06-29 11:04:42,198 DEBUG [Time-limited test] zookeeper.ZKUtil(358): @Before-0x100797d56510001, quorum=127.0.0.1:63507, baseZNode=/hbase Set watcher on existing znode=/hbase/backup-masters
 2020-06-29 11:04:42,198 DEBUG [Time-limited test] zookeeper.ZKUtil(358): @Before-0x100797d56510001, quorum=127.0.0.1:63507, baseZNode=/hbase Set watcher on existing znode=/hbase/table
 2020-06-29 11:04:42,199 DEBUG [Time-limited test] zookeeper.ZKUtil(358): @Before-0x100797d56510001, quorum=127.0.0.1:63507, baseZNode=/hbase Set watcher on existing znode=/hbase/draining
 2020-06-29 11:04:42,200 DEBUG [Time-limited test] zookeeper.ZKUtil(358): @Before-0x100797d56510001, quorum=127.0.0.1:63507, baseZNode=/hbase Set watcher on existing znode=/hbase/master-maintenance
 2020-06-29 11:04:42,213 DEBUG [Time-limited test-EventThread] zookeeper.ZKWatcher(555): @Before-0x100797d56510001, quorum=127.0.0.1:63507, baseZNode=/hbase Received ZooKeeper Event, type=NodeDeleted, state=SyncConnected, path=/hbase/master-maintenance
 2020-06-29 11:04:42,213 DEBUG [Time-limited test-EventThread] zookeeper.ZKWatcher(555): master:54310-0x100797d56510000, quorum=127.0.0.1:63507, baseZNode=/hbase Received ZooKeeper Event, type=NodeChildrenChanged, state=SyncConnected, path=/hbase
 2020-06-29 11:04:42,213 DEBUG [Time-limited test-EventThread] zookeeper.ZKWatcher(555): @Before-0x100797d56510001, quorum=127.0.0.1:63507, baseZNode=/hbase Received ZooKeeper Event, type=NodeChildrenChanged, state=SyncConnected, path=/hbase
 2020-06-29 11:04:42,214 DEBUG [Time-limited test-EventThread] zookeeper.ZKWatcher(555): @Before-0x100797d56510001, quorum=127.0.0.1:63507, baseZNode=/hbase Received ZooKeeper Event, type=NodeDeleted, state=SyncConnected, path=/hbase/draining
 2020-06-29 11:04:42,214 DEBUG [zk-event-processor-pool1-t1] zookeeper.ZKUtil(448): master:54310-0x100797d56510000, quorum=127.0.0.1:63507, baseZNode=/hbase Unable to list children of znode /hbase because node does not exist (not an error)
{code}

Here is sequence when test fails:
{code}
2020-06-29 15:21:07,638 DEBUG [zk-event-processor-pool2-t1] zookeeper.ZKWatcher(580): @Before-0x100c741374b0001 connected
2020-06-29 15:21:07,642 DEBUG [Time-limited test] zookeeper.ZKUtil(358): @Before-0x100c741374b0001, quorum=127.0.0.1:62960, baseZNode=/hbase Set watcher on existing znode=/hbase/rs
2020-06-29 15:21:07,643 DEBUG [Time-limited test] zookeeper.ZKUtil(358): @Before-0x100c741374b0001, quorum=127.0.0.1:62960, baseZNode=/hbase Set watcher on existing znode=/hbase/splitWAL
2020-06-29 15:21:07,645 DEBUG [Time-limited test] zookeeper.ZKUtil(358): @Before-0x100c741374b0001, quorum=127.0.0.1:62960, baseZNode=/hbase Set watcher on existing znode=/hbase/backup-masters
2020-06-29 15:21:07,646 DEBUG [Time-limited test] zookeeper.ZKUtil(358): @Before-0x100c741374b0001, quorum=127.0.0.1:62960, baseZNode=/hbase Set watcher on existing znode=/hbase/table
2020-06-29 15:21:07,647 DEBUG [Time-limited test] zookeeper.ZKUtil(358): @Before-0x100c741374b0001, quorum=127.0.0.1:62960, baseZNode=/hbase Set watcher on existing znode=/hbase/draining
2020-06-29 15:21:07,649 DEBUG [Time-limited test] zookeeper.ZKUtil(358): @Before-0x100c741374b0001, quorum=127.0.0.1:62960, baseZNode=/hbase Set watcher on existing znode=/hbase/master-maintenance
2020-06-29 15:21:07,666 DEBUG [Time-limited test-EventThread] zookeeper.ZKWatcher(555): @Before-0x100c741374b0001, quorum=127.0.0.1:62960, baseZNode=/hbase Received ZooKeeper Event, type=NodeChildrenChanged, state=SyncConnected, path=/hbase/backup-masters
2020-06-29 15:21:07,667 DEBUG [master/asf905:0:becomeActiveMaster] zookeeper.ZKUtil(358): master:33965-0x100c741374b0000, quorum=127.0.0.1:62960, baseZNode=/hbase Set watcher on existing znode=/hbase/backup-masters/asf905.gq1.ygridcore.net,33965,1593444064742
2020-06-29 15:21:07,701 INFO  [Time-limited test] zookeeper.ZKUtil(1809): multi exception: org.apache.zookeeper.KeeperException$NotEmptyException: KeeperErrorCode = Directory not empty; running operations sequentially 
{code}

The backup master arrives after the delete started... The retry should help here. Let me push.

> [Flakey Tests] branch-2 TestMasterNoCluster.testStopDuringStart
> ---------------------------------------------------------------
>
>                 Key: HBASE-24656
>                 URL: https://issues.apache.org/jira/browse/HBASE-24656
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Michael Stack
>            Priority: Major
>
> org.apache.hadoop.hbase.master.TestMasterNoCluster.testStopDuringStart is (only) flakey on branch-2 currently. Fails here:
> Error Message
> KeeperErrorCode = Directory not empty for /hbase/backup-masters
> Stacktrace
> org.apache.zookeeper.KeeperException$NotEmptyException: KeeperErrorCode = Directory not empty for /hbase/backup-masters
> 	at org.apache.hadoop.hbase.master.TestMasterNoCluster.tearDown(TestMasterNoCluster.java:121)
> I can see the zk events in teardown as we purge children as part of cleanup. Can also see that the backup master registers later. Other than that, log is opaque on why the teardown is failing. This is just clean up so adding in retry to see if that helps.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)