You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hbase.apache.org by "Duo Zhang (Jira)" <ji...@apache.org> on 2022/07/08 06:51:00 UTC

[jira] [Resolved] (HBASE-27169) TestSeparateClientZKCluster is flaky

     [ https://issues.apache.org/jira/browse/HBASE-27169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Duo Zhang resolved HBASE-27169.
-------------------------------
    Hadoop Flags: Reviewed
      Resolution: Fixed

It is much more stable than before but still have some failure runs.

https://ci-hbase.apache.org/job/HBase-Flaky-Tests/job/master/3818/testReport/junit/org.apache.hadoop.hbase.client/TestSeparateClientZKCluster/testMetaMoveDuringClientZkClusterRestart/

Will open other issues for fixing.

> TestSeparateClientZKCluster is flaky
> ------------------------------------
>
>                 Key: HBASE-27169
>                 URL: https://issues.apache.org/jira/browse/HBASE-27169
>             Project: HBase
>          Issue Type: Bug
>          Components: test
>            Reporter: Duo Zhang
>            Assignee: Duo Zhang
>            Priority: Major
>             Fix For: 2.5.0, 3.0.0-alpha-4, 2.4.14
>
>
> https://nightlies.apache.org/hbase/HBase-Flaky-Tests/master/3773/hbase-server/target/surefire-reports/org.apache.hadoop.hbase.client.TestSeparateClientZKCluster-output.txt
> {noformat}
> org.apache.hadoop.hbase.exceptions.MasterStoppedException: null
> 	at org.apache.hadoop.hbase.master.HMaster.checkInitialized(HMaster.java:3177) ~[classes/:?]
> 	at org.apache.hadoop.hbase.master.HMaster.balance(HMaster.java:1954) ~[classes/:?]
> 	at org.apache.hadoop.hbase.master.MasterRpcServices.balance(MasterRpcServices.java:743) ~[classes/:?]
> 	at org.apache.hadoop.hbase.shaded.protobuf.generated.MasterProtos$MasterService$2.callBlockingMethod(MasterProtos.java) ~[hbase-protocol-shaded-3.0.0-alpha-4-SNAPSHOT.jar:3.0.0-alpha-4-SNAPSHOT]
> 	at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:385) ~[classes/:?]
> 	at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:124) ~[classes/:?]
> 	at org.apache.hadoop.hbase.ipc.RpcHandler.run(RpcHandler.java:104) ~[classes/:?]
> 	at org.apache.hadoop.hbase.ipc.RpcHandler.run(RpcHandler.java:84) ~[classes/:?]
> {noformat}
> I think the problem is that, MasterStoppedException is a sub class of DoNotRetryIOException, so when hitting this issue, we will fail immediately.
> And the client zk syncer is asynchoronous, so it is possible that when we call admin.balance, we haven't synced the new location yet, and it will throw the MasterStoppedException out soon and fail the UT.
> Let me see how to fix it. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)