You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Bharath Vissapragada (Jira)" <ji...@apache.org> on 2020/06/01 01:55:00 UTC

[jira] [Commented] (HBASE-24480) Deflake TestRSGroupsBasics#testClearDeadServers

    [ https://issues.apache.org/jira/browse/HBASE-24480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17120691#comment-17120691 ] 

Bharath Vissapragada commented on HBASE-24480:
----------------------------------------------

I think the root cause is this. The problem happens when an empty list is passed to postClearDeadServers hook...

{noformat}
public void postClearDeadServers(ObserverContext<MasterCoprocessorEnvironment> ctx,
      List<ServerName> servers, List<ServerName> notClearedServers)
      throws IOException {
    Set<Address> clearedServer = Sets.newHashSet();
    for (ServerName server: servers) {
      if (!notClearedServers.contains(server)) {
        clearedServer.add(server.getAddress());
      }
    }
    groupAdminServer.removeServers(clearedServer); <== clearedServer list is empty
  }
{noformat}

The cause of this is in clearDeadServers() RPC..

{noformat}
  if (master.getServerManager().areDeadServersInProgress()) {
        LOG.debug("Some dead server is still under processing, won't clear the dead server list");  <=======
        response.addAllServerName(request.getServerNameList());
      } else {
        for (HBaseProtos.ServerName pbServer : request.getServerNameList()) {
          if (!master.getServerManager().getDeadServers()
                  .removeDeadServer(ProtobufUtil.toServerName(pbServer))) {
            response.addServerName(pbServer);
          }
        }
      }
{noformat}

I could see the LOG.debug() in the logs. Its the same region server that was stopped. Essentially there is a dead server that is being processed and hence the current request was rejected. The fix is essentially the following

- Don't execute the post hook if no server is cleared
- Make the the test more robust to handle this RPC failure..




> Deflake TestRSGroupsBasics#testClearDeadServers
> -----------------------------------------------
>
>                 Key: HBASE-24480
>                 URL: https://issues.apache.org/jira/browse/HBASE-24480
>             Project: HBase
>          Issue Type: Bug
>          Components: rsgroup
>    Affects Versions: 2.3.0, 1.7.0
>            Reporter: Bharath Vissapragada
>            Assignee: Bharath Vissapragada
>            Priority: Major
>
> Ran into this on our internal forks based on branch-1. It also applies to branch-2 but not master because the code has been re-implemented without co-proc due to HBASE-22514
> Running into this exception in the test run..
> {noformat}
> org.apache.hadoop.hbase.constraint.ConstraintException: The set of servers to remove cannot be null or empty.&#10;	at org.apache.hadoop.hbase.rsgroup.RSGroupAdminServer.removeServers(RSGroupAdminServer.java:391)&#10;	at org.apache.hadoop.hbase.rsgroup.RSGroupAdminEndpoint.postClearDeadServers(RSGroupAdminEndpoint.java:1175)&#10;	at org.apache.hadoop.hbase.master.MasterCoprocessorHost$104.call(MasterCoprocessorHost.java:1251)&#10;	at org.apache.hadoop.hbase.master.MasterCoprocessorHost.execOperation(MasterCoprocessorHost.java:1507)&#10;	at org.apache.hadoop.hbase.master.MasterCoprocessorHost.postClearDeadServers(MasterCoprocessorHost.java:1247)&#10;	at org.apache.hadoop.hbase.master.MasterRpcServices.clearDeadServers(MasterRpcServices.java:1167)&#10;	at org.apache.hadoop.hbase.protobuf.generated.MasterProtos$MasterService$2.callBlockingMethod(MasterProtos.java)&#10;	at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2421)&#10;	at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:124)&#10;	at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:311)&#10;	at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:291)&#10;" type="org.apache.hadoop.hbase.constraint.ConstraintException">org.apache.hadoop.hbase.constraint.ConstraintException: 
> org.apache.hadoop.hbase.constraint.ConstraintException: The set of servers to remove cannot be null or empty.
> 	at org.apache.hadoop.hbase.rsgroup.RSGroupAdminServer.removeServers(RSGroupAdminServer.java:391)
> 	at org.apache.hadoop.hbase.rsgroup.RSGroupAdminEndpoint.postClearDeadServers(RSGroupAdminEndpoint.java:1175)
> 	at org.apache.hadoop.hbase.master.MasterCoprocessorHost$104.call(MasterCoprocessorHost.java:1251)
> 	at org.apache.hadoop.hbase.master.MasterCoprocessorHost.execOperation(MasterCoprocessorHost.java:1507)
> 	at org.apache.hadoop.hbase.master.MasterCoprocessorHost.postClearDeadServers(MasterCoprocessorHost.java:1247)
> 	at org.apache.hadoop.hbase.master.MasterRpcServices.clearDeadServers(MasterRpcServices.java:1167)
> 	at org.apache.hadoop.hbase.protobuf.generated.MasterProtos$MasterService$2.callBlockingMethod(MasterProtos.java)
> 	at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2421)
> 	at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:124)
> 	at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:311)
> 	at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:291)
> 	at org.apache.hadoop.hbase.rsgroup.TestRSGroupsBasics.testClearDeadServers(TestRSGroupsBasics.java:215)
> Caused by: org.apache.hadoop.hbase.ipc.RemoteWithExtrasException: 
> org.apache.hadoop.hbase.constraint.ConstraintException: The set of servers to remove cannot be null or empty.
> 	at org.apache.hadoop.hbase.rsgroup.RSGroupAdminServer.removeServers(RSGroupAdminServer.java:391)
> 	at org.apache.hadoop.hbase.rsgroup.RSGroupAdminEndpoint.postClearDeadServers(RSGroupAdminEndpoint.java:1175)
> 	at org.apache.hadoop.hbase.master.MasterCoprocessorHost$104.call(MasterCoprocessorHost.java:1251)
> 	at org.apache.hadoop.hbase.master.MasterCoprocessorHost.execOperation(MasterCoprocessorHost.java:1507)
> 	at org.apache.hadoop.hbase.master.MasterCoprocessorHost.postClearDeadServers(MasterCoprocessorHost.java:1247)
> 	at org.apache.hadoop.hbase.master.MasterRpcServices.clearDeadServers(MasterRpcServices.java:1167)
> 	at org.apache.hadoop.hbase.protobuf.generated.MasterProtos$MasterService$2.callBlockingMethod(MasterProtos.java)
> 	at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2421)
> 	at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:124)
> 	at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:311)
> 	at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:291)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)