You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Bharath Vissapragada (Jira)" <ji...@apache.org> on 2020/06/01 01:55:00 UTC
[jira] [Commented] (HBASE-24480) Deflake
TestRSGroupsBasics#testClearDeadServers
[ https://issues.apache.org/jira/browse/HBASE-24480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17120691#comment-17120691 ]
Bharath Vissapragada commented on HBASE-24480:
----------------------------------------------
I think the root cause is this. The problem happens when an empty list is passed to postClearDeadServers hook...
{noformat}
public void postClearDeadServers(ObserverContext<MasterCoprocessorEnvironment> ctx,
List<ServerName> servers, List<ServerName> notClearedServers)
throws IOException {
Set<Address> clearedServer = Sets.newHashSet();
for (ServerName server: servers) {
if (!notClearedServers.contains(server)) {
clearedServer.add(server.getAddress());
}
}
groupAdminServer.removeServers(clearedServer); <== clearedServer list is empty
}
{noformat}
The cause of this is in clearDeadServers() RPC..
{noformat}
if (master.getServerManager().areDeadServersInProgress()) {
LOG.debug("Some dead server is still under processing, won't clear the dead server list"); <=======
response.addAllServerName(request.getServerNameList());
} else {
for (HBaseProtos.ServerName pbServer : request.getServerNameList()) {
if (!master.getServerManager().getDeadServers()
.removeDeadServer(ProtobufUtil.toServerName(pbServer))) {
response.addServerName(pbServer);
}
}
}
{noformat}
I could see the LOG.debug() in the logs. Its the same region server that was stopped. Essentially there is a dead server that is being processed and hence the current request was rejected. The fix is essentially the following
- Don't execute the post hook if no server is cleared
- Make the the test more robust to handle this RPC failure..
> Deflake TestRSGroupsBasics#testClearDeadServers
> -----------------------------------------------
>
> Key: HBASE-24480
> URL: https://issues.apache.org/jira/browse/HBASE-24480
> Project: HBase
> Issue Type: Bug
> Components: rsgroup
> Affects Versions: 2.3.0, 1.7.0
> Reporter: Bharath Vissapragada
> Assignee: Bharath Vissapragada
> Priority: Major
>
> Ran into this on our internal forks based on branch-1. It also applies to branch-2 but not master because the code has been re-implemented without co-proc due to HBASE-22514
> Running into this exception in the test run..
> {noformat}
> org.apache.hadoop.hbase.constraint.ConstraintException: The set of servers to remove cannot be null or empty. at org.apache.hadoop.hbase.rsgroup.RSGroupAdminServer.removeServers(RSGroupAdminServer.java:391) at org.apache.hadoop.hbase.rsgroup.RSGroupAdminEndpoint.postClearDeadServers(RSGroupAdminEndpoint.java:1175) at org.apache.hadoop.hbase.master.MasterCoprocessorHost$104.call(MasterCoprocessorHost.java:1251) at org.apache.hadoop.hbase.master.MasterCoprocessorHost.execOperation(MasterCoprocessorHost.java:1507) at org.apache.hadoop.hbase.master.MasterCoprocessorHost.postClearDeadServers(MasterCoprocessorHost.java:1247) at org.apache.hadoop.hbase.master.MasterRpcServices.clearDeadServers(MasterRpcServices.java:1167) at org.apache.hadoop.hbase.protobuf.generated.MasterProtos$MasterService$2.callBlockingMethod(MasterProtos.java) at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2421) at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:124) at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:311) at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:291) " type="org.apache.hadoop.hbase.constraint.ConstraintException">org.apache.hadoop.hbase.constraint.ConstraintException:
> org.apache.hadoop.hbase.constraint.ConstraintException: The set of servers to remove cannot be null or empty.
> at org.apache.hadoop.hbase.rsgroup.RSGroupAdminServer.removeServers(RSGroupAdminServer.java:391)
> at org.apache.hadoop.hbase.rsgroup.RSGroupAdminEndpoint.postClearDeadServers(RSGroupAdminEndpoint.java:1175)
> at org.apache.hadoop.hbase.master.MasterCoprocessorHost$104.call(MasterCoprocessorHost.java:1251)
> at org.apache.hadoop.hbase.master.MasterCoprocessorHost.execOperation(MasterCoprocessorHost.java:1507)
> at org.apache.hadoop.hbase.master.MasterCoprocessorHost.postClearDeadServers(MasterCoprocessorHost.java:1247)
> at org.apache.hadoop.hbase.master.MasterRpcServices.clearDeadServers(MasterRpcServices.java:1167)
> at org.apache.hadoop.hbase.protobuf.generated.MasterProtos$MasterService$2.callBlockingMethod(MasterProtos.java)
> at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2421)
> at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:124)
> at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:311)
> at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:291)
> at org.apache.hadoop.hbase.rsgroup.TestRSGroupsBasics.testClearDeadServers(TestRSGroupsBasics.java:215)
> Caused by: org.apache.hadoop.hbase.ipc.RemoteWithExtrasException:
> org.apache.hadoop.hbase.constraint.ConstraintException: The set of servers to remove cannot be null or empty.
> at org.apache.hadoop.hbase.rsgroup.RSGroupAdminServer.removeServers(RSGroupAdminServer.java:391)
> at org.apache.hadoop.hbase.rsgroup.RSGroupAdminEndpoint.postClearDeadServers(RSGroupAdminEndpoint.java:1175)
> at org.apache.hadoop.hbase.master.MasterCoprocessorHost$104.call(MasterCoprocessorHost.java:1251)
> at org.apache.hadoop.hbase.master.MasterCoprocessorHost.execOperation(MasterCoprocessorHost.java:1507)
> at org.apache.hadoop.hbase.master.MasterCoprocessorHost.postClearDeadServers(MasterCoprocessorHost.java:1247)
> at org.apache.hadoop.hbase.master.MasterRpcServices.clearDeadServers(MasterRpcServices.java:1167)
> at org.apache.hadoop.hbase.protobuf.generated.MasterProtos$MasterService$2.callBlockingMethod(MasterProtos.java)
> at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2421)
> at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:124)
> at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:311)
> at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:291)
> {noformat}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)