You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Yu Li (JIRA)" <ji...@apache.org> on 2016/07/08 07:27:11 UTC
[jira] [Updated] (HBASE-16201) NPE in RpcServer causing
intermittent UT failure of TestMasterReplication#testHFileCyclicReplication
[ https://issues.apache.org/jira/browse/HBASE-16201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Yu Li updated HBASE-16201:
--------------------------
Attachment: HBASE-16201.patch
A straight forward patch, with it we could see below debug message in UT log and {{TestMasterReplication#testHFileCyclicReplication}} won't fail.
{noformat}
2016-07-08 15:12:17,818 DEBUG [RpcServer.FifoWFPBQ.replication.handler=2,queue=0,port=57350] ipc.RpcServer(2251): Caught a ServiceException with null cause
com.google.protobuf.ServiceException: Replication services are not initialized yet
at org.apache.hadoop.hbase.regionserver.RSRpcServices.replicateWALEntry(RSRpcServices.java:1929)
at org.apache.hadoop.hbase.protobuf.generated.AdminProtos$AdminService$2.callBlockingMethod(AdminProtos.java:22751)
at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2212)
at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:118)
at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:189)
at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:169)
{noformat}
> NPE in RpcServer causing intermittent UT failure of TestMasterReplication#testHFileCyclicReplication
> ----------------------------------------------------------------------------------------------------
>
> Key: HBASE-16201
> URL: https://issues.apache.org/jira/browse/HBASE-16201
> Project: HBase
> Issue Type: Bug
> Reporter: Yu Li
> Assignee: Yu Li
> Attachments: HBASE-16201.patch
>
>
> Every several rounds of {{TestMasterReplication#testHFileCyclicReplication}}, we could observe below NPE in UT log:
> {noformat}
> java.lang.NullPointerException
> at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2257)
> at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:118)
> at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:189)
> at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:169)
> {noformat}
> And related codes at RpcServer line 2257 are:
> {code}
> if (e instanceof ServiceException) {
> e = e.getCause();
> }
> // increment the number of requests that were exceptions.
> metrics.exception(e);
> if (e instanceof LinkageError) throw new DoNotRetryIOException(e);
> if (e instanceof IOException) throw (IOException)e;
> {code}
> And after some debugging, we could find several places that constructing ServiceException with no cause, such as in {{RsRpcServices#replicateWALEntry}}:
> {code}
> if (regionServer.replicationSinkHandler != null) {
> ...
> } else {
> throw new ServiceException("Replication services are not initialized yet");
> }
> {code}
> So we should firstly check and only reset {{e=e.getCause()}} when the cause is not null
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)