You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hbase.apache.org by "Samir Ahmic (JIRA)" <ji...@apache.org> on 2015/05/28 14:27:29 UTC
[jira] [Created] (HBASE-13793) Regionserver unable to report to
master when master is restarted
Samir Ahmic created HBASE-13793:
-----------------------------------
Summary: Regionserver unable to report to master when master is restarted
Key: HBASE-13793
URL: https://issues.apache.org/jira/browse/HBASE-13793
Project: HBase
Issue Type: Bug
Components: IPC/RPC
Affects Versions: 2.0.0
Environment: x86_64 GNU/Linux
Reporter: Samir Ahmic
Priority: Critical
Fix For: 2.0.0
I was testing master branch on distributed cluster and i notice that when master is restarted on running cluster regionservers are unable report back when master is up again.
Things back to normal after i restarted regionservers. Logs showing that regionservers are correctly detecting master znode.
After some digging i notice that we have changed client implementation in RpcClientFactory to AsyncRpcClient so i have tried running cluster with previous RpcClientImpl and issue was gone.
So issue is probably caused by AsyncRpcClient which is unable reconnect to master once original connection is gone.
I was able to fix issue by creating new rpcClient object inside HRegionServer#createRegionServerStatusStub() and using it for channel creation here is diff:
{code}
diff --git a/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java b/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
index fa56966..27e658c 100644
--- a/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
+++ b/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
@@ -2219,8 +2219,11 @@ public class HRegionServer extends HasThread implements
break;
}
try {
+ LOG.info("***Creating new client connection");
+ rpcClient = RpcClientFactory.createClient(conf, clusterId, new InetSocketAddress(
+ rpcServices.isa.getAddress(), 0));
BlockingRpcChannel channel =
- this.rpcClient.createBlockingRpcChannel(sn, userProvider.getCurrent(),
+ rpcClient.createBlockingRpcChannel(sn, userProvider.getCurrent(),
shortOperationTimeout);
intf = RegionServerStatusService.newBlockingStub(channel);
break;
{code}
If this is acceptable way for fixing this issue i will create and attach patch?
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)