You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hbase.apache.org by "Pankaj Kumar (JIRA)" <ji...@apache.org> on 2016/10/11 04:02:20 UTC
[jira] [Created] (HBASE-16807) RegionServer will fail to report new
active Hmaster until HMaster/RegionServer failover
Pankaj Kumar created HBASE-16807:
------------------------------------
Summary: RegionServer will fail to report new active Hmaster until HMaster/RegionServer failover
Key: HBASE-16807
URL: https://issues.apache.org/jira/browse/HBASE-16807
Project: HBase
Issue Type: Bug
Reporter: Pankaj Kumar
It's little weird, but it happened in the product environment that few RegionServer missed master znode create notification on master failover. In that case ZooKeeperNodeTracker will not refresh the cached data and MasterAddressTracker
will always return old active HM detail to Region server on ServiceException.
Though We create region server stub on failure but without refreshing the MasterAddressTracker data.
In HRegionServer.createRegionServerStatusStub()
{code}
boolean refresh = false; // for the first time, use cached data
RegionServerStatusService.BlockingInterface intf = null;
boolean interrupted = false;
try {
while (keepLooping()) {
sn = this.masterAddressTracker.getMasterAddress(refresh);
if (sn == null) {
if (!keepLooping()) {
// give up with no connection.
LOG.debug("No master found and cluster is stopped; bailing out");
return null;
}
if (System.currentTimeMillis() > (previousLogTime + 1000)) {
LOG.debug("No master found; retry");
previousLogTime = System.currentTimeMillis();
}
refresh = true; // let's try pull it from ZK directly
if (sleep(200)) {
interrupted = true;
}
continue;
}
{code}
Here we refresh node only when 'sn' is NULL otherwise it will use same cached data.
So in above case RegionServer will never report active HMaster successfully until HMaster failover or RegionServer restart.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)