You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Jimmy Xiang (JIRA)" <ji...@apache.org> on 2013/05/13 19:23:16 UTC
[jira] [Created] (HBASE-8537) Dead region server pulled in from ZK
Jimmy Xiang created HBASE-8537:
----------------------------------
Summary: Dead region server pulled in from ZK
Key: HBASE-8537
URL: https://issues.apache.org/jira/browse/HBASE-8537
Project: HBase
Issue Type: Bug
Components: master
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
Priority: Minor
When a cluster restarts quickly after it's crashed, although a new region server is reported in, the master still pulls in the dead region server from the zk.
{noformat}
2013-05-12 18:32:52,996 INFO [IPC Server handler 6 on 36000] org.apache.hadoop.hbase.master.ServerManager: Registering server=a1217.halxg.cloudera.com,36020,1368408767773
....
2013-05-12 18:32:54,653 INFO [master-a1220.halxg.cloudera.com,36000,1368408767520] org.apache.hadoop.hbase.master.HMaster: Registering server found up in zk but who has not yet reported in: a1217.halxg.cloudera.com,36020,1368378273768
2013-05-12 18:32:54,653 INFO [master-a1220.halxg.cloudera.com,36000,1368408767520] org.apache.hadoop.hbase.master.ServerManager: Registering server=a1217.halxg.cloudera.com,36020,1368378273768
{noformat}
We should not pull in the second region server instance from zk. It is actually dead. We can figure this out by the hostname, and the port. We can assume no two region server instances can be alive on the same host, the same port. To be more cautious, we can check the timestamp as well. The live one should be that with the late timestamp, not pulled in from zk.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira