You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "ramkrishna.s.vasudevan (JIRA)" <ji...@apache.org> on 2012/05/18 16:49:09 UTC
[jira] [Commented] (HBASE-6046) Master retry on ZK session expiry
causes inconsistent region assignments.
[ https://issues.apache.org/jira/browse/HBASE-6046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13278839#comment-13278839 ]
ramkrishna.s.vasudevan commented on HBASE-6046:
-----------------------------------------------
The problem here is when the master retries to come out of zk expiry exception and if he succeeds the entire master is almost recreated in the sense
{code}
try {
if (!becomeActiveMaster(status)) {
return Boolean.FALSE;
}
initializeZKBasedSystemTrackers();
// Update in-memory structures to reflect our earlier Root/Meta assignment.
assignRootAndMeta(status);
// process RIT if any
// TODO: Why does this not call AssignmentManager.joinCluster? Otherwise
// we are not processing dead servers if any.
assignmentManager.processDeadServersAndRegionsInTransition();
{code}
Here the initializeZKBasedSystemTrackers() will even create new AssignmentManager. So what ever he does in processDeadServersAndRegionsInTransition() is like a fresh start.
So in processDeadServersAndRegionsInTransition()
{code}
for (Map.Entry<HRegionInfo, ServerName> e: this.regions.entrySet()) {
if (!e.getKey().isMetaTable()
&& e.getValue() != null) {
LOG.debug("Found " + e + " out on cluster");
this.failover = true;
break;
}
{code}
Though all the RS is online we will have the 'this.regions' empty and hence we go with completely new assignment.
> Master retry on ZK session expiry causes inconsistent region assignments.
> -------------------------------------------------------------------------
>
> Key: HBASE-6046
> URL: https://issues.apache.org/jira/browse/HBASE-6046
> Project: HBase
> Issue Type: Bug
> Components: master
> Affects Versions: 0.92.1, 0.94.0
> Reporter: Gopinathan A
> Assignee: ramkrishna.s.vasudevan
> Fix For: 0.92.2, 0.94.1
>
>
> 1> ZK Session timeout in the hmaster leads to bulk assignment though all the RSs are online.
> 2> While doing bulk assignment, if the master again goes down & restart(or backup comes up) all the node created in the ZK will now be tried to reassign to the new RSs. This is leading to double assignment.
> we had 2800 regions, among this 1900 region got double assignment, taking the region count to 4700.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira