You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "ramkrishna.s.vasudevan (JIRA)" <ji...@apache.org> on 2012/05/18 16:49:09 UTC

[jira] [Commented] (HBASE-6046) Master retry on ZK session expiry causes inconsistent region assignments.

    [ https://issues.apache.org/jira/browse/HBASE-6046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13278839#comment-13278839 ] 

ramkrishna.s.vasudevan commented on HBASE-6046:
-----------------------------------------------

The problem here is when the master retries to come out of zk expiry exception and if he succeeds the entire master is almost recreated in the sense
{code}
try {
          if (!becomeActiveMaster(status)) {
            return Boolean.FALSE;
          }
          initializeZKBasedSystemTrackers();
          // Update in-memory structures to reflect our earlier Root/Meta assignment.
          assignRootAndMeta(status);
          // process RIT if any
          // TODO: Why does this not call AssignmentManager.joinCluster?  Otherwise
          // we are not processing dead servers if any.
          assignmentManager.processDeadServersAndRegionsInTransition();
{code}

Here the initializeZKBasedSystemTrackers() will even create new AssignmentManager.  So what ever he does in processDeadServersAndRegionsInTransition() is like a fresh start.
So in processDeadServersAndRegionsInTransition()
{code}
for (Map.Entry<HRegionInfo, ServerName> e: this.regions.entrySet()) {
      if (!e.getKey().isMetaTable()
          && e.getValue() != null) {
        LOG.debug("Found " + e + " out on cluster");
        this.failover = true;
        break;
      }
{code}

Though all the RS is online we will have the 'this.regions' empty and hence we go with completely new assignment.
                
> Master retry on ZK session expiry causes inconsistent region assignments.
> -------------------------------------------------------------------------
>
>                 Key: HBASE-6046
>                 URL: https://issues.apache.org/jira/browse/HBASE-6046
>             Project: HBase
>          Issue Type: Bug
>          Components: master
>    Affects Versions: 0.92.1, 0.94.0
>            Reporter: Gopinathan A
>            Assignee: ramkrishna.s.vasudevan
>             Fix For: 0.92.2, 0.94.1
>
>
> 1> ZK Session timeout in the hmaster leads to bulk assignment though all the RSs are online.
> 2> While doing bulk assignment, if the master again goes down & restart(or backup comes up) all the node created in the ZK will now be tried to reassign to the new RSs. This is leading to double assignment.
> we had 2800 regions, among this 1900 region got double assignment, taking the region count to 4700. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira