You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hbase.apache.org by "Abhishek Singh Chouhan (JIRA)" <ji...@apache.org> on 2015/11/26 08:32:11 UTC

[jira] [Created] (HBASE-14889) Region stuck in transition in OPEN state indefinitely in corner scenario

Abhishek Singh Chouhan created HBASE-14889:
----------------------------------------------

             Summary: Region stuck in transition in OPEN state indefinitely in corner scenario
                 Key: HBASE-14889
                 URL: https://issues.apache.org/jira/browse/HBASE-14889
             Project: HBase
          Issue Type: Bug
    Affects Versions: 0.98.14
            Reporter: Abhishek Singh Chouhan


During a failure scenario when a RS dies and the bulk assigner(BA) is assigning its regions to others RSs, if another RS dies(on which some regions are being moved) on which region is in pending open state, we end up in a situation where two bulk assigners try to assign the same region on the Same RS.

The following happened - 
1. While one BA was opening the region the second one sees it in pending open state, retries and calls unassign(...) thereby sending CLOSE RPC to the RS.
2. The RS meanwhile has already opened the region, hence changing the znode state to RS_ZK_REGION_OPENED which triggers event on master.
3. On master after the unassign is successful we go on to deleting the znode, change region state to Pending open and send open RPC to RS.
4. The earlier triggered event now sees the state as Pending open and happily changes it to OPEN, but is unable to delete the znode which by this time is not in RS_ZK_REGION_OPENED state but is in M_ZK_REGION_OFFLINE state. Hence the region remains in transition in the OPEN state.
5. RS goes on to changing the znode states and successfully opens the region (changes znode state to RS_ZK_REGION_OPENED)
6. This again triggers event on master but this time since the state is OPEN the folloing code path is taken 
{noformat}
case RS_ZK_REGION_OPENED:
          // Should see OPENED after OPENING but possible after PENDING_OPEN.
          if (regionState == null
              || !regionState.isPendingOpenOrOpeningOnServer(sn)) {
            LOG.warn("Received OPENED for " + prettyPrintedRegionName
              + " from " + sn + " but the region isn't PENDING_OPEN/OPENING here: "
              + regionStates.getRegionState(encodedName));

            if (regionState != null) {
              // Close it without updating the internal region states,
              // so as not to create double assignments in unlucky scenarios
              // mentioned in OpenRegionHandler#process
              unassign(regionState.getRegion(), null, -1, null, false, sn);
            }
            return;
          }
{noformat}
We call unassign here with transitionInZK=false and state=null
7. RS closes the region but doesn't update the ZK, also state is not changed in master. Region remains in transition in OPEN state, when its actually closed. We have to restart the RS post which it opens correctly on some other RS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)