You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "ramkrishna.s.vasudevan (Created) (JIRA)" <ji...@apache.org> on 2011/12/26 07:44:30 UTC

[jira] [Created] (HBASE-5094) The META can hold an entry for a region with a different server name from the one actually in the AssignmentManager thus making the region unaccessible.

The META can hold an entry for a region with a different server name from the one actually in the AssignmentManager thus making the region unaccessible.
--------------------------------------------------------------------------------------------------------------------------------------------------------

                 Key: HBASE-5094
                 URL: https://issues.apache.org/jira/browse/HBASE-5094
             Project: HBase
          Issue Type: Bug
            Reporter: Ted Yu


R1 is reassigned to RS3 during RS1 shutdown, even though R1 was just assigned to RS2 by load balancer. So .META. table indicated R1 is on RS3. Both RS2 and RS3 think they have R1. Later when RS3 shutdown, R1 is reassigned to RS2. RS2 will indicate ALREADY_OPENED. Thus the region is considered assigned to RS2 even though .META. indicates it is on RS3.

T1: Load balancer tried to move R1 from RS1 to RS2
. 2011-11-21 14:03:20,812 INFO org.apache.hadoop.hbase.master.HMaster: balance hri=tableXY,\xB8Q\xEB\x85\x1E\xB8Q\xDF,1321573099841.ee2e205a60f1bb06cc73bc9df06289df., src=skynet-1,60020,1321912978281, dest=skynet-4,60020,1321912999305

T2: RS1 shutdown. 2011-11-21 14:03:24,759 DEBUG org.apache.hadoop.hbase.master.ServerManager: Added=skynet-1,60020,1321912978281 to dead servers, submitted shutdown handler to be executed, root=false, meta=true

T3:R1 is opened on RS2. 2011-11-21 14:03:26,131 DEBUG org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: The master has opened the region tableXY,\xB8Q\xEB\x85\x1E\xB8Q\xDF,1321573099841.ee2e205a60f1bb06cc73bc9df06289df. that was online on skynet-4,60020,1321912999305

T4: As part of RS1 shutdown handling, region reassignment starts. It uses the region location captured at T2. 2011-11-21 14:03:26,152 INFO org.apache.hadoop.hbase.master.handler.ServerShutdownHandler: Reassigning 32 region(s) that skynet-1,60020,1321912978281 was carrying (skipping 0 regions(s) that are already in transition)

T5: R1 is assigned to RS3. 2011-11-21 14:03:27,404 DEBUG org.apache.hadoop.hbase.zookeeper.ZKUtil: master:60000-0x133b84f9f490000 Retrieved 115 byte(s) of data from znode /hbase/unassigned/ee2e205a60f1bb06cc73bc9df06289df; data=region=tableXY,\xB8Q\xEB\x85\x1E\xB8Q\xDF,1321573099841.ee2e205a60f1bb06cc73bc9df06289df., origin=skynet-3,60020,1321912991430, state=RS_ZK_REGION_OPENED

T6: RS3 shutdown. R1 is reassigned to RS2. 2011-11-21 14:03:37,899 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: ALREADY_OPENED region tableXY,\xB8Q\xEB\x85\x1E\xB8Q\xDF,1321573099841.ee2e205a60f1bb06cc73bc9df06289df. to skynet-4,60020,1321912999305

>From AssignmentManager point of view, the R1 is assigned to RS2. The .META. table indicates the location is RS3.



--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-5094) The META can hold an entry for a region with a different server name from the one actually in the AssignmentManager thus making the region inaccessible.

Posted by "Zhihong Yu (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-5094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Zhihong Yu updated HBASE-5094:
------------------------------

    Reporter: ramkrishna.s.vasudevan  (was: Ted Yu)
    
> The META can hold an entry for a region with a different server name from the one actually in the AssignmentManager thus making the region inaccessible.
> --------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-5094
>                 URL: https://issues.apache.org/jira/browse/HBASE-5094
>             Project: HBase
>          Issue Type: Bug
>            Reporter: ramkrishna.s.vasudevan
>
> R1 is reassigned to RS3 during RS1 shutdown, even though R1 was just assigned to RS2 by load balancer. So .META. table indicated R1 is on RS3. Both RS2 and RS3 think they have R1. Later when RS3 shutdown, R1 is reassigned to RS2. RS2 will indicate ALREADY_OPENED. Thus the region is considered assigned to RS2 even though .META. indicates it is on RS3.
> T1: Load balancer tried to move R1 from RS1 to RS2
> . 2011-11-21 14:03:20,812 INFO org.apache.hadoop.hbase.master.HMaster: balance hri=tableXY,\xB8Q\xEB\x85\x1E\xB8Q\xDF,1321573099841.ee2e205a60f1bb06cc73bc9df06289df., src=skynet-1,60020,1321912978281, dest=skynet-4,60020,1321912999305
> T2: RS1 shutdown. 2011-11-21 14:03:24,759 DEBUG org.apache.hadoop.hbase.master.ServerManager: Added=skynet-1,60020,1321912978281 to dead servers, submitted shutdown handler to be executed, root=false, meta=true
> T3:R1 is opened on RS2. 2011-11-21 14:03:26,131 DEBUG org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: The master has opened the region tableXY,\xB8Q\xEB\x85\x1E\xB8Q\xDF,1321573099841.ee2e205a60f1bb06cc73bc9df06289df. that was online on skynet-4,60020,1321912999305
> T4: As part of RS1 shutdown handling, region reassignment starts. It uses the region location captured at T2. 2011-11-21 14:03:26,152 INFO org.apache.hadoop.hbase.master.handler.ServerShutdownHandler: Reassigning 32 region(s) that skynet-1,60020,1321912978281 was carrying (skipping 0 regions(s) that are already in transition)
> T5: R1 is assigned to RS3. 2011-11-21 14:03:27,404 DEBUG org.apache.hadoop.hbase.zookeeper.ZKUtil: master:60000-0x133b84f9f490000 Retrieved 115 byte(s) of data from znode /hbase/unassigned/ee2e205a60f1bb06cc73bc9df06289df; data=region=tableXY,\xB8Q\xEB\x85\x1E\xB8Q\xDF,1321573099841.ee2e205a60f1bb06cc73bc9df06289df., origin=skynet-3,60020,1321912991430, state=RS_ZK_REGION_OPENED
> T6: RS3 shutdown. R1 is reassigned to RS2. 2011-11-21 14:03:37,899 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: ALREADY_OPENED region tableXY,\xB8Q\xEB\x85\x1E\xB8Q\xDF,1321573099841.ee2e205a60f1bb06cc73bc9df06289df. to skynet-4,60020,1321912999305
> From AssignmentManager point of view, the R1 is assigned to RS2. The .META. table indicates the location is RS3.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-5094) The META can hold an entry for a region with a different server name from the one actually in the AssignmentManager thus making the region inaccessible.

Posted by "ramkrishna.s.vasudevan (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-5094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

ramkrishna.s.vasudevan updated HBASE-5094:
------------------------------------------

    Attachment: HBASE-5094_1.patch
    
> The META can hold an entry for a region with a different server name from the one actually in the AssignmentManager thus making the region inaccessible.
> --------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-5094
>                 URL: https://issues.apache.org/jira/browse/HBASE-5094
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.92.0
>            Reporter: ramkrishna.s.vasudevan
>            Priority: Critical
>         Attachments: HBASE-5094_1.patch
>
>
> {code}
> RegionState rit = this.services.getAssignmentManager().isRegionInTransition(e.getKey());
>             ServerName addressFromAM = this.services.getAssignmentManager()
>                 .getRegionServerOfRegion(e.getKey());
>             if (rit != null && !rit.isClosing() && !rit.isPendingClose()) {
>               // Skip regions that were in transition unless CLOSING or
>               // PENDING_CLOSE
>               LOG.info("Skip assigning region " + rit.toString());
>             } else if (addressFromAM != null
>                 && !addressFromAM.equals(this.serverName)) {
>               LOG.debug("Skip assigning region "
>                     + e.getKey().getRegionNameAsString()
>                     + " because it has been opened in "
>                     + addressFromAM.getServerName());
>               }
> {code}
> In ServerShutDownHandler we try to get the address in the AM.  This address is initially null because it is not yet updated after the region was opened .i.e. the CAll back after node deletion is not yet done in the master side.
> But removal from RIT is completed on the master side.  So this will trigger a new assignment.
> So there is a small window between the online region is actually added in to the online list and the ServerShutdownHandler where we check the existing address in AM.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5094) The META can hold an entry for a region with a different server name from the one actually in the AssignmentManager thus making the region inaccessible.

Posted by "ramkrishna.s.vasudevan (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13176073#comment-13176073 ] 

ramkrishna.s.vasudevan commented on HBASE-5094:
-----------------------------------------------

Can we change the position "removing RIT" and "adding to Region set"?
                
> The META can hold an entry for a region with a different server name from the one actually in the AssignmentManager thus making the region inaccessible.
> --------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-5094
>                 URL: https://issues.apache.org/jira/browse/HBASE-5094
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.92.0
>            Reporter: ramkrishna.s.vasudevan
>
> {code}
> RegionState rit = this.services.getAssignmentManager().isRegionInTransition(e.getKey());
>             ServerName addressFromAM = this.services.getAssignmentManager()
>                 .getRegionServerOfRegion(e.getKey());
>             if (rit != null && !rit.isClosing() && !rit.isPendingClose()) {
>               // Skip regions that were in transition unless CLOSING or
>               // PENDING_CLOSE
>               LOG.info("Skip assigning region " + rit.toString());
>             } else if (addressFromAM != null
>                 && !addressFromAM.equals(this.serverName)) {
>               LOG.debug("Skip assigning region "
>                     + e.getKey().getRegionNameAsString()
>                     + " because it has been opened in "
>                     + addressFromAM.getServerName());
>               }
> {code}
> In ServerShutDownHandler we try to get the address in the AM.  This address is initially null because it is not yet updated after the region was opened .i.e. the CAll back after node deletion is not yet done in the master side.
> But removal from RIT is completed on the master side.  So this will trigger a new assignment.
> So there is a small window between the online region is actually added in to the online list and the ServerShutdownHandler where we check the existing address in AM.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5094) The META can hold an entry for a region with a different server name from the one actually in the AssignmentManager thus making the region inaccessible.

Posted by "Hadoop QA (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13176435#comment-13176435 ] 

Hadoop QA commented on HBASE-5094:
----------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12508715/5094.patch
  against trunk revision .

    +1 @author.  The patch does not contain any @author tags.

    -1 tests included.  The patch doesn't appear to include any new or modified tests.
                        Please justify why no new tests are needed for this patch.
                        Also please list what manual steps were performed to verify this patch.

    -1 javadoc.  The javadoc tool appears to have generated -151 warning messages.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    -1 findbugs.  The patch appears to introduce 77 new Findbugs (version 1.3.9) warnings.

    +1 release audit.  The applied patch does not increase the total number of release audit warnings.

     -1 core tests.  The patch failed these unit tests:
                       org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat
                  org.apache.hadoop.hbase.mapred.TestTableMapReduce

Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/605//testReport/
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/605//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/605//console

This message is automatically generated.
                
> The META can hold an entry for a region with a different server name from the one actually in the AssignmentManager thus making the region inaccessible.
> --------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-5094
>                 URL: https://issues.apache.org/jira/browse/HBASE-5094
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.92.0
>            Reporter: ramkrishna.s.vasudevan
>            Priority: Critical
>         Attachments: 5094.patch
>
>
> {code}
> RegionState rit = this.services.getAssignmentManager().isRegionInTransition(e.getKey());
>             ServerName addressFromAM = this.services.getAssignmentManager()
>                 .getRegionServerOfRegion(e.getKey());
>             if (rit != null && !rit.isClosing() && !rit.isPendingClose()) {
>               // Skip regions that were in transition unless CLOSING or
>               // PENDING_CLOSE
>               LOG.info("Skip assigning region " + rit.toString());
>             } else if (addressFromAM != null
>                 && !addressFromAM.equals(this.serverName)) {
>               LOG.debug("Skip assigning region "
>                     + e.getKey().getRegionNameAsString()
>                     + " because it has been opened in "
>                     + addressFromAM.getServerName());
>               }
> {code}
> In ServerShutDownHandler we try to get the address in the AM.  This address is initially null because it is not yet updated after the region was opened .i.e. the CAll back after node deletion is not yet done in the master side.
> But removal from RIT is completed on the master side.  So this will trigger a new assignment.
> So there is a small window between the online region is actually added in to the online list and the ServerShutdownHandler where we check the existing address in AM.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5094) The META can hold an entry for a region with a different server name from the one actually in the AssignmentManager thus making the region inaccessible.

Posted by "Hadoop QA (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13177535#comment-13177535 ] 

Hadoop QA commented on HBASE-5094:
----------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12508904/HBASE-5094_1.patch
  against trunk revision .

    +1 @author.  The patch does not contain any @author tags.

    -1 tests included.  The patch doesn't appear to include any new or modified tests.
                        Please justify why no new tests are needed for this patch.
                        Also please list what manual steps were performed to verify this patch.

    -1 javadoc.  The javadoc tool appears to have generated -151 warning messages.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    -1 findbugs.  The patch appears to introduce 76 new Findbugs (version 1.3.9) warnings.

    +1 release audit.  The applied patch does not increase the total number of release audit warnings.

     -1 core tests.  The patch failed these unit tests:
                       org.apache.hadoop.hbase.mapred.TestTableMapReduce
                  org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat

Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/638//testReport/
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/638//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/638//console

This message is automatically generated.
                
> The META can hold an entry for a region with a different server name from the one actually in the AssignmentManager thus making the region inaccessible.
> --------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-5094
>                 URL: https://issues.apache.org/jira/browse/HBASE-5094
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.92.0
>            Reporter: ramkrishna.s.vasudevan
>            Priority: Critical
>         Attachments: HBASE-5094_1.patch
>
>
> {code}
> RegionState rit = this.services.getAssignmentManager().isRegionInTransition(e.getKey());
>             ServerName addressFromAM = this.services.getAssignmentManager()
>                 .getRegionServerOfRegion(e.getKey());
>             if (rit != null && !rit.isClosing() && !rit.isPendingClose()) {
>               // Skip regions that were in transition unless CLOSING or
>               // PENDING_CLOSE
>               LOG.info("Skip assigning region " + rit.toString());
>             } else if (addressFromAM != null
>                 && !addressFromAM.equals(this.serverName)) {
>               LOG.debug("Skip assigning region "
>                     + e.getKey().getRegionNameAsString()
>                     + " because it has been opened in "
>                     + addressFromAM.getServerName());
>               }
> {code}
> In ServerShutDownHandler we try to get the address in the AM.  This address is initially null because it is not yet updated after the region was opened .i.e. the CAll back after node deletion is not yet done in the master side.
> But removal from RIT is completed on the master side.  So this will trigger a new assignment.
> So there is a small window between the online region is actually added in to the online list and the ServerShutdownHandler where we check the existing address in AM.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-5094) The META can hold an entry for a region with a different server name from the one actually in the AssignmentManager thus making the region inaccessible.

Posted by "Zhihong Yu (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-5094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Zhihong Yu updated HBASE-5094:
------------------------------

    Comment: was deleted

(was: +1 on patch.)
    
> The META can hold an entry for a region with a different server name from the one actually in the AssignmentManager thus making the region inaccessible.
> --------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-5094
>                 URL: https://issues.apache.org/jira/browse/HBASE-5094
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.92.0
>            Reporter: ramkrishna.s.vasudevan
>            Priority: Critical
>         Attachments: 5094.patch
>
>
> {code}
> RegionState rit = this.services.getAssignmentManager().isRegionInTransition(e.getKey());
>             ServerName addressFromAM = this.services.getAssignmentManager()
>                 .getRegionServerOfRegion(e.getKey());
>             if (rit != null && !rit.isClosing() && !rit.isPendingClose()) {
>               // Skip regions that were in transition unless CLOSING or
>               // PENDING_CLOSE
>               LOG.info("Skip assigning region " + rit.toString());
>             } else if (addressFromAM != null
>                 && !addressFromAM.equals(this.serverName)) {
>               LOG.debug("Skip assigning region "
>                     + e.getKey().getRegionNameAsString()
>                     + " because it has been opened in "
>                     + addressFromAM.getServerName());
>               }
> {code}
> In ServerShutDownHandler we try to get the address in the AM.  This address is initially null because it is not yet updated after the region was opened .i.e. the CAll back after node deletion is not yet done in the master side.
> But removal from RIT is completed on the master side.  So this will trigger a new assignment.
> So there is a small window between the online region is actually added in to the online list and the ServerShutdownHandler where we check the existing address in AM.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-5094) The META can hold an entry for a region with a different server name from the one actually in the AssignmentManager thus making the region inaccessible.

Posted by "Zhihong Yu (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-5094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Zhihong Yu updated HBASE-5094:
------------------------------

    Priority: Critical  (was: Major)
    
> The META can hold an entry for a region with a different server name from the one actually in the AssignmentManager thus making the region inaccessible.
> --------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-5094
>                 URL: https://issues.apache.org/jira/browse/HBASE-5094
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.92.0
>            Reporter: ramkrishna.s.vasudevan
>            Priority: Critical
>
> {code}
> RegionState rit = this.services.getAssignmentManager().isRegionInTransition(e.getKey());
>             ServerName addressFromAM = this.services.getAssignmentManager()
>                 .getRegionServerOfRegion(e.getKey());
>             if (rit != null && !rit.isClosing() && !rit.isPendingClose()) {
>               // Skip regions that were in transition unless CLOSING or
>               // PENDING_CLOSE
>               LOG.info("Skip assigning region " + rit.toString());
>             } else if (addressFromAM != null
>                 && !addressFromAM.equals(this.serverName)) {
>               LOG.debug("Skip assigning region "
>                     + e.getKey().getRegionNameAsString()
>                     + " because it has been opened in "
>                     + addressFromAM.getServerName());
>               }
> {code}
> In ServerShutDownHandler we try to get the address in the AM.  This address is initially null because it is not yet updated after the region was opened .i.e. the CAll back after node deletion is not yet done in the master side.
> But removal from RIT is completed on the master side.  So this will trigger a new assignment.
> So there is a small window between the online region is actually added in to the online list and the ServerShutdownHandler where we check the existing address in AM.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-5094) The META can hold an entry for a region with a different server name from the one actually in the AssignmentManager thus making the region inaccessible.

Posted by "Zhihong Yu (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-5094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Zhihong Yu updated HBASE-5094:
------------------------------

          Description: 
{code}
RegionState rit = this.services.getAssignmentManager().isRegionInTransition(e.getKey());
            ServerName addressFromAM = this.services.getAssignmentManager()
                .getRegionServerOfRegion(e.getKey());
            if (rit != null && !rit.isClosing() && !rit.isPendingClose()) {
              // Skip regions that were in transition unless CLOSING or
              // PENDING_CLOSE
              LOG.info("Skip assigning region " + rit.toString());
            } else if (addressFromAM != null
                && !addressFromAM.equals(this.serverName)) {
              LOG.debug("Skip assigning region "
                    + e.getKey().getRegionNameAsString()
                    + " because it has been opened in "
                    + addressFromAM.getServerName());
              }
{code}
In ServerShutDownHandler we try to get the address in the AM.  This address is initially null because it is not yet updated after the region was opened .i.e. the CAll back after node deletion is not yet done in the master side.
But removal from RIT is completed on the master side.  So this will trigger a new assignment.
So there is a small window between the online region is actually added in to the online list and the ServerShutdownHandler where we check the existing address in AM.

  was:
R1 is reassigned to RS3 during RS1 shutdown, even though R1 was just assigned to RS2 by load balancer. So .META. table indicated R1 is on RS3. Both RS2 and RS3 think they have R1. Later when RS3 shutdown, R1 is reassigned to RS2. RS2 will indicate ALREADY_OPENED. Thus the region is considered assigned to RS2 even though .META. indicates it is on RS3.



1) Region R1 - Assigned from RS1 to RS2.
2) RS1 goes down and ServerShutDownHandler.  ServerShutDwonHandler finds R1 with RS1 from META as still META is not yet updated to RS2.
3) As RS1 goes down R1 is assigned from RS1 to RS3.  
4) RS3 goes down. ServerShutdownHandler processes R1 and tries to assign it to RS2.
5) RS2 says ALREADY_OPENED but META shows RS3.

I was able to reproduce the scenario in 0.92





    Affects Version/s: 0.92.0
    
> The META can hold an entry for a region with a different server name from the one actually in the AssignmentManager thus making the region inaccessible.
> --------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-5094
>                 URL: https://issues.apache.org/jira/browse/HBASE-5094
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.92.0
>            Reporter: ramkrishna.s.vasudevan
>
> {code}
> RegionState rit = this.services.getAssignmentManager().isRegionInTransition(e.getKey());
>             ServerName addressFromAM = this.services.getAssignmentManager()
>                 .getRegionServerOfRegion(e.getKey());
>             if (rit != null && !rit.isClosing() && !rit.isPendingClose()) {
>               // Skip regions that were in transition unless CLOSING or
>               // PENDING_CLOSE
>               LOG.info("Skip assigning region " + rit.toString());
>             } else if (addressFromAM != null
>                 && !addressFromAM.equals(this.serverName)) {
>               LOG.debug("Skip assigning region "
>                     + e.getKey().getRegionNameAsString()
>                     + " because it has been opened in "
>                     + addressFromAM.getServerName());
>               }
> {code}
> In ServerShutDownHandler we try to get the address in the AM.  This address is initially null because it is not yet updated after the region was opened .i.e. the CAll back after node deletion is not yet done in the master side.
> But removal from RIT is completed on the master side.  So this will trigger a new assignment.
> So there is a small window between the online region is actually added in to the online list and the ServerShutdownHandler where we check the existing address in AM.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5094) The META can hold an entry for a region with a different server name from the one actually in the AssignmentManager thus making the region inaccessible.

Posted by "Ming Ma (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13179258#comment-13179258 ] 

Ming Ma commented on HBASE-5094:
--------------------------------

It is a tricky bug. I tend to agree with Stack here. Perhaps we can enforce synchronization for region assignment. Here is some additional background info.

1. How the bug was found. Rolling restart RSs with regular shutdown(not kill -9). After running for couple hours, one user region is missing. I then identified the event sequence based on the logs across machines.

2. I put some quick fix couple weeks to AssignmentManager and ServerShutDownHandler(not submitted to open source). That reduces the chance of such error, but didn't completely address the synchronization issue. 

3. How we can verify if the fix works. Besides code review and unit test, I think it is better to run rolling restart RS script for a long period of time say couple days.
                
> The META can hold an entry for a region with a different server name from the one actually in the AssignmentManager thus making the region inaccessible.
> --------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-5094
>                 URL: https://issues.apache.org/jira/browse/HBASE-5094
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.92.0
>            Reporter: ramkrishna.s.vasudevan
>            Assignee: ramkrishna.s.vasudevan
>            Priority: Critical
>         Attachments: HBASE-5094_1.patch
>
>
> {code}
> RegionState rit = this.services.getAssignmentManager().isRegionInTransition(e.getKey());
>             ServerName addressFromAM = this.services.getAssignmentManager()
>                 .getRegionServerOfRegion(e.getKey());
>             if (rit != null && !rit.isClosing() && !rit.isPendingClose()) {
>               // Skip regions that were in transition unless CLOSING or
>               // PENDING_CLOSE
>               LOG.info("Skip assigning region " + rit.toString());
>             } else if (addressFromAM != null
>                 && !addressFromAM.equals(this.serverName)) {
>               LOG.debug("Skip assigning region "
>                     + e.getKey().getRegionNameAsString()
>                     + " because it has been opened in "
>                     + addressFromAM.getServerName());
>               }
> {code}
> In ServerShutDownHandler we try to get the address in the AM.  This address is initially null because it is not yet updated after the region was opened .i.e. the CAll back after node deletion is not yet done in the master side.
> But removal from RIT is completed on the master side.  So this will trigger a new assignment.
> So there is a small window between the online region is actually added in to the online list and the ServerShutdownHandler where we check the existing address in AM.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-5094) The META can hold an entry for a region with a different server name from the one actually in the AssignmentManager thus making the region inaccessible.

Posted by "ramkrishna.s.vasudevan (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-5094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

ramkrishna.s.vasudevan updated HBASE-5094:
------------------------------------------

    Description: 
R1 is reassigned to RS3 during RS1 shutdown, even though R1 was just assigned to RS2 by load balancer. So .META. table indicated R1 is on RS3. Both RS2 and RS3 think they have R1. Later when RS3 shutdown, R1 is reassigned to RS2. RS2 will indicate ALREADY_OPENED. Thus the region is considered assigned to RS2 even though .META. indicates it is on RS3.



1) Region R1 - Assigned from RS1 to RS2.
2) RS1 goes down and ServerShutDownHandler.  ServerShutDwonHandler finds RS1 with R1 from META as still META is not yet updated to RS2.
3) As RS1 goes down R1 is assigned from RS1 to RS3.  
4) RS3 goes down. ServerShutdownHandler processes R1 and tries to assign it to RS2.
5) RS2 says ALREADY_OPENED but META shows RS3.

I was able to reproduce the scenario.





  was:
R1 is reassigned to RS3 during RS1 shutdown, even though R1 was just assigned to RS2 by load balancer. So .META. table indicated R1 is on RS3. Both RS2 and RS3 think they have R1. Later when RS3 shutdown, R1 is reassigned to RS2. RS2 will indicate ALREADY_OPENED. Thus the region is considered assigned to RS2 even though .META. indicates it is on RS3.

T1: Load balancer tried to move R1 from RS1 to RS2
. 2011-11-21 14:03:20,812 INFO org.apache.hadoop.hbase.master.HMaster: balance hri=tableXY,\xB8Q\xEB\x85\x1E\xB8Q\xDF,1321573099841.ee2e205a60f1bb06cc73bc9df06289df., src=skynet-1,60020,1321912978281, dest=skynet-4,60020,1321912999305

T2: RS1 shutdown. 2011-11-21 14:03:24,759 DEBUG org.apache.hadoop.hbase.master.ServerManager: Added=skynet-1,60020,1321912978281 to dead servers, submitted shutdown handler to be executed, root=false, meta=true

T3:R1 is opened on RS2. 2011-11-21 14:03:26,131 DEBUG org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: The master has opened the region tableXY,\xB8Q\xEB\x85\x1E\xB8Q\xDF,1321573099841.ee2e205a60f1bb06cc73bc9df06289df. that was online on skynet-4,60020,1321912999305

T4: As part of RS1 shutdown handling, region reassignment starts. It uses the region location captured at T2. 2011-11-21 14:03:26,152 INFO org.apache.hadoop.hbase.master.handler.ServerShutdownHandler: Reassigning 32 region(s) that skynet-1,60020,1321912978281 was carrying (skipping 0 regions(s) that are already in transition)

T5: R1 is assigned to RS3. 2011-11-21 14:03:27,404 DEBUG org.apache.hadoop.hbase.zookeeper.ZKUtil: master:60000-0x133b84f9f490000 Retrieved 115 byte(s) of data from znode /hbase/unassigned/ee2e205a60f1bb06cc73bc9df06289df; data=region=tableXY,\xB8Q\xEB\x85\x1E\xB8Q\xDF,1321573099841.ee2e205a60f1bb06cc73bc9df06289df., origin=skynet-3,60020,1321912991430, state=RS_ZK_REGION_OPENED

T6: RS3 shutdown. R1 is reassigned to RS2. 2011-11-21 14:03:37,899 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: ALREADY_OPENED region tableXY,\xB8Q\xEB\x85\x1E\xB8Q\xDF,1321573099841.ee2e205a60f1bb06cc73bc9df06289df. to skynet-4,60020,1321912999305

>From AssignmentManager point of view, the R1 is assigned to RS2. The .META. table indicates the location is RS3.



    
> The META can hold an entry for a region with a different server name from the one actually in the AssignmentManager thus making the region inaccessible.
> --------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-5094
>                 URL: https://issues.apache.org/jira/browse/HBASE-5094
>             Project: HBase
>          Issue Type: Bug
>            Reporter: ramkrishna.s.vasudevan
>
> R1 is reassigned to RS3 during RS1 shutdown, even though R1 was just assigned to RS2 by load balancer. So .META. table indicated R1 is on RS3. Both RS2 and RS3 think they have R1. Later when RS3 shutdown, R1 is reassigned to RS2. RS2 will indicate ALREADY_OPENED. Thus the region is considered assigned to RS2 even though .META. indicates it is on RS3.
> 1) Region R1 - Assigned from RS1 to RS2.
> 2) RS1 goes down and ServerShutDownHandler.  ServerShutDwonHandler finds RS1 with R1 from META as still META is not yet updated to RS2.
> 3) As RS1 goes down R1 is assigned from RS1 to RS3.  
> 4) RS3 goes down. ServerShutdownHandler processes R1 and tries to assign it to RS2.
> 5) RS2 says ALREADY_OPENED but META shows RS3.
> I was able to reproduce the scenario.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5094) The META can hold an entry for a region with a different server name from the one actually in the AssignmentManager thus making the region inaccessible.

Posted by "Hudson (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13178318#comment-13178318 ] 

Hudson commented on HBASE-5094:
-------------------------------

Integrated in HBase-0.92-security #57 (See [https://builds.apache.org/job/HBase-0.92-security/57/])
    HBASE-5094 The META can hold an entry for a region with a different server name from the one actually in the AssignmentManager thus making the region inaccessible. (Ram)

ramkrishna : 
Files : 
* /hbase/branches/0.92/CHANGES.txt
* /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java

                
> The META can hold an entry for a region with a different server name from the one actually in the AssignmentManager thus making the region inaccessible.
> --------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-5094
>                 URL: https://issues.apache.org/jira/browse/HBASE-5094
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.92.0
>            Reporter: ramkrishna.s.vasudevan
>            Priority: Critical
>         Attachments: HBASE-5094_1.patch
>
>
> {code}
> RegionState rit = this.services.getAssignmentManager().isRegionInTransition(e.getKey());
>             ServerName addressFromAM = this.services.getAssignmentManager()
>                 .getRegionServerOfRegion(e.getKey());
>             if (rit != null && !rit.isClosing() && !rit.isPendingClose()) {
>               // Skip regions that were in transition unless CLOSING or
>               // PENDING_CLOSE
>               LOG.info("Skip assigning region " + rit.toString());
>             } else if (addressFromAM != null
>                 && !addressFromAM.equals(this.serverName)) {
>               LOG.debug("Skip assigning region "
>                     + e.getKey().getRegionNameAsString()
>                     + " because it has been opened in "
>                     + addressFromAM.getServerName());
>               }
> {code}
> In ServerShutDownHandler we try to get the address in the AM.  This address is initially null because it is not yet updated after the region was opened .i.e. the CAll back after node deletion is not yet done in the master side.
> But removal from RIT is completed on the master side.  So this will trigger a new assignment.
> So there is a small window between the online region is actually added in to the online list and the ServerShutdownHandler where we check the existing address in AM.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-5094) The META can hold an entry for a region with a different server name from the one actually in the AssignmentManager thus making the region inaccessible.

Posted by "ramkrishna.s.vasudevan (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-5094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

ramkrishna.s.vasudevan updated HBASE-5094:
------------------------------------------

    Attachment: 5094.patch

Patch for trunk
                
> The META can hold an entry for a region with a different server name from the one actually in the AssignmentManager thus making the region inaccessible.
> --------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-5094
>                 URL: https://issues.apache.org/jira/browse/HBASE-5094
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.92.0
>            Reporter: ramkrishna.s.vasudevan
>            Priority: Critical
>         Attachments: 5094.patch
>
>
> {code}
> RegionState rit = this.services.getAssignmentManager().isRegionInTransition(e.getKey());
>             ServerName addressFromAM = this.services.getAssignmentManager()
>                 .getRegionServerOfRegion(e.getKey());
>             if (rit != null && !rit.isClosing() && !rit.isPendingClose()) {
>               // Skip regions that were in transition unless CLOSING or
>               // PENDING_CLOSE
>               LOG.info("Skip assigning region " + rit.toString());
>             } else if (addressFromAM != null
>                 && !addressFromAM.equals(this.serverName)) {
>               LOG.debug("Skip assigning region "
>                     + e.getKey().getRegionNameAsString()
>                     + " because it has been opened in "
>                     + addressFromAM.getServerName());
>               }
> {code}
> In ServerShutDownHandler we try to get the address in the AM.  This address is initially null because it is not yet updated after the region was opened .i.e. the CAll back after node deletion is not yet done in the master side.
> But removal from RIT is completed on the master side.  So this will trigger a new assignment.
> So there is a small window between the online region is actually added in to the online list and the ServerShutdownHandler where we check the existing address in AM.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5094) The META can hold an entry for a region with a different server name from the one actually in the AssignmentManager thus making the region inaccessible.

Posted by "Hudson (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13178128#comment-13178128 ] 

Hudson commented on HBASE-5094:
-------------------------------

Integrated in HBase-0.92 #223 (See [https://builds.apache.org/job/HBase-0.92/223/])
    HBASE-5094 The META can hold an entry for a region with a different server name from the one actually in the AssignmentManager thus making the region inaccessible. (Ram)

ramkrishna : 
Files : 
* /hbase/branches/0.92/CHANGES.txt
* /hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java

                
> The META can hold an entry for a region with a different server name from the one actually in the AssignmentManager thus making the region inaccessible.
> --------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-5094
>                 URL: https://issues.apache.org/jira/browse/HBASE-5094
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.92.0
>            Reporter: ramkrishna.s.vasudevan
>            Priority: Critical
>         Attachments: HBASE-5094_1.patch
>
>
> {code}
> RegionState rit = this.services.getAssignmentManager().isRegionInTransition(e.getKey());
>             ServerName addressFromAM = this.services.getAssignmentManager()
>                 .getRegionServerOfRegion(e.getKey());
>             if (rit != null && !rit.isClosing() && !rit.isPendingClose()) {
>               // Skip regions that were in transition unless CLOSING or
>               // PENDING_CLOSE
>               LOG.info("Skip assigning region " + rit.toString());
>             } else if (addressFromAM != null
>                 && !addressFromAM.equals(this.serverName)) {
>               LOG.debug("Skip assigning region "
>                     + e.getKey().getRegionNameAsString()
>                     + " because it has been opened in "
>                     + addressFromAM.getServerName());
>               }
> {code}
> In ServerShutDownHandler we try to get the address in the AM.  This address is initially null because it is not yet updated after the region was opened .i.e. the CAll back after node deletion is not yet done in the master side.
> But removal from RIT is completed on the master side.  So this will trigger a new assignment.
> So there is a small window between the online region is actually added in to the online list and the ServerShutdownHandler where we check the existing address in AM.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5094) The META can hold an entry for a region with a different server name from the one actually in the AssignmentManager thus making the region inaccessible.

Posted by "stack (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13179164#comment-13179164 ] 

stack commented on HBASE-5094:
------------------------------

Thanks for the fix Ram.  It does feel though like fix should have been done on the master-side rather than out on regionserver.   In master we have some hope of making sense of whats going on out on the cluster; this seems like an issue in master where the shutdown thread and balancer thread are fighting over a particular region' state.  And this 'fix' only addresses case where region reassign arrives at the server that already has it open.

Could we make it such that only one thread can transition a region at a time?

Could shutdown handler not have noticed this state?

{code}
->6)The step 3 continues and he sees addressinAM is null and also RIT is null and so he goes with assignment.
{code}

                
> The META can hold an entry for a region with a different server name from the one actually in the AssignmentManager thus making the region inaccessible.
> --------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-5094
>                 URL: https://issues.apache.org/jira/browse/HBASE-5094
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.92.0
>            Reporter: ramkrishna.s.vasudevan
>            Assignee: ramkrishna.s.vasudevan
>            Priority: Critical
>         Attachments: HBASE-5094_1.patch
>
>
> {code}
> RegionState rit = this.services.getAssignmentManager().isRegionInTransition(e.getKey());
>             ServerName addressFromAM = this.services.getAssignmentManager()
>                 .getRegionServerOfRegion(e.getKey());
>             if (rit != null && !rit.isClosing() && !rit.isPendingClose()) {
>               // Skip regions that were in transition unless CLOSING or
>               // PENDING_CLOSE
>               LOG.info("Skip assigning region " + rit.toString());
>             } else if (addressFromAM != null
>                 && !addressFromAM.equals(this.serverName)) {
>               LOG.debug("Skip assigning region "
>                     + e.getKey().getRegionNameAsString()
>                     + " because it has been opened in "
>                     + addressFromAM.getServerName());
>               }
> {code}
> In ServerShutDownHandler we try to get the address in the AM.  This address is initially null because it is not yet updated after the region was opened .i.e. the CAll back after node deletion is not yet done in the master side.
> But removal from RIT is completed on the master side.  So this will trigger a new assignment.
> So there is a small window between the online region is actually added in to the online list and the ServerShutdownHandler where we check the existing address in AM.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5094) The META can hold an entry for a region with a different server name from the one actually in the AssignmentManager thus making the region inaccessible.

Posted by "ramkrishna.s.vasudevan (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13179738#comment-13179738 ] 

ramkrishna.s.vasudevan commented on HBASE-5094:
-----------------------------------------------

@Stack
I tried the option of applying the fix on the master side.  But i felt like the history of the region was not available.  (May be am wrong.)
Also the ServerShutdownhandler.processShutDown() removes the online server first and also the list of regions.

Then it is after the region opening is done by the balancer flow the new server name is again added back.
{code}
if (isServerOnline(sn)) {
        this.regions.put(regionInfo, sn);
        addToServers(sn, regionInfo);
        this.regions.notifyAll();
{code}
bq.Could we make it such that only one thread can transition a region at a time?
This i did not think much on this.  

@Ming Ma
If it is ok, do you mind sharing the patch that you had prepared ?

                
> The META can hold an entry for a region with a different server name from the one actually in the AssignmentManager thus making the region inaccessible.
> --------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-5094
>                 URL: https://issues.apache.org/jira/browse/HBASE-5094
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.92.0
>            Reporter: ramkrishna.s.vasudevan
>            Assignee: ramkrishna.s.vasudevan
>            Priority: Critical
>         Attachments: HBASE-5094_1.patch
>
>
> {code}
> RegionState rit = this.services.getAssignmentManager().isRegionInTransition(e.getKey());
>             ServerName addressFromAM = this.services.getAssignmentManager()
>                 .getRegionServerOfRegion(e.getKey());
>             if (rit != null && !rit.isClosing() && !rit.isPendingClose()) {
>               // Skip regions that were in transition unless CLOSING or
>               // PENDING_CLOSE
>               LOG.info("Skip assigning region " + rit.toString());
>             } else if (addressFromAM != null
>                 && !addressFromAM.equals(this.serverName)) {
>               LOG.debug("Skip assigning region "
>                     + e.getKey().getRegionNameAsString()
>                     + " because it has been opened in "
>                     + addressFromAM.getServerName());
>               }
> {code}
> In ServerShutDownHandler we try to get the address in the AM.  This address is initially null because it is not yet updated after the region was opened .i.e. the CAll back after node deletion is not yet done in the master side.
> But removal from RIT is completed on the master side.  So this will trigger a new assignment.
> So there is a small window between the online region is actually added in to the online list and the ServerShutdownHandler where we check the existing address in AM.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-5094) The META can hold an entry for a region with a different server name from the one actually in the AssignmentManager thus making the region inaccessible.

Posted by "ramkrishna.s.vasudevan (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-5094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

ramkrishna.s.vasudevan updated HBASE-5094:
------------------------------------------

    Resolution: Fixed
      Assignee: ramkrishna.s.vasudevan
        Status: Resolved  (was: Patch Available)

Committed to 0.92 and trunk.
                
> The META can hold an entry for a region with a different server name from the one actually in the AssignmentManager thus making the region inaccessible.
> --------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-5094
>                 URL: https://issues.apache.org/jira/browse/HBASE-5094
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.92.0
>            Reporter: ramkrishna.s.vasudevan
>            Assignee: ramkrishna.s.vasudevan
>            Priority: Critical
>         Attachments: HBASE-5094_1.patch
>
>
> {code}
> RegionState rit = this.services.getAssignmentManager().isRegionInTransition(e.getKey());
>             ServerName addressFromAM = this.services.getAssignmentManager()
>                 .getRegionServerOfRegion(e.getKey());
>             if (rit != null && !rit.isClosing() && !rit.isPendingClose()) {
>               // Skip regions that were in transition unless CLOSING or
>               // PENDING_CLOSE
>               LOG.info("Skip assigning region " + rit.toString());
>             } else if (addressFromAM != null
>                 && !addressFromAM.equals(this.serverName)) {
>               LOG.debug("Skip assigning region "
>                     + e.getKey().getRegionNameAsString()
>                     + " because it has been opened in "
>                     + addressFromAM.getServerName());
>               }
> {code}
> In ServerShutDownHandler we try to get the address in the AM.  This address is initially null because it is not yet updated after the region was opened .i.e. the CAll back after node deletion is not yet done in the master side.
> But removal from RIT is completed on the master side.  So this will trigger a new assignment.
> So there is a small window between the online region is actually added in to the online list and the ServerShutdownHandler where we check the existing address in AM.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5094) The META can hold an entry for a region with a different server name from the one actually in the AssignmentManager thus making the region inaccessible.

Posted by "Zhihong Yu (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13176712#comment-13176712 ] 

Zhihong Yu commented on HBASE-5094:
-----------------------------------

Ram is coming up with a new patch as the previous one didn't fix the problem.
                
> The META can hold an entry for a region with a different server name from the one actually in the AssignmentManager thus making the region inaccessible.
> --------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-5094
>                 URL: https://issues.apache.org/jira/browse/HBASE-5094
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.92.0
>            Reporter: ramkrishna.s.vasudevan
>            Priority: Critical
>
> {code}
> RegionState rit = this.services.getAssignmentManager().isRegionInTransition(e.getKey());
>             ServerName addressFromAM = this.services.getAssignmentManager()
>                 .getRegionServerOfRegion(e.getKey());
>             if (rit != null && !rit.isClosing() && !rit.isPendingClose()) {
>               // Skip regions that were in transition unless CLOSING or
>               // PENDING_CLOSE
>               LOG.info("Skip assigning region " + rit.toString());
>             } else if (addressFromAM != null
>                 && !addressFromAM.equals(this.serverName)) {
>               LOG.debug("Skip assigning region "
>                     + e.getKey().getRegionNameAsString()
>                     + " because it has been opened in "
>                     + addressFromAM.getServerName());
>               }
> {code}
> In ServerShutDownHandler we try to get the address in the AM.  This address is initially null because it is not yet updated after the region was opened .i.e. the CAll back after node deletion is not yet done in the master side.
> But removal from RIT is completed on the master side.  So this will trigger a new assignment.
> So there is a small window between the online region is actually added in to the online list and the ServerShutdownHandler where we check the existing address in AM.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-5094) The META can hold an entry for a region with a different server name from the one actually in the AssignmentManager thus making the region inaccessible.

Posted by "ramkrishna.s.vasudevan (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-5094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

ramkrishna.s.vasudevan updated HBASE-5094:
------------------------------------------

    Fix Version/s: 0.92.0

Updated the fix versions.


                
> The META can hold an entry for a region with a different server name from the one actually in the AssignmentManager thus making the region inaccessible.
> --------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-5094
>                 URL: https://issues.apache.org/jira/browse/HBASE-5094
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.92.0
>            Reporter: ramkrishna.s.vasudevan
>            Assignee: ramkrishna.s.vasudevan
>            Priority: Critical
>             Fix For: 0.92.0
>
>         Attachments: HBASE-5094_1.patch
>
>
> {code}
> RegionState rit = this.services.getAssignmentManager().isRegionInTransition(e.getKey());
>             ServerName addressFromAM = this.services.getAssignmentManager()
>                 .getRegionServerOfRegion(e.getKey());
>             if (rit != null && !rit.isClosing() && !rit.isPendingClose()) {
>               // Skip regions that were in transition unless CLOSING or
>               // PENDING_CLOSE
>               LOG.info("Skip assigning region " + rit.toString());
>             } else if (addressFromAM != null
>                 && !addressFromAM.equals(this.serverName)) {
>               LOG.debug("Skip assigning region "
>                     + e.getKey().getRegionNameAsString()
>                     + " because it has been opened in "
>                     + addressFromAM.getServerName());
>               }
> {code}
> In ServerShutDownHandler we try to get the address in the AM.  This address is initially null because it is not yet updated after the region was opened .i.e. the CAll back after node deletion is not yet done in the master side.
> But removal from RIT is completed on the master side.  So this will trigger a new assignment.
> So there is a small window between the online region is actually added in to the online list and the ServerShutdownHandler where we check the existing address in AM.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-5094) The META can hold an entry for a region with a different server name from the one actually in the AssignmentManager thus making the region inaccessible.

Posted by "ramkrishna.s.vasudevan (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-5094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

ramkrishna.s.vasudevan updated HBASE-5094:
------------------------------------------

    Comment: was deleted

(was: Can we change the position "removing RIT" and "adding to Region set"?)
    
> The META can hold an entry for a region with a different server name from the one actually in the AssignmentManager thus making the region inaccessible.
> --------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-5094
>                 URL: https://issues.apache.org/jira/browse/HBASE-5094
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.92.0
>            Reporter: ramkrishna.s.vasudevan
>
> {code}
> RegionState rit = this.services.getAssignmentManager().isRegionInTransition(e.getKey());
>             ServerName addressFromAM = this.services.getAssignmentManager()
>                 .getRegionServerOfRegion(e.getKey());
>             if (rit != null && !rit.isClosing() && !rit.isPendingClose()) {
>               // Skip regions that were in transition unless CLOSING or
>               // PENDING_CLOSE
>               LOG.info("Skip assigning region " + rit.toString());
>             } else if (addressFromAM != null
>                 && !addressFromAM.equals(this.serverName)) {
>               LOG.debug("Skip assigning region "
>                     + e.getKey().getRegionNameAsString()
>                     + " because it has been opened in "
>                     + addressFromAM.getServerName());
>               }
> {code}
> In ServerShutDownHandler we try to get the address in the AM.  This address is initially null because it is not yet updated after the region was opened .i.e. the CAll back after node deletion is not yet done in the master side.
> But removal from RIT is completed on the master side.  So this will trigger a new assignment.
> So there is a small window between the online region is actually added in to the online list and the ServerShutdownHandler where we check the existing address in AM.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-5094) The META can hold an entry for a region with a different server name from the one actually in the AssignmentManager thus making the region inaccessible.

Posted by "Zhihong Yu (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-5094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Zhihong Yu updated HBASE-5094:
------------------------------

    Comment: was deleted

(was: {code}
RegionState rit = this.services.getAssignmentManager().isRegionInTransition(e.getKey());
            ServerName addressFromAM = this.services.getAssignmentManager()
                .getRegionServerOfRegion(e.getKey());
            if (rit != null && !rit.isClosing() && !rit.isPendingClose()) {
              // Skip regions that were in transition unless CLOSING or
              // PENDING_CLOSE
              LOG.info("Skip assigning region " + rit.toString());
            } else if (addressFromAM != null
                && !addressFromAM.equals(this.serverName)) {
              LOG.debug("Skip assigning region "
                    + e.getKey().getRegionNameAsString()
                    + " because it has been opened in "
                    + addressFromAM.getServerName());
              }
{code}
In ServerShutDownHandler we try to get the address in the AM.  This address is initially null because it is not yet updated after the region was opened .i.e. the CAll back after node deletion is not yet done in the master side.
But removal from RIT is completed on the master side.  So this will trigger a new assignment.
So there is a small window between the online region is actually added in to the online list and the ServerShutdownHandler where we check the existing address in AM.)
    
> The META can hold an entry for a region with a different server name from the one actually in the AssignmentManager thus making the region inaccessible.
> --------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-5094
>                 URL: https://issues.apache.org/jira/browse/HBASE-5094
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.92.0
>            Reporter: ramkrishna.s.vasudevan
>
> {code}
> RegionState rit = this.services.getAssignmentManager().isRegionInTransition(e.getKey());
>             ServerName addressFromAM = this.services.getAssignmentManager()
>                 .getRegionServerOfRegion(e.getKey());
>             if (rit != null && !rit.isClosing() && !rit.isPendingClose()) {
>               // Skip regions that were in transition unless CLOSING or
>               // PENDING_CLOSE
>               LOG.info("Skip assigning region " + rit.toString());
>             } else if (addressFromAM != null
>                 && !addressFromAM.equals(this.serverName)) {
>               LOG.debug("Skip assigning region "
>                     + e.getKey().getRegionNameAsString()
>                     + " because it has been opened in "
>                     + addressFromAM.getServerName());
>               }
> {code}
> In ServerShutDownHandler we try to get the address in the AM.  This address is initially null because it is not yet updated after the region was opened .i.e. the CAll back after node deletion is not yet done in the master side.
> But removal from RIT is completed on the master side.  So this will trigger a new assignment.
> So there is a small window between the online region is actually added in to the online list and the ServerShutdownHandler where we check the existing address in AM.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5094) The META can hold an entry for a region with a different server name from the one actually in the AssignmentManager thus making the region inaccessible.

Posted by "ramkrishna.s.vasudevan (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13175899#comment-13175899 ] 

ramkrishna.s.vasudevan commented on HBASE-5094:
-----------------------------------------------

{code}
RegionState rit = this.services.getAssignmentManager().isRegionInTransition(e.getKey());
            ServerName addressFromAM = this.services.getAssignmentManager()
                .getRegionServerOfRegion(e.getKey());
            if (rit != null && !rit.isClosing() && !rit.isPendingClose()) {
              // Skip regions that were in transition unless CLOSING or
              // PENDING_CLOSE
              LOG.info("Skip assigning region " + rit.toString());
            } else if (addressFromAM != null
                && !addressFromAM.equals(this.serverName)) {
              LOG.debug("Skip assigning region "
                    + e.getKey().getRegionNameAsString()
                    + " because it has been opened in "
                    + addressFromAM.getServerName());
              }
{code}
In ServerShutDownHandler we try to get the address in the AM.  This address is initially null because it is not yet updated after the region was opened .i.e. the CAll back after node deletion is not yet done in the master side.
But removal from RIT is completed on the master side.  So this will trigger a new assignment.
So there is a small window between the online region is actually added in to the online list and the ServerShutdownHandler where we check the existing address in AM.
                
> The META can hold an entry for a region with a different server name from the one actually in the AssignmentManager thus making the region inaccessible.
> --------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-5094
>                 URL: https://issues.apache.org/jira/browse/HBASE-5094
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Ted Yu
>
> R1 is reassigned to RS3 during RS1 shutdown, even though R1 was just assigned to RS2 by load balancer. So .META. table indicated R1 is on RS3. Both RS2 and RS3 think they have R1. Later when RS3 shutdown, R1 is reassigned to RS2. RS2 will indicate ALREADY_OPENED. Thus the region is considered assigned to RS2 even though .META. indicates it is on RS3.
> T1: Load balancer tried to move R1 from RS1 to RS2
> . 2011-11-21 14:03:20,812 INFO org.apache.hadoop.hbase.master.HMaster: balance hri=tableXY,\xB8Q\xEB\x85\x1E\xB8Q\xDF,1321573099841.ee2e205a60f1bb06cc73bc9df06289df., src=skynet-1,60020,1321912978281, dest=skynet-4,60020,1321912999305
> T2: RS1 shutdown. 2011-11-21 14:03:24,759 DEBUG org.apache.hadoop.hbase.master.ServerManager: Added=skynet-1,60020,1321912978281 to dead servers, submitted shutdown handler to be executed, root=false, meta=true
> T3:R1 is opened on RS2. 2011-11-21 14:03:26,131 DEBUG org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: The master has opened the region tableXY,\xB8Q\xEB\x85\x1E\xB8Q\xDF,1321573099841.ee2e205a60f1bb06cc73bc9df06289df. that was online on skynet-4,60020,1321912999305
> T4: As part of RS1 shutdown handling, region reassignment starts. It uses the region location captured at T2. 2011-11-21 14:03:26,152 INFO org.apache.hadoop.hbase.master.handler.ServerShutdownHandler: Reassigning 32 region(s) that skynet-1,60020,1321912978281 was carrying (skipping 0 regions(s) that are already in transition)
> T5: R1 is assigned to RS3. 2011-11-21 14:03:27,404 DEBUG org.apache.hadoop.hbase.zookeeper.ZKUtil: master:60000-0x133b84f9f490000 Retrieved 115 byte(s) of data from znode /hbase/unassigned/ee2e205a60f1bb06cc73bc9df06289df; data=region=tableXY,\xB8Q\xEB\x85\x1E\xB8Q\xDF,1321573099841.ee2e205a60f1bb06cc73bc9df06289df., origin=skynet-3,60020,1321912991430, state=RS_ZK_REGION_OPENED
> T6: RS3 shutdown. R1 is reassigned to RS2. 2011-11-21 14:03:37,899 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: ALREADY_OPENED region tableXY,\xB8Q\xEB\x85\x1E\xB8Q\xDF,1321573099841.ee2e205a60f1bb06cc73bc9df06289df. to skynet-4,60020,1321912999305
> From AssignmentManager point of view, the R1 is assigned to RS2. The .META. table indicates the location is RS3.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5094) The META can hold an entry for a region with a different server name from the one actually in the AssignmentManager thus making the region inaccessible.

Posted by "Zhihong Yu (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13176458#comment-13176458 ] 

Zhihong Yu commented on HBASE-5094:
-----------------------------------

+1 on patch.
                
> The META can hold an entry for a region with a different server name from the one actually in the AssignmentManager thus making the region inaccessible.
> --------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-5094
>                 URL: https://issues.apache.org/jira/browse/HBASE-5094
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.92.0
>            Reporter: ramkrishna.s.vasudevan
>            Priority: Critical
>         Attachments: 5094.patch
>
>
> {code}
> RegionState rit = this.services.getAssignmentManager().isRegionInTransition(e.getKey());
>             ServerName addressFromAM = this.services.getAssignmentManager()
>                 .getRegionServerOfRegion(e.getKey());
>             if (rit != null && !rit.isClosing() && !rit.isPendingClose()) {
>               // Skip regions that were in transition unless CLOSING or
>               // PENDING_CLOSE
>               LOG.info("Skip assigning region " + rit.toString());
>             } else if (addressFromAM != null
>                 && !addressFromAM.equals(this.serverName)) {
>               LOG.debug("Skip assigning region "
>                     + e.getKey().getRegionNameAsString()
>                     + " because it has been opened in "
>                     + addressFromAM.getServerName());
>               }
> {code}
> In ServerShutDownHandler we try to get the address in the AM.  This address is initially null because it is not yet updated after the region was opened .i.e. the CAll back after node deletion is not yet done in the master side.
> But removal from RIT is completed on the master side.  So this will trigger a new assignment.
> So there is a small window between the online region is actually added in to the online list and the ServerShutdownHandler where we check the existing address in AM.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-5094) The META can hold an entry for a region with a different server name from the one actually in the AssignmentManager thus making the region inaccessible.

Posted by "Zhihong Yu (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-5094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Zhihong Yu updated HBASE-5094:
------------------------------

    Status: Patch Available  (was: Open)
    
> The META can hold an entry for a region with a different server name from the one actually in the AssignmentManager thus making the region inaccessible.
> --------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-5094
>                 URL: https://issues.apache.org/jira/browse/HBASE-5094
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.92.0
>            Reporter: ramkrishna.s.vasudevan
>            Priority: Critical
>         Attachments: 5094.patch
>
>
> {code}
> RegionState rit = this.services.getAssignmentManager().isRegionInTransition(e.getKey());
>             ServerName addressFromAM = this.services.getAssignmentManager()
>                 .getRegionServerOfRegion(e.getKey());
>             if (rit != null && !rit.isClosing() && !rit.isPendingClose()) {
>               // Skip regions that were in transition unless CLOSING or
>               // PENDING_CLOSE
>               LOG.info("Skip assigning region " + rit.toString());
>             } else if (addressFromAM != null
>                 && !addressFromAM.equals(this.serverName)) {
>               LOG.debug("Skip assigning region "
>                     + e.getKey().getRegionNameAsString()
>                     + " because it has been opened in "
>                     + addressFromAM.getServerName());
>               }
> {code}
> In ServerShutDownHandler we try to get the address in the AM.  This address is initially null because it is not yet updated after the region was opened .i.e. the CAll back after node deletion is not yet done in the master side.
> But removal from RIT is completed on the master side.  So this will trigger a new assignment.
> So there is a small window between the online region is actually added in to the online list and the ServerShutdownHandler where we check the existing address in AM.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5094) The META can hold an entry for a region with a different server name from the one actually in the AssignmentManager thus making the region inaccessible.

Posted by "Zhihong Yu (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13178051#comment-13178051 ] 

Zhihong Yu commented on HBASE-5094:
-----------------------------------

+1 on patch.

After discussing with Ramkrishna, the task of handling doubly assigned regions would be done in another JIRA.
                
> The META can hold an entry for a region with a different server name from the one actually in the AssignmentManager thus making the region inaccessible.
> --------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-5094
>                 URL: https://issues.apache.org/jira/browse/HBASE-5094
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.92.0
>            Reporter: ramkrishna.s.vasudevan
>            Priority: Critical
>         Attachments: HBASE-5094_1.patch
>
>
> {code}
> RegionState rit = this.services.getAssignmentManager().isRegionInTransition(e.getKey());
>             ServerName addressFromAM = this.services.getAssignmentManager()
>                 .getRegionServerOfRegion(e.getKey());
>             if (rit != null && !rit.isClosing() && !rit.isPendingClose()) {
>               // Skip regions that were in transition unless CLOSING or
>               // PENDING_CLOSE
>               LOG.info("Skip assigning region " + rit.toString());
>             } else if (addressFromAM != null
>                 && !addressFromAM.equals(this.serverName)) {
>               LOG.debug("Skip assigning region "
>                     + e.getKey().getRegionNameAsString()
>                     + " because it has been opened in "
>                     + addressFromAM.getServerName());
>               }
> {code}
> In ServerShutDownHandler we try to get the address in the AM.  This address is initially null because it is not yet updated after the region was opened .i.e. the CAll back after node deletion is not yet done in the master side.
> But removal from RIT is completed on the master side.  So this will trigger a new assignment.
> So there is a small window between the online region is actually added in to the online list and the ServerShutdownHandler where we check the existing address in AM.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5094) The META can hold an entry for a region with a different server name from the one actually in the AssignmentManager thus making the region inaccessible.

Posted by "stack (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13180216#comment-13180216 ] 

stack commented on HBASE-5094:
------------------------------

@Ming Open a new issue w/ above scenario?
                
> The META can hold an entry for a region with a different server name from the one actually in the AssignmentManager thus making the region inaccessible.
> --------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-5094
>                 URL: https://issues.apache.org/jira/browse/HBASE-5094
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.92.0
>            Reporter: ramkrishna.s.vasudevan
>            Assignee: ramkrishna.s.vasudevan
>            Priority: Critical
>         Attachments: HBASE-5094_1.patch
>
>
> {code}
> RegionState rit = this.services.getAssignmentManager().isRegionInTransition(e.getKey());
>             ServerName addressFromAM = this.services.getAssignmentManager()
>                 .getRegionServerOfRegion(e.getKey());
>             if (rit != null && !rit.isClosing() && !rit.isPendingClose()) {
>               // Skip regions that were in transition unless CLOSING or
>               // PENDING_CLOSE
>               LOG.info("Skip assigning region " + rit.toString());
>             } else if (addressFromAM != null
>                 && !addressFromAM.equals(this.serverName)) {
>               LOG.debug("Skip assigning region "
>                     + e.getKey().getRegionNameAsString()
>                     + " because it has been opened in "
>                     + addressFromAM.getServerName());
>               }
> {code}
> In ServerShutDownHandler we try to get the address in the AM.  This address is initially null because it is not yet updated after the region was opened .i.e. the CAll back after node deletion is not yet done in the master side.
> But removal from RIT is completed on the master side.  So this will trigger a new assignment.
> So there is a small window between the online region is actually added in to the online list and the ServerShutdownHandler where we check the existing address in AM.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5094) The META can hold an entry for a region with a different server name from the one actually in the AssignmentManager thus making the region inaccessible.

Posted by "Zhihong Yu (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13177500#comment-13177500 ] 

Zhihong Yu commented on HBASE-5094:
-----------------------------------

How about introducing check in region server's loop [tryRegionServerReport() ?], similar to the one you added in the patch ?
This way region server can remove such 'doubly assigned' regions ?
                
> The META can hold an entry for a region with a different server name from the one actually in the AssignmentManager thus making the region inaccessible.
> --------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-5094
>                 URL: https://issues.apache.org/jira/browse/HBASE-5094
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.92.0
>            Reporter: ramkrishna.s.vasudevan
>            Priority: Critical
>         Attachments: HBASE-5094_1.patch
>
>
> {code}
> RegionState rit = this.services.getAssignmentManager().isRegionInTransition(e.getKey());
>             ServerName addressFromAM = this.services.getAssignmentManager()
>                 .getRegionServerOfRegion(e.getKey());
>             if (rit != null && !rit.isClosing() && !rit.isPendingClose()) {
>               // Skip regions that were in transition unless CLOSING or
>               // PENDING_CLOSE
>               LOG.info("Skip assigning region " + rit.toString());
>             } else if (addressFromAM != null
>                 && !addressFromAM.equals(this.serverName)) {
>               LOG.debug("Skip assigning region "
>                     + e.getKey().getRegionNameAsString()
>                     + " because it has been opened in "
>                     + addressFromAM.getServerName());
>               }
> {code}
> In ServerShutDownHandler we try to get the address in the AM.  This address is initially null because it is not yet updated after the region was opened .i.e. the CAll back after node deletion is not yet done in the master side.
> But removal from RIT is completed on the master side.  So this will trigger a new assignment.
> So there is a small window between the online region is actually added in to the online list and the ServerShutdownHandler where we check the existing address in AM.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5094) The META can hold an entry for a region with a different server name from the one actually in the AssignmentManager thus making the region inaccessible.

Posted by "ramkrishna.s.vasudevan (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13177496#comment-13177496 ] 

ramkrishna.s.vasudevan commented on HBASE-5094:
-----------------------------------------------

Steps to reproduece the problem
->1) Load balancer started moving region(R1) from RS1 to Rs2.
->2)Rs2 has not yet updated in META table, before that RS1 goes down.
->3) So Servershutdownhandler started,
        a) he first removes the region R1 from online list in master
       b)  and he sees R1 with RS1 as per META entry.
->4) That point RS2 completes the opening and updates the META.
-> 5)Call back comes to master, removes the region from RIT and not yet added to onlineRegionlist in MAster.
->6)The step 3 continues and he sees addressinAM is null and also RIT is null and so he goes with assignment.
-> 7) Now R1 is updated  as RS3 in META and the operation gets completed.  So master also stores in online list that R1 is with RS3.
->8) Now RS3 goes down .
-> 9) Region R1 is getting assigned to RS2 from RS3 and RS2 says ALREADY_OPENED.

                
> The META can hold an entry for a region with a different server name from the one actually in the AssignmentManager thus making the region inaccessible.
> --------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-5094
>                 URL: https://issues.apache.org/jira/browse/HBASE-5094
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.92.0
>            Reporter: ramkrishna.s.vasudevan
>            Priority: Critical
>         Attachments: HBASE-5094_1.patch
>
>
> {code}
> RegionState rit = this.services.getAssignmentManager().isRegionInTransition(e.getKey());
>             ServerName addressFromAM = this.services.getAssignmentManager()
>                 .getRegionServerOfRegion(e.getKey());
>             if (rit != null && !rit.isClosing() && !rit.isPendingClose()) {
>               // Skip regions that were in transition unless CLOSING or
>               // PENDING_CLOSE
>               LOG.info("Skip assigning region " + rit.toString());
>             } else if (addressFromAM != null
>                 && !addressFromAM.equals(this.serverName)) {
>               LOG.debug("Skip assigning region "
>                     + e.getKey().getRegionNameAsString()
>                     + " because it has been opened in "
>                     + addressFromAM.getServerName());
>               }
> {code}
> In ServerShutDownHandler we try to get the address in the AM.  This address is initially null because it is not yet updated after the region was opened .i.e. the CAll back after node deletion is not yet done in the master side.
> But removal from RIT is completed on the master side.  So this will trigger a new assignment.
> So there is a small window between the online region is actually added in to the online list and the ServerShutdownHandler where we check the existing address in AM.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5094) The META can hold an entry for a region with a different server name from the one actually in the AssignmentManager thus making the region inaccessible.

Posted by "Ming Ma (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13180214#comment-13180214 ] 

Ming Ma commented on HBASE-5094:
--------------------------------

Ram, it turns out that my patch which is based on an earlier snapshot of 0.92 code base is quite similar to the fix in HBase-4899. In fact, how the bug is reproed is also similar. Still it seems like there is a really small time window where both my fix and HBase-4899 won't cover. Below code refers to the new code added in HBase-4899.

T1. ServerShutdownHandler. the check for "if (rit != null && !rit.isClosing() && !rit.isPendingClose()" return false as the region is still in closing state. It is actually closed by the RS; Master's state is "closing" due to the delay in ZK notification.
T2. Right after the above check, ZK notification happens and Master starts the opening of the region as requested by load balancer.
T3. "else { this.services.getAssignmentManager().assign(e.getKey(), true); }" is called for another assignment.
                
> The META can hold an entry for a region with a different server name from the one actually in the AssignmentManager thus making the region inaccessible.
> --------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-5094
>                 URL: https://issues.apache.org/jira/browse/HBASE-5094
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.92.0
>            Reporter: ramkrishna.s.vasudevan
>            Assignee: ramkrishna.s.vasudevan
>            Priority: Critical
>         Attachments: HBASE-5094_1.patch
>
>
> {code}
> RegionState rit = this.services.getAssignmentManager().isRegionInTransition(e.getKey());
>             ServerName addressFromAM = this.services.getAssignmentManager()
>                 .getRegionServerOfRegion(e.getKey());
>             if (rit != null && !rit.isClosing() && !rit.isPendingClose()) {
>               // Skip regions that were in transition unless CLOSING or
>               // PENDING_CLOSE
>               LOG.info("Skip assigning region " + rit.toString());
>             } else if (addressFromAM != null
>                 && !addressFromAM.equals(this.serverName)) {
>               LOG.debug("Skip assigning region "
>                     + e.getKey().getRegionNameAsString()
>                     + " because it has been opened in "
>                     + addressFromAM.getServerName());
>               }
> {code}
> In ServerShutDownHandler we try to get the address in the AM.  This address is initially null because it is not yet updated after the region was opened .i.e. the CAll back after node deletion is not yet done in the master side.
> But removal from RIT is completed on the master side.  So this will trigger a new assignment.
> So there is a small window between the online region is actually added in to the online list and the ServerShutdownHandler where we check the existing address in AM.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5094) The META can hold an entry for a region with a different server name from the one actually in the AssignmentManager thus making the region inaccessible.

Posted by "ramkrishna.s.vasudevan (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13177497#comment-13177497 ] 

ramkrishna.s.vasudevan commented on HBASE-5094:
-----------------------------------------------

But i would like to mention one point is
If step 8 and 9 does not happen...
the region R1 will be opened both in RS2 and RS3. :(  META will indicate RS3 so any requests to this region will be routed to RS3.  But RS2 will think he has the region.
This patch does not solve this problem.

                
> The META can hold an entry for a region with a different server name from the one actually in the AssignmentManager thus making the region inaccessible.
> --------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-5094
>                 URL: https://issues.apache.org/jira/browse/HBASE-5094
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.92.0
>            Reporter: ramkrishna.s.vasudevan
>            Priority: Critical
>         Attachments: HBASE-5094_1.patch
>
>
> {code}
> RegionState rit = this.services.getAssignmentManager().isRegionInTransition(e.getKey());
>             ServerName addressFromAM = this.services.getAssignmentManager()
>                 .getRegionServerOfRegion(e.getKey());
>             if (rit != null && !rit.isClosing() && !rit.isPendingClose()) {
>               // Skip regions that were in transition unless CLOSING or
>               // PENDING_CLOSE
>               LOG.info("Skip assigning region " + rit.toString());
>             } else if (addressFromAM != null
>                 && !addressFromAM.equals(this.serverName)) {
>               LOG.debug("Skip assigning region "
>                     + e.getKey().getRegionNameAsString()
>                     + " because it has been opened in "
>                     + addressFromAM.getServerName());
>               }
> {code}
> In ServerShutDownHandler we try to get the address in the AM.  This address is initially null because it is not yet updated after the region was opened .i.e. the CAll back after node deletion is not yet done in the master side.
> But removal from RIT is completed on the master side.  So this will trigger a new assignment.
> So there is a small window between the online region is actually added in to the online list and the ServerShutdownHandler where we check the existing address in AM.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5094) The META can hold an entry for a region with a different server name from the one actually in the AssignmentManager thus making the region inaccessible.

Posted by "Hudson (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13178091#comment-13178091 ] 

Hudson commented on HBASE-5094:
-------------------------------

Integrated in HBase-TRUNK-security #57 (See [https://builds.apache.org/job/HBase-TRUNK-security/57/])
    HBASE-5094 The META can hold an entry for a region with a different server name from the one actually in the AssignmentManager thus making the region inaccessible. (Ram)

ramkrishna : 
Files : 
* /hbase/trunk/CHANGES.txt
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java

                
> The META can hold an entry for a region with a different server name from the one actually in the AssignmentManager thus making the region inaccessible.
> --------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-5094
>                 URL: https://issues.apache.org/jira/browse/HBASE-5094
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.92.0
>            Reporter: ramkrishna.s.vasudevan
>            Priority: Critical
>         Attachments: HBASE-5094_1.patch
>
>
> {code}
> RegionState rit = this.services.getAssignmentManager().isRegionInTransition(e.getKey());
>             ServerName addressFromAM = this.services.getAssignmentManager()
>                 .getRegionServerOfRegion(e.getKey());
>             if (rit != null && !rit.isClosing() && !rit.isPendingClose()) {
>               // Skip regions that were in transition unless CLOSING or
>               // PENDING_CLOSE
>               LOG.info("Skip assigning region " + rit.toString());
>             } else if (addressFromAM != null
>                 && !addressFromAM.equals(this.serverName)) {
>               LOG.debug("Skip assigning region "
>                     + e.getKey().getRegionNameAsString()
>                     + " because it has been opened in "
>                     + addressFromAM.getServerName());
>               }
> {code}
> In ServerShutDownHandler we try to get the address in the AM.  This address is initially null because it is not yet updated after the region was opened .i.e. the CAll back after node deletion is not yet done in the master side.
> But removal from RIT is completed on the master side.  So this will trigger a new assignment.
> So there is a small window between the online region is actually added in to the online list and the ServerShutdownHandler where we check the existing address in AM.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-5094) The META can hold an entry for a region with a different server name from the one actually in the AssignmentManager thus making the region inaccessible.

Posted by "ramkrishna.s.vasudevan (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-5094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

ramkrishna.s.vasudevan updated HBASE-5094:
------------------------------------------

    Summary: The META can hold an entry for a region with a different server name from the one actually in the AssignmentManager thus making the region inaccessible.  (was: The META can hold an entry for a region with a different server name from the one actually in the AssignmentManager thus making the region unaccessible.)
    
> The META can hold an entry for a region with a different server name from the one actually in the AssignmentManager thus making the region inaccessible.
> --------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-5094
>                 URL: https://issues.apache.org/jira/browse/HBASE-5094
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Ted Yu
>
> R1 is reassigned to RS3 during RS1 shutdown, even though R1 was just assigned to RS2 by load balancer. So .META. table indicated R1 is on RS3. Both RS2 and RS3 think they have R1. Later when RS3 shutdown, R1 is reassigned to RS2. RS2 will indicate ALREADY_OPENED. Thus the region is considered assigned to RS2 even though .META. indicates it is on RS3.
> T1: Load balancer tried to move R1 from RS1 to RS2
> . 2011-11-21 14:03:20,812 INFO org.apache.hadoop.hbase.master.HMaster: balance hri=tableXY,\xB8Q\xEB\x85\x1E\xB8Q\xDF,1321573099841.ee2e205a60f1bb06cc73bc9df06289df., src=skynet-1,60020,1321912978281, dest=skynet-4,60020,1321912999305
> T2: RS1 shutdown. 2011-11-21 14:03:24,759 DEBUG org.apache.hadoop.hbase.master.ServerManager: Added=skynet-1,60020,1321912978281 to dead servers, submitted shutdown handler to be executed, root=false, meta=true
> T3:R1 is opened on RS2. 2011-11-21 14:03:26,131 DEBUG org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: The master has opened the region tableXY,\xB8Q\xEB\x85\x1E\xB8Q\xDF,1321573099841.ee2e205a60f1bb06cc73bc9df06289df. that was online on skynet-4,60020,1321912999305
> T4: As part of RS1 shutdown handling, region reassignment starts. It uses the region location captured at T2. 2011-11-21 14:03:26,152 INFO org.apache.hadoop.hbase.master.handler.ServerShutdownHandler: Reassigning 32 region(s) that skynet-1,60020,1321912978281 was carrying (skipping 0 regions(s) that are already in transition)
> T5: R1 is assigned to RS3. 2011-11-21 14:03:27,404 DEBUG org.apache.hadoop.hbase.zookeeper.ZKUtil: master:60000-0x133b84f9f490000 Retrieved 115 byte(s) of data from znode /hbase/unassigned/ee2e205a60f1bb06cc73bc9df06289df; data=region=tableXY,\xB8Q\xEB\x85\x1E\xB8Q\xDF,1321573099841.ee2e205a60f1bb06cc73bc9df06289df., origin=skynet-3,60020,1321912991430, state=RS_ZK_REGION_OPENED
> T6: RS3 shutdown. R1 is reassigned to RS2. 2011-11-21 14:03:37,899 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: ALREADY_OPENED region tableXY,\xB8Q\xEB\x85\x1E\xB8Q\xDF,1321573099841.ee2e205a60f1bb06cc73bc9df06289df. to skynet-4,60020,1321912999305
> From AssignmentManager point of view, the R1 is assigned to RS2. The .META. table indicates the location is RS3.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5094) The META can hold an entry for a region with a different server name from the one actually in the AssignmentManager thus making the region inaccessible.

Posted by "ramkrishna.s.vasudevan (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13175925#comment-13175925 ] 

ramkrishna.s.vasudevan commented on HBASE-5094:
-----------------------------------------------

RegionX is reassigned to RS_C during RS_A shutdown, although RegionX was just assigned to RS_B by load balancer. So .META. table indicates RegionX is on RS_C. Both RS_B and RS_C think they have RegionX.Later when RS_C shuts down, RegionX is reassigned to RS_B. RS_B will indicate ALREADY_OPENED. Thus the region is considered assigned to RS_B even though .META. indicates it is on RS_C.

1) Region RegionX - Assigned from RS_A to RS_B.
2) RS_A goes down and ServerShutDownHandler.  ServerShutDwonHandler finds RegionX with RS_A from .META. as still .META. is not yet updated to RS_B.
3) As RS_A goes down RegionX is assigned from RS_A to RS_C.
4) RS_C goes down. ServerShutdownHandler processes RegionX and tries to assign it to RS_B.
5) RS_B says ALREADY_OPENED but .META. shows RS_C.
                
> The META can hold an entry for a region with a different server name from the one actually in the AssignmentManager thus making the region inaccessible.
> --------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-5094
>                 URL: https://issues.apache.org/jira/browse/HBASE-5094
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.92.0
>            Reporter: ramkrishna.s.vasudevan
>
> {code}
> RegionState rit = this.services.getAssignmentManager().isRegionInTransition(e.getKey());
>             ServerName addressFromAM = this.services.getAssignmentManager()
>                 .getRegionServerOfRegion(e.getKey());
>             if (rit != null && !rit.isClosing() && !rit.isPendingClose()) {
>               // Skip regions that were in transition unless CLOSING or
>               // PENDING_CLOSE
>               LOG.info("Skip assigning region " + rit.toString());
>             } else if (addressFromAM != null
>                 && !addressFromAM.equals(this.serverName)) {
>               LOG.debug("Skip assigning region "
>                     + e.getKey().getRegionNameAsString()
>                     + " because it has been opened in "
>                     + addressFromAM.getServerName());
>               }
> {code}
> In ServerShutDownHandler we try to get the address in the AM.  This address is initially null because it is not yet updated after the region was opened .i.e. the CAll back after node deletion is not yet done in the master side.
> But removal from RIT is completed on the master side.  So this will trigger a new assignment.
> So there is a small window between the online region is actually added in to the online list and the ServerShutdownHandler where we check the existing address in AM.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5094) The META can hold an entry for a region with a different server name from the one actually in the AssignmentManager thus making the region inaccessible.

Posted by "Hudson (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13178101#comment-13178101 ] 

Hudson commented on HBASE-5094:
-------------------------------

Integrated in HBase-TRUNK #2600 (See [https://builds.apache.org/job/HBase-TRUNK/2600/])
    HBASE-5094 The META can hold an entry for a region with a different server name from the one actually in the AssignmentManager thus making the region inaccessible. (Ram)

ramkrishna : 
Files : 
* /hbase/trunk/CHANGES.txt
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java

                
> The META can hold an entry for a region with a different server name from the one actually in the AssignmentManager thus making the region inaccessible.
> --------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-5094
>                 URL: https://issues.apache.org/jira/browse/HBASE-5094
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.92.0
>            Reporter: ramkrishna.s.vasudevan
>            Priority: Critical
>         Attachments: HBASE-5094_1.patch
>
>
> {code}
> RegionState rit = this.services.getAssignmentManager().isRegionInTransition(e.getKey());
>             ServerName addressFromAM = this.services.getAssignmentManager()
>                 .getRegionServerOfRegion(e.getKey());
>             if (rit != null && !rit.isClosing() && !rit.isPendingClose()) {
>               // Skip regions that were in transition unless CLOSING or
>               // PENDING_CLOSE
>               LOG.info("Skip assigning region " + rit.toString());
>             } else if (addressFromAM != null
>                 && !addressFromAM.equals(this.serverName)) {
>               LOG.debug("Skip assigning region "
>                     + e.getKey().getRegionNameAsString()
>                     + " because it has been opened in "
>                     + addressFromAM.getServerName());
>               }
> {code}
> In ServerShutDownHandler we try to get the address in the AM.  This address is initially null because it is not yet updated after the region was opened .i.e. the CAll back after node deletion is not yet done in the master side.
> But removal from RIT is completed on the master side.  So this will trigger a new assignment.
> So there is a small window between the online region is actually added in to the online list and the ServerShutdownHandler where we check the existing address in AM.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-5094) The META can hold an entry for a region with a different server name from the one actually in the AssignmentManager thus making the region inaccessible.

Posted by "ramkrishna.s.vasudevan (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-5094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

ramkrishna.s.vasudevan updated HBASE-5094:
------------------------------------------

    Description: 
R1 is reassigned to RS3 during RS1 shutdown, even though R1 was just assigned to RS2 by load balancer. So .META. table indicated R1 is on RS3. Both RS2 and RS3 think they have R1. Later when RS3 shutdown, R1 is reassigned to RS2. RS2 will indicate ALREADY_OPENED. Thus the region is considered assigned to RS2 even though .META. indicates it is on RS3.



1) Region R1 - Assigned from RS1 to RS2.
2) RS1 goes down and ServerShutDownHandler.  ServerShutDwonHandler finds R1 with RS1 from META as still META is not yet updated to RS2.
3) As RS1 goes down R1 is assigned from RS1 to RS3.  
4) RS3 goes down. ServerShutdownHandler processes R1 and tries to assign it to RS2.
5) RS2 says ALREADY_OPENED but META shows RS3.

I was able to reproduce the scenario in 0.92





  was:
R1 is reassigned to RS3 during RS1 shutdown, even though R1 was just assigned to RS2 by load balancer. So .META. table indicated R1 is on RS3. Both RS2 and RS3 think they have R1. Later when RS3 shutdown, R1 is reassigned to RS2. RS2 will indicate ALREADY_OPENED. Thus the region is considered assigned to RS2 even though .META. indicates it is on RS3.



1) Region R1 - Assigned from RS1 to RS2.
2) RS1 goes down and ServerShutDownHandler.  ServerShutDwonHandler finds RS1 with R1 from META as still META is not yet updated to RS2.
3) As RS1 goes down R1 is assigned from RS1 to RS3.  
4) RS3 goes down. ServerShutdownHandler processes R1 and tries to assign it to RS2.
5) RS2 says ALREADY_OPENED but META shows RS3.

I was able to reproduce the scenario.





    
> The META can hold an entry for a region with a different server name from the one actually in the AssignmentManager thus making the region inaccessible.
> --------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-5094
>                 URL: https://issues.apache.org/jira/browse/HBASE-5094
>             Project: HBase
>          Issue Type: Bug
>            Reporter: ramkrishna.s.vasudevan
>
> R1 is reassigned to RS3 during RS1 shutdown, even though R1 was just assigned to RS2 by load balancer. So .META. table indicated R1 is on RS3. Both RS2 and RS3 think they have R1. Later when RS3 shutdown, R1 is reassigned to RS2. RS2 will indicate ALREADY_OPENED. Thus the region is considered assigned to RS2 even though .META. indicates it is on RS3.
> 1) Region R1 - Assigned from RS1 to RS2.
> 2) RS1 goes down and ServerShutDownHandler.  ServerShutDwonHandler finds R1 with RS1 from META as still META is not yet updated to RS2.
> 3) As RS1 goes down R1 is assigned from RS1 to RS3.  
> 4) RS3 goes down. ServerShutdownHandler processes R1 and tries to assign it to RS2.
> 5) RS2 says ALREADY_OPENED but META shows RS3.
> I was able to reproduce the scenario in 0.92

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HBASE-5094) The META can hold an entry for a region with a different server name from the one actually in the AssignmentManager thus making the region inaccessible.

Posted by "ramkrishna.s.vasudevan (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-5094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13178063#comment-13178063 ] 

ramkrishna.s.vasudevan commented on HBASE-5094:
-----------------------------------------------

Committed to trunk and 0.92.

Thanks for the review Ted.
                
> The META can hold an entry for a region with a different server name from the one actually in the AssignmentManager thus making the region inaccessible.
> --------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-5094
>                 URL: https://issues.apache.org/jira/browse/HBASE-5094
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.92.0
>            Reporter: ramkrishna.s.vasudevan
>            Priority: Critical
>         Attachments: HBASE-5094_1.patch
>
>
> {code}
> RegionState rit = this.services.getAssignmentManager().isRegionInTransition(e.getKey());
>             ServerName addressFromAM = this.services.getAssignmentManager()
>                 .getRegionServerOfRegion(e.getKey());
>             if (rit != null && !rit.isClosing() && !rit.isPendingClose()) {
>               // Skip regions that were in transition unless CLOSING or
>               // PENDING_CLOSE
>               LOG.info("Skip assigning region " + rit.toString());
>             } else if (addressFromAM != null
>                 && !addressFromAM.equals(this.serverName)) {
>               LOG.debug("Skip assigning region "
>                     + e.getKey().getRegionNameAsString()
>                     + " because it has been opened in "
>                     + addressFromAM.getServerName());
>               }
> {code}
> In ServerShutDownHandler we try to get the address in the AM.  This address is initially null because it is not yet updated after the region was opened .i.e. the CAll back after node deletion is not yet done in the master side.
> But removal from RIT is completed on the master side.  So this will trigger a new assignment.
> So there is a small window between the online region is actually added in to the online list and the ServerShutdownHandler where we check the existing address in AM.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HBASE-5094) The META can hold an entry for a region with a different server name from the one actually in the AssignmentManager thus making the region inaccessible.

Posted by "Zhihong Yu (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-5094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Zhihong Yu updated HBASE-5094:
------------------------------

    Attachment:     (was: 5094.patch)
    
> The META can hold an entry for a region with a different server name from the one actually in the AssignmentManager thus making the region inaccessible.
> --------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-5094
>                 URL: https://issues.apache.org/jira/browse/HBASE-5094
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.92.0
>            Reporter: ramkrishna.s.vasudevan
>            Priority: Critical
>
> {code}
> RegionState rit = this.services.getAssignmentManager().isRegionInTransition(e.getKey());
>             ServerName addressFromAM = this.services.getAssignmentManager()
>                 .getRegionServerOfRegion(e.getKey());
>             if (rit != null && !rit.isClosing() && !rit.isPendingClose()) {
>               // Skip regions that were in transition unless CLOSING or
>               // PENDING_CLOSE
>               LOG.info("Skip assigning region " + rit.toString());
>             } else if (addressFromAM != null
>                 && !addressFromAM.equals(this.serverName)) {
>               LOG.debug("Skip assigning region "
>                     + e.getKey().getRegionNameAsString()
>                     + " because it has been opened in "
>                     + addressFromAM.getServerName());
>               }
> {code}
> In ServerShutDownHandler we try to get the address in the AM.  This address is initially null because it is not yet updated after the region was opened .i.e. the CAll back after node deletion is not yet done in the master side.
> But removal from RIT is completed on the master side.  So this will trigger a new assignment.
> So there is a small window between the online region is actually added in to the online list and the ServerShutdownHandler where we check the existing address in AM.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira