You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "chunhui shen (JIRA)" <ji...@apache.org> on 2012/08/15 08:58:37 UTC

[jira] [Created] (HBASE-6587) Region would be assigned twice in the case of all RS offline

chunhui shen created HBASE-6587:
-----------------------------------

             Summary: Region would be assigned twice in the case of all RS offline
                 Key: HBASE-6587
                 URL: https://issues.apache.org/jira/browse/HBASE-6587
             Project: HBase
          Issue Type: Bug
    Affects Versions: 0.94.1
            Reporter: chunhui shen
            Assignee: chunhui shen


In the TimeoutMonitor, we would act on time out for the regions if (this.allRegionServersOffline && !noRSAvailable)
The code is as the following:
{code}
 if (regionState.getStamp() + timeout <= now ||
          (this.allRegionServersOffline && !noRSAvailable)) {
          //decide on action upon timeout or, if some RSs just came back online, we can start the
          // the assignment
          actOnTimeOut(regionState);
        }
{code}

But we found it exists a bug that it would act on time out for the region which was assigned just now , and cause assigning the region twice.


Master log for the region 277b9b6df6de2b9be1353b4fa25f4222:
{code}
2012-08-14 20:42:54,367 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Unable to determine a plan to assign .META.,,1.1028785192 state=OFFLINE, ts=1
344948174367, server=null
2012-08-14 20:44:31,640 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: No previous transition plan was found (or we are ignoring an existing plan) for writete
st,VHXYHJN0BL48HMR4DI1L,1344925649429.277b9b6df6de2b9be1353b4fa25f4222. so generated a random one; hri=writetest,VHXYHJN0BL48HMR4DI1L,1344925649429.277b9b6df6de2b9be13
53b4fa25f4222., src=, dest=dw92.kgb.sqa.cm4,60020,1344948267642; 1 (online=1, available=1) available servers
2012-08-14 20:44:31,640 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:60000-0x438f53bbf9b0acd Creating (or updating) unassigned node for 277b9b6df6de2b9be13
53b4fa25f4222 with OFFLINE state
2012-08-14 20:44:31,643 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Assigning region writetest,VHXYHJN0BL48HMR4DI1L,1344925649429.277b9b6df6de2b9be1353b4fa
25f4222. to dw92.kgb.sqa.cm4,60020,1344948267642
2012-08-14 20:44:32,291 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_OPENING, server=dw92.kgb.sqa.cm4,60020,1344948267642, 
region=277b9b6df6de2b9be1353b4fa25f4222
// 异常的超时
2012-08-14 20:44:32,518 INFO org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed out: writetest,VHXYHJN0BL48HMR4DI1L,1344925649429.277b9b6df
6de2b9be1353b4fa25f4222. state=OPENING, ts=1344948272279, server=dw92.kgb.sqa.cm4,60020,1344948267642
2012-08-14 20:44:32,518 INFO org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPENING for too long, reassigning region=writetest,VHXYHJN0BL48HMR4DI1L,
1344925649429.277b9b6df6de2b9be1353b4fa25f4222.
{code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

[jira] [Commented] (HBASE-6587) Region would be assigned twice in the case of all RS offline

Posted by "Hudson (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13436009#comment-13436009 ] 

Hudson commented on HBASE-6587:
-------------------------------

Integrated in HBase-TRUNK #3223 (See [https://builds.apache.org/job/HBase-TRUNK/3223/])
    HBASE-6587 Region would be assigned twice in the case of all RS offline (Chunhui) (Revision 1373829)

     Result = FAILURE
tedyu : 
Files : 
* /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java

                
> Region would be assigned twice in the case of all RS offline
> ------------------------------------------------------------
>
>                 Key: HBASE-6587
>                 URL: https://issues.apache.org/jira/browse/HBASE-6587
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.94.1
>            Reporter: chunhui shen
>            Assignee: chunhui shen
>             Fix For: 0.96.0
>
>         Attachments: 6587.patch, HBASE-6587.patch
>
>
> In the TimeoutMonitor, we would act on time out for the regions if (this.allRegionServersOffline && !noRSAvailable)
> The code is as the following:
> {code}
>  if (regionState.getStamp() + timeout <= now ||
>           (this.allRegionServersOffline && !noRSAvailable)) {
>           //decide on action upon timeout or, if some RSs just came back online, we can start the
>           // the assignment
>           actOnTimeOut(regionState);
>         }
> {code}
> But we found it exists a bug that it would act on time out for the region which was assigned just now , and cause assigning the region twice.
> Master log for the region 277b9b6df6de2b9be1353b4fa25f4222:
> {code}
> 2012-08-14 20:42:54,367 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Unable to determine a plan to assign .META.,,1.1028785192 state=OFFLINE, ts=1
> 344948174367, server=null
> 2012-08-14 20:44:31,640 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: No previous transition plan was found (or we are ignoring an existing plan) for writete
> st,VHXYHJN0BL48HMR4DI1L,1344925649429.277b9b6df6de2b9be1353b4fa25f4222. so generated a random one; hri=writetest,VHXYHJN0BL48HMR4DI1L,1344925649429.277b9b6df6de2b9be13
> 53b4fa25f4222., src=, dest=dw92.kgb.sqa.cm4,60020,1344948267642; 1 (online=1, available=1) available servers
> 2012-08-14 20:44:31,640 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:60000-0x438f53bbf9b0acd Creating (or updating) unassigned node for 277b9b6df6de2b9be13
> 53b4fa25f4222 with OFFLINE state
> 2012-08-14 20:44:31,643 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Assigning region writetest,VHXYHJN0BL48HMR4DI1L,1344925649429.277b9b6df6de2b9be1353b4fa
> 25f4222. to dw92.kgb.sqa.cm4,60020,1344948267642
> 2012-08-14 20:44:32,291 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_OPENING, server=dw92.kgb.sqa.cm4,60020,1344948267642, 
> region=277b9b6df6de2b9be1353b4fa25f4222
> // 异常的超时
> 2012-08-14 20:44:32,518 INFO org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed out: writetest,VHXYHJN0BL48HMR4DI1L,1344925649429.277b9b6df
> 6de2b9be1353b4fa25f4222. state=OPENING, ts=1344948272279, server=dw92.kgb.sqa.cm4,60020,1344948267642
> 2012-08-14 20:44:32,518 INFO org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPENING for too long, reassigning region=writetest,VHXYHJN0BL48HMR4DI1L,
> 1344925649429.277b9b6df6de2b9be1353b4fa25f4222.
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

[jira] [Updated] (HBASE-6587) Region would be assigned twice in the case of all RS offline

Posted by "chunhui shen (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-6587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

chunhui shen updated HBASE-6587:
--------------------------------

    Attachment: HBASE-6587.patch

My solution is that checking the region plan before acting on time out for the region in the case of this issue
                
> Region would be assigned twice in the case of all RS offline
> ------------------------------------------------------------
>
>                 Key: HBASE-6587
>                 URL: https://issues.apache.org/jira/browse/HBASE-6587
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.94.1
>            Reporter: chunhui shen
>            Assignee: chunhui shen
>         Attachments: HBASE-6587.patch
>
>
> In the TimeoutMonitor, we would act on time out for the regions if (this.allRegionServersOffline && !noRSAvailable)
> The code is as the following:
> {code}
>  if (regionState.getStamp() + timeout <= now ||
>           (this.allRegionServersOffline && !noRSAvailable)) {
>           //decide on action upon timeout or, if some RSs just came back online, we can start the
>           // the assignment
>           actOnTimeOut(regionState);
>         }
> {code}
> But we found it exists a bug that it would act on time out for the region which was assigned just now , and cause assigning the region twice.
> Master log for the region 277b9b6df6de2b9be1353b4fa25f4222:
> {code}
> 2012-08-14 20:42:54,367 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Unable to determine a plan to assign .META.,,1.1028785192 state=OFFLINE, ts=1
> 344948174367, server=null
> 2012-08-14 20:44:31,640 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: No previous transition plan was found (or we are ignoring an existing plan) for writete
> st,VHXYHJN0BL48HMR4DI1L,1344925649429.277b9b6df6de2b9be1353b4fa25f4222. so generated a random one; hri=writetest,VHXYHJN0BL48HMR4DI1L,1344925649429.277b9b6df6de2b9be13
> 53b4fa25f4222., src=, dest=dw92.kgb.sqa.cm4,60020,1344948267642; 1 (online=1, available=1) available servers
> 2012-08-14 20:44:31,640 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:60000-0x438f53bbf9b0acd Creating (or updating) unassigned node for 277b9b6df6de2b9be13
> 53b4fa25f4222 with OFFLINE state
> 2012-08-14 20:44:31,643 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Assigning region writetest,VHXYHJN0BL48HMR4DI1L,1344925649429.277b9b6df6de2b9be1353b4fa
> 25f4222. to dw92.kgb.sqa.cm4,60020,1344948267642
> 2012-08-14 20:44:32,291 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_OPENING, server=dw92.kgb.sqa.cm4,60020,1344948267642, 
> region=277b9b6df6de2b9be1353b4fa25f4222
> // 异常的超时
> 2012-08-14 20:44:32,518 INFO org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed out: writetest,VHXYHJN0BL48HMR4DI1L,1344925649429.277b9b6df
> 6de2b9be1353b4fa25f4222. state=OPENING, ts=1344948272279, server=dw92.kgb.sqa.cm4,60020,1344948267642
> 2012-08-14 20:44:32,518 INFO org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPENING for too long, reassigning region=writetest,VHXYHJN0BL48HMR4DI1L,
> 1344925649429.277b9b6df6de2b9be1353b4fa25f4222.
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

[jira] [Commented] (HBASE-6587) Region would be assigned twice in the case of all RS offline

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13435012#comment-13435012 ] 

Hadoop QA commented on HBASE-6587:
----------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12541012/HBASE-6587.patch
  against trunk revision .

    +1 @author.  The patch does not contain any @author tags.

    -1 tests included.  The patch doesn't appear to include any new or modified tests.
                        Please justify why no new tests are needed for this patch.
                        Also please list what manual steps were performed to verify this patch.

    +1 hadoop2.0.  The patch compiles against the hadoop 2.0 profile.

    +1 javadoc.  The javadoc tool did not generate any warning messages.

    -1 javac.  The applied patch generated 5 javac compiler warnings (more than the trunk's current 4 warnings).

    -1 findbugs.  The patch appears to introduce 9 new Findbugs (version 1.3.9) warnings.

    +1 release audit.  The applied patch does not increase the total number of release audit warnings.

    +1 core tests.  The patch passed unit tests in .

Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/2584//testReport/
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2584//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2584//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2584//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2584//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2584//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/2584//console

This message is automatically generated.
                
> Region would be assigned twice in the case of all RS offline
> ------------------------------------------------------------
>
>                 Key: HBASE-6587
>                 URL: https://issues.apache.org/jira/browse/HBASE-6587
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.94.1
>            Reporter: chunhui shen
>            Assignee: chunhui shen
>         Attachments: HBASE-6587.patch
>
>
> In the TimeoutMonitor, we would act on time out for the regions if (this.allRegionServersOffline && !noRSAvailable)
> The code is as the following:
> {code}
>  if (regionState.getStamp() + timeout <= now ||
>           (this.allRegionServersOffline && !noRSAvailable)) {
>           //decide on action upon timeout or, if some RSs just came back online, we can start the
>           // the assignment
>           actOnTimeOut(regionState);
>         }
> {code}
> But we found it exists a bug that it would act on time out for the region which was assigned just now , and cause assigning the region twice.
> Master log for the region 277b9b6df6de2b9be1353b4fa25f4222:
> {code}
> 2012-08-14 20:42:54,367 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Unable to determine a plan to assign .META.,,1.1028785192 state=OFFLINE, ts=1
> 344948174367, server=null
> 2012-08-14 20:44:31,640 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: No previous transition plan was found (or we are ignoring an existing plan) for writete
> st,VHXYHJN0BL48HMR4DI1L,1344925649429.277b9b6df6de2b9be1353b4fa25f4222. so generated a random one; hri=writetest,VHXYHJN0BL48HMR4DI1L,1344925649429.277b9b6df6de2b9be13
> 53b4fa25f4222., src=, dest=dw92.kgb.sqa.cm4,60020,1344948267642; 1 (online=1, available=1) available servers
> 2012-08-14 20:44:31,640 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:60000-0x438f53bbf9b0acd Creating (or updating) unassigned node for 277b9b6df6de2b9be13
> 53b4fa25f4222 with OFFLINE state
> 2012-08-14 20:44:31,643 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Assigning region writetest,VHXYHJN0BL48HMR4DI1L,1344925649429.277b9b6df6de2b9be1353b4fa
> 25f4222. to dw92.kgb.sqa.cm4,60020,1344948267642
> 2012-08-14 20:44:32,291 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_OPENING, server=dw92.kgb.sqa.cm4,60020,1344948267642, 
> region=277b9b6df6de2b9be1353b4fa25f4222
> // 异常的超时
> 2012-08-14 20:44:32,518 INFO org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed out: writetest,VHXYHJN0BL48HMR4DI1L,1344925649429.277b9b6df
> 6de2b9be1353b4fa25f4222. state=OPENING, ts=1344948272279, server=dw92.kgb.sqa.cm4,60020,1344948267642
> 2012-08-14 20:44:32,518 INFO org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPENING for too long, reassigning region=writetest,VHXYHJN0BL48HMR4DI1L,
> 1344925649429.277b9b6df6de2b9be1353b4fa25f4222.
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

[jira] [Updated] (HBASE-6587) Region would be assigned twice in the case of all RS offline

Posted by "Lars Hofhansl (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-6587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Lars Hofhansl updated HBASE-6587:
---------------------------------

    Resolution: Fixed
        Status: Resolved  (was: Patch Available)

Committed to 0.94 as well.
                
> Region would be assigned twice in the case of all RS offline
> ------------------------------------------------------------
>
>                 Key: HBASE-6587
>                 URL: https://issues.apache.org/jira/browse/HBASE-6587
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.94.1
>            Reporter: chunhui shen
>            Assignee: chunhui shen
>             Fix For: 0.96.0, 0.94.2
>
>         Attachments: 6587-0.94.patch, 6587.patch, HBASE-6587.patch
>
>
> In the TimeoutMonitor, we would act on time out for the regions if (this.allRegionServersOffline && !noRSAvailable)
> The code is as the following:
> {code}
>  if (regionState.getStamp() + timeout <= now ||
>           (this.allRegionServersOffline && !noRSAvailable)) {
>           //decide on action upon timeout or, if some RSs just came back online, we can start the
>           // the assignment
>           actOnTimeOut(regionState);
>         }
> {code}
> But we found it exists a bug that it would act on time out for the region which was assigned just now , and cause assigning the region twice.
> Master log for the region 277b9b6df6de2b9be1353b4fa25f4222:
> {code}
> 2012-08-14 20:42:54,367 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Unable to determine a plan to assign .META.,,1.1028785192 state=OFFLINE, ts=1
> 344948174367, server=null
> 2012-08-14 20:44:31,640 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: No previous transition plan was found (or we are ignoring an existing plan) for writete
> st,VHXYHJN0BL48HMR4DI1L,1344925649429.277b9b6df6de2b9be1353b4fa25f4222. so generated a random one; hri=writetest,VHXYHJN0BL48HMR4DI1L,1344925649429.277b9b6df6de2b9be13
> 53b4fa25f4222., src=, dest=dw92.kgb.sqa.cm4,60020,1344948267642; 1 (online=1, available=1) available servers
> 2012-08-14 20:44:31,640 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:60000-0x438f53bbf9b0acd Creating (or updating) unassigned node for 277b9b6df6de2b9be13
> 53b4fa25f4222 with OFFLINE state
> 2012-08-14 20:44:31,643 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Assigning region writetest,VHXYHJN0BL48HMR4DI1L,1344925649429.277b9b6df6de2b9be1353b4fa
> 25f4222. to dw92.kgb.sqa.cm4,60020,1344948267642
> 2012-08-14 20:44:32,291 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_OPENING, server=dw92.kgb.sqa.cm4,60020,1344948267642, 
> region=277b9b6df6de2b9be1353b4fa25f4222
> // 异常的超时
> 2012-08-14 20:44:32,518 INFO org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed out: writetest,VHXYHJN0BL48HMR4DI1L,1344925649429.277b9b6df
> 6de2b9be1353b4fa25f4222. state=OPENING, ts=1344948272279, server=dw92.kgb.sqa.cm4,60020,1344948267642
> 2012-08-14 20:44:32,518 INFO org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPENING for too long, reassigning region=writetest,VHXYHJN0BL48HMR4DI1L,
> 1344925649429.277b9b6df6de2b9be1353b4fa25f4222.
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-6587) Region would be assigned twice in the case of all RS offline

Posted by "Zhihong Ted Yu (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-6587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Zhihong Ted Yu updated HBASE-6587:
----------------------------------

    Status: Patch Available  (was: Open)
    
> Region would be assigned twice in the case of all RS offline
> ------------------------------------------------------------
>
>                 Key: HBASE-6587
>                 URL: https://issues.apache.org/jira/browse/HBASE-6587
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.94.1
>            Reporter: chunhui shen
>            Assignee: chunhui shen
>         Attachments: HBASE-6587.patch
>
>
> In the TimeoutMonitor, we would act on time out for the regions if (this.allRegionServersOffline && !noRSAvailable)
> The code is as the following:
> {code}
>  if (regionState.getStamp() + timeout <= now ||
>           (this.allRegionServersOffline && !noRSAvailable)) {
>           //decide on action upon timeout or, if some RSs just came back online, we can start the
>           // the assignment
>           actOnTimeOut(regionState);
>         }
> {code}
> But we found it exists a bug that it would act on time out for the region which was assigned just now , and cause assigning the region twice.
> Master log for the region 277b9b6df6de2b9be1353b4fa25f4222:
> {code}
> 2012-08-14 20:42:54,367 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Unable to determine a plan to assign .META.,,1.1028785192 state=OFFLINE, ts=1
> 344948174367, server=null
> 2012-08-14 20:44:31,640 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: No previous transition plan was found (or we are ignoring an existing plan) for writete
> st,VHXYHJN0BL48HMR4DI1L,1344925649429.277b9b6df6de2b9be1353b4fa25f4222. so generated a random one; hri=writetest,VHXYHJN0BL48HMR4DI1L,1344925649429.277b9b6df6de2b9be13
> 53b4fa25f4222., src=, dest=dw92.kgb.sqa.cm4,60020,1344948267642; 1 (online=1, available=1) available servers
> 2012-08-14 20:44:31,640 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:60000-0x438f53bbf9b0acd Creating (or updating) unassigned node for 277b9b6df6de2b9be13
> 53b4fa25f4222 with OFFLINE state
> 2012-08-14 20:44:31,643 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Assigning region writetest,VHXYHJN0BL48HMR4DI1L,1344925649429.277b9b6df6de2b9be1353b4fa
> 25f4222. to dw92.kgb.sqa.cm4,60020,1344948267642
> 2012-08-14 20:44:32,291 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_OPENING, server=dw92.kgb.sqa.cm4,60020,1344948267642, 
> region=277b9b6df6de2b9be1353b4fa25f4222
> // 异常的超时
> 2012-08-14 20:44:32,518 INFO org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed out: writetest,VHXYHJN0BL48HMR4DI1L,1344925649429.277b9b6df
> 6de2b9be1353b4fa25f4222. state=OPENING, ts=1344948272279, server=dw92.kgb.sqa.cm4,60020,1344948267642
> 2012-08-14 20:44:32,518 INFO org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPENING for too long, reassigning region=writetest,VHXYHJN0BL48HMR4DI1L,
> 1344925649429.277b9b6df6de2b9be1353b4fa25f4222.
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

[jira] [Updated] (HBASE-6587) Region would be assigned twice in the case of all RS offline

Posted by "Lars Hofhansl (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-6587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Lars Hofhansl updated HBASE-6587:
---------------------------------

    Fix Version/s: 0.94.2
    
> Region would be assigned twice in the case of all RS offline
> ------------------------------------------------------------
>
>                 Key: HBASE-6587
>                 URL: https://issues.apache.org/jira/browse/HBASE-6587
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.94.1
>            Reporter: chunhui shen
>            Assignee: chunhui shen
>             Fix For: 0.96.0, 0.94.2
>
>         Attachments: 6587.patch, HBASE-6587.patch
>
>
> In the TimeoutMonitor, we would act on time out for the regions if (this.allRegionServersOffline && !noRSAvailable)
> The code is as the following:
> {code}
>  if (regionState.getStamp() + timeout <= now ||
>           (this.allRegionServersOffline && !noRSAvailable)) {
>           //decide on action upon timeout or, if some RSs just came back online, we can start the
>           // the assignment
>           actOnTimeOut(regionState);
>         }
> {code}
> But we found it exists a bug that it would act on time out for the region which was assigned just now , and cause assigning the region twice.
> Master log for the region 277b9b6df6de2b9be1353b4fa25f4222:
> {code}
> 2012-08-14 20:42:54,367 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Unable to determine a plan to assign .META.,,1.1028785192 state=OFFLINE, ts=1
> 344948174367, server=null
> 2012-08-14 20:44:31,640 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: No previous transition plan was found (or we are ignoring an existing plan) for writete
> st,VHXYHJN0BL48HMR4DI1L,1344925649429.277b9b6df6de2b9be1353b4fa25f4222. so generated a random one; hri=writetest,VHXYHJN0BL48HMR4DI1L,1344925649429.277b9b6df6de2b9be13
> 53b4fa25f4222., src=, dest=dw92.kgb.sqa.cm4,60020,1344948267642; 1 (online=1, available=1) available servers
> 2012-08-14 20:44:31,640 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:60000-0x438f53bbf9b0acd Creating (or updating) unassigned node for 277b9b6df6de2b9be13
> 53b4fa25f4222 with OFFLINE state
> 2012-08-14 20:44:31,643 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Assigning region writetest,VHXYHJN0BL48HMR4DI1L,1344925649429.277b9b6df6de2b9be1353b4fa
> 25f4222. to dw92.kgb.sqa.cm4,60020,1344948267642
> 2012-08-14 20:44:32,291 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_OPENING, server=dw92.kgb.sqa.cm4,60020,1344948267642, 
> region=277b9b6df6de2b9be1353b4fa25f4222
> // 异常的超时
> 2012-08-14 20:44:32,518 INFO org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed out: writetest,VHXYHJN0BL48HMR4DI1L,1344925649429.277b9b6df
> 6de2b9be1353b4fa25f4222. state=OPENING, ts=1344948272279, server=dw92.kgb.sqa.cm4,60020,1344948267642
> 2012-08-14 20:44:32,518 INFO org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPENING for too long, reassigning region=writetest,VHXYHJN0BL48HMR4DI1L,
> 1344925649429.277b9b6df6de2b9be1353b4fa25f4222.
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

[jira] [Commented] (HBASE-6587) Region would be assigned twice in the case of all RS offline

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13435734#comment-13435734 ] 

Hadoop QA commented on HBASE-6587:
----------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12541159/6587.patch
  against trunk revision .

    +1 @author.  The patch does not contain any @author tags.

    -1 tests included.  The patch doesn't appear to include any new or modified tests.
                        Please justify why no new tests are needed for this patch.
                        Also please list what manual steps were performed to verify this patch.

    +1 hadoop2.0.  The patch compiles against the hadoop 2.0 profile.

    +1 javadoc.  The javadoc tool did not generate any warning messages.

    -1 javac.  The applied patch generated 5 javac compiler warnings (more than the trunk's current 4 warnings).

    -1 findbugs.  The patch appears to introduce 9 new Findbugs (version 1.3.9) warnings.

    +1 release audit.  The applied patch does not increase the total number of release audit warnings.

    +1 core tests.  The patch passed unit tests in .

Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/2595//testReport/
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2595//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2595//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2595//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2595//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2595//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/2595//console

This message is automatically generated.
                
> Region would be assigned twice in the case of all RS offline
> ------------------------------------------------------------
>
>                 Key: HBASE-6587
>                 URL: https://issues.apache.org/jira/browse/HBASE-6587
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.94.1
>            Reporter: chunhui shen
>            Assignee: chunhui shen
>             Fix For: 0.96.0
>
>         Attachments: 6587.patch, HBASE-6587.patch
>
>
> In the TimeoutMonitor, we would act on time out for the regions if (this.allRegionServersOffline && !noRSAvailable)
> The code is as the following:
> {code}
>  if (regionState.getStamp() + timeout <= now ||
>           (this.allRegionServersOffline && !noRSAvailable)) {
>           //decide on action upon timeout or, if some RSs just came back online, we can start the
>           // the assignment
>           actOnTimeOut(regionState);
>         }
> {code}
> But we found it exists a bug that it would act on time out for the region which was assigned just now , and cause assigning the region twice.
> Master log for the region 277b9b6df6de2b9be1353b4fa25f4222:
> {code}
> 2012-08-14 20:42:54,367 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Unable to determine a plan to assign .META.,,1.1028785192 state=OFFLINE, ts=1
> 344948174367, server=null
> 2012-08-14 20:44:31,640 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: No previous transition plan was found (or we are ignoring an existing plan) for writete
> st,VHXYHJN0BL48HMR4DI1L,1344925649429.277b9b6df6de2b9be1353b4fa25f4222. so generated a random one; hri=writetest,VHXYHJN0BL48HMR4DI1L,1344925649429.277b9b6df6de2b9be13
> 53b4fa25f4222., src=, dest=dw92.kgb.sqa.cm4,60020,1344948267642; 1 (online=1, available=1) available servers
> 2012-08-14 20:44:31,640 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:60000-0x438f53bbf9b0acd Creating (or updating) unassigned node for 277b9b6df6de2b9be13
> 53b4fa25f4222 with OFFLINE state
> 2012-08-14 20:44:31,643 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Assigning region writetest,VHXYHJN0BL48HMR4DI1L,1344925649429.277b9b6df6de2b9be1353b4fa
> 25f4222. to dw92.kgb.sqa.cm4,60020,1344948267642
> 2012-08-14 20:44:32,291 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_OPENING, server=dw92.kgb.sqa.cm4,60020,1344948267642, 
> region=277b9b6df6de2b9be1353b4fa25f4222
> // 异常的超时
> 2012-08-14 20:44:32,518 INFO org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed out: writetest,VHXYHJN0BL48HMR4DI1L,1344925649429.277b9b6df
> 6de2b9be1353b4fa25f4222. state=OPENING, ts=1344948272279, server=dw92.kgb.sqa.cm4,60020,1344948267642
> 2012-08-14 20:44:32,518 INFO org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPENING for too long, reassigning region=writetest,VHXYHJN0BL48HMR4DI1L,
> 1344925649429.277b9b6df6de2b9be1353b4fa25f4222.
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

[jira] [Updated] (HBASE-6587) Region would be assigned twice in the case of all RS offline

Posted by "Lars Hofhansl (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-6587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Lars Hofhansl updated HBASE-6587:
---------------------------------

    Attachment: 6587-0.94.patch

0.94 patch.
Please have a look.
                
> Region would be assigned twice in the case of all RS offline
> ------------------------------------------------------------
>
>                 Key: HBASE-6587
>                 URL: https://issues.apache.org/jira/browse/HBASE-6587
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.94.1
>            Reporter: chunhui shen
>            Assignee: chunhui shen
>             Fix For: 0.96.0, 0.94.2
>
>         Attachments: 6587-0.94.patch, 6587.patch, HBASE-6587.patch
>
>
> In the TimeoutMonitor, we would act on time out for the regions if (this.allRegionServersOffline && !noRSAvailable)
> The code is as the following:
> {code}
>  if (regionState.getStamp() + timeout <= now ||
>           (this.allRegionServersOffline && !noRSAvailable)) {
>           //decide on action upon timeout or, if some RSs just came back online, we can start the
>           // the assignment
>           actOnTimeOut(regionState);
>         }
> {code}
> But we found it exists a bug that it would act on time out for the region which was assigned just now , and cause assigning the region twice.
> Master log for the region 277b9b6df6de2b9be1353b4fa25f4222:
> {code}
> 2012-08-14 20:42:54,367 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Unable to determine a plan to assign .META.,,1.1028785192 state=OFFLINE, ts=1
> 344948174367, server=null
> 2012-08-14 20:44:31,640 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: No previous transition plan was found (or we are ignoring an existing plan) for writete
> st,VHXYHJN0BL48HMR4DI1L,1344925649429.277b9b6df6de2b9be1353b4fa25f4222. so generated a random one; hri=writetest,VHXYHJN0BL48HMR4DI1L,1344925649429.277b9b6df6de2b9be13
> 53b4fa25f4222., src=, dest=dw92.kgb.sqa.cm4,60020,1344948267642; 1 (online=1, available=1) available servers
> 2012-08-14 20:44:31,640 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:60000-0x438f53bbf9b0acd Creating (or updating) unassigned node for 277b9b6df6de2b9be13
> 53b4fa25f4222 with OFFLINE state
> 2012-08-14 20:44:31,643 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Assigning region writetest,VHXYHJN0BL48HMR4DI1L,1344925649429.277b9b6df6de2b9be1353b4fa
> 25f4222. to dw92.kgb.sqa.cm4,60020,1344948267642
> 2012-08-14 20:44:32,291 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_OPENING, server=dw92.kgb.sqa.cm4,60020,1344948267642, 
> region=277b9b6df6de2b9be1353b4fa25f4222
> // 异常的超时
> 2012-08-14 20:44:32,518 INFO org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed out: writetest,VHXYHJN0BL48HMR4DI1L,1344925649429.277b9b6df
> 6de2b9be1353b4fa25f4222. state=OPENING, ts=1344948272279, server=dw92.kgb.sqa.cm4,60020,1344948267642
> 2012-08-14 20:44:32,518 INFO org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPENING for too long, reassigning region=writetest,VHXYHJN0BL48HMR4DI1L,
> 1344925649429.277b9b6df6de2b9be1353b4fa25f4222.
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6587) Region would be assigned twice in the case of all RS offline

Posted by "Zhihong Ted Yu (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13435957#comment-13435957 ] 

Zhihong Ted Yu commented on HBASE-6587:
---------------------------------------

Integrated to trunk.

Thanks for the patch, Chunhui.

Thanks for the review, Ram.
                
> Region would be assigned twice in the case of all RS offline
> ------------------------------------------------------------
>
>                 Key: HBASE-6587
>                 URL: https://issues.apache.org/jira/browse/HBASE-6587
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.94.1
>            Reporter: chunhui shen
>            Assignee: chunhui shen
>             Fix For: 0.96.0
>
>         Attachments: 6587.patch, HBASE-6587.patch
>
>
> In the TimeoutMonitor, we would act on time out for the regions if (this.allRegionServersOffline && !noRSAvailable)
> The code is as the following:
> {code}
>  if (regionState.getStamp() + timeout <= now ||
>           (this.allRegionServersOffline && !noRSAvailable)) {
>           //decide on action upon timeout or, if some RSs just came back online, we can start the
>           // the assignment
>           actOnTimeOut(regionState);
>         }
> {code}
> But we found it exists a bug that it would act on time out for the region which was assigned just now , and cause assigning the region twice.
> Master log for the region 277b9b6df6de2b9be1353b4fa25f4222:
> {code}
> 2012-08-14 20:42:54,367 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Unable to determine a plan to assign .META.,,1.1028785192 state=OFFLINE, ts=1
> 344948174367, server=null
> 2012-08-14 20:44:31,640 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: No previous transition plan was found (or we are ignoring an existing plan) for writete
> st,VHXYHJN0BL48HMR4DI1L,1344925649429.277b9b6df6de2b9be1353b4fa25f4222. so generated a random one; hri=writetest,VHXYHJN0BL48HMR4DI1L,1344925649429.277b9b6df6de2b9be13
> 53b4fa25f4222., src=, dest=dw92.kgb.sqa.cm4,60020,1344948267642; 1 (online=1, available=1) available servers
> 2012-08-14 20:44:31,640 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:60000-0x438f53bbf9b0acd Creating (or updating) unassigned node for 277b9b6df6de2b9be13
> 53b4fa25f4222 with OFFLINE state
> 2012-08-14 20:44:31,643 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Assigning region writetest,VHXYHJN0BL48HMR4DI1L,1344925649429.277b9b6df6de2b9be1353b4fa
> 25f4222. to dw92.kgb.sqa.cm4,60020,1344948267642
> 2012-08-14 20:44:32,291 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_OPENING, server=dw92.kgb.sqa.cm4,60020,1344948267642, 
> region=277b9b6df6de2b9be1353b4fa25f4222
> // 异常的超时
> 2012-08-14 20:44:32,518 INFO org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed out: writetest,VHXYHJN0BL48HMR4DI1L,1344925649429.277b9b6df
> 6de2b9be1353b4fa25f4222. state=OPENING, ts=1344948272279, server=dw92.kgb.sqa.cm4,60020,1344948267642
> 2012-08-14 20:44:32,518 INFO org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPENING for too long, reassigning region=writetest,VHXYHJN0BL48HMR4DI1L,
> 1344925649429.277b9b6df6de2b9be1353b4fa25f4222.
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

[jira] [Commented] (HBASE-6587) Region would be assigned twice in the case of all RS offline

Posted by "Hudson (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13445645#comment-13445645 ] 

Hudson commented on HBASE-6587:
-------------------------------

Integrated in HBase-0.94 #443 (See [https://builds.apache.org/job/HBase-0.94/443/])
    HBASE-6587 Region would be assigned twice in the case of all RS offline (Revision 1379242)

     Result = SUCCESS
larsh : 
Files : 
* /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java

                
> Region would be assigned twice in the case of all RS offline
> ------------------------------------------------------------
>
>                 Key: HBASE-6587
>                 URL: https://issues.apache.org/jira/browse/HBASE-6587
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.94.1
>            Reporter: chunhui shen
>            Assignee: chunhui shen
>             Fix For: 0.96.0, 0.94.2
>
>         Attachments: 6587-0.94.patch, 6587.patch, HBASE-6587.patch
>
>
> In the TimeoutMonitor, we would act on time out for the regions if (this.allRegionServersOffline && !noRSAvailable)
> The code is as the following:
> {code}
>  if (regionState.getStamp() + timeout <= now ||
>           (this.allRegionServersOffline && !noRSAvailable)) {
>           //decide on action upon timeout or, if some RSs just came back online, we can start the
>           // the assignment
>           actOnTimeOut(regionState);
>         }
> {code}
> But we found it exists a bug that it would act on time out for the region which was assigned just now , and cause assigning the region twice.
> Master log for the region 277b9b6df6de2b9be1353b4fa25f4222:
> {code}
> 2012-08-14 20:42:54,367 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Unable to determine a plan to assign .META.,,1.1028785192 state=OFFLINE, ts=1
> 344948174367, server=null
> 2012-08-14 20:44:31,640 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: No previous transition plan was found (or we are ignoring an existing plan) for writete
> st,VHXYHJN0BL48HMR4DI1L,1344925649429.277b9b6df6de2b9be1353b4fa25f4222. so generated a random one; hri=writetest,VHXYHJN0BL48HMR4DI1L,1344925649429.277b9b6df6de2b9be13
> 53b4fa25f4222., src=, dest=dw92.kgb.sqa.cm4,60020,1344948267642; 1 (online=1, available=1) available servers
> 2012-08-14 20:44:31,640 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:60000-0x438f53bbf9b0acd Creating (or updating) unassigned node for 277b9b6df6de2b9be13
> 53b4fa25f4222 with OFFLINE state
> 2012-08-14 20:44:31,643 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Assigning region writetest,VHXYHJN0BL48HMR4DI1L,1344925649429.277b9b6df6de2b9be1353b4fa
> 25f4222. to dw92.kgb.sqa.cm4,60020,1344948267642
> 2012-08-14 20:44:32,291 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_OPENING, server=dw92.kgb.sqa.cm4,60020,1344948267642, 
> region=277b9b6df6de2b9be1353b4fa25f4222
> // 异常的超时
> 2012-08-14 20:44:32,518 INFO org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed out: writetest,VHXYHJN0BL48HMR4DI1L,1344925649429.277b9b6df
> 6de2b9be1353b4fa25f4222. state=OPENING, ts=1344948272279, server=dw92.kgb.sqa.cm4,60020,1344948267642
> 2012-08-14 20:44:32,518 INFO org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPENING for too long, reassigning region=writetest,VHXYHJN0BL48HMR4DI1L,
> 1344925649429.277b9b6df6de2b9be1353b4fa25f4222.
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-6587) Region would be assigned twice in the case of all RS offline

Posted by "Zhihong Ted Yu (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-6587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Zhihong Ted Yu updated HBASE-6587:
----------------------------------

    Fix Version/s: 0.96.0
     Hadoop Flags: Reviewed

Looks good to me.
nit: insert space between } and else, between if and (.
{code}
+        }else if(this.allRegionServersOffline && !noRSAvailable){
{code}
                
> Region would be assigned twice in the case of all RS offline
> ------------------------------------------------------------
>
>                 Key: HBASE-6587
>                 URL: https://issues.apache.org/jira/browse/HBASE-6587
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.94.1
>            Reporter: chunhui shen
>            Assignee: chunhui shen
>             Fix For: 0.96.0
>
>         Attachments: HBASE-6587.patch
>
>
> In the TimeoutMonitor, we would act on time out for the regions if (this.allRegionServersOffline && !noRSAvailable)
> The code is as the following:
> {code}
>  if (regionState.getStamp() + timeout <= now ||
>           (this.allRegionServersOffline && !noRSAvailable)) {
>           //decide on action upon timeout or, if some RSs just came back online, we can start the
>           // the assignment
>           actOnTimeOut(regionState);
>         }
> {code}
> But we found it exists a bug that it would act on time out for the region which was assigned just now , and cause assigning the region twice.
> Master log for the region 277b9b6df6de2b9be1353b4fa25f4222:
> {code}
> 2012-08-14 20:42:54,367 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Unable to determine a plan to assign .META.,,1.1028785192 state=OFFLINE, ts=1
> 344948174367, server=null
> 2012-08-14 20:44:31,640 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: No previous transition plan was found (or we are ignoring an existing plan) for writete
> st,VHXYHJN0BL48HMR4DI1L,1344925649429.277b9b6df6de2b9be1353b4fa25f4222. so generated a random one; hri=writetest,VHXYHJN0BL48HMR4DI1L,1344925649429.277b9b6df6de2b9be13
> 53b4fa25f4222., src=, dest=dw92.kgb.sqa.cm4,60020,1344948267642; 1 (online=1, available=1) available servers
> 2012-08-14 20:44:31,640 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:60000-0x438f53bbf9b0acd Creating (or updating) unassigned node for 277b9b6df6de2b9be13
> 53b4fa25f4222 with OFFLINE state
> 2012-08-14 20:44:31,643 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Assigning region writetest,VHXYHJN0BL48HMR4DI1L,1344925649429.277b9b6df6de2b9be1353b4fa
> 25f4222. to dw92.kgb.sqa.cm4,60020,1344948267642
> 2012-08-14 20:44:32,291 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_OPENING, server=dw92.kgb.sqa.cm4,60020,1344948267642, 
> region=277b9b6df6de2b9be1353b4fa25f4222
> // 异常的超时
> 2012-08-14 20:44:32,518 INFO org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed out: writetest,VHXYHJN0BL48HMR4DI1L,1344925649429.277b9b6df
> 6de2b9be1353b4fa25f4222. state=OPENING, ts=1344948272279, server=dw92.kgb.sqa.cm4,60020,1344948267642
> 2012-08-14 20:44:32,518 INFO org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPENING for too long, reassigning region=writetest,VHXYHJN0BL48HMR4DI1L,
> 1344925649429.277b9b6df6de2b9be1353b4fa25f4222.
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

[jira] [Commented] (HBASE-6587) Region would be assigned twice in the case of all RS offline

Posted by "Hudson (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13436440#comment-13436440 ] 

Hudson commented on HBASE-6587:
-------------------------------

Integrated in HBase-TRUNK-on-Hadoop-2.0.0 #132 (See [https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/132/])
    HBASE-6587 Region would be assigned twice in the case of all RS offline (Chunhui) (Revision 1373829)

     Result = FAILURE
tedyu : 
Files : 
* /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java

                
> Region would be assigned twice in the case of all RS offline
> ------------------------------------------------------------
>
>                 Key: HBASE-6587
>                 URL: https://issues.apache.org/jira/browse/HBASE-6587
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.94.1
>            Reporter: chunhui shen
>            Assignee: chunhui shen
>             Fix For: 0.96.0, 0.94.2
>
>         Attachments: 6587.patch, HBASE-6587.patch
>
>
> In the TimeoutMonitor, we would act on time out for the regions if (this.allRegionServersOffline && !noRSAvailable)
> The code is as the following:
> {code}
>  if (regionState.getStamp() + timeout <= now ||
>           (this.allRegionServersOffline && !noRSAvailable)) {
>           //decide on action upon timeout or, if some RSs just came back online, we can start the
>           // the assignment
>           actOnTimeOut(regionState);
>         }
> {code}
> But we found it exists a bug that it would act on time out for the region which was assigned just now , and cause assigning the region twice.
> Master log for the region 277b9b6df6de2b9be1353b4fa25f4222:
> {code}
> 2012-08-14 20:42:54,367 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Unable to determine a plan to assign .META.,,1.1028785192 state=OFFLINE, ts=1
> 344948174367, server=null
> 2012-08-14 20:44:31,640 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: No previous transition plan was found (or we are ignoring an existing plan) for writete
> st,VHXYHJN0BL48HMR4DI1L,1344925649429.277b9b6df6de2b9be1353b4fa25f4222. so generated a random one; hri=writetest,VHXYHJN0BL48HMR4DI1L,1344925649429.277b9b6df6de2b9be13
> 53b4fa25f4222., src=, dest=dw92.kgb.sqa.cm4,60020,1344948267642; 1 (online=1, available=1) available servers
> 2012-08-14 20:44:31,640 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:60000-0x438f53bbf9b0acd Creating (or updating) unassigned node for 277b9b6df6de2b9be13
> 53b4fa25f4222 with OFFLINE state
> 2012-08-14 20:44:31,643 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Assigning region writetest,VHXYHJN0BL48HMR4DI1L,1344925649429.277b9b6df6de2b9be1353b4fa
> 25f4222. to dw92.kgb.sqa.cm4,60020,1344948267642
> 2012-08-14 20:44:32,291 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_OPENING, server=dw92.kgb.sqa.cm4,60020,1344948267642, 
> region=277b9b6df6de2b9be1353b4fa25f4222
> // 异常的超时
> 2012-08-14 20:44:32,518 INFO org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed out: writetest,VHXYHJN0BL48HMR4DI1L,1344925649429.277b9b6df
> 6de2b9be1353b4fa25f4222. state=OPENING, ts=1344948272279, server=dw92.kgb.sqa.cm4,60020,1344948267642
> 2012-08-14 20:44:32,518 INFO org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPENING for too long, reassigning region=writetest,VHXYHJN0BL48HMR4DI1L,
> 1344925649429.277b9b6df6de2b9be1353b4fa25f4222.
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

[jira] [Updated] (HBASE-6587) Region would be assigned twice in the case of all RS offline

Posted by "Zhihong Ted Yu (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-6587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Zhihong Ted Yu updated HBASE-6587:
----------------------------------

    Attachment: 6587.patch

Patch with minor reformatting.

Going to integrate tomorrow if there is no objection.
                
> Region would be assigned twice in the case of all RS offline
> ------------------------------------------------------------
>
>                 Key: HBASE-6587
>                 URL: https://issues.apache.org/jira/browse/HBASE-6587
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.94.1
>            Reporter: chunhui shen
>            Assignee: chunhui shen
>             Fix For: 0.96.0
>
>         Attachments: 6587.patch, HBASE-6587.patch
>
>
> In the TimeoutMonitor, we would act on time out for the regions if (this.allRegionServersOffline && !noRSAvailable)
> The code is as the following:
> {code}
>  if (regionState.getStamp() + timeout <= now ||
>           (this.allRegionServersOffline && !noRSAvailable)) {
>           //decide on action upon timeout or, if some RSs just came back online, we can start the
>           // the assignment
>           actOnTimeOut(regionState);
>         }
> {code}
> But we found it exists a bug that it would act on time out for the region which was assigned just now , and cause assigning the region twice.
> Master log for the region 277b9b6df6de2b9be1353b4fa25f4222:
> {code}
> 2012-08-14 20:42:54,367 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Unable to determine a plan to assign .META.,,1.1028785192 state=OFFLINE, ts=1
> 344948174367, server=null
> 2012-08-14 20:44:31,640 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: No previous transition plan was found (or we are ignoring an existing plan) for writete
> st,VHXYHJN0BL48HMR4DI1L,1344925649429.277b9b6df6de2b9be1353b4fa25f4222. so generated a random one; hri=writetest,VHXYHJN0BL48HMR4DI1L,1344925649429.277b9b6df6de2b9be13
> 53b4fa25f4222., src=, dest=dw92.kgb.sqa.cm4,60020,1344948267642; 1 (online=1, available=1) available servers
> 2012-08-14 20:44:31,640 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:60000-0x438f53bbf9b0acd Creating (or updating) unassigned node for 277b9b6df6de2b9be13
> 53b4fa25f4222 with OFFLINE state
> 2012-08-14 20:44:31,643 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Assigning region writetest,VHXYHJN0BL48HMR4DI1L,1344925649429.277b9b6df6de2b9be1353b4fa
> 25f4222. to dw92.kgb.sqa.cm4,60020,1344948267642
> 2012-08-14 20:44:32,291 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_OPENING, server=dw92.kgb.sqa.cm4,60020,1344948267642, 
> region=277b9b6df6de2b9be1353b4fa25f4222
> // 异常的超时
> 2012-08-14 20:44:32,518 INFO org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed out: writetest,VHXYHJN0BL48HMR4DI1L,1344925649429.277b9b6df
> 6de2b9be1353b4fa25f4222. state=OPENING, ts=1344948272279, server=dw92.kgb.sqa.cm4,60020,1344948267642
> 2012-08-14 20:44:32,518 INFO org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPENING for too long, reassigning region=writetest,VHXYHJN0BL48HMR4DI1L,
> 1344925649429.277b9b6df6de2b9be1353b4fa25f4222.
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

[jira] [Comment Edited] (HBASE-6587) Region would be assigned twice in the case of all RS offline

Posted by "Zhihong Ted Yu (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13435805#comment-13435805 ] 

Zhihong Ted Yu edited comment on HBASE-6587 at 8/17/12 12:37 AM:
-----------------------------------------------------------------

@ram
{code}
2012-08-14 20:42:54,367 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Unable to determine a plan to assign .META.,,1.1028785192 state=OFFLINE, ts=1
344948174367, server=null
{code}

After the above log, TimeoutMonitor set allRegionServersOffline true

{code}2012-08-14 20:44:31,640 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: No previous transition plan was found (or we are ignoring an existing plan) for writete
st,VHXYHJN0BL48HMR4DI1L,1344925649429.277b9b6df6de2b9be1353b4fa25f4222. so generated a random one; hri=writetest,VHXYHJN0BL48HMR4DI1L,1344925649429.277b9b6df6de2b9be13
53b4fa25f4222., src=, dest=dw92.kgb.sqa.cm4,60020,1344948267642; 1 (online=1, available=1) available {code}

At the 2012-08-14 20:44:31, one server is onlined now, and region 277b9b6df6de2b9be1353b4fa25f4222 is sucessfully assigned.

However, at that time TimeoutMonitor, in th chore(), it would act on time out because the if block
{code}
if (this.allRegionServersOffline && !allRSsOffline) return true;
{code}
So we see the following log
{code}
2012-08-14 20:44:32,518 INFO org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed out: writetest,VHXYHJN0BL48HMR4DI1L,1344925649429.277b9b6df
6de2b9be1353b4fa25f4222. state=OPENING, ts=1344948272279, server=dw92.kgb.sqa.cm4,60020,1344948267642
2012-08-14 20:44:32,518 INFO org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPENING for too long, reassigning region=writetest,VHXYHJN0BL48HMR4DI1L,
1344925649429.277b9b6df6de2b9be1353b4fa25f4222.
{code}

The region is assigned at the time 2012-08-14 20:44:31, but is timed out by TimeoutMonitor at the time 2012-08-14 20:44:32. 
It cause the collision by two assign thread,
And the result is that the region is onlined after 30mins.
                
      was (Author: zjushch):
    @ram
{code}
2012-08-14 20:42:54,367 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Unable to determine a plan to assign .META.,,1.1028785192 state=OFFLINE, ts=1
344948174367, server=null
{code}

After the above log, TimeoutMonitor set allRegionServersOffline true

{code}2012-08-14 20:44:31,640 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: No previous transition plan was found (or we are ignoring an existing plan) for writete
st,VHXYHJN0BL48HMR4DI1L,1344925649429.277b9b6df6de2b9be1353b4fa25f4222. so generated a random one; hri=writetest,VHXYHJN0BL48HMR4DI1L,1344925649429.277b9b6df6de2b9be13
53b4fa25f4222., src=, dest=dw92.kgb.sqa.cm4,60020,1344948267642; 1 (online=1, available=1) available {code}

At the 2012-08-14 20:44:31, one server is onlined now, and region 277b9b6df6de2b9be1353b4fa25f4222 is sucessfully assigned.

However, at that time TimeoutMonitor, in th chore(), it would act on time out because the if block {
code}if (this.allRegionServersOffline && !allRSsOffline){code} return true;

So we see the following log
{code}2012-08-14 20:44:32,518 INFO org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed out: writetest,VHXYHJN0BL48HMR4DI1L,1344925649429.277b9b6df
6de2b9be1353b4fa25f4222. state=OPENING, ts=1344948272279, server=dw92.kgb.sqa.cm4,60020,1344948267642
2012-08-14 20:44:32,518 INFO org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPENING for too long, reassigning region=writetest,VHXYHJN0BL48HMR4DI1L,
1344925649429.277b9b6df6de2b9be1353b4fa25f4222.
{code}

The region is assigned at the time 2012-08-14 20:44:31, but is timed out by TimeoutMonitor at the time 2012-08-14 20:44:32. 
It cause the collision by two assign thread,
And the result is that the region is onlined after 30mins.
                  
> Region would be assigned twice in the case of all RS offline
> ------------------------------------------------------------
>
>                 Key: HBASE-6587
>                 URL: https://issues.apache.org/jira/browse/HBASE-6587
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.94.1
>            Reporter: chunhui shen
>            Assignee: chunhui shen
>             Fix For: 0.96.0
>
>         Attachments: 6587.patch, HBASE-6587.patch
>
>
> In the TimeoutMonitor, we would act on time out for the regions if (this.allRegionServersOffline && !noRSAvailable)
> The code is as the following:
> {code}
>  if (regionState.getStamp() + timeout <= now ||
>           (this.allRegionServersOffline && !noRSAvailable)) {
>           //decide on action upon timeout or, if some RSs just came back online, we can start the
>           // the assignment
>           actOnTimeOut(regionState);
>         }
> {code}
> But we found it exists a bug that it would act on time out for the region which was assigned just now , and cause assigning the region twice.
> Master log for the region 277b9b6df6de2b9be1353b4fa25f4222:
> {code}
> 2012-08-14 20:42:54,367 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Unable to determine a plan to assign .META.,,1.1028785192 state=OFFLINE, ts=1
> 344948174367, server=null
> 2012-08-14 20:44:31,640 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: No previous transition plan was found (or we are ignoring an existing plan) for writete
> st,VHXYHJN0BL48HMR4DI1L,1344925649429.277b9b6df6de2b9be1353b4fa25f4222. so generated a random one; hri=writetest,VHXYHJN0BL48HMR4DI1L,1344925649429.277b9b6df6de2b9be13
> 53b4fa25f4222., src=, dest=dw92.kgb.sqa.cm4,60020,1344948267642; 1 (online=1, available=1) available servers
> 2012-08-14 20:44:31,640 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:60000-0x438f53bbf9b0acd Creating (or updating) unassigned node for 277b9b6df6de2b9be13
> 53b4fa25f4222 with OFFLINE state
> 2012-08-14 20:44:31,643 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Assigning region writetest,VHXYHJN0BL48HMR4DI1L,1344925649429.277b9b6df6de2b9be1353b4fa
> 25f4222. to dw92.kgb.sqa.cm4,60020,1344948267642
> 2012-08-14 20:44:32,291 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_OPENING, server=dw92.kgb.sqa.cm4,60020,1344948267642, 
> region=277b9b6df6de2b9be1353b4fa25f4222
> // 异常的超时
> 2012-08-14 20:44:32,518 INFO org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed out: writetest,VHXYHJN0BL48HMR4DI1L,1344925649429.277b9b6df
> 6de2b9be1353b4fa25f4222. state=OPENING, ts=1344948272279, server=dw92.kgb.sqa.cm4,60020,1344948267642
> 2012-08-14 20:44:32,518 INFO org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPENING for too long, reassigning region=writetest,VHXYHJN0BL48HMR4DI1L,
> 1344925649429.277b9b6df6de2b9be1353b4fa25f4222.
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

[jira] [Commented] (HBASE-6587) Region would be assigned twice in the case of all RS offline

Posted by "Lars Hofhansl (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13445101#comment-13445101 ] 

Lars Hofhansl commented on HBASE-6587:
--------------------------------------

Going to commit the 0.94 patch soon. It is just a backport of the trunk patch.
                
> Region would be assigned twice in the case of all RS offline
> ------------------------------------------------------------
>
>                 Key: HBASE-6587
>                 URL: https://issues.apache.org/jira/browse/HBASE-6587
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.94.1
>            Reporter: chunhui shen
>            Assignee: chunhui shen
>             Fix For: 0.96.0, 0.94.2
>
>         Attachments: 6587-0.94.patch, 6587.patch, HBASE-6587.patch
>
>
> In the TimeoutMonitor, we would act on time out for the regions if (this.allRegionServersOffline && !noRSAvailable)
> The code is as the following:
> {code}
>  if (regionState.getStamp() + timeout <= now ||
>           (this.allRegionServersOffline && !noRSAvailable)) {
>           //decide on action upon timeout or, if some RSs just came back online, we can start the
>           // the assignment
>           actOnTimeOut(regionState);
>         }
> {code}
> But we found it exists a bug that it would act on time out for the region which was assigned just now , and cause assigning the region twice.
> Master log for the region 277b9b6df6de2b9be1353b4fa25f4222:
> {code}
> 2012-08-14 20:42:54,367 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Unable to determine a plan to assign .META.,,1.1028785192 state=OFFLINE, ts=1
> 344948174367, server=null
> 2012-08-14 20:44:31,640 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: No previous transition plan was found (or we are ignoring an existing plan) for writete
> st,VHXYHJN0BL48HMR4DI1L,1344925649429.277b9b6df6de2b9be1353b4fa25f4222. so generated a random one; hri=writetest,VHXYHJN0BL48HMR4DI1L,1344925649429.277b9b6df6de2b9be13
> 53b4fa25f4222., src=, dest=dw92.kgb.sqa.cm4,60020,1344948267642; 1 (online=1, available=1) available servers
> 2012-08-14 20:44:31,640 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:60000-0x438f53bbf9b0acd Creating (or updating) unassigned node for 277b9b6df6de2b9be13
> 53b4fa25f4222 with OFFLINE state
> 2012-08-14 20:44:31,643 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Assigning region writetest,VHXYHJN0BL48HMR4DI1L,1344925649429.277b9b6df6de2b9be1353b4fa
> 25f4222. to dw92.kgb.sqa.cm4,60020,1344948267642
> 2012-08-14 20:44:32,291 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_OPENING, server=dw92.kgb.sqa.cm4,60020,1344948267642, 
> region=277b9b6df6de2b9be1353b4fa25f4222
> // 异常的超时
> 2012-08-14 20:44:32,518 INFO org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed out: writetest,VHXYHJN0BL48HMR4DI1L,1344925649429.277b9b6df
> 6de2b9be1353b4fa25f4222. state=OPENING, ts=1344948272279, server=dw92.kgb.sqa.cm4,60020,1344948267642
> 2012-08-14 20:44:32,518 INFO org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPENING for too long, reassigning region=writetest,VHXYHJN0BL48HMR4DI1L,
> 1344925649429.277b9b6df6de2b9be1353b4fa25f4222.
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6587) Region would be assigned twice in the case of all RS offline

Posted by "ramkrishna.s.vasudevan (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13435772#comment-13435772 ] 

ramkrishna.s.vasudevan commented on HBASE-6587:
-----------------------------------------------

@Chunhui
The intention of your soln is valid.  but few questions just to clarify the scenario
First time when the assignment started
{code}
2012-08-14 20:44:31,640 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: No previous transition plan was found (or we are ignoring an existing plan) for writete
st,VHXYHJN0BL48HMR4DI1L,1344925649429.277b9b6df6de2b9be1353b4fa25f4222. so generated a random one; hri=writetest,VHXYHJN0BL48HMR4DI1L,1344925649429.277b9b6df6de2b9be13
53b4fa25f4222., src=, dest=dw92.kgb.sqa.cm4,60020,1344948267642; 1 (online=1, available=1) available servers
{code}
there was atleast one server right.  
Then when the timeout monitor thread saw that there were no region server  online and that the flag allRegionServersOffline should be set to true.
In this case the prev assignment has already failed right as there is no RS.  I am sure am missing something here.  Can you tel me how the double assignment happened?
                
> Region would be assigned twice in the case of all RS offline
> ------------------------------------------------------------
>
>                 Key: HBASE-6587
>                 URL: https://issues.apache.org/jira/browse/HBASE-6587
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.94.1
>            Reporter: chunhui shen
>            Assignee: chunhui shen
>             Fix For: 0.96.0
>
>         Attachments: 6587.patch, HBASE-6587.patch
>
>
> In the TimeoutMonitor, we would act on time out for the regions if (this.allRegionServersOffline && !noRSAvailable)
> The code is as the following:
> {code}
>  if (regionState.getStamp() + timeout <= now ||
>           (this.allRegionServersOffline && !noRSAvailable)) {
>           //decide on action upon timeout or, if some RSs just came back online, we can start the
>           // the assignment
>           actOnTimeOut(regionState);
>         }
> {code}
> But we found it exists a bug that it would act on time out for the region which was assigned just now , and cause assigning the region twice.
> Master log for the region 277b9b6df6de2b9be1353b4fa25f4222:
> {code}
> 2012-08-14 20:42:54,367 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Unable to determine a plan to assign .META.,,1.1028785192 state=OFFLINE, ts=1
> 344948174367, server=null
> 2012-08-14 20:44:31,640 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: No previous transition plan was found (or we are ignoring an existing plan) for writete
> st,VHXYHJN0BL48HMR4DI1L,1344925649429.277b9b6df6de2b9be1353b4fa25f4222. so generated a random one; hri=writetest,VHXYHJN0BL48HMR4DI1L,1344925649429.277b9b6df6de2b9be13
> 53b4fa25f4222., src=, dest=dw92.kgb.sqa.cm4,60020,1344948267642; 1 (online=1, available=1) available servers
> 2012-08-14 20:44:31,640 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:60000-0x438f53bbf9b0acd Creating (or updating) unassigned node for 277b9b6df6de2b9be13
> 53b4fa25f4222 with OFFLINE state
> 2012-08-14 20:44:31,643 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Assigning region writetest,VHXYHJN0BL48HMR4DI1L,1344925649429.277b9b6df6de2b9be1353b4fa
> 25f4222. to dw92.kgb.sqa.cm4,60020,1344948267642
> 2012-08-14 20:44:32,291 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_OPENING, server=dw92.kgb.sqa.cm4,60020,1344948267642, 
> region=277b9b6df6de2b9be1353b4fa25f4222
> // 异常的超时
> 2012-08-14 20:44:32,518 INFO org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed out: writetest,VHXYHJN0BL48HMR4DI1L,1344925649429.277b9b6df
> 6de2b9be1353b4fa25f4222. state=OPENING, ts=1344948272279, server=dw92.kgb.sqa.cm4,60020,1344948267642
> 2012-08-14 20:44:32,518 INFO org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPENING for too long, reassigning region=writetest,VHXYHJN0BL48HMR4DI1L,
> 1344925649429.277b9b6df6de2b9be1353b4fa25f4222.
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

[jira] [Commented] (HBASE-6587) Region would be assigned twice in the case of all RS offline

Posted by "Lars Hofhansl (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13435999#comment-13435999 ] 

Lars Hofhansl commented on HBASE-6587:
--------------------------------------

Looks like a worthy addition to 0.94 as well
                
> Region would be assigned twice in the case of all RS offline
> ------------------------------------------------------------
>
>                 Key: HBASE-6587
>                 URL: https://issues.apache.org/jira/browse/HBASE-6587
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.94.1
>            Reporter: chunhui shen
>            Assignee: chunhui shen
>             Fix For: 0.96.0
>
>         Attachments: 6587.patch, HBASE-6587.patch
>
>
> In the TimeoutMonitor, we would act on time out for the regions if (this.allRegionServersOffline && !noRSAvailable)
> The code is as the following:
> {code}
>  if (regionState.getStamp() + timeout <= now ||
>           (this.allRegionServersOffline && !noRSAvailable)) {
>           //decide on action upon timeout or, if some RSs just came back online, we can start the
>           // the assignment
>           actOnTimeOut(regionState);
>         }
> {code}
> But we found it exists a bug that it would act on time out for the region which was assigned just now , and cause assigning the region twice.
> Master log for the region 277b9b6df6de2b9be1353b4fa25f4222:
> {code}
> 2012-08-14 20:42:54,367 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Unable to determine a plan to assign .META.,,1.1028785192 state=OFFLINE, ts=1
> 344948174367, server=null
> 2012-08-14 20:44:31,640 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: No previous transition plan was found (or we are ignoring an existing plan) for writete
> st,VHXYHJN0BL48HMR4DI1L,1344925649429.277b9b6df6de2b9be1353b4fa25f4222. so generated a random one; hri=writetest,VHXYHJN0BL48HMR4DI1L,1344925649429.277b9b6df6de2b9be13
> 53b4fa25f4222., src=, dest=dw92.kgb.sqa.cm4,60020,1344948267642; 1 (online=1, available=1) available servers
> 2012-08-14 20:44:31,640 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:60000-0x438f53bbf9b0acd Creating (or updating) unassigned node for 277b9b6df6de2b9be13
> 53b4fa25f4222 with OFFLINE state
> 2012-08-14 20:44:31,643 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Assigning region writetest,VHXYHJN0BL48HMR4DI1L,1344925649429.277b9b6df6de2b9be1353b4fa
> 25f4222. to dw92.kgb.sqa.cm4,60020,1344948267642
> 2012-08-14 20:44:32,291 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_OPENING, server=dw92.kgb.sqa.cm4,60020,1344948267642, 
> region=277b9b6df6de2b9be1353b4fa25f4222
> // 异常的超时
> 2012-08-14 20:44:32,518 INFO org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed out: writetest,VHXYHJN0BL48HMR4DI1L,1344925649429.277b9b6df
> 6de2b9be1353b4fa25f4222. state=OPENING, ts=1344948272279, server=dw92.kgb.sqa.cm4,60020,1344948267642
> 2012-08-14 20:44:32,518 INFO org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPENING for too long, reassigning region=writetest,VHXYHJN0BL48HMR4DI1L,
> 1344925649429.277b9b6df6de2b9be1353b4fa25f4222.
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

[jira] [Commented] (HBASE-6587) Region would be assigned twice in the case of all RS offline

Posted by "chunhui shen (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13435805#comment-13435805 ] 

chunhui shen commented on HBASE-6587:
-------------------------------------

@ram
{code}
2012-08-14 20:42:54,367 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Unable to determine a plan to assign .META.,,1.1028785192 state=OFFLINE, ts=1
344948174367, server=null
{code}

After the above log, TimeoutMonitor set allRegionServersOffline true

{code}2012-08-14 20:44:31,640 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: No previous transition plan was found (or we are ignoring an existing plan) for writete
st,VHXYHJN0BL48HMR4DI1L,1344925649429.277b9b6df6de2b9be1353b4fa25f4222. so generated a random one; hri=writetest,VHXYHJN0BL48HMR4DI1L,1344925649429.277b9b6df6de2b9be13
53b4fa25f4222., src=, dest=dw92.kgb.sqa.cm4,60020,1344948267642; 1 (online=1, available=1) available {code}

At the 2012-08-14 20:44:31, one server is onlined now, and region 277b9b6df6de2b9be1353b4fa25f4222 is sucessfully assigned.

However, at that time TimeoutMonitor, in th chore(), it would act on time out because the if block {
code}if (this.allRegionServersOffline && !allRSsOffline){code} return true;

So we see the following log
{code}2012-08-14 20:44:32,518 INFO org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed out: writetest,VHXYHJN0BL48HMR4DI1L,1344925649429.277b9b6df
6de2b9be1353b4fa25f4222. state=OPENING, ts=1344948272279, server=dw92.kgb.sqa.cm4,60020,1344948267642
2012-08-14 20:44:32,518 INFO org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPENING for too long, reassigning region=writetest,VHXYHJN0BL48HMR4DI1L,
1344925649429.277b9b6df6de2b9be1353b4fa25f4222.
{code}

The region is assigned at the time 2012-08-14 20:44:31, but is timed out by TimeoutMonitor at the time 2012-08-14 20:44:32. 
It cause the collision by two assign thread,
And the result is that the region is onlined after 30mins.
                
> Region would be assigned twice in the case of all RS offline
> ------------------------------------------------------------
>
>                 Key: HBASE-6587
>                 URL: https://issues.apache.org/jira/browse/HBASE-6587
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.94.1
>            Reporter: chunhui shen
>            Assignee: chunhui shen
>             Fix For: 0.96.0
>
>         Attachments: 6587.patch, HBASE-6587.patch
>
>
> In the TimeoutMonitor, we would act on time out for the regions if (this.allRegionServersOffline && !noRSAvailable)
> The code is as the following:
> {code}
>  if (regionState.getStamp() + timeout <= now ||
>           (this.allRegionServersOffline && !noRSAvailable)) {
>           //decide on action upon timeout or, if some RSs just came back online, we can start the
>           // the assignment
>           actOnTimeOut(regionState);
>         }
> {code}
> But we found it exists a bug that it would act on time out for the region which was assigned just now , and cause assigning the region twice.
> Master log for the region 277b9b6df6de2b9be1353b4fa25f4222:
> {code}
> 2012-08-14 20:42:54,367 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Unable to determine a plan to assign .META.,,1.1028785192 state=OFFLINE, ts=1
> 344948174367, server=null
> 2012-08-14 20:44:31,640 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: No previous transition plan was found (or we are ignoring an existing plan) for writete
> st,VHXYHJN0BL48HMR4DI1L,1344925649429.277b9b6df6de2b9be1353b4fa25f4222. so generated a random one; hri=writetest,VHXYHJN0BL48HMR4DI1L,1344925649429.277b9b6df6de2b9be13
> 53b4fa25f4222., src=, dest=dw92.kgb.sqa.cm4,60020,1344948267642; 1 (online=1, available=1) available servers
> 2012-08-14 20:44:31,640 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:60000-0x438f53bbf9b0acd Creating (or updating) unassigned node for 277b9b6df6de2b9be13
> 53b4fa25f4222 with OFFLINE state
> 2012-08-14 20:44:31,643 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Assigning region writetest,VHXYHJN0BL48HMR4DI1L,1344925649429.277b9b6df6de2b9be1353b4fa
> 25f4222. to dw92.kgb.sqa.cm4,60020,1344948267642
> 2012-08-14 20:44:32,291 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_OPENING, server=dw92.kgb.sqa.cm4,60020,1344948267642, 
> region=277b9b6df6de2b9be1353b4fa25f4222
> // 异常的超时
> 2012-08-14 20:44:32,518 INFO org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed out: writetest,VHXYHJN0BL48HMR4DI1L,1344925649429.277b9b6df
> 6de2b9be1353b4fa25f4222. state=OPENING, ts=1344948272279, server=dw92.kgb.sqa.cm4,60020,1344948267642
> 2012-08-14 20:44:32,518 INFO org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPENING for too long, reassigning region=writetest,VHXYHJN0BL48HMR4DI1L,
> 1344925649429.277b9b6df6de2b9be1353b4fa25f4222.
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

[jira] [Commented] (HBASE-6587) Region would be assigned twice in the case of all RS offline

Posted by "Hudson (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13448264#comment-13448264 ] 

Hudson commented on HBASE-6587:
-------------------------------

Integrated in HBase-0.94-security #51 (See [https://builds.apache.org/job/HBase-0.94-security/51/])
    HBASE-6587 Region would be assigned twice in the case of all RS offline (Revision 1379242)

     Result = FAILURE
larsh : 
Files : 
* /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java

                
> Region would be assigned twice in the case of all RS offline
> ------------------------------------------------------------
>
>                 Key: HBASE-6587
>                 URL: https://issues.apache.org/jira/browse/HBASE-6587
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.94.1
>            Reporter: chunhui shen
>            Assignee: chunhui shen
>             Fix For: 0.96.0, 0.94.2
>
>         Attachments: 6587-0.94.patch, 6587.patch, HBASE-6587.patch
>
>
> In the TimeoutMonitor, we would act on time out for the regions if (this.allRegionServersOffline && !noRSAvailable)
> The code is as the following:
> {code}
>  if (regionState.getStamp() + timeout <= now ||
>           (this.allRegionServersOffline && !noRSAvailable)) {
>           //decide on action upon timeout or, if some RSs just came back online, we can start the
>           // the assignment
>           actOnTimeOut(regionState);
>         }
> {code}
> But we found it exists a bug that it would act on time out for the region which was assigned just now , and cause assigning the region twice.
> Master log for the region 277b9b6df6de2b9be1353b4fa25f4222:
> {code}
> 2012-08-14 20:42:54,367 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Unable to determine a plan to assign .META.,,1.1028785192 state=OFFLINE, ts=1
> 344948174367, server=null
> 2012-08-14 20:44:31,640 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: No previous transition plan was found (or we are ignoring an existing plan) for writete
> st,VHXYHJN0BL48HMR4DI1L,1344925649429.277b9b6df6de2b9be1353b4fa25f4222. so generated a random one; hri=writetest,VHXYHJN0BL48HMR4DI1L,1344925649429.277b9b6df6de2b9be13
> 53b4fa25f4222., src=, dest=dw92.kgb.sqa.cm4,60020,1344948267642; 1 (online=1, available=1) available servers
> 2012-08-14 20:44:31,640 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:60000-0x438f53bbf9b0acd Creating (or updating) unassigned node for 277b9b6df6de2b9be13
> 53b4fa25f4222 with OFFLINE state
> 2012-08-14 20:44:31,643 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Assigning region writetest,VHXYHJN0BL48HMR4DI1L,1344925649429.277b9b6df6de2b9be1353b4fa
> 25f4222. to dw92.kgb.sqa.cm4,60020,1344948267642
> 2012-08-14 20:44:32,291 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_OPENING, server=dw92.kgb.sqa.cm4,60020,1344948267642, 
> region=277b9b6df6de2b9be1353b4fa25f4222
> // 异常的超时
> 2012-08-14 20:44:32,518 INFO org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed out: writetest,VHXYHJN0BL48HMR4DI1L,1344925649429.277b9b6df
> 6de2b9be1353b4fa25f4222. state=OPENING, ts=1344948272279, server=dw92.kgb.sqa.cm4,60020,1344948267642
> 2012-08-14 20:44:32,518 INFO org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPENING for too long, reassigning region=writetest,VHXYHJN0BL48HMR4DI1L,
> 1344925649429.277b9b6df6de2b9be1353b4fa25f4222.
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6587) Region would be assigned twice in the case of all RS offline

Posted by "ramkrishna.s.vasudevan (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13435836#comment-13435836 ] 

ramkrishna.s.vasudevan commented on HBASE-6587:
-----------------------------------------------

Thanks Chunhui.  I am +1.
                
> Region would be assigned twice in the case of all RS offline
> ------------------------------------------------------------
>
>                 Key: HBASE-6587
>                 URL: https://issues.apache.org/jira/browse/HBASE-6587
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.94.1
>            Reporter: chunhui shen
>            Assignee: chunhui shen
>             Fix For: 0.96.0
>
>         Attachments: 6587.patch, HBASE-6587.patch
>
>
> In the TimeoutMonitor, we would act on time out for the regions if (this.allRegionServersOffline && !noRSAvailable)
> The code is as the following:
> {code}
>  if (regionState.getStamp() + timeout <= now ||
>           (this.allRegionServersOffline && !noRSAvailable)) {
>           //decide on action upon timeout or, if some RSs just came back online, we can start the
>           // the assignment
>           actOnTimeOut(regionState);
>         }
> {code}
> But we found it exists a bug that it would act on time out for the region which was assigned just now , and cause assigning the region twice.
> Master log for the region 277b9b6df6de2b9be1353b4fa25f4222:
> {code}
> 2012-08-14 20:42:54,367 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Unable to determine a plan to assign .META.,,1.1028785192 state=OFFLINE, ts=1
> 344948174367, server=null
> 2012-08-14 20:44:31,640 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: No previous transition plan was found (or we are ignoring an existing plan) for writete
> st,VHXYHJN0BL48HMR4DI1L,1344925649429.277b9b6df6de2b9be1353b4fa25f4222. so generated a random one; hri=writetest,VHXYHJN0BL48HMR4DI1L,1344925649429.277b9b6df6de2b9be13
> 53b4fa25f4222., src=, dest=dw92.kgb.sqa.cm4,60020,1344948267642; 1 (online=1, available=1) available servers
> 2012-08-14 20:44:31,640 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:60000-0x438f53bbf9b0acd Creating (or updating) unassigned node for 277b9b6df6de2b9be13
> 53b4fa25f4222 with OFFLINE state
> 2012-08-14 20:44:31,643 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Assigning region writetest,VHXYHJN0BL48HMR4DI1L,1344925649429.277b9b6df6de2b9be1353b4fa
> 25f4222. to dw92.kgb.sqa.cm4,60020,1344948267642
> 2012-08-14 20:44:32,291 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_OPENING, server=dw92.kgb.sqa.cm4,60020,1344948267642, 
> region=277b9b6df6de2b9be1353b4fa25f4222
> // 异常的超时
> 2012-08-14 20:44:32,518 INFO org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed out: writetest,VHXYHJN0BL48HMR4DI1L,1344925649429.277b9b6df
> 6de2b9be1353b4fa25f4222. state=OPENING, ts=1344948272279, server=dw92.kgb.sqa.cm4,60020,1344948267642
> 2012-08-14 20:44:32,518 INFO org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPENING for too long, reassigning region=writetest,VHXYHJN0BL48HMR4DI1L,
> 1344925649429.277b9b6df6de2b9be1353b4fa25f4222.
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

[jira] [Commented] (HBASE-6587) Region would be assigned twice in the case of all RS offline

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13444640#comment-13444640 ] 

Hadoop QA commented on HBASE-6587:
----------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12543028/6587-0.94.patch
  against trunk revision .

    +1 @author.  The patch does not contain any @author tags.

    -1 tests included.  The patch doesn't appear to include any new or modified tests.
                        Please justify why no new tests are needed for this patch.
                        Also please list what manual steps were performed to verify this patch.

    -1 patch.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/2736//console

This message is automatically generated.
                
> Region would be assigned twice in the case of all RS offline
> ------------------------------------------------------------
>
>                 Key: HBASE-6587
>                 URL: https://issues.apache.org/jira/browse/HBASE-6587
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.94.1
>            Reporter: chunhui shen
>            Assignee: chunhui shen
>             Fix For: 0.96.0, 0.94.2
>
>         Attachments: 6587-0.94.patch, 6587.patch, HBASE-6587.patch
>
>
> In the TimeoutMonitor, we would act on time out for the regions if (this.allRegionServersOffline && !noRSAvailable)
> The code is as the following:
> {code}
>  if (regionState.getStamp() + timeout <= now ||
>           (this.allRegionServersOffline && !noRSAvailable)) {
>           //decide on action upon timeout or, if some RSs just came back online, we can start the
>           // the assignment
>           actOnTimeOut(regionState);
>         }
> {code}
> But we found it exists a bug that it would act on time out for the region which was assigned just now , and cause assigning the region twice.
> Master log for the region 277b9b6df6de2b9be1353b4fa25f4222:
> {code}
> 2012-08-14 20:42:54,367 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Unable to determine a plan to assign .META.,,1.1028785192 state=OFFLINE, ts=1
> 344948174367, server=null
> 2012-08-14 20:44:31,640 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: No previous transition plan was found (or we are ignoring an existing plan) for writete
> st,VHXYHJN0BL48HMR4DI1L,1344925649429.277b9b6df6de2b9be1353b4fa25f4222. so generated a random one; hri=writetest,VHXYHJN0BL48HMR4DI1L,1344925649429.277b9b6df6de2b9be13
> 53b4fa25f4222., src=, dest=dw92.kgb.sqa.cm4,60020,1344948267642; 1 (online=1, available=1) available servers
> 2012-08-14 20:44:31,640 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:60000-0x438f53bbf9b0acd Creating (or updating) unassigned node for 277b9b6df6de2b9be13
> 53b4fa25f4222 with OFFLINE state
> 2012-08-14 20:44:31,643 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Assigning region writetest,VHXYHJN0BL48HMR4DI1L,1344925649429.277b9b6df6de2b9be1353b4fa
> 25f4222. to dw92.kgb.sqa.cm4,60020,1344948267642
> 2012-08-14 20:44:32,291 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_OPENING, server=dw92.kgb.sqa.cm4,60020,1344948267642, 
> region=277b9b6df6de2b9be1353b4fa25f4222
> // 异常的超时
> 2012-08-14 20:44:32,518 INFO org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed out: writetest,VHXYHJN0BL48HMR4DI1L,1344925649429.277b9b6df
> 6de2b9be1353b4fa25f4222. state=OPENING, ts=1344948272279, server=dw92.kgb.sqa.cm4,60020,1344948267642
> 2012-08-14 20:44:32,518 INFO org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPENING for too long, reassigning region=writetest,VHXYHJN0BL48HMR4DI1L,
> 1344925649429.277b9b6df6de2b9be1353b4fa25f4222.
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6587) Region would be assigned twice in the case of all RS offline

Posted by "Hudson (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-6587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13448353#comment-13448353 ] 

Hudson commented on HBASE-6587:
-------------------------------

Integrated in HBase-0.94-security-on-Hadoop-23 #7 (See [https://builds.apache.org/job/HBase-0.94-security-on-Hadoop-23/7/])
    HBASE-6587 Region would be assigned twice in the case of all RS offline (Revision 1379242)

     Result = FAILURE
larsh : 
Files : 
* /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java

                
> Region would be assigned twice in the case of all RS offline
> ------------------------------------------------------------
>
>                 Key: HBASE-6587
>                 URL: https://issues.apache.org/jira/browse/HBASE-6587
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.94.1
>            Reporter: chunhui shen
>            Assignee: chunhui shen
>             Fix For: 0.96.0, 0.94.2
>
>         Attachments: 6587-0.94.patch, 6587.patch, HBASE-6587.patch
>
>
> In the TimeoutMonitor, we would act on time out for the regions if (this.allRegionServersOffline && !noRSAvailable)
> The code is as the following:
> {code}
>  if (regionState.getStamp() + timeout <= now ||
>           (this.allRegionServersOffline && !noRSAvailable)) {
>           //decide on action upon timeout or, if some RSs just came back online, we can start the
>           // the assignment
>           actOnTimeOut(regionState);
>         }
> {code}
> But we found it exists a bug that it would act on time out for the region which was assigned just now , and cause assigning the region twice.
> Master log for the region 277b9b6df6de2b9be1353b4fa25f4222:
> {code}
> 2012-08-14 20:42:54,367 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Unable to determine a plan to assign .META.,,1.1028785192 state=OFFLINE, ts=1
> 344948174367, server=null
> 2012-08-14 20:44:31,640 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: No previous transition plan was found (or we are ignoring an existing plan) for writete
> st,VHXYHJN0BL48HMR4DI1L,1344925649429.277b9b6df6de2b9be1353b4fa25f4222. so generated a random one; hri=writetest,VHXYHJN0BL48HMR4DI1L,1344925649429.277b9b6df6de2b9be13
> 53b4fa25f4222., src=, dest=dw92.kgb.sqa.cm4,60020,1344948267642; 1 (online=1, available=1) available servers
> 2012-08-14 20:44:31,640 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:60000-0x438f53bbf9b0acd Creating (or updating) unassigned node for 277b9b6df6de2b9be13
> 53b4fa25f4222 with OFFLINE state
> 2012-08-14 20:44:31,643 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Assigning region writetest,VHXYHJN0BL48HMR4DI1L,1344925649429.277b9b6df6de2b9be1353b4fa
> 25f4222. to dw92.kgb.sqa.cm4,60020,1344948267642
> 2012-08-14 20:44:32,291 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_OPENING, server=dw92.kgb.sqa.cm4,60020,1344948267642, 
> region=277b9b6df6de2b9be1353b4fa25f4222
> // 异常的超时
> 2012-08-14 20:44:32,518 INFO org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed out: writetest,VHXYHJN0BL48HMR4DI1L,1344925649429.277b9b6df
> 6de2b9be1353b4fa25f4222. state=OPENING, ts=1344948272279, server=dw92.kgb.sqa.cm4,60020,1344948267642
> 2012-08-14 20:44:32,518 INFO org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPENING for too long, reassigning region=writetest,VHXYHJN0BL48HMR4DI1L,
> 1344925649429.277b9b6df6de2b9be1353b4fa25f4222.
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira