You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@hbase.apache.org by "Todd Lipcon (JIRA)" <ji...@apache.org> on 2011/01/04 02:31:49 UTC

[jira] Created: (HBASE-3406) Region stuck in transition after RS failed while opening

Region stuck in transition after RS failed while opening
--------------------------------------------------------

                 Key: HBASE-3406
                 URL: https://issues.apache.org/jira/browse/HBASE-3406
             Project: HBase
          Issue Type: Bug
    Affects Versions: 0.90.0
            Reporter: Todd Lipcon
            Priority: Critical
             Fix For: 0.90.0


I had a RS fail due to GC pause while it was in the midst of opening a region, apparently. This got the region stuck in the following repeating sequence in the master log:

2011-01-03 17:24:33,884 INFO org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPENING for too long, reassigning region=usertable,user991629466,1293747979500.c6a54b4d07a44e113b3a4d2ab22daa70.
2011-01-03 17:24:33,885 DEBUG org.apache.hadoop.hbase.zookeeper.ZKUtil: master:60000-0x12ce26f6c0600e3 Retrieved 113 byte(s) of data from znode /hbase/unassigned/c6a54b4d07a44e113b3a4d2ab22daa70; data=region=usertable,user991629466,1293747979500.c6a54b4d07a44e113b3a4d2ab22daa70., server=haus03.sf.cloudera.com:60000, state=M_ZK_REGION_OFFLINE
2011-01-03 17:24:43,886 INFO org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed out:  usertable,user991629466,1293747979500.c6a54b4d07a44e113b3a4d2ab22daa70. state=OPENING, ts=1293840977790
2011-01-03 17:24:43,886 INFO org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPENING for too long, reassigning region=usertable,user991629466,1293747979500.c6a54b4d07a44e113b3a4d2ab22daa70.
2011-01-03 17:24:43,887 DEBUG org.apache.hadoop.hbase.zookeeper.ZKUtil: master:60000-0x12ce26f6c0600e3 Retrieved 113 byte(s) of data from znode /hbase/unassigned/c6a54b4d07a44e113b3a4d2ab22daa70; data=region=usertable,user991629466,1293747979500.c6a54b4d07a44e113b3a4d2ab22daa70., server=haus03.sf.cloudera.com:60000, state=M_ZK_REGION_OFFLINE

etc... repeating every 10 seconds. Eventually I ran hbck -fix which forced it to OFFLINE in ZK and it reassigned just fine.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HBASE-3406) Region stuck in transition after RS failed while opening

Posted by "stack (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-3406?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

stack updated HBASE-3406:
-------------------------


I agree this is critical issue for 0.90.2.

> Region stuck in transition after RS failed while opening
> --------------------------------------------------------
>
>                 Key: HBASE-3406
>                 URL: https://issues.apache.org/jira/browse/HBASE-3406
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.0
>            Reporter: Todd Lipcon
>            Priority: Critical
>             Fix For: 0.90.2
>
>
> I had a RS fail due to GC pause while it was in the midst of opening a region, apparently. This got the region stuck in the following repeating sequence in the master log:
> 2011-01-03 17:24:33,884 INFO org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPENING for too long, reassigning region=usertable,user991629466,1293747979500.c6a54b4d07a44e113b3a4d2ab22daa70.
> 2011-01-03 17:24:33,885 DEBUG org.apache.hadoop.hbase.zookeeper.ZKUtil: master:60000-0x12ce26f6c0600e3 Retrieved 113 byte(s) of data from znode /hbase/unassigned/c6a54b4d07a44e113b3a4d2ab22daa70; data=region=usertable,user991629466,1293747979500.c6a54b4d07a44e113b3a4d2ab22daa70., server=haus03.sf.cloudera.com:60000, state=M_ZK_REGION_OFFLINE
> 2011-01-03 17:24:43,886 INFO org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed out:  usertable,user991629466,1293747979500.c6a54b4d07a44e113b3a4d2ab22daa70. state=OPENING, ts=1293840977790
> 2011-01-03 17:24:43,886 INFO org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPENING for too long, reassigning region=usertable,user991629466,1293747979500.c6a54b4d07a44e113b3a4d2ab22daa70.
> 2011-01-03 17:24:43,887 DEBUG org.apache.hadoop.hbase.zookeeper.ZKUtil: master:60000-0x12ce26f6c0600e3 Retrieved 113 byte(s) of data from znode /hbase/unassigned/c6a54b4d07a44e113b3a4d2ab22daa70; data=region=usertable,user991629466,1293747979500.c6a54b4d07a44e113b3a4d2ab22daa70., server=haus03.sf.cloudera.com:60000, state=M_ZK_REGION_OFFLINE
> etc... repeating every 10 seconds. Eventually I ran hbck -fix which forced it to OFFLINE in ZK and it reassigned just fine.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Updated: (HBASE-3406) Region stuck in transition after RS failed while opening

Posted by "Todd Lipcon (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-3406?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Todd Lipcon updated HBASE-3406:
-------------------------------

    Fix Version/s:     (was: 0.90.1)
                   0.90.2

Bumping to 0.90.2 since we haven't investigated enough

> Region stuck in transition after RS failed while opening
> --------------------------------------------------------
>
>                 Key: HBASE-3406
>                 URL: https://issues.apache.org/jira/browse/HBASE-3406
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.0
>            Reporter: Todd Lipcon
>            Priority: Critical
>             Fix For: 0.90.2
>
>
> I had a RS fail due to GC pause while it was in the midst of opening a region, apparently. This got the region stuck in the following repeating sequence in the master log:
> 2011-01-03 17:24:33,884 INFO org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPENING for too long, reassigning region=usertable,user991629466,1293747979500.c6a54b4d07a44e113b3a4d2ab22daa70.
> 2011-01-03 17:24:33,885 DEBUG org.apache.hadoop.hbase.zookeeper.ZKUtil: master:60000-0x12ce26f6c0600e3 Retrieved 113 byte(s) of data from znode /hbase/unassigned/c6a54b4d07a44e113b3a4d2ab22daa70; data=region=usertable,user991629466,1293747979500.c6a54b4d07a44e113b3a4d2ab22daa70., server=haus03.sf.cloudera.com:60000, state=M_ZK_REGION_OFFLINE
> 2011-01-03 17:24:43,886 INFO org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed out:  usertable,user991629466,1293747979500.c6a54b4d07a44e113b3a4d2ab22daa70. state=OPENING, ts=1293840977790
> 2011-01-03 17:24:43,886 INFO org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPENING for too long, reassigning region=usertable,user991629466,1293747979500.c6a54b4d07a44e113b3a4d2ab22daa70.
> 2011-01-03 17:24:43,887 DEBUG org.apache.hadoop.hbase.zookeeper.ZKUtil: master:60000-0x12ce26f6c0600e3 Retrieved 113 byte(s) of data from znode /hbase/unassigned/c6a54b4d07a44e113b3a4d2ab22daa70; data=region=usertable,user991629466,1293747979500.c6a54b4d07a44e113b3a4d2ab22daa70., server=haus03.sf.cloudera.com:60000, state=M_ZK_REGION_OFFLINE
> etc... repeating every 10 seconds. Eventually I ran hbck -fix which forced it to OFFLINE in ZK and it reassigned just fine.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Resolved] (HBASE-3406) Region stuck in transition after RS failed while opening

Posted by "stack (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-3406?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

stack resolved HBASE-3406.
--------------------------

    Resolution: Duplicate

Marking as duplicate of HBASE-3669 (Please reopen if anyone disagrees)

> Region stuck in transition after RS failed while opening
> --------------------------------------------------------
>
>                 Key: HBASE-3406
>                 URL: https://issues.apache.org/jira/browse/HBASE-3406
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.0
>            Reporter: Todd Lipcon
>            Priority: Critical
>             Fix For: 0.90.2
>
>
> I had a RS fail due to GC pause while it was in the midst of opening a region, apparently. This got the region stuck in the following repeating sequence in the master log:
> 2011-01-03 17:24:33,884 INFO org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPENING for too long, reassigning region=usertable,user991629466,1293747979500.c6a54b4d07a44e113b3a4d2ab22daa70.
> 2011-01-03 17:24:33,885 DEBUG org.apache.hadoop.hbase.zookeeper.ZKUtil: master:60000-0x12ce26f6c0600e3 Retrieved 113 byte(s) of data from znode /hbase/unassigned/c6a54b4d07a44e113b3a4d2ab22daa70; data=region=usertable,user991629466,1293747979500.c6a54b4d07a44e113b3a4d2ab22daa70., server=haus03.sf.cloudera.com:60000, state=M_ZK_REGION_OFFLINE
> 2011-01-03 17:24:43,886 INFO org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed out:  usertable,user991629466,1293747979500.c6a54b4d07a44e113b3a4d2ab22daa70. state=OPENING, ts=1293840977790
> 2011-01-03 17:24:43,886 INFO org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPENING for too long, reassigning region=usertable,user991629466,1293747979500.c6a54b4d07a44e113b3a4d2ab22daa70.
> 2011-01-03 17:24:43,887 DEBUG org.apache.hadoop.hbase.zookeeper.ZKUtil: master:60000-0x12ce26f6c0600e3 Retrieved 113 byte(s) of data from znode /hbase/unassigned/c6a54b4d07a44e113b3a4d2ab22daa70; data=region=usertable,user991629466,1293747979500.c6a54b4d07a44e113b3a4d2ab22daa70., server=haus03.sf.cloudera.com:60000, state=M_ZK_REGION_OFFLINE
> etc... repeating every 10 seconds. Eventually I ran hbck -fix which forced it to OFFLINE in ZK and it reassigned just fine.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-3406) Region stuck in transition after RS failed while opening

Posted by "Jean-Daniel Cryans (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-3406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13010755#comment-13010755 ] 

Jean-Daniel Cryans commented on HBASE-3406:
-------------------------------------------

Yeah, they seem to be different instances of the same issue.

> Region stuck in transition after RS failed while opening
> --------------------------------------------------------
>
>                 Key: HBASE-3406
>                 URL: https://issues.apache.org/jira/browse/HBASE-3406
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.0
>            Reporter: Todd Lipcon
>            Priority: Critical
>             Fix For: 0.90.2
>
>
> I had a RS fail due to GC pause while it was in the midst of opening a region, apparently. This got the region stuck in the following repeating sequence in the master log:
> 2011-01-03 17:24:33,884 INFO org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPENING for too long, reassigning region=usertable,user991629466,1293747979500.c6a54b4d07a44e113b3a4d2ab22daa70.
> 2011-01-03 17:24:33,885 DEBUG org.apache.hadoop.hbase.zookeeper.ZKUtil: master:60000-0x12ce26f6c0600e3 Retrieved 113 byte(s) of data from znode /hbase/unassigned/c6a54b4d07a44e113b3a4d2ab22daa70; data=region=usertable,user991629466,1293747979500.c6a54b4d07a44e113b3a4d2ab22daa70., server=haus03.sf.cloudera.com:60000, state=M_ZK_REGION_OFFLINE
> 2011-01-03 17:24:43,886 INFO org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed out:  usertable,user991629466,1293747979500.c6a54b4d07a44e113b3a4d2ab22daa70. state=OPENING, ts=1293840977790
> 2011-01-03 17:24:43,886 INFO org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPENING for too long, reassigning region=usertable,user991629466,1293747979500.c6a54b4d07a44e113b3a4d2ab22daa70.
> 2011-01-03 17:24:43,887 DEBUG org.apache.hadoop.hbase.zookeeper.ZKUtil: master:60000-0x12ce26f6c0600e3 Retrieved 113 byte(s) of data from znode /hbase/unassigned/c6a54b4d07a44e113b3a4d2ab22daa70; data=region=usertable,user991629466,1293747979500.c6a54b4d07a44e113b3a4d2ab22daa70., server=haus03.sf.cloudera.com:60000, state=M_ZK_REGION_OFFLINE
> etc... repeating every 10 seconds. Eventually I ran hbck -fix which forced it to OFFLINE in ZK and it reassigned just fine.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-3406) Region stuck in transition after RS failed while opening

Posted by "stack (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-3406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13010563#comment-13010563 ] 

stack commented on HBASE-3406:
------------------------------

I want to close this as dup of HBASE-3669 if thats ok.  HBASE-3669 has more helpful log.

> Region stuck in transition after RS failed while opening
> --------------------------------------------------------
>
>                 Key: HBASE-3406
>                 URL: https://issues.apache.org/jira/browse/HBASE-3406
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.0
>            Reporter: Todd Lipcon
>            Priority: Critical
>             Fix For: 0.90.2
>
>
> I had a RS fail due to GC pause while it was in the midst of opening a region, apparently. This got the region stuck in the following repeating sequence in the master log:
> 2011-01-03 17:24:33,884 INFO org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPENING for too long, reassigning region=usertable,user991629466,1293747979500.c6a54b4d07a44e113b3a4d2ab22daa70.
> 2011-01-03 17:24:33,885 DEBUG org.apache.hadoop.hbase.zookeeper.ZKUtil: master:60000-0x12ce26f6c0600e3 Retrieved 113 byte(s) of data from znode /hbase/unassigned/c6a54b4d07a44e113b3a4d2ab22daa70; data=region=usertable,user991629466,1293747979500.c6a54b4d07a44e113b3a4d2ab22daa70., server=haus03.sf.cloudera.com:60000, state=M_ZK_REGION_OFFLINE
> 2011-01-03 17:24:43,886 INFO org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed out:  usertable,user991629466,1293747979500.c6a54b4d07a44e113b3a4d2ab22daa70. state=OPENING, ts=1293840977790
> 2011-01-03 17:24:43,886 INFO org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPENING for too long, reassigning region=usertable,user991629466,1293747979500.c6a54b4d07a44e113b3a4d2ab22daa70.
> 2011-01-03 17:24:43,887 DEBUG org.apache.hadoop.hbase.zookeeper.ZKUtil: master:60000-0x12ce26f6c0600e3 Retrieved 113 byte(s) of data from znode /hbase/unassigned/c6a54b4d07a44e113b3a4d2ab22daa70; data=region=usertable,user991629466,1293747979500.c6a54b4d07a44e113b3a4d2ab22daa70., server=haus03.sf.cloudera.com:60000, state=M_ZK_REGION_OFFLINE
> etc... repeating every 10 seconds. Eventually I ran hbck -fix which forced it to OFFLINE in ZK and it reassigned just fine.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Commented: (HBASE-3406) Region stuck in transition after RS failed while opening

Posted by "Bill Graham (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-3406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12985543#action_12985543 ] 

Bill Graham commented on HBASE-3406:
------------------------------------

I got myself into the same state. I was able to do this using 0.90.0-rc1 and CDH3b2 by either:

- Changing COMPRESSION => 'lzo' on an existing table where lzo is not set up on HBase.
- Creating a new table with COMPRESSION => 'lzo' where lzo is not set up on HBase.

In my case though hbck -fix wouldn't work. I had to restart the cluster and then run hbck -fix. See this thread for more info:
http://mail-archives.apache.org/mod_mbox/hbase-user/201101.mbox/%3CAANLkTi=GRWBhOVM=YjKMODfm8XyVN5T0XEkdb4FF2CrP@mail.gmail.com%3E


> Region stuck in transition after RS failed while opening
> --------------------------------------------------------
>
>                 Key: HBASE-3406
>                 URL: https://issues.apache.org/jira/browse/HBASE-3406
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.0
>            Reporter: Todd Lipcon
>            Priority: Critical
>             Fix For: 0.90.1
>
>
> I had a RS fail due to GC pause while it was in the midst of opening a region, apparently. This got the region stuck in the following repeating sequence in the master log:
> 2011-01-03 17:24:33,884 INFO org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPENING for too long, reassigning region=usertable,user991629466,1293747979500.c6a54b4d07a44e113b3a4d2ab22daa70.
> 2011-01-03 17:24:33,885 DEBUG org.apache.hadoop.hbase.zookeeper.ZKUtil: master:60000-0x12ce26f6c0600e3 Retrieved 113 byte(s) of data from znode /hbase/unassigned/c6a54b4d07a44e113b3a4d2ab22daa70; data=region=usertable,user991629466,1293747979500.c6a54b4d07a44e113b3a4d2ab22daa70., server=haus03.sf.cloudera.com:60000, state=M_ZK_REGION_OFFLINE
> 2011-01-03 17:24:43,886 INFO org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed out:  usertable,user991629466,1293747979500.c6a54b4d07a44e113b3a4d2ab22daa70. state=OPENING, ts=1293840977790
> 2011-01-03 17:24:43,886 INFO org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPENING for too long, reassigning region=usertable,user991629466,1293747979500.c6a54b4d07a44e113b3a4d2ab22daa70.
> 2011-01-03 17:24:43,887 DEBUG org.apache.hadoop.hbase.zookeeper.ZKUtil: master:60000-0x12ce26f6c0600e3 Retrieved 113 byte(s) of data from znode /hbase/unassigned/c6a54b4d07a44e113b3a4d2ab22daa70; data=region=usertable,user991629466,1293747979500.c6a54b4d07a44e113b3a4d2ab22daa70., server=haus03.sf.cloudera.com:60000, state=M_ZK_REGION_OFFLINE
> etc... repeating every 10 seconds. Eventually I ran hbck -fix which forced it to OFFLINE in ZK and it reassigned just fine.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HBASE-3406) Region stuck in transition after RS failed while opening

Posted by "stack (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-3406?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

stack updated HBASE-3406:
-------------------------

    Fix Version/s:     (was: 0.90.0)
                   0.90.1

Moving to 0.90.1

I cannot explain how in-memory state has OPENING for the node but the znode content is M_ZK_REGION_OFFLINE without more context.

> Region stuck in transition after RS failed while opening
> --------------------------------------------------------
>
>                 Key: HBASE-3406
>                 URL: https://issues.apache.org/jira/browse/HBASE-3406
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.0
>            Reporter: Todd Lipcon
>            Priority: Critical
>             Fix For: 0.90.1
>
>
> I had a RS fail due to GC pause while it was in the midst of opening a region, apparently. This got the region stuck in the following repeating sequence in the master log:
> 2011-01-03 17:24:33,884 INFO org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPENING for too long, reassigning region=usertable,user991629466,1293747979500.c6a54b4d07a44e113b3a4d2ab22daa70.
> 2011-01-03 17:24:33,885 DEBUG org.apache.hadoop.hbase.zookeeper.ZKUtil: master:60000-0x12ce26f6c0600e3 Retrieved 113 byte(s) of data from znode /hbase/unassigned/c6a54b4d07a44e113b3a4d2ab22daa70; data=region=usertable,user991629466,1293747979500.c6a54b4d07a44e113b3a4d2ab22daa70., server=haus03.sf.cloudera.com:60000, state=M_ZK_REGION_OFFLINE
> 2011-01-03 17:24:43,886 INFO org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed out:  usertable,user991629466,1293747979500.c6a54b4d07a44e113b3a4d2ab22daa70. state=OPENING, ts=1293840977790
> 2011-01-03 17:24:43,886 INFO org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPENING for too long, reassigning region=usertable,user991629466,1293747979500.c6a54b4d07a44e113b3a4d2ab22daa70.
> 2011-01-03 17:24:43,887 DEBUG org.apache.hadoop.hbase.zookeeper.ZKUtil: master:60000-0x12ce26f6c0600e3 Retrieved 113 byte(s) of data from znode /hbase/unassigned/c6a54b4d07a44e113b3a4d2ab22daa70; data=region=usertable,user991629466,1293747979500.c6a54b4d07a44e113b3a4d2ab22daa70., server=haus03.sf.cloudera.com:60000, state=M_ZK_REGION_OFFLINE
> etc... repeating every 10 seconds. Eventually I ran hbck -fix which forced it to OFFLINE in ZK and it reassigned just fine.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] [Commented] (HBASE-3406) Region stuck in transition after RS failed while opening

Posted by "stack (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-3406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13010560#comment-13010560 ] 

stack commented on HBASE-3406:
------------------------------

I don't have enough info to figure whats going on here.  I want to punt to 0.90.3 until we have logs from actual fail.

> Region stuck in transition after RS failed while opening
> --------------------------------------------------------
>
>                 Key: HBASE-3406
>                 URL: https://issues.apache.org/jira/browse/HBASE-3406
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.0
>            Reporter: Todd Lipcon
>            Priority: Critical
>             Fix For: 0.90.2
>
>
> I had a RS fail due to GC pause while it was in the midst of opening a region, apparently. This got the region stuck in the following repeating sequence in the master log:
> 2011-01-03 17:24:33,884 INFO org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPENING for too long, reassigning region=usertable,user991629466,1293747979500.c6a54b4d07a44e113b3a4d2ab22daa70.
> 2011-01-03 17:24:33,885 DEBUG org.apache.hadoop.hbase.zookeeper.ZKUtil: master:60000-0x12ce26f6c0600e3 Retrieved 113 byte(s) of data from znode /hbase/unassigned/c6a54b4d07a44e113b3a4d2ab22daa70; data=region=usertable,user991629466,1293747979500.c6a54b4d07a44e113b3a4d2ab22daa70., server=haus03.sf.cloudera.com:60000, state=M_ZK_REGION_OFFLINE
> 2011-01-03 17:24:43,886 INFO org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed out:  usertable,user991629466,1293747979500.c6a54b4d07a44e113b3a4d2ab22daa70. state=OPENING, ts=1293840977790
> 2011-01-03 17:24:43,886 INFO org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPENING for too long, reassigning region=usertable,user991629466,1293747979500.c6a54b4d07a44e113b3a4d2ab22daa70.
> 2011-01-03 17:24:43,887 DEBUG org.apache.hadoop.hbase.zookeeper.ZKUtil: master:60000-0x12ce26f6c0600e3 Retrieved 113 byte(s) of data from znode /hbase/unassigned/c6a54b4d07a44e113b3a4d2ab22daa70; data=region=usertable,user991629466,1293747979500.c6a54b4d07a44e113b3a4d2ab22daa70., server=haus03.sf.cloudera.com:60000, state=M_ZK_REGION_OFFLINE
> etc... repeating every 10 seconds. Eventually I ran hbck -fix which forced it to OFFLINE in ZK and it reassigned just fine.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Commented: (HBASE-3406) Region stuck in transition after RS failed while opening

Posted by "Cosmin Lehene (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-3406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13007890#comment-13007890 ] 

Cosmin Lehene commented on HBASE-3406:
--------------------------------------

I've seen this as well. At the same time the HRegionServer was getting a message to close the region, but since it wasn't actually serving it it was reporting an error as well creating this infinite loop.



> Region stuck in transition after RS failed while opening
> --------------------------------------------------------
>
>                 Key: HBASE-3406
>                 URL: https://issues.apache.org/jira/browse/HBASE-3406
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.0
>            Reporter: Todd Lipcon
>            Priority: Critical
>             Fix For: 0.90.2
>
>
> I had a RS fail due to GC pause while it was in the midst of opening a region, apparently. This got the region stuck in the following repeating sequence in the master log:
> 2011-01-03 17:24:33,884 INFO org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPENING for too long, reassigning region=usertable,user991629466,1293747979500.c6a54b4d07a44e113b3a4d2ab22daa70.
> 2011-01-03 17:24:33,885 DEBUG org.apache.hadoop.hbase.zookeeper.ZKUtil: master:60000-0x12ce26f6c0600e3 Retrieved 113 byte(s) of data from znode /hbase/unassigned/c6a54b4d07a44e113b3a4d2ab22daa70; data=region=usertable,user991629466,1293747979500.c6a54b4d07a44e113b3a4d2ab22daa70., server=haus03.sf.cloudera.com:60000, state=M_ZK_REGION_OFFLINE
> 2011-01-03 17:24:43,886 INFO org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed out:  usertable,user991629466,1293747979500.c6a54b4d07a44e113b3a4d2ab22daa70. state=OPENING, ts=1293840977790
> 2011-01-03 17:24:43,886 INFO org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPENING for too long, reassigning region=usertable,user991629466,1293747979500.c6a54b4d07a44e113b3a4d2ab22daa70.
> 2011-01-03 17:24:43,887 DEBUG org.apache.hadoop.hbase.zookeeper.ZKUtil: master:60000-0x12ce26f6c0600e3 Retrieved 113 byte(s) of data from znode /hbase/unassigned/c6a54b4d07a44e113b3a4d2ab22daa70; data=region=usertable,user991629466,1293747979500.c6a54b4d07a44e113b3a4d2ab22daa70., server=haus03.sf.cloudera.com:60000, state=M_ZK_REGION_OFFLINE
> etc... repeating every 10 seconds. Eventually I ran hbck -fix which forced it to OFFLINE in ZK and it reassigned just fine.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira