You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Todd Lipcon (JIRA)" <ji...@apache.org> on 2011/03/14 21:15:29 UTC

[jira] Created: (HBASE-3637) Region stuck in OPENED state

Region stuck in OPENED state
----------------------------

                 Key: HBASE-3637
                 URL: https://issues.apache.org/jira/browse/HBASE-3637
             Project: HBase
          Issue Type: Bug
          Components: master
    Affects Versions: 0.92.0
            Reporter: Todd Lipcon
            Priority: Critical
             Fix For: 0.92.0


I don't 100% understand how this happened, but the following was observed:

- META is in OPENED state in ZK, for a server which no longer exists
- Handler sees that server is dead, and figures that the RIT timeout will handle it
- RIT timeout sees that it's already in OPENED state, and assumes that the OPENED handler will handle it
- loops in timeout state forever, never actually getting reassigned

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Resolved: (HBASE-3637) Region stuck in OPENED state

Posted by "stack (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HBASE-3637?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

stack resolved HBASE-3637.
--------------------------

    Resolution: Won't Fix

Will fix HBASE-3638 instead

> Region stuck in OPENED state
> ----------------------------
>
>                 Key: HBASE-3637
>                 URL: https://issues.apache.org/jira/browse/HBASE-3637
>             Project: HBase
>          Issue Type: Bug
>          Components: master
>    Affects Versions: 0.92.0
>            Reporter: Todd Lipcon
>            Priority: Critical
>             Fix For: 0.92.0
>
>
> I don't 100% understand how this happened, but the following was observed:
> - META is in OPENED state in ZK, for a server which no longer exists
> - Handler sees that server is dead, and figures that the RIT timeout will handle it
> - RIT timeout sees that it's already in OPENED state, and assumes that the OPENED handler will handle it
> - loops in timeout state forever, never actually getting reassigned

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Commented: (HBASE-3637) Region stuck in OPENED state

Posted by "stack (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-3637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13006708#comment-13006708 ] 

stack commented on HBASE-3637:
------------------------------

OK if I close this in favor of HBASE-3638 Todd?

> Region stuck in OPENED state
> ----------------------------
>
>                 Key: HBASE-3637
>                 URL: https://issues.apache.org/jira/browse/HBASE-3637
>             Project: HBase
>          Issue Type: Bug
>          Components: master
>    Affects Versions: 0.92.0
>            Reporter: Todd Lipcon
>            Priority: Critical
>             Fix For: 0.92.0
>
>
> I don't 100% understand how this happened, but the following was observed:
> - META is in OPENED state in ZK, for a server which no longer exists
> - Handler sees that server is dead, and figures that the RIT timeout will handle it
> - RIT timeout sees that it's already in OPENED state, and assumes that the OPENED handler will handle it
> - loops in timeout state forever, never actually getting reassigned

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Commented: (HBASE-3637) Region stuck in OPENED state

Posted by "Todd Lipcon (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-3637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13006619#comment-13006619 ] 

Todd Lipcon commented on HBASE-3637:
------------------------------------

Some more context: this was from a test that was looping the following:
- install HBase
- wipe /hbase on HDFS
- start hbase, run smoke test, kill all servers

Notably, it wasn't clearing ZK between runs. So some leftover RIT data from a previous HBase incarnation may be confusing this one's master.

> Region stuck in OPENED state
> ----------------------------
>
>                 Key: HBASE-3637
>                 URL: https://issues.apache.org/jira/browse/HBASE-3637
>             Project: HBase
>          Issue Type: Bug
>          Components: master
>    Affects Versions: 0.92.0
>            Reporter: Todd Lipcon
>            Priority: Critical
>             Fix For: 0.92.0
>
>
> I don't 100% understand how this happened, but the following was observed:
> - META is in OPENED state in ZK, for a server which no longer exists
> - Handler sees that server is dead, and figures that the RIT timeout will handle it
> - RIT timeout sees that it's already in OPENED state, and assumes that the OPENED handler will handle it
> - loops in timeout state forever, never actually getting reassigned

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Commented: (HBASE-3637) Region stuck in OPENED state

Posted by "Todd Lipcon (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-3637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13006614#comment-13006614 ] 

Todd Lipcon commented on HBASE-3637:
------------------------------------

2011-03-11 06:42:58,301 DEBUG org.apache.hadoop.hbase.zookeeper.ZKUtil: master:60000-0x22ea55e0f670002 Retrieved 65 byte(s) of data from znode /hbase/unassigned/1028785192 and set watcher; region=.META.,,1, server=trek08.sf.cloudera.com,60020,1299853933073, state=RS_ZK_REGION_OPENED
2011-03-11 06:42:58,301 INFO org.apache.hadoop.hbase.master.AssignmentManager: Processing region .META.,,1.1028785192 in state RS_ZK_REGION_OPENED
2011-03-11 06:42:58,302 WARN org.apache.hadoop.hbase.master.AssignmentManager: Region in transition 1028785192 references a server no longer up trek08.sf.cloudera.com,60020,1299853933073; letting RIT timeout so will be assigned elsewhere
2011-03-11 06:42:58,304 DEBUG org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher: master:60000-0x22ea55e0f670002 Received ZooKeeper Event, type=NodeDataChanged, state=SyncConnected, path=/hbase/unassigned/70236052
2011-03-11 06:42:58,305 DEBUG org.apache.hadoop.hbase.zookeeper.ZKUtil: master:60000-0x22ea55e0f670002 Retrieved 65 byte(s) of data from znode /hbase/unassigned/70236052 and set watcher; region=-ROOT-,,0, server=trek10.sf.cloudera.com,60020,1299854562169, state=RS_ZK_REGION_OPENED
2011-03-11 06:42:58,305 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_OPENED, server=trek10.sf.cloudera.com,60020,1299854562169, region=70236052/-ROOT-
2011-03-11 06:42:58,307 DEBUG org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: Handling OPENED event for 70236052; deleting unassigned node
2011-03-11 06:42:58,308 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:60000-0x22ea55e0f670002 Deleting existing unassigned node for 70236052 that is in expected state RS_ZK_REGION_OPENED
2011-03-11 06:42:58,313 DEBUG org.apache.hadoop.hbase.zookeeper.ZKUtil: master:60000-0x22ea55e0f670002 Retrieved 65 byte(s) of data from znode /hbase/unassigned/70236052; data=region=-ROOT-,,0, server=trek10.sf.cloudera.com,60020,1299854562169, state=RS_ZK_REGION_OPENED
2011-03-11 06:42:58,315 DEBUG org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher: master:60000-0x22ea55e0f670002 Received ZooKeeper Event, type=NodeDeleted, state=SyncConnected, path=/hbase/unassigned/70236052
2011-03-11 06:42:58,315 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:60000-0x22ea55e0f670002 Successfully deleted unassigned node for region 70236052 in expected state RS_ZK_REGION_OPENED
2011-03-11 06:42:58,316 DEBUG org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: Opened region -ROOT-,,0.70236052 on trek10.sf.cloudera.com,60020,1299854562169
2011-03-11 06:42:59,097 INFO org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition timed out:  .META.,,1.1028785192 state=OPENING, ts=1299854016886
2011-03-11 06:42:59,097 INFO org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPENING for too long, reassigning region=.META.,,1.1028785192
2011-03-11 06:42:59,098 DEBUG org.apache.hadoop.hbase.zookeeper.ZKUtil: master:60000-0x22ea55e0f670002 Retrieved 65 byte(s) of data from znode /hbase/unassigned/1028785192; data=region=.META.,,1, server=trek08.sf.cloudera.com,60020,1299853933073, state=RS_ZK_REGION_OPENED
2011-03-11 06:42:59,099 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Region has transitioned to OPENED, allowing watched event handlers to process


> Region stuck in OPENED state
> ----------------------------
>
>                 Key: HBASE-3637
>                 URL: https://issues.apache.org/jira/browse/HBASE-3637
>             Project: HBase
>          Issue Type: Bug
>          Components: master
>    Affects Versions: 0.92.0
>            Reporter: Todd Lipcon
>            Priority: Critical
>             Fix For: 0.92.0
>
>
> I don't 100% understand how this happened, but the following was observed:
> - META is in OPENED state in ZK, for a server which no longer exists
> - Handler sees that server is dead, and figures that the RIT timeout will handle it
> - RIT timeout sees that it's already in OPENED state, and assumes that the OPENED handler will handle it
> - loops in timeout state forever, never actually getting reassigned

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Commented: (HBASE-3637) Region stuck in OPENED state

Posted by "Todd Lipcon (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HBASE-3637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13006738#comment-13006738 ] 

Todd Lipcon commented on HBASE-3637:
------------------------------------

sure

> Region stuck in OPENED state
> ----------------------------
>
>                 Key: HBASE-3637
>                 URL: https://issues.apache.org/jira/browse/HBASE-3637
>             Project: HBase
>          Issue Type: Bug
>          Components: master
>    Affects Versions: 0.92.0
>            Reporter: Todd Lipcon
>            Priority: Critical
>             Fix For: 0.92.0
>
>
> I don't 100% understand how this happened, but the following was observed:
> - META is in OPENED state in ZK, for a server which no longer exists
> - Handler sees that server is dead, and figures that the RIT timeout will handle it
> - RIT timeout sees that it's already in OPENED state, and assumes that the OPENED handler will handle it
> - loops in timeout state forever, never actually getting reassigned

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira