You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Ted Yu (JIRA)" <ji...@apache.org> on 2011/07/22 18:35:58 UTC

[jira] [Commented] (HBASE-3801) Backup Master blocked when the HMaster Node Fail.

    [ https://issues.apache.org/jira/browse/HBASE-3801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13069607#comment-13069607 ] 

Ted Yu commented on HBASE-3801:
-------------------------------

Normally patch should carry the JIRA number in its filename.

The patch changes the semantics of how ActiveMasterManager handles watcher.masterAddressZNode
Consequently the following assertion from TestActiveMasterManager would fail:
{code}
    assertFalse(activeMasterManager.clusterHasActiveMaster.get());
{code}

Please produce a complete patch, run through the following unit tests and document the experience of testing failover in a real cluster:
{code}
src/test/java/org/apache/hadoop/hbase/regionserver/TestMasterAddressManager.java
src/test/java/org/apache/hadoop/hbase/coprocessor/TestMasterObserver.java
src/test/java/org/apache/hadoop/hbase/master/TestMasterFailover.java
src/test/java/org/apache/hadoop/hbase/master/TestMaster.java
src/test/java/org/apache/hadoop/hbase/master/TestActiveMasterManager.java
src/test/java/org/apache/hadoop/hbase/master/TestMasterStatusServlet.java
src/test/java/org/apache/hadoop/hbase/master/TestMasterTransitions.java
src/test/java/org/apache/hadoop/hbase/master/TestMasterRestartAfterDisablingTable.java
src/test/java/org/apache/hadoop/hbase/master/TestHMasterRPCException.java
{code}

> Backup Master blocked when the HMaster Node Fail.
> -------------------------------------------------
>
>                 Key: HBASE-3801
>                 URL: https://issues.apache.org/jira/browse/HBASE-3801
>             Project: HBase
>          Issue Type: Bug
>          Components: master
>    Affects Versions: 0.90.2, 0.90.3
>         Environment: 1 HMaster
> 1 HMaster -backup
> 6 HResignServer
>            Reporter: Aaron Guo
>         Attachments: patch.txt
>
>
> When the HMaster crash, the Backup HMaster blocked for waiting the ZK notify.
> The Backup HMaster's thread stack is :
> "master-hp1:60000" prio=10 tid=0x00000000484c6800 nid=0x4b56 waiting on condition [0x0000000040209000]
>    java.lang.Thread.State: TIMED_WAITING (sleeping)
>         at java.lang.Thread.sleep(Native Method)
>         at org.apache.hadoop.hbase.master.HMaster.stallIfBackupMaster(HMaster.java:251)
>         at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:279)
>    Locked ownable synchronizers:
>         - None

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira