You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Jean-Daniel Cryans (JIRA)" <ji...@apache.org> on 2013/07/10 02:45:50 UTC

[jira] [Updated] (HBASE-8910) TestReplicationDisableInactivePeer fails if the master we shutdown comes back to life

     [ https://issues.apache.org/jira/browse/HBASE-8910?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jean-Daniel Cryans updated HBASE-8910:
--------------------------------------

    Attachment: HBASE-8910.patch

Attaching a patch that verifies in abortNow if, appart from not being the primary master, we are supposed to stop, in which case it will continue to abort and call stop (again) which should exit quickly.
                
> TestReplicationDisableInactivePeer fails if the master we shutdown comes back to life
> -------------------------------------------------------------------------------------
>
>                 Key: HBASE-8910
>                 URL: https://issues.apache.org/jira/browse/HBASE-8910
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Jean-Daniel Cryans
>            Assignee: Jean-Daniel Cryans
>             Fix For: 0.98.0, 0.95.2, 0.94.10
>
>         Attachments: HBASE-8910.patch
>
>
> Here's a case where TestReplicationDisableInactivePeer fails while re-starting the second master:
> http://54.241.6.143/job/HBase-0.95-Hadoop-2/574/org.apache.hbase$hbase-server/testReport/junit/org.apache.hadoop.hbase.replication/TestReplicationDisableInactivePeer/testDisableInactivePeer/
> The reason is that when we first shutdown the master, it comes back to life thinking it just lost its session:
> {noformat}
> 2013-07-07 04:27:03,989 FATAL [pool-1-thread-1-EventThread] master.HMaster(2062): Master server abort: loaded coprocessors are: [org.apache.hadoop.hbase.coprocessor.MultiRowMutationEndpoint]
> 2013-07-07 04:27:03,989 INFO  [pool-1-thread-1-EventThread] master.HMaster(2155): Primary Master trying to recover from ZooKeeper session expiry.
> {noformat}
> And after that it tries to assign .META. fails since the RS are down.
> One way I think we can prevent this is by skipping recovering the session if we are stopping.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira