You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Nick Dimiduk (JIRA)" <ji...@apache.org> on 2014/11/12 23:46:34 UTC

[jira] [Updated] (HBASE-12467) Master joins cluster but never completes initialization

     [ https://issues.apache.org/jira/browse/HBASE-12467?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Nick Dimiduk updated HBASE-12467:
---------------------------------
    Attachment: HBASE-12467.00.patch

Here's a fix to at least detect this kind of thing with the option to let the operator follow the Erlang approach.

> Master joins cluster but never completes initialization
> -------------------------------------------------------
>
>                 Key: HBASE-12467
>                 URL: https://issues.apache.org/jira/browse/HBASE-12467
>             Project: HBase
>          Issue Type: Bug
>          Components: master
>            Reporter: Nick Dimiduk
>            Assignee: Nick Dimiduk
>             Fix For: 2.0.0, 0.98.9, 0.99.2
>
>         Attachments: HBASE-12467.00.patch
>
>
> While diagnosing a rare failure in IntegrationTestLoadAndVerify, I discovered this scenario. Master was restarted by CM. Upon rejoining the cluster it successfully assumes responsibility as active master, but apparently the finishInitialization method never completes. The last log line from that thread is
> {noformat}
> 2014-11-10 17:01:29,940 INFO  [master:ip-172-31-9-135:60000] master.HMaster: hbase:meta with replicaId 0 assigned=0, rit=false, location=ip-172-31-9-136.ec2.internal,60020,1415638551951
> {noformat}
> I see region states populated from existing znodes. AM inventoried the online regions, acknowledged that this was master failover. There it sits, responding to RPC's with {{PleaseHoldException: Master is initializing}}.
> For the sake of resiliency, we should detect this scenario and at least release control as active master.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)