You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Duo Zhang (JIRA)" <ji...@apache.org> on 2018/12/23 01:27:00 UTC

[jira] [Commented] (HBASE-21624) master startup should not wait (or die) on assigning meta replicas

    [ https://issues.apache.org/jira/browse/HBASE-21624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16727769#comment-16727769 ] 

Duo Zhang commented on HBASE-21624:
-----------------------------------

Agree that we do not need to wait for non default meta replicas to be online when starting HMaster.

But I'm afraid that the old code just wants to confirm that the meta replicas can work. I can see lots of comments in the code base which are always something like "TODO: handle meta replicas". So here I think we'd better find out a complete solution for meta replicas, i.e, how to initialize meta replicas for a fresh new cluster, how to increase or decrease meta replicas, and how deal with meta replicas in SCP and TRSP.

> master startup should not wait (or die) on assigning meta replicas
> ------------------------------------------------------------------
>
>                 Key: HBASE-21624
>                 URL: https://issues.apache.org/jira/browse/HBASE-21624
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Sergey Shelukhin
>            Priority: Major
>
> Due to some other bug, a meta replica is stuck in transition forever. 
> Master is running fine without it, however the initializer thread hasn't finished initialization for ~19 hours now and is stuck in the below state.
> Doesn't seem to be necessary to wait for them - could just be fire-and-forget, normal region handling should handle it after that.
> {noformat}
> Thread 118 (master/...:17000:becomeActiveMaster):
>   State: TIMED_WAITING
>   Blocked count: 281
>   Waited count: 67059
>   Stack:
>     java.lang.Thread.sleep(Native Method)
>     org.apache.hadoop.hbase.master.procedure.ProcedureSyncWait.waitFor(ProcedureSyncWait.java:209)
>     org.apache.hadoop.hbase.master.procedure.ProcedureSyncWait.waitFor(ProcedureSyncWait.java:192)
>     org.apache.hadoop.hbase.master.procedure.ProcedureSyncWait.waitForProcedureToComplete(ProcedureSyncWait.java:151)
>     org.apache.hadoop.hbase.master.procedure.ProcedureSyncWait.waitForProcedureToCompleteIOE(ProcedureSyncWait.java:140)
>     org.apache.hadoop.hbase.master.procedure.ProcedureSyncWait.submitAndWaitProcedure(ProcedureSyncWait.java:133)
>     org.apache.hadoop.hbase.master.assignment.AssignmentManager.assign(AssignmentManager.java:569)
>     org.apache.hadoop.hbase.master.MasterMetaBootstrap.assignMetaReplicas(MasterMetaBootstrap.java:84)
>     org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:1146)
>     org.apache.hadoop.hbase.master.HMaster.startActiveMasterManager(HMaster.java:2342)
> {noformat}
> Additionally and semi related, if the meta-hosting server dies during replica assignment, master also immediately dies, which is unnecessary.
> {noformat}
> 2018-12-14 21:00:55,331 ERROR [master/...:17000:becomeActiveMaster] master.HMaster: Failed to become active master
> org.apache.hadoop.hbase.HBaseIOException: rit=OFFLINE, location=null, table=hbase:meta, region=534574363 is currently in transition
>                 at org.apache.hadoop.hbase.master.assignment.AssignmentManager.preTransitCheck(AssignmentManager.java:545)
>                 at org.apache.hadoop.hbase.master.assignment.AssignmentManager.assign(AssignmentManager.java:563)
>                 at org.apache.hadoop.hbase.master.MasterMetaBootstrap.assignMetaReplicas(MasterMetaBootstrap.java:84)
>                 at org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:1146)
>                 at org.apache.hadoop.hbase.master.HMaster.startActiveMasterManager(HMaster.java:2342)
>                 at org.apache.hadoop.hbase.master.HMaster.lambda$run$0(HMaster.java:591)
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)