You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Duo Zhang (JIRA)" <ji...@apache.org> on 2018/12/23 01:24:00 UTC

[jira] [Commented] (HBASE-21627) race condition between a recovered RIT for meta replica, and master startup

    [ https://issues.apache.org/jira/browse/HBASE-21627?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16727766#comment-16727766 ] 

Duo Zhang commented on HBASE-21627:
-----------------------------------

Agree that we do not need to wait for non default meta replicas to be online when starting HMaster.

But I'm afraid that the old code just wants to confirm that the meta replicas can work. I can see lots of comments in the code base which are always something like "TODO: handle meta replicas". So here I think we'd better find out a complete solution for meta replicas, i.e, how to initialize meta replicas for a fresh new cluster, how to increase or decrease meta replicas, and how deal with meta replicas in SCP and TRSP.

> race condition between a recovered RIT for meta replica, and master startup
> ---------------------------------------------------------------------------
>
>                 Key: HBASE-21627
>                 URL: https://issues.apache.org/jira/browse/HBASE-21627
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Sergey Shelukhin
>            Priority: Major
>
> Master recovers RIT for a meta replica
> {noformat}
> 2018-12-14 23:16:12,008 INFO  [master/...:17000:becomeActiveMaster] assignment.AssignmentManager: Attach pid=83796, ppid=83788, state=RUNNABLE:REGION_STATE_TRANSITION_OPEN, hasLock=false; TransitRegionStateProcedure table=hbase:meta, region=(region), ASSIGN to rit=OFFLINE, location=null, table=hbase:meta, region=(region) to restore RIT
> 2018-12-14 23:16:16,475 WARN  [PEWorker-8] assignment.TransitRegionStateProcedure: No location specified for {ENCODED => (region), NAME => 'hbase:meta,,1_0001', STARTKEY => '', ENDKEY => '', REPLICA_ID => 1}, jump back to state REGION_STATE_TRANSITION_GET_ASSIGN_CANDIDATE to get one
> ...
> 2018-12-14 23:16:30,010 INFO  [PEWorker-16] procedure2.ProcedureExecutor: Finished pid=83796, ppid=83788, state=SUCCESS, hasLock=false; TransitRegionStateProcedure table=hbase:meta, region=(region), ASSIGN in 8mins, 23.39sec
> {noformat}
> Then tries to assign replicas..
> {noformat}
> 2018-12-14 23:16:36,091 ERROR [master/...:17000:becomeActiveMaster] master.HMaster: Failed to become active master
> org.apache.hadoop.hbase.client.DoNotRetryRegionException: Unexpected state for rit=OPEN, location=server,17020,1544858156805, table=hbase:meta, region=(region)
>                 at org.apache.hadoop.hbase.master.assignment.AssignmentManager.preTransitCheck(AssignmentManager.java:548)
>                 at org.apache.hadoop.hbase.master.assignment.AssignmentManager.assign(AssignmentManager.java:563)
>                 at org.apache.hadoop.hbase.master.MasterMetaBootstrap.assignMetaReplicas(MasterMetaBootstrap.java:84)
>                 at org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:1146)
> {noformat}
> Unfortunately I misplaced the log from this after copy-pasting a grep result so that's all I have for this.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)