You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Michael Stack (Jira)" <ji...@apache.org> on 2021/08/14 16:17:00 UTC

[jira] [Commented] (HBASE-24286) HMaster won't become healthy after after cloning or creating a new cluster pointing at the same file system

    [ https://issues.apache.org/jira/browse/HBASE-24286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17399210#comment-17399210 ] 

Michael Stack commented on HBASE-24286:
---------------------------------------

Linking HBASE-26193... has a nice summary of what the issue is here by [~zyork]  w/ some color added by [~zhangduo]

> HMaster won't become healthy after after cloning or creating a new cluster pointing at the same file system
> -----------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-24286
>                 URL: https://issues.apache.org/jira/browse/HBASE-24286
>             Project: HBase
>          Issue Type: Bug
>          Components: master, Region Assignment
>    Affects Versions: 3.0.0-alpha-1, 2.2.3, 2.2.4, 2.2.5
>            Reporter: Jack Ye
>            Assignee: Tak-Lon (Stephen) Wu
>            Priority: Major
>
> h1. How to reproduce:
>  # user starts an HBase cluster on top of a file system
>  # user performs some operations and shuts down the cluster, all the data are still persisted in the file system
>  # user creates a new HBase cluster using a different set of servers on top of the same file system with the same root directory
>  # HMaster cannot initialize
> h1. Root cause:
> During HMaster initialization phase, the following happens:
>  # HMaster waits for namespace table online
>  # AssignmentManager gets all namespace table regions info
>  # region servers of namespace table are already dead, online check fails
>  # HMaster waits for namespace regions online, keep retrying for 1000 times which means forever
> Code waiting for namespace table to be online: https://github.com/apache/hbase/blob/rel/2.2.3/hbase-server/src/main/java/org/apache/hadoop/hbase/master/HMaster.java#L1102
> h1. Stack trace (running on S3):
> 2020-04-23 08:15:57,185 WARN [master/ip-10-12-13-14:16000:becomeActiveMaster] master.HMaster: hbase:namespace,,1587628169070.d34b65b91a52644ed3e77c5fbb065c2b. is NOT online; state=\{d34b65b91a52644ed3e77c5fbb065c2b state=OPEN, ts=1587629742129, server=ip-10-12-13-14.ec2.internal,16020,1587628031614}; ServerCrashProcedures=false. Master startup cannot progress, in holding-pattern until region onlined.
> where ip-10-12-13-14.ec2.internal is the old region server hosting the region of hbase:namespace.
> h1. Discussion for the fix
> We see there is a fix for this at branch-3: https://issues.apache.org/jira/browse/HBASE-21154. Before we provide a patch, we would like to know from the community if we should backport this change to branch-2, or if we should just perform a fix with minimum code change.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)