You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hbase.apache.org by "bolao (Jira)" <ji...@apache.org> on 2021/09/16 02:51:00 UTC

[jira] [Created] (HBASE-26287) the initialization of master could not be completed when hbase:namesapce' region is not online

bolao created HBASE-26287:
-----------------------------

             Summary: the initialization of master could not be completed  when  hbase:namesapce' region is not online
                 Key: HBASE-26287
                 URL: https://issues.apache.org/jira/browse/HBASE-26287
             Project: HBase
          Issue Type: Improvement
          Components: master
    Affects Versions: 2.3.5
            Reporter: bolao


hbase cluster unexpected shuts down and then restart, we sometimes find the master can't not initialize becouse of that it is stuck in isRegionOnline methad for Hbase:namespace。we found the master and meta table think the hbase:namespace region is online but it's regionserver is dead by viewing logs of master,isRegionOnline print log for this every one minute and don't do Nothing, I think we can remove record form assignmentManager's RegionState and assign hbase:namespace to another regionserver, in order to make hbase cluster recover without human intervention。i came to ask your advice, what do you think?
{panel:title=the logs of master}
2021-09-02 18:32:46 [master/fx-hd-sc-hbase-master-0:16000:becomeActiveMaster] WARN org.apache.hadoop.hbase.master.HMaster.isRegionOnline(1229) -hbase:namespace,,1477036226969.c0b2d4af686dc6b1c98dd9c866fe7607. is NOT online; state=\{c0b2d4af686dc6b1c98dd9c866fe7607 state=OPEN, ts=1630577738198, server=fx-hd-sc-hbase-slave-10.fx-hd-sc.fx-ns.svc.cluster.xjht,16020,1628591903870}; ServerCrashProcedures=false. Master startup cannot progress, in holding-pattern until region onlined.
2021-09-02 18:33:01 [master/fx-hd-sc-hbase-master-0:16000.Chore.1] INFO org.apache.hadoop.hbase.ChoreService.scheduleChore(157) -Chore ScheduledChore name=fx-hd-sc-hbase-master-0.fx-hd-sc.fx-ns.svc.cluster.xjht,16000,1630401705440-ClusterStatusChore, period=60000, unit=MILLISECONDS is enabled.
2021-09-02 18:33:01 [master/fx-hd-sc-hbase-master-0:16000.Chore.1] INFO org.apache.hadoop.hbase.ScheduledChore.run(172) -Chore: fx-hd-sc-hbase-master-0.fx-hd-sc.fx-ns.svc.cluster.xjht,16000,1630401705440-ClusterStatusChore missed its start time
2021-09-02 18:33:41 [ProcExecTimeout] INFO org.apache.hadoop.hbase.master.assignment.AssignmentManager.periodicExecute(1334) -Found 0 OPEN regions on dead servers and 177568 OPEN regions on unknown servers
2021-09-02 18:33:46 [master/fx-hd-sc-hbase-master-0:16000:becomeActiveMaster] WARN org.apache.hadoop.hbase.master.HMaster.isRegionOnline(1229) -hbase:namespace,,1477036226969.c0b2d4af686dc6b1c98dd9c866fe7607. is NOT online; state=\{c0b2d4af686dc6b1c98dd9c866fe7607 state=OPEN, ts=1630577738198, server=fx-hd-sc-hbase-slave-10.fx-hd-sc.fx-ns.svc.cluster.xjht,16020,1628591903870}; ServerCrashProcedures=false. Master startup cannot progress, in holding-pattern until region onlined.
2021-09-02 18:34:31 [qtp780802740-4192] INFO http.requests.master.write(60) -15.22.70.168 - - [02/Sep/2021:10:34:31 +0000] "GET //15.22.70.168:1601/master-status HTTP/1.1" 200 54124 
2021-09-02 18:34:46 [master/fx-hd-sc-hbase-master-0:16000:becomeActiveMaster] WARN org.apache.hadoop.hbase.master.HMaster.isRegionOnline(1229) -hbase:namespace,,1477036226969.c0b2d4af686dc6b1c98dd9c866fe7607. is NOT online; state=\{c0b2d4af686dc6b1c98dd9c866fe7607 state=OPEN, ts=1630577738198, server=fx-hd-sc-hbase-slave-10.fx-hd-sc.fx-ns.svc.cluster.xjht,16020,1628591903870}; ServerCrashProcedures=false. Master startup cannot progress, in holding-pattern until region onlined.
2021-09-02 18:34:51 [qtp780802740-4202] INFO http.requests.master.write(60) -15.22.70.168 - - [02/Sep/2021:10:34:51 +0000] "GET //15.22.70.168:1601/master-status HTTP/1.1" 200 54122 
2021-09-02 18:35:41 [ProcExecTimeout] INFO org.apache.hadoop.hbase.master.assignment.AssignmentManager.periodicExecute(1334) -Found 0 OPEN regions on dead servers and 177568 OPEN regions on unknown servers
2021-09-02 18:35:46 [master/fx-hd-sc-hbase-master-0:16000:becomeActiveMaster] WARN org.apache.hadoop.hbase.master.HMaster.isRegionOnline(1229) -hbase:namespace,,1477036226969.c0b2d4af686dc6b1c98dd9c866fe7607. is NOT online; state=\{c0b2d4af686dc6b1c98dd9c866fe7607 state=OPEN, ts=1630577738198, server=fx-hd-sc-hbase-slave-10.fx-hd-sc.fx-ns.svc.cluster.xjht,16020,1628591903870}; ServerCrashProcedures=false. Master startup cannot progress, in holding-pattern until region onlined.
2021-09-02 18:36:20 [qtp780802740-4192] INFO http.requests.master.write(60) -15.22.70.168 - - [02/Sep/2021:10:36:20 +0000] "GET //15.22.70.168:1601/master-status HTTP/1.1" 200 54122 
2021-09-02 18:36:46 [master/fx-hd-sc-hbase-master-0:16000:becomeActiveMaster] WARN org.apache.hadoop.hbase.master.HMaster.isRegionOnline(1229) -hbase:namespace,,1477036226969.c0b2d4af686dc6b1c98dd9c866fe7607. is NOT online; state=\{c0b2d4af686dc6b1c98dd9c866fe7607 state=OPEN, ts=1630577738198, server=fx-hd-sc-hbase-slave-10.fx-hd-sc.fx-ns.svc.cluster.xjht,16020,1628591903870}; ServerCrashProcedures=false. Master startup cannot progress, in holding-pattern until region onlined.
2021-09-02 18:36:57 [RSProcedureDispatcher-pool4-t23] WARN org.apache.hadoop.hbase.master.procedure.RSProcedureDispatcher.scheduleForRetry(323) -request to fx-hd-sc-hbase-slave-15.fx-hd-sc.fx-ns.svc.cluster.xjht,16020,1630405458162 failed due to org.apache.hadoop.hbase.ipc.CallTimeoutException: Call to fx-hd-sc-hbase-slave-15.fx-hd-sc.fx-ns.svc.cluster.xjht/172.49.9.38:16020 failed on local exception: org.apache.hadoop.hbase.ipc.CallTimeoutException: Call[id=6192,methodName=ExecuteProcedures], waitTime=600008, rpcTimeout=600000, try=7, retrying...
2021-09-02 18:37:42 [ProcExecTimeout] INFO org.apache.hadoop.hbase.master.assignment.AssignmentManager.periodicExecute(1334) -Found 0 OPEN regions on dead servers and 177568 OPEN regions on unknown servers
2021-09-02 18:37:46 [master/fx-hd-sc-hbase-master-0:16000:becomeActiveMaster] WARN org.apache.hadoop.hbase.master.HMaster.isRegionOnline(1229) -hbase:namespace,,1477036226969.c0b2d4af686dc6b1c98dd9c866fe7607. is NOT online; state=\{c0b2d4af686dc6b1c98dd9c866fe7607 state=OPEN, ts=1630577738198, server=fx-hd-sc-hbase-slave-10.fx-hd-sc.fx-ns.svc.cluster.xjht,16020,1628591903870}; ServerCrashProcedures=false. Master startup cannot progress, in holding-pattern until region onlined.
2021-09-02 18:38:46 [master/fx-hd-sc-hbase-master-0:16000:becomeActiveMaster] WARN org.apache.hadoop.hbase.master.HMaster.isRegionOnline(1229) -hbase:namespace,,1477036226969.c0b2d4af686dc6b1c98dd9c866fe7607. is NOT online; state=\{c0b2d4af686dc6b1c98dd9c866fe7607 state=OPEN, ts=1630577738198, server=fx-hd-sc-hbase-slave-10.fx-hd-sc.fx-ns.svc.cluster.xjht,16020,1628591903870}; ServerCrashProcedures=false. Master startup cannot progress, in holding-pattern until region onlined.
2021-09-02 18:38:49 [zk-event-processor-pool1-t1] INFO org.apache.hadoop.hbase.security.token.ZKSecretWatcher.nodeDeleted(94) -Node deleted id=168
2021-09-02 18:39:42 [ProcExecTimeout] INFO org.apache.hadoop.hbase.master.assignment.AssignmentManager.periodicExecute(1334) -Found 0 OPEN regions on dead servers and 177568 OPEN regions on unknown servers
2021-09-02 18:39:46 [master/fx-hd-sc-hbase-master-0:16000:becomeActiveMaster] WARN org.apache.hadoop.hbase.master.HMaster.isRegionOnline(1229) -hbase:namespace,,1477036226969.c0b2d4af686dc6b1c98dd9c866fe7607. is NOT online; state=\{c0b2d4af686dc6b1c98dd9c866fe7607 state=OPEN, ts=1630577738198, server=fx-hd-sc-hbase-slave-10.fx-hd-sc.fx-ns.svc.cluster.xjht,16020,1628591903870}; ServerCrashProcedures=false. Master startup cannot progress, in holding-pattern until region onlined.

 
{panel:title=the code of master}
https://github.com/apache/hbase/blob/fd3fdc08d1cd43eb3432a1a70d31c3aece6ecabe/hbase-server/src/main/java/org/apache/hadoop/hbase/master/HMaster.java#L1214
{panel}


 
{panel}
 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)