You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Michael Stack (Jira)" <ji...@apache.org> on 2020/06/25 19:05:00 UTC

[jira] [Resolved] (HBASE-24585) Failed start recovering crash in standalone mode if procedure-based distributed WAL split & hbase.wal.split.to.hfile=true

     [ https://issues.apache.org/jira/browse/HBASE-24585?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Michael Stack resolved HBASE-24585.
-----------------------------------
    Fix Version/s: 2.3.0
                   3.0.0-alpha-1
         Assignee: Michael Stack
       Resolution: Not A Problem

Resolving as no longer a problem after HBASE-24616 went in.

> Failed start recovering crash in standalone mode if procedure-based distributed WAL split & hbase.wal.split.to.hfile=true
> -------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-24585
>                 URL: https://issues.apache.org/jira/browse/HBASE-24585
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Michael Stack
>            Assignee: Michael Stack
>            Priority: Major
>             Fix For: 3.0.0-alpha-1, 2.3.0
>
>
> (This description got redone after I figured out what was going on. Previously it was just a litany of me banging around trying to learn procedure-based WAL splitting and hbase.wal.split.to.hfile; no one needs to read that; hence the refactor).
> HBASE-24574 procedure-based distributed WAL splitting is enabled and split-to-hflie too. A force crash requires recovery with ServerCrashProcedure splitting old WALs on restart. The recovery fails because we get stuck. The Master can't assign meta because it is being recovered. The recovery can't make progress because it is asking for a table descriptor for meta -- needed by the hbase.wal.split.to.hfile feature -- and the master is not yet initialized.  After the default timeout, Master shuts down because it can't initialize.
> {code}
>  2020-06-18 19:53:54,175 ERROR [main] master.HMasterCommandLine: Master exiting
>  java.lang.RuntimeException: Master not initialized after 200000ms
>    at org.apache.hadoop.hbase.util.JVMClusterUtil.waitForEvent(JVMClusterUtil.java:232)
>    at org.apache.hadoop.hbase.util.JVMClusterUtil.startup(JVMClusterUtil.java:200)
>    at org.apache.hadoop.hbase.LocalHBaseCluster.startup(LocalHBaseCluster.java:430)
>    at org.apache.hadoop.hbase.master.HMasterCommandLine.startMaster(HMasterCommandLine.java:232)
>    at org.apache.hadoop.hbase.master.HMasterCommandLine.run(HMasterCommandLine.java:140)
>    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
>    at org.apache.hadoop.hbase.util.ServerCommandLine.doMain(ServerCommandLine.java:149)
>    at org.apache.hadoop.hbase.master.HMaster.main(HMaster.java:3059)
> {code}
> The abort of Master interrupts other ongoing actions so later in the log we'll see the WAL split show as interrupted
> {code}
>  2020-06-17 21:20:37,472 ERROR [RS_LOG_REPLAY_OPS-regionserver/localhost:16020-0] handler.RSProcedureHandler: Error when call RSProcedureCallable:
>  java.io.IOException: Failed WAL split, status=RESIGNED, wal=file:/Users/stack/checkouts/hbase.apache.git/tmp/hbase/WALs/localhost,16020,1592440848604-splitting/localhost%2C16020%2C1592440848604.meta.1592440852959.meta
>    at org.apache.hadoop.hbase.regionserver.SplitWALCallable.splitWal(SplitWALCallable.java:106)
>    at org.apache.hadoop.hbase.regionserver.SplitWALCallable.call(SplitWALCallable.java:86)
>    at org.apache.hadoop.hbase.regionserver.SplitWALCallable.call(SplitWALCallable.java:49)
>    at org.apache.hadoop.hbase.regionserver.handler.RSProcedureHandler.process(RSProcedureHandler.java:49)
>    at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:104)
>    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>    at java.lang.Thread.run(Thread.java:748)
> {code}
> This issue becomes how to make hbase.wal.split.to.hfile work in standalone mode.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)