You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Michael Stack (Jira)" <ji...@apache.org> on 2020/01/09 22:33:00 UTC
[jira] [Comment Edited] (HBASE-23286) Improve MTTR: Split WAL to HFile

    [ https://issues.apache.org/jira/browse/HBASE-23286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17012272#comment-17012272 ] 

Michael Stack edited comment on HBASE-23286 at 1/9/20 10:32 PM:
----------------------------------------------------------------

[~zghao] does this work for you?

I enabled hbase.wal.split.to.hfile by setting it to true. I killed a few servers. The SCP logging shows this for the split log steps...
{code}
2020-01-09 22:17:55,346 DEBUG org.apache.hadoop.hbase.master.MasterWalManager: Renamed region directory: hdfs://nameservice1/hbase/genie/WALs/h5,16020,1578604825302-splitting
2020-01-09 22:17:55,347 INFO org.apache.hadoop.hbase.master.SplitLogManager: dead splitlog workers [h5,16020,1578604825302]
2020-01-09 22:17:55,351 INFO org.apache.hadoop.hbase.master.SplitLogManager: hdfs://nameservice1/hbase/genie/WALs/h5,16020,1578604825302-splitting dir is empty, no logs to split.
2020-01-09 22:17:55,355 INFO org.apache.hadoop.hbase.master.SplitLogManager: Finished splitting (more than or equal to) 0 (0 bytes) in 0 log files in [hdfs://nameservice1/hbase/genie/WALs/h5,16020,1578604825302-splitting] in 0ms
2020-01-09 22:17:55,356 DEBUG org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure: Done splitting WALs pid=123301, state=RUNNABLE:SERVER_CRASH_SPLIT_LOGS, locked=true; ServerCrashProcedure server=h5,16020,1578604825302, splitWal=true, meta=false
{code}
The dir had 50 odd WALs in it but after above runs all are gone. Above runs too quickly. No instances of recovered.edits in my fs.

Let me look at patch...


was (Author: stack):
[~zghao] does this work for you?

I enabled hbase.wal.split.to.hfile by setting it to true. I killed a few servers. The SCP logging shows this for the split log steps...

2020-01-09 22:17:55,346 DEBUG org.apache.hadoop.hbase.master.MasterWalManager: Renamed region directory: hdfs://nameservice1/hbase/genie/WALs/h5,16020,1578604825302-splitting
2020-01-09 22:17:55,347 INFO org.apache.hadoop.hbase.master.SplitLogManager: dead splitlog workers [h5,16020,1578604825302]
2020-01-09 22:17:55,351 INFO org.apache.hadoop.hbase.master.SplitLogManager: hdfs://nameservice1/hbase/genie/WALs/h5,16020,1578604825302-splitting dir is empty, no logs to split.
2020-01-09 22:17:55,355 INFO org.apache.hadoop.hbase.master.SplitLogManager: Finished splitting (more than or equal to) 0 (0 bytes) in 0 log files in [hdfs://nameservice1/hbase/genie/WALs/h5,16020,1578604825302-splitting] in 0ms
2020-01-09 22:17:55,356 DEBUG org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure: Done splitting WALs pid=123301, state=RUNNABLE:SERVER_CRASH_SPLIT_LOGS, locked=true; ServerCrashProcedure server=h5,16020,1578604825302, splitWal=true, meta=false

The dir had 50 odd WALs in it but after above runs all are gone. Above runs too quickly.

Let me look at patch...

> Improve MTTR: Split WAL to HFile
> --------------------------------
>
>                 Key: HBASE-23286
>                 URL: https://issues.apache.org/jira/browse/HBASE-23286
>             Project: HBase
>          Issue Type: Improvement
>          Components: MTTR
>    Affects Versions: 3.0.0, 2.3.0
>            Reporter: Guanghao Zhang
>            Assignee: Guanghao Zhang
>            Priority: Major
>             Fix For: 3.0.0, 2.3.0
>
>
> After HBASE-20724, the compaction event marker is not used anymore when failover. So our new proposal is split WAL to HFile to imporve MTTR. It has 3 steps:
>  # Read WAL and write HFile to region’s column family’s recovered.hfiles directory.
>  # Open region.
>  # Bulkload the recovered.hfiles for every column family.
> The design doc was attathed by a google doc. Any suggestions are welcomed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)