You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Zach York (JIRA)" <ji...@apache.org> on 2018/06/15 17:58:00 UTC

[jira] [Updated] (HBASE-20723) Custom hbase.wal.dir results in dataloss because we write recovered edits into a different place than where the recovering region server looks for them.

     [ https://issues.apache.org/jira/browse/HBASE-20723?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Zach York updated HBASE-20723:
------------------------------
    Summary: Custom hbase.wal.dir results in dataloss because we write recovered edits into a different place than where the recovering region server looks for them.  (was: WALSplitter uses the rootDir, which is walDir, as the recovered edits root path)

> Custom hbase.wal.dir results in dataloss because we write recovered edits into a different place than where the recovering region server looks for them.
> --------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-20723
>                 URL: https://issues.apache.org/jira/browse/HBASE-20723
>             Project: HBase
>          Issue Type: Bug
>          Components: Recovery, wal
>    Affects Versions: 1.4.0, 1.4.1, 1.4.2, 1.4.3, 1.4.4, 2.0.0
>            Reporter: Rohan Pednekar
>            Assignee: Ted Yu
>            Priority: Critical
>         Attachments: 20723.v1.txt, 20723.v2.txt, 20723.v3.txt, 20723.v4.txt, 20723.v5.txt, 20723.v5.txt, 20723.v6.txt, 20723.v7.txt, 20723.v8.txt, 20723.v9.txt, logs.zip
>
>
> This is an Azure HDInsight HBase cluster with HDP 2.6. and HBase 1.1.2.2.6.3.2-14 
> By default the underlying data is going to wasb://xxxxx@yyyyy/hbase 
>  I tried to move WAL folders to HDFS, which is the SSD mounted on each VM at /mnt.
> hbase.wal.dir= hdfs://mycluster/walontest
> hbase.wal.dir.perms=700
> hbase.rootdir.perms=700
> hbase.rootdir= wasb://XYZ[@hbaseperf.core.net|mailto:duohbase5ds4v2@duohbaseperf.blob.core.windows.net]/hbase
> Procedure to reproduce this issue:
> 1. create a table in hbase shell
> 2. insert a row in hbase shell
> 3. reboot the VM which hosts that region
> 4. scan the table in hbase shell and it is empty
> Looking at the region server logs:
> {code:java}
> 2018-06-12 22:08:40,455 INFO  [RS_LOG_REPLAY_OPS-wn2-duohba:16020-0-Writer-1] wal.WALSplitter: This region's directory doesn't exist: hdfs://mycluster/walontest/data/default/tb1/b7fd7db5694eb71190955292b3ff7648. It is very likely that it was already split so it's safe to discard those edits.
> {code}
> The log split/replay ignored actual WAL due to WALSplitter is looking for the region directory in the hbase.wal.dir we specified rather than the hbase.rootdir.
> Looking at the source code,
> https://github.com/apache/hbase/blob/master/hbase-server/src/main/java/org/apache/hadoop/hbase/wal/WALSplitter.java it uses the rootDir, which is walDir, as the tableDir root path.
> So if we use HBASE-17437, waldir and hbase rootdir are in different path or even in different filesystem, then the #5 uses walDir as tableDir is apparently wrong.
> CC: [~zyork], [~yuzhihong@gmail.com] Attached the logs for quick review.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)