You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Zach York (JIRA)" <ji...@apache.org> on 2019/06/25 20:22:00 UTC

[jira] [Commented] (HBASE-22628) Data loss while migrating to custom WAL directory (hbase.wal.dir)

    [ https://issues.apache.org/jira/browse/HBASE-22628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16872685#comment-16872685 ] 

Zach York commented on HBASE-22628:
-----------------------------------

When the custom WAL directory was added, it was assumed to be a backwards incompatible change needing a clean shutdown before. However, maybe it is time to add some backwards compatibility?
We are never going to be able to handle every case here since that would require knowing what the wal Dir had been set to previously, but maybe it would be enough to add a check to the default location.

The other option is to create a separate migration tool.

> Data loss while migrating to custom WAL directory (hbase.wal.dir)
> -----------------------------------------------------------------
>
>                 Key: HBASE-22628
>                 URL: https://issues.apache.org/jira/browse/HBASE-22628
>             Project: HBase
>          Issue Type: Bug
>          Components: Recovery, wal
>            Reporter: Pankaj Kumar
>            Assignee: Pankaj Kumar
>            Priority: Blocker
>
> There is one data loss scenario while migrating to custom WAL directory.
> Steps to reproduce:
>  # Setup HBase cluster with the default setting (all WAL files are under the root directory ie. /hbase/WALs).
>  # Create table 't1' and insert few records
>  # Flush meta table (so that table region entries persist in FS)
>  # Forcibly kill HBase processes (HM & RS).
>  # Configure the hbase.wal.dir to outside the root dir (say /hbaseWAL)
>  # Start the HBase servers
>  # Scan 't1'
> Ideally HMaster should submit split task of old RS(s) WAL files (created under /hbase/WALs) and old data should be replayed.
> But currently, during HM startup we populate the previous dead servers from the current WAL dir ( hbase.wal.dir -> /hbaseWAL).
> In MasterFileSystem.getFailedServersFromLogFolders(),
> {code:java}
> Set<ServerName> getFailedServersFromLogFolders() {
>  boolean retrySplitting = !conf.getBoolean("hbase.hlog.split.skip.errors",
>  WALSplitter.SPLIT_SKIP_ERRORS_DEFAULT);
> Set<ServerName> serverNames = new HashSet<ServerName>();
>  Path logsDirPath = new Path(this.walRootDir, HConstants.HREGION_LOGDIR_NAME);
> do {
>  if (master.isStopped()) {
>  LOG.warn("Master stopped while trying to get failed servers.");
>  break;
>  }
>  try {
>  if (!this.walFs.exists(logsDirPath)) return serverNames;
>  FileStatus[] logFolders = FSUtils.listStatus(this.walFs, logsDirPath, null);
> {code}
> For backward compatibility we should consider default WAL directory path also.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)