You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Sean Busbey (JIRA)" <ji...@apache.org> on 2019/07/03 01:12:00 UTC

[jira] [Commented] (HBASE-22075) Potential data loss when MOB compaction fails

    [ https://issues.apache.org/jira/browse/HBASE-22075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16877397#comment-16877397 ] 

Sean Busbey commented on HBASE-22075:
-------------------------------------

I've been having a lot of trouble trying to reproduce this failure using the IT on the CDH5 backport of the MOB feature. It's been confusing since the code that I think is responsible is essentially the same - bulk loading the updated references into the non-MOB regions.

At the end of last week I started looking at the bulk load code to see if that was the difference. [~npopa] suggested it was a difference in how much failures are retried by default. It doesn't look obviously different. However, if I take the master branch and update the {{PartitionedMobCompactor}} to set {{hbase.bulkload.retries.number}} to {{MAX_INT}} on the conf it uses when calling bulk load, then the IT no longer reproduces the failure. So even if it isn't specifically the bulkload retries that are different I think we're on the right track that currently the "if it fails just bail" approach is the source of the inconsistency.

I'm going to see if I can rework how we do committing the post-compaction references in the non-MOB regions to expressly handle error conditions.

> Potential data loss when MOB compaction fails
> ---------------------------------------------
>
>                 Key: HBASE-22075
>                 URL: https://issues.apache.org/jira/browse/HBASE-22075
>             Project: HBase
>          Issue Type: Bug
>          Components: mob
>    Affects Versions: 2.1.0, 2.0.0, 2.0.1, 2.1.1, 2.0.2, 2.0.3, 2.1.2, 2.0.4, 2.1.3
>            Reporter: Vladimir Rodionov
>            Assignee: Vladimir Rodionov
>            Priority: Critical
>              Labels: compaction, mob
>             Fix For: 2.0.6, 2.2.1, 2.1.6
>
>         Attachments: HBASE-22075-v1.patch, HBASE-22075-v2.patch, HBASE-22075.test-only.0.patch, HBASE-22075.test-only.1.patch, HBASE-22075.test-only.2.patch, ReproMOBDataLoss.java
>
>
> When MOB compaction fails during last step (bulk load of a newly created reference file) there is a high chance of a data loss due to partially loaded reference file, cells of which refer to (now) non-existent MOB file. The newly created MOB file is deleted automatically in case of a MOB compaction failure, but some cells with the references to this file might be loaded to HBase. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)