You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Yunfan Zhong (JIRA)" <ji...@apache.org> on 2014/02/14 02:16:26 UTC
[jira] [Updated] (HBASE-10466) Bugs that causes flushes being skipped during HRegion close could cause data loss

     [ https://issues.apache.org/jira/browse/HBASE-10466?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Yunfan Zhong updated HBASE-10466:
---------------------------------

    Description: 
During region close, there are two flushes to ensure nothing is persisted in memory. When there is data in current memstore only, 1 flush is required. When there is data also in memstore's snapshot, 2 flushes are essential otherwise we have data loss. However, recently we found two bugs that lead to at least 1 flush skipped and caused data loss.

Bug 1: Wrong calculation of HRegion.memstoreSize
When a flush fails, data to be flushed is kept in each MemStore's snapshot and wait for next flush attempt to continue on it. But when the next flush succeeds, the counter of total memstore size in HRegion is always deduced by the sum of current memstore sizes instead of snapshots left from previous failed flush. This calculation is problematic that almost every time there is failed flush, HRegion.memstoreSize gets reduced by a wrong value. If region flush could not proceed for a couple cycles, the size in current memstore could be much larger than the snapshot. It's likely to drift memstoreSize much smaller than expected. In extreme case, if the error accumulates to even bigger than HRegion's memstore size limit, any further flush is skipped because flush does not do anything if memstoreSize is not larger than 0.
When the region is closing, if the two flushes get skipped and leave data in current memstore and/or snapshot, we could lose data up to the memstore size limit of the region.
The fix is deducing correct size of data that is going to be flushed from memstoreSize.

Bug 2: Conditions for the first flush of region close (so-called pre-flush)
If memstoreSize is smaller than a certain value, or when region close starts a flush is ongoing, the first flush is skipped and only the second flush takes place. However, two flushes are required in case previous flush fails and leaves some data in snapshot. The bug could cause loss of data in current memstore.
The fix is removing all conditions except abort check so we ensure 2 flushes for region close.


  was:
When there are failed flushes, data to be flush are kept in each MemStore's snapshot. Next flush attempt will continue on snapshot first. However, the counter of total memstore size in HRegion is always deduced by the sum of current memstore sizes after the flush succeeds. This calculation is definitely wrong if flush fails last time.
When the region is closing, there are two flushes. During the period that some data is in snapshot and the memstore size is incorrect, the first flush successfully saved data in snapshot. But the memstore size counter was reduced to 0 or even less. This prevented the second flush since HRegion.internalFlushcache() directly returns while total memstore size is not greater than 0. As result, data in memstores were lost.
It could cause mass data loss up to the size limit of memstores.

        Summary: Bugs that causes flushes being skipped during HRegion close could cause data loss  (was: Wrong calculation of total memstore size in HRegion which could cause data loss)

> Bugs that causes flushes being skipped during HRegion close could cause data loss
> ---------------------------------------------------------------------------------
>
>                 Key: HBASE-10466
>                 URL: https://issues.apache.org/jira/browse/HBASE-10466
>             Project: HBase
>          Issue Type: Bug
>          Components: regionserver
>    Affects Versions: 0.89-fb
>            Reporter: Yunfan Zhong
>            Priority: Critical
>             Fix For: 0.89-fb
>
>
> During region close, there are two flushes to ensure nothing is persisted in memory. When there is data in current memstore only, 1 flush is required. When there is data also in memstore's snapshot, 2 flushes are essential otherwise we have data loss. However, recently we found two bugs that lead to at least 1 flush skipped and caused data loss.
> Bug 1: Wrong calculation of HRegion.memstoreSize
> When a flush fails, data to be flushed is kept in each MemStore's snapshot and wait for next flush attempt to continue on it. But when the next flush succeeds, the counter of total memstore size in HRegion is always deduced by the sum of current memstore sizes instead of snapshots left from previous failed flush. This calculation is problematic that almost every time there is failed flush, HRegion.memstoreSize gets reduced by a wrong value. If region flush could not proceed for a couple cycles, the size in current memstore could be much larger than the snapshot. It's likely to drift memstoreSize much smaller than expected. In extreme case, if the error accumulates to even bigger than HRegion's memstore size limit, any further flush is skipped because flush does not do anything if memstoreSize is not larger than 0.
> When the region is closing, if the two flushes get skipped and leave data in current memstore and/or snapshot, we could lose data up to the memstore size limit of the region.
> The fix is deducing correct size of data that is going to be flushed from memstoreSize.
> Bug 2: Conditions for the first flush of region close (so-called pre-flush)
> If memstoreSize is smaller than a certain value, or when region close starts a flush is ongoing, the first flush is skipped and only the second flush takes place. However, two flushes are required in case previous flush fails and leaves some data in snapshot. The bug could cause loss of data in current memstore.
> The fix is removing all conditions except abort check so we ensure 2 flushes for region close.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)