You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@zookeeper.apache.org by "Abhishek Rai (JIRA)" <ji...@apache.org> on 2016/09/10 19:10:20 UTC

[jira] [Updated] (ZOOKEEPER-2574) PurgeTxnLog can inadvertently delete required txn log files

     [ https://issues.apache.org/jira/browse/ZOOKEEPER-2574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Abhishek Rai updated ZOOKEEPER-2574:
------------------------------------
    Description: 
As part of the fix for ZOOKEEPER-1797, the call to FileTxnSnapLog.getSnapshotLogs() was removed from PurgeTxnLog.java.  As a result, some old-looking but required txn log files can be deleted, resulting in data corruption or loss.

For example, consider the following:

1. Configuration:
autopurge.snapRetainCount=3

2. Following files exist:
log.100 spans transactions from zxid=100 till zxid=140 (inclusive)
snapshot.110 - snapshot as of zxid=110
snapshot.120 - snapshot as of zxid=120
snapshot.130 - snapshot as of zxid=130

Above scenario is possible when snapshotting has happened multiple times but without accompanying log rollover, which is possible if the server was running as a learner.

3. PurgeTxnLog retains all snapshots but deletes log.100 because its zxid is older than the zxid of the oldest snapshot (110).  This results in loss of transactions in the range 131-140.

Before the fix for ZOOKEEPER-1797, this was avoided by the call to FileTxnSnapLog.getSnapshotLogs() which finds and retains the newest txn log file with starting zxid < oldest retained snapshot's highest zxid.

  was:
As part of the fix for ZOOKEEPER-1797, the call to FileTxnSnapLog.getSnapshotLogs() was removed from PurgeTxnLog.java.  As a result, some old-looking but required txn log files can be deleted, resulting in data corruption or loss.

For example, consider the following:

1. Configuration:
autopurge.snapRetainCount=3

2. Following files exist:
log.100 spans transactions from zxid=100 till zxid=140 (inclusive)
snapshot.110 - snapshot as of zxid=110
snapshot.120 - snapshot as of zxid=120
snapshot.130 - snapshot as of zxid=130

3. PurgeTxnLog retains all snapshots but deletes log.100 because its zxid is older than the zxid of the oldest snapshot (110).  This results in loss of transactions in the range 131-140.

Before the fix for ZOOKEEPER-1797, this was avoided by the call to FileTxnSnapLog.getSnapshotLogs() which finds the newest txn log file with starting zxid < snapshot zxid.


> PurgeTxnLog can inadvertently delete required txn log files
> -----------------------------------------------------------
>
>                 Key: ZOOKEEPER-2574
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2574
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: server
>    Affects Versions: 3.4.7, 3.4.8, 3.5.0, 3.5.1, 3.5.2
>         Environment: Zookeeper 3.4.8, standalone, and 3-server quorum
>            Reporter: Abhishek Rai
>            Priority: Critical
>
> As part of the fix for ZOOKEEPER-1797, the call to FileTxnSnapLog.getSnapshotLogs() was removed from PurgeTxnLog.java.  As a result, some old-looking but required txn log files can be deleted, resulting in data corruption or loss.
> For example, consider the following:
> 1. Configuration:
> autopurge.snapRetainCount=3
> 2. Following files exist:
> log.100 spans transactions from zxid=100 till zxid=140 (inclusive)
> snapshot.110 - snapshot as of zxid=110
> snapshot.120 - snapshot as of zxid=120
> snapshot.130 - snapshot as of zxid=130
> Above scenario is possible when snapshotting has happened multiple times but without accompanying log rollover, which is possible if the server was running as a learner.
> 3. PurgeTxnLog retains all snapshots but deletes log.100 because its zxid is older than the zxid of the oldest snapshot (110).  This results in loss of transactions in the range 131-140.
> Before the fix for ZOOKEEPER-1797, this was avoided by the call to FileTxnSnapLog.getSnapshotLogs() which finds and retains the newest txn log file with starting zxid < oldest retained snapshot's highest zxid.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)