You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@zookeeper.apache.org by "Rakesh R (JIRA)" <ji...@apache.org> on 2016/11/07 05:29:59 UTC
[jira] [Commented] (ZOOKEEPER-2574) PurgeTxnLog can inadvertently
delete required txn log files
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15643171#comment-15643171 ]
Rakesh R commented on ZOOKEEPER-2574:
-------------------------------------
Thank you [~abhishekrai]. I could see there are few more corrections required in the [zookeeperAdmin.html#The+Log+Directory|https://zookeeper.apache.org/doc/r3.4.9/zookeeperAdmin.html#The+Log+Directory], right?. I'd suggest you to read the {{ZooKeeper transaction log and snapshot}} related sections in the ZK docs and do necessary changes. Appreciate your time & efforts.
{code}
The Log Directory:
A new log file is started each time a snapshot is begun. The log file's suffix is the first zxid written to that log
{code}
Could you please create a pull request for the proposed patch, that will be used for code reviews and commits.
Hi [~fpj], [~phunt], [~rgs]. As part of this jira we have came across a situation {{"where snapshotting has happened multiple times without accompanying log rollover"}}, this is contradicting with the ZooKeeper docs. I think, this would be a serious concern in the view of disaster recovery scripts. If someone has written a script blindly following that "A new log file is started each time a snapshot is begun". It would be really helpful if you could pitch in and give your thoughts on this problem. Thanks!
> PurgeTxnLog can inadvertently delete required txn log files
> -----------------------------------------------------------
>
> Key: ZOOKEEPER-2574
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2574
> Project: ZooKeeper
> Issue Type: Bug
> Components: server
> Affects Versions: 3.4.7, 3.4.8, 3.5.0, 3.5.1, 3.5.2
> Environment: Zookeeper 3.4.8, standalone, and 3-server quorum
> Reporter: Abhishek Rai
> Assignee: Abhishek Rai
> Priority: Blocker
> Fix For: 3.4.10, 3.5.3
>
> Attachments: ZOOKEEPER-2574.2.patch, ZOOKEEPER-2574.3.patch, ZOOKEEPER-2574.4.patch, ZOOKEEPER-2574.5.patch, ZOOKEEPER-2574.6.patch, ZOOKEEPER-2574.patch
>
>
> As part of the fix for ZOOKEEPER-1797, the call to FileTxnSnapLog.getSnapshotLogs() was removed from PurgeTxnLog.java. As a result, some old-looking but required txn log files can be deleted, resulting in data corruption or loss.
> For example, consider the following:
> 1. Configuration:
> autopurge.snapRetainCount=3
> 2. Following files exist:
> log.100 spans transactions from zxid=100 till zxid=140 (inclusive)
> snapshot.110 - snapshot as of zxid=110
> snapshot.120 - snapshot as of zxid=120
> snapshot.130 - snapshot as of zxid=130
> Above scenario is possible when snapshotting has happened multiple times but without accompanying log rollover, which is possible if the server was running as a learner.
> 3. PurgeTxnLog retains all snapshots but deletes log.100 because its zxid is older than the zxid of the oldest snapshot (110). This results in loss of transactions in the range 131-140.
> Before the fix for ZOOKEEPER-1797, this was avoided by the call to FileTxnSnapLog.getSnapshotLogs() which finds and retains the newest txn log file with starting zxid < oldest retained snapshot's highest zxid.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)