You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@zookeeper.apache.org by "Abhay Bothra (JIRA)" <ji...@apache.org> on 2017/04/10 22:45:41 UTC
[jira] [Comment Edited] (ZOOKEEPER-2325) Data inconsistency if all
snapshots empty or missing
[ https://issues.apache.org/jira/browse/ZOOKEEPER-2325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15963604#comment-15963604 ]
Abhay Bothra edited comment on ZOOKEEPER-2325 at 4/10/17 10:44 PM:
-------------------------------------------------------------------
We saw a scenario where the zookeeper cluster had a log.1 file, but no snapshot.0. Is this a possible state of the data dir? If yes, this change prevents Zookeeper from restoring from just a txn log.
was (Author: bothra90):
We saw a scenario where the zookeeper cluster had a log.1 file, but no snapshot.0. Is this a possible state of the data dir? If yes, this change prevents Zookeeper from restoring from just a txn log
> Data inconsistency if all snapshots empty or missing
> ----------------------------------------------------
>
> Key: ZOOKEEPER-2325
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2325
> Project: ZooKeeper
> Issue Type: Bug
> Components: server
> Affects Versions: 3.4.6
> Reporter: Andrew Grasso
> Priority: Critical
> Fix For: 3.5.4, 3.6.0
>
> Attachments: zk.patch, ZOOKEEPER-2325.001.patch, ZOOKEEPER-2325-test.patch
>
> Original Estimate: 1h
> Remaining Estimate: 1h
>
> When loading state from snapshots on startup, FileTxnSnapLog.java ignores the result of FileSnap.deserialize, which is -1L if no valid snapshots are found. Recovery proceeds with dt.lastProcessed == 0, its initial value.
> The result is that Zookeeper will process the transaction logs and then begin serving requests with a different state than the rest of the ensemble.
> To reproduce:
> In a healthy zookeeper cluster of size >= 3, shut down one node.
> Either delete all snapshots for this node or change all to be empty files.
> Restart the node.
> We believe this can happen organically if a node runs out of disk space.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)