You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@zookeeper.apache.org by "Abhay Bothra (JIRA)" <ji...@apache.org> on 2017/04/10 22:45:41 UTC

[jira] [Comment Edited] (ZOOKEEPER-2325) Data inconsistency if all snapshots empty or missing

    [ https://issues.apache.org/jira/browse/ZOOKEEPER-2325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15963604#comment-15963604 ] 

Abhay Bothra edited comment on ZOOKEEPER-2325 at 4/10/17 10:44 PM:
-------------------------------------------------------------------

We saw a scenario where the zookeeper cluster had a log.1 file, but no snapshot.0. Is this a possible state of the data dir? If yes, this change prevents Zookeeper from restoring from just a txn log.


was (Author: bothra90):
We saw a scenario where the zookeeper cluster had a log.1 file, but no snapshot.0. Is this a possible state of the data dir? If yes, this change prevents Zookeeper from restoring from just a txn log

> Data inconsistency if all snapshots empty or missing
> ----------------------------------------------------
>
>                 Key: ZOOKEEPER-2325
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2325
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: server
>    Affects Versions: 3.4.6
>            Reporter: Andrew Grasso
>            Priority: Critical
>             Fix For: 3.5.4, 3.6.0
>
>         Attachments: zk.patch, ZOOKEEPER-2325.001.patch, ZOOKEEPER-2325-test.patch
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> When loading state from snapshots on startup, FileTxnSnapLog.java ignores the result of FileSnap.deserialize, which is -1L if no valid snapshots are found. Recovery proceeds with dt.lastProcessed == 0, its initial value.
> The result is that Zookeeper will process the transaction logs and then begin serving requests with a different state than the rest of the ensemble.
> To reproduce:
> In a healthy zookeeper cluster of size >= 3, shut down one node.
> Either delete all snapshots for this node or change all to be empty files.
> Restart the node.
> We believe this can happen organically if a node runs out of disk space.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)