You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@zookeeper.apache.org by Yang Sirius <al...@outlook.com> on 2022/12/13 04:52:47 UTC

Two issues in ZooKeeper that might cause data inconsistency or committed data loss

Hi everyone!

Recently we discovered two issues in the ZooKeeper’s latest versions that might cause data inconsistency or committed data loss. Details and analysis of the issues are presented on JIRA:


  *   ZOOKEEPER-4643<https://issues.apache.org/jira/browse/ZOOKEEPER-4643> :  Committed txns may be improperly truncated if follower crashes right after updating currentEpoch but before persisting txns to disk.
  *   ZOOKEEPER-4646<https://issues.apache.org/jira/browse/ZOOKEEPER-4646> : Committed txns may still be lost if followers crash after replying ACK-LD but before writing txns to disk. (This issue is related to the fix of ZOOKEEPER-3911<https://issues.apache.org/jira/browse/ZOOKEEPER-3911>)

The issues seem to be critical since they lead to data loss or inconsistency, which violate the properties that ZAB is supposed to satisfy. I wonder whether the bugs should get a fix since data consistency is of prime importance of ZooKeeper. If so, I will try to fix the code together with further testing and verification techniques.

Thanks!

Attached are example traces of these two issues that have been generated in multiple versions such as 3.8.0 & 3.7.1. (The traces are also provided on JIRA.)
Trace-ZK-4643:
Trace-ZK-4646:

Re: Two issues in ZooKeeper that might cause data inconsistency or committed data loss

Posted by Enrico Olivelli <eo...@gmail.com>.
Yang,
Thanks for your report


Il Mar 13 Dic 2022, 18:16 Yang Sirius <al...@outlook.com> ha
scritto:

> Hi everyone!
>
> Recently we discovered two issues in the ZooKeeper’s latest versions that
> might cause data inconsistency or committed data loss. Details and analysis
> of the issues are presented on JIRA:
>
>
>    - ZOOKEEPER-4643 <https://issues.apache.org/jira/browse/ZOOKEEPER-4643> :
>     Committed txns may be improperly truncated if follower crashes right
>    after updating currentEpoch but before persisting txns to disk.
>    - ZOOKEEPER-4646 <https://issues.apache.org/jira/browse/ZOOKEEPER-4646>
>     : Committed txns may still be lost if followers crash after replying
>    ACK-LD but before writing txns to disk. (This issue is related to the fix
>    of ZOOKEEPER-3911
>    <https://issues.apache.org/jira/browse/ZOOKEEPER-3911>)
>
>
> The issues seem to be critical since they lead to data loss or
> inconsistency, which violate the properties that ZAB is supposed to
> satisfy. I wonder whether the bugs should get a fix since data consistency
> is of prime importance of ZooKeeper. If so, I will try to fix the code
> together with further testing and verification techniques.
>

Help is always welcome!

I personally don't have time to investigate and code a fix, but I will be
happy to review your work

Thank you very much

Sharing problems and solutions is fundamental for an OSS community like
Apache ZooKeeper

Cheers
Enrico



> Thanks!
>
> Attached are example traces of these two issues that have been generated
> in multiple versions such as 3.8.0 & 3.7.1. (The traces are also provided
> on JIRA.)
> Trace-ZK-4643:
> Trace-ZK-4646:
>