You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@zookeeper.apache.org by Alexander Shraer <sh...@gmail.com> on 2013/06/06 16:46:11 UTC

Re: Detecting data loss

A server doesn't know which of the operations in its log were
committed, and so can't say whether truncating the log is something
that resulted from normal operation or from some failure.

Truncating the log is often normal - suppose that A is the leader and
receives 10 operations from some client to propose, but looses
leadership because of temporary network problems. Then B talks with C
and becomes the leader, and neither B nor C know of the new 10
operations, which is ok. Then A reconnects back to B and C and at this
point A's log gets truncated, to match that of B and C.

We could detect a problem if we'd stored the last committed operation
id on disk, so truncating beyond that id is obviously wrong. But that
would not always work, because a server may miss a few last commit
messages (its a quorum protocol). It may be better than nothing
though.

Alex

On Thu, May 30, 2013 at 2:57 PM, Dave Katz <dk...@dkatz.org> wrote:
> Are there any hooks by which the Zookeeper server can signal that it has lost data?  It seems at least theoretically possible that when a server is reconciling its state with other servers that it could detect history truncation and signal it (even as crudely as throwing an exception).  This would provide a mechanism with which an elastic system could do last-ditch recovery when things fell apart.
>
> Thanks,
>
> --Dave
>