You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@zookeeper.apache.org by Gaurav Saxena <gs...@gmail.com> on 2014/08/20 19:24:09 UTC

Data loss scenario

I am curious about a seemingly data loss scenario. I describe it below

There are three zookeeper servers A, B, and C.
1. At one point in time t1 the state of the system is as follows:
A is up and contains data d1, d2. A is master
B is up and contains data d1, d2
C is up and contains data d1, d2

2. At time t2 C goes down. The state of the system at t2 is
A is up and contains data d1, d2. A is master
B is up and contains data d1, d2
C is down and its log contains data d1, d2

3. At time t3 the state of the system changes
A is up and contains data d1, d2, d3. A is master
B is up and contains data d1, d2, d3
C is down and its log contains data d1, d2

4. At time t4, C comes up and also becomes the master, while A and B are
also up

Question: Because C is master, will the logs of A and B be truncated to
contain only d1 and d2? Is this considered a data loss scenario? If yes, is
there an issue around it?

-- 
Regards
Gaurav Saxena

Re: Data loss scenario

Posted by Gaurav Saxena <gs...@gmail.com>.
Thanks a lot Alexander. That's a great starting point. I will look into the
code.

On Wednesday, August 20, 2014, Alexander Shraer <sh...@gmail.com> wrote:

> I think its:
>
> src/java/main/org/apache/zookeeper/server/quorum/Leader.java,
> waitForEpochAck throws exception if the follower is ahead of the leader in
> terms of data, like in your example
>
> src/java/main/org/apache/zookeeper/server/quorum/LearnerHandler.java, run()
> throws exception if follower has a more up-to-date configuration than
> leader.
>
> Since a leader needs support from a quorum, when trying to become leader
> one of the servers who knows about d3 will need to connect to it (since d3
> was committed and every two majorities intersect). So C will not be able to
> gather the required support without triggering the checks above.
>
> In fact C is very unlikely to get that far as to try to become the leader -
> as Henry mentioned ZooKeeper has a preliminary protocol called
> FastLeaderElection.java which tries to make sure that the candidate leader
> has the most up-to-date data and support from a quorum. This is how the
> candidate is chosen and then the other servers establish connections to
> this candidate. The checks above are in case by the time connections are
> established to the candidate leader some server from whom he previously
> didn't hear in FastLeaderElection tries to connect and the candidate leader
> discovers that he shouldn't really be the leader. Then he gives up and
> returns back to FastLeaderElection.
>
>
>
>
>
> On Wed, Aug 20, 2014 at 10:42 AM, Gaurav Saxena <gsaxena81@gmail.com
> <javascript:;>> wrote:
>
> > Thanks! That's great... If someone can point me to the code where this is
> > decided, it will be a great help... as I have to present evidence that
> this
> > scenario will not happen
> >
> >
> > On Wed, Aug 20, 2014 at 10:33 AM, Henry Robinson <henry@cloudera.com
> <javascript:;>>
> > wrote:
> >
> > > IIRC, C cannot become the master because it does not have all the
> changes
> > > that A and B have seen. The leader election protocol can take care of
> > > ensuring the invariant that the elected master must be the most
> > up-to-date
> > > of all peers. (Alternatively, the new master can request the missing
> log
> > > suffix from the peers during election, but I believe, although it's a
> > while
> > > since I checked, that ZK does the former. Someone can fill in the
> > details /
> > > correct me).
> > >
> > > Henry
> > >
> > >
> > > On 20 August 2014 10:24, Gaurav Saxena <gsaxena81@gmail.com
> <javascript:;>> wrote:
> > >
> > > > I am curious about a seemingly data loss scenario. I describe it
> below
> > > >
> > > > There are three zookeeper servers A, B, and C.
> > > > 1. At one point in time t1 the state of the system is as follows:
> > > > A is up and contains data d1, d2. A is master
> > > > B is up and contains data d1, d2
> > > > C is up and contains data d1, d2
> > > >
> > > > 2. At time t2 C goes down. The state of the system at t2 is
> > > > A is up and contains data d1, d2. A is master
> > > > B is up and contains data d1, d2
> > > > C is down and its log contains data d1, d2
> > > >
> > > > 3. At time t3 the state of the system changes
> > > > A is up and contains data d1, d2, d3. A is master
> > > > B is up and contains data d1, d2, d3
> > > > C is down and its log contains data d1, d2
> > > >
> > > > 4. At time t4, C comes up and also becomes the master, while A and B
> > are
> > > > also up
> > > >
> > > > Question: Because C is master, will the logs of A and B be truncated
> to
> > > > contain only d1 and d2? Is this considered a data loss scenario? If
> > yes,
> > > is
> > > > there an issue around it?
> > > >
> > > > --
> > > > Regards
> > > > Gaurav Saxena
> > > >
> > >
> > >
> > >
> > > --
> > > Henry Robinson
> > > Software Engineer
> > > Cloudera
> > > 415-994-6679
> > >
> >
> >
> >
> > --
> > Regards
> > Gaurav Saxena
> >
>


-- 
Regards
Gaurav Saxena

Re: Data loss scenario

Posted by Alexander Shraer <sh...@gmail.com>.
I think its:

src/java/main/org/apache/zookeeper/server/quorum/Leader.java,
waitForEpochAck throws exception if the follower is ahead of the leader in
terms of data, like in your example

src/java/main/org/apache/zookeeper/server/quorum/LearnerHandler.java, run()
throws exception if follower has a more up-to-date configuration than
leader.

Since a leader needs support from a quorum, when trying to become leader
one of the servers who knows about d3 will need to connect to it (since d3
was committed and every two majorities intersect). So C will not be able to
gather the required support without triggering the checks above.

In fact C is very unlikely to get that far as to try to become the leader -
as Henry mentioned ZooKeeper has a preliminary protocol called
FastLeaderElection.java which tries to make sure that the candidate leader
has the most up-to-date data and support from a quorum. This is how the
candidate is chosen and then the other servers establish connections to
this candidate. The checks above are in case by the time connections are
established to the candidate leader some server from whom he previously
didn't hear in FastLeaderElection tries to connect and the candidate leader
discovers that he shouldn't really be the leader. Then he gives up and
returns back to FastLeaderElection.





On Wed, Aug 20, 2014 at 10:42 AM, Gaurav Saxena <gs...@gmail.com> wrote:

> Thanks! That's great... If someone can point me to the code where this is
> decided, it will be a great help... as I have to present evidence that this
> scenario will not happen
>
>
> On Wed, Aug 20, 2014 at 10:33 AM, Henry Robinson <he...@cloudera.com>
> wrote:
>
> > IIRC, C cannot become the master because it does not have all the changes
> > that A and B have seen. The leader election protocol can take care of
> > ensuring the invariant that the elected master must be the most
> up-to-date
> > of all peers. (Alternatively, the new master can request the missing log
> > suffix from the peers during election, but I believe, although it's a
> while
> > since I checked, that ZK does the former. Someone can fill in the
> details /
> > correct me).
> >
> > Henry
> >
> >
> > On 20 August 2014 10:24, Gaurav Saxena <gs...@gmail.com> wrote:
> >
> > > I am curious about a seemingly data loss scenario. I describe it below
> > >
> > > There are three zookeeper servers A, B, and C.
> > > 1. At one point in time t1 the state of the system is as follows:
> > > A is up and contains data d1, d2. A is master
> > > B is up and contains data d1, d2
> > > C is up and contains data d1, d2
> > >
> > > 2. At time t2 C goes down. The state of the system at t2 is
> > > A is up and contains data d1, d2. A is master
> > > B is up and contains data d1, d2
> > > C is down and its log contains data d1, d2
> > >
> > > 3. At time t3 the state of the system changes
> > > A is up and contains data d1, d2, d3. A is master
> > > B is up and contains data d1, d2, d3
> > > C is down and its log contains data d1, d2
> > >
> > > 4. At time t4, C comes up and also becomes the master, while A and B
> are
> > > also up
> > >
> > > Question: Because C is master, will the logs of A and B be truncated to
> > > contain only d1 and d2? Is this considered a data loss scenario? If
> yes,
> > is
> > > there an issue around it?
> > >
> > > --
> > > Regards
> > > Gaurav Saxena
> > >
> >
> >
> >
> > --
> > Henry Robinson
> > Software Engineer
> > Cloudera
> > 415-994-6679
> >
>
>
>
> --
> Regards
> Gaurav Saxena
>

Re: Data loss scenario

Posted by Gaurav Saxena <gs...@gmail.com>.
Thanks! That's great... If someone can point me to the code where this is
decided, it will be a great help... as I have to present evidence that this
scenario will not happen


On Wed, Aug 20, 2014 at 10:33 AM, Henry Robinson <he...@cloudera.com> wrote:

> IIRC, C cannot become the master because it does not have all the changes
> that A and B have seen. The leader election protocol can take care of
> ensuring the invariant that the elected master must be the most up-to-date
> of all peers. (Alternatively, the new master can request the missing log
> suffix from the peers during election, but I believe, although it's a while
> since I checked, that ZK does the former. Someone can fill in the details /
> correct me).
>
> Henry
>
>
> On 20 August 2014 10:24, Gaurav Saxena <gs...@gmail.com> wrote:
>
> > I am curious about a seemingly data loss scenario. I describe it below
> >
> > There are three zookeeper servers A, B, and C.
> > 1. At one point in time t1 the state of the system is as follows:
> > A is up and contains data d1, d2. A is master
> > B is up and contains data d1, d2
> > C is up and contains data d1, d2
> >
> > 2. At time t2 C goes down. The state of the system at t2 is
> > A is up and contains data d1, d2. A is master
> > B is up and contains data d1, d2
> > C is down and its log contains data d1, d2
> >
> > 3. At time t3 the state of the system changes
> > A is up and contains data d1, d2, d3. A is master
> > B is up and contains data d1, d2, d3
> > C is down and its log contains data d1, d2
> >
> > 4. At time t4, C comes up and also becomes the master, while A and B are
> > also up
> >
> > Question: Because C is master, will the logs of A and B be truncated to
> > contain only d1 and d2? Is this considered a data loss scenario? If yes,
> is
> > there an issue around it?
> >
> > --
> > Regards
> > Gaurav Saxena
> >
>
>
>
> --
> Henry Robinson
> Software Engineer
> Cloudera
> 415-994-6679
>



-- 
Regards
Gaurav Saxena

Re: Data loss scenario

Posted by Henry Robinson <he...@cloudera.com>.
IIRC, C cannot become the master because it does not have all the changes
that A and B have seen. The leader election protocol can take care of
ensuring the invariant that the elected master must be the most up-to-date
of all peers. (Alternatively, the new master can request the missing log
suffix from the peers during election, but I believe, although it's a while
since I checked, that ZK does the former. Someone can fill in the details /
correct me).

Henry


On 20 August 2014 10:24, Gaurav Saxena <gs...@gmail.com> wrote:

> I am curious about a seemingly data loss scenario. I describe it below
>
> There are three zookeeper servers A, B, and C.
> 1. At one point in time t1 the state of the system is as follows:
> A is up and contains data d1, d2. A is master
> B is up and contains data d1, d2
> C is up and contains data d1, d2
>
> 2. At time t2 C goes down. The state of the system at t2 is
> A is up and contains data d1, d2. A is master
> B is up and contains data d1, d2
> C is down and its log contains data d1, d2
>
> 3. At time t3 the state of the system changes
> A is up and contains data d1, d2, d3. A is master
> B is up and contains data d1, d2, d3
> C is down and its log contains data d1, d2
>
> 4. At time t4, C comes up and also becomes the master, while A and B are
> also up
>
> Question: Because C is master, will the logs of A and B be truncated to
> contain only d1 and d2? Is this considered a data loss scenario? If yes, is
> there an issue around it?
>
> --
> Regards
> Gaurav Saxena
>



-- 
Henry Robinson
Software Engineer
Cloudera
415-994-6679