You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@zookeeper.apache.org by Hongchao Deng <hd...@cloudera.com> on 2015/03/10 04:31:02 UTC

Changing sync() to need quorum ack

Hi all,

I recently worked on fixing flaky test -- testPortChange(), which is
related to ZOOKEEPER-2000.

This is what I have figured out:

* Server (1) and (2) were followers, (3) was the leader.
* client connected to (1), did a reconfig().
* (1) and (2) formed a quorum, reconfig was successful, and returned.
* (3) still thinks he's the leader, so using LeaderZooKeeperServer.
* client connected to (3) did a sync(), and the sync didn't go through a
quorum. THE CLIENT WHO DID SYNC() GETS WRONG BEHAVIOR. There's a split
brain here for sync().
* Then (3) gradually moves to the new quorum config.

I'm proposing to change sync() to need quorum acks. I've privately talked
with my friend Xiang Li who's working on etcd. He previously had similar
experience and finally changed sync to go through quorum.

Since this change affects the behavior of sync(), I'm asking in public if
there's any concern/assumption? Let's discuss it here.

Best,
-- 
*- Hongchao Deng*
*Software Engineer*

Re: Changing sync() to need quorum ack

Posted by Marshall McMullen <ma...@gmail.com>.
+1. This is how we believed sync was implemented already. Getting these
semantics correct would be very important for us.
On Mar 10, 2015 2:57 AM, "Flavio Junqueira" <fp...@yahoo.com.invalid>
wrote:

> For one thing, this should clean up the mess that we had to do in the code
> to have sync() the way it is, since it was neither a regular nor a regular
> quorum write. I don't know why you say that it changes the behavior. It
> changes the internal behavior, but the expected behavior exposed through
> the API call remains the same, so no user should care about it, it doesn't
> break any code.
>
> -Flavio
>
> > On 10 Mar 2015, at 03:31, Hongchao Deng <hd...@cloudera.com> wrote:
> >
> > Hi all,
> >
> > I recently worked on fixing flaky test -- testPortChange(), which is
> > related to ZOOKEEPER-2000.
> >
> > This is what I have figured out:
> >
> > * Server (1) and (2) were followers, (3) was the leader.
> > * client connected to (1), did a reconfig().
> > * (1) and (2) formed a quorum, reconfig was successful, and returned.
> > * (3) still thinks he's the leader, so using LeaderZooKeeperServer.
> > * client connected to (3) did a sync(), and the sync didn't go through a
> > quorum. THE CLIENT WHO DID SYNC() GETS WRONG BEHAVIOR. There's a split
> > brain here for sync().
> > * Then (3) gradually moves to the new quorum config.
> >
> > I'm proposing to change sync() to need quorum acks. I've privately talked
> > with my friend Xiang Li who's working on etcd. He previously had similar
> > experience and finally changed sync to go through quorum.
> >
> > Since this change affects the behavior of sync(), I'm asking in public if
> > there's any concern/assumption? Let's discuss it here.
> >
> > Best,
> > --
> > *- Hongchao Deng*
> > *Software Engineer*
>
>

Re: Changing sync() to need quorum ack

Posted by Flavio Junqueira <fp...@yahoo.com.INVALID>.
For one thing, this should clean up the mess that we had to do in the code to have sync() the way it is, since it was neither a regular nor a regular quorum write. I don't know why you say that it changes the behavior. It changes the internal behavior, but the expected behavior exposed through the API call remains the same, so no user should care about it, it doesn't break any code.

-Flavio

> On 10 Mar 2015, at 03:31, Hongchao Deng <hd...@cloudera.com> wrote:
> 
> Hi all,
> 
> I recently worked on fixing flaky test -- testPortChange(), which is
> related to ZOOKEEPER-2000.
> 
> This is what I have figured out:
> 
> * Server (1) and (2) were followers, (3) was the leader.
> * client connected to (1), did a reconfig().
> * (1) and (2) formed a quorum, reconfig was successful, and returned.
> * (3) still thinks he's the leader, so using LeaderZooKeeperServer.
> * client connected to (3) did a sync(), and the sync didn't go through a
> quorum. THE CLIENT WHO DID SYNC() GETS WRONG BEHAVIOR. There's a split
> brain here for sync().
> * Then (3) gradually moves to the new quorum config.
> 
> I'm proposing to change sync() to need quorum acks. I've privately talked
> with my friend Xiang Li who's working on etcd. He previously had similar
> experience and finally changed sync to go through quorum.
> 
> Since this change affects the behavior of sync(), I'm asking in public if
> there's any concern/assumption? Let's discuss it here.
> 
> Best,
> -- 
> *- Hongchao Deng*
> *Software Engineer*


Re: Changing sync() to need quorum ack

Posted by Michi Mutsuzaki <mi...@cs.stanford.edu>.
+1 Thank you for looking into this Hongchao.

On Mon, Mar 9, 2015 at 8:31 PM, Hongchao Deng <hd...@cloudera.com> wrote:
> Hi all,
>
> I recently worked on fixing flaky test -- testPortChange(), which is
> related to ZOOKEEPER-2000.
>
> This is what I have figured out:
>
> * Server (1) and (2) were followers, (3) was the leader.
> * client connected to (1), did a reconfig().
> * (1) and (2) formed a quorum, reconfig was successful, and returned.
> * (3) still thinks he's the leader, so using LeaderZooKeeperServer.
> * client connected to (3) did a sync(), and the sync didn't go through a
> quorum. THE CLIENT WHO DID SYNC() GETS WRONG BEHAVIOR. There's a split
> brain here for sync().
> * Then (3) gradually moves to the new quorum config.
>
> I'm proposing to change sync() to need quorum acks. I've privately talked
> with my friend Xiang Li who's working on etcd. He previously had similar
> experience and finally changed sync to go through quorum.
>
> Since this change affects the behavior of sync(), I'm asking in public if
> there's any concern/assumption? Let's discuss it here.
>
> Best,
> --
> *- Hongchao Deng*
> *Software Engineer*