You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@zookeeper.apache.org by chen dongming <ca...@hotmail.com> on 2018/07/17 10:13:04 UTC

Zookeeper consistency

hi,

I am reading the old thread of 
http://comments.gmane.org/gmane.comp.java.hadoop.zookeeper.user/5221.

I have 2 questions.


Question 1:

I agree with Zookeeper is sequential consistency, but I don't think 
Zookeeper is linearizability even if doing sync+read for every read.  
for example,

timeline ------------>

C1:                sync()                  r1(x)

C2: w2(x=1)             w2(x=2)

It is possible r1 return x=1.  This history is not linearable according 
to Herlihy difination.


Question 2:

I think that Zookeeper is linearable only if there are only write operation.


Dominic,

Best Regard.



Re: Zookeeper consistency

Posted by Michael Han <ha...@apache.org>.
>> I think that Zookeeper is linearable only if there are only write
operation.

Yes writes are linearizable because writes are totally ordered globally.

For read, a linearizable read needs to read the latest writes in the system
at the point in time the read is issued; so by this definition, ZK read is
not linearizable, even with sync+read as both are separate operations. If
we provide a quorum read (so read operations are broadcasted globally as
well), then the read will be linearizable as well, but no practical systems
would do that due to performance reasons.


On Tue, Jul 17, 2018 at 3:13 AM, chen dongming <ca...@hotmail.com> wrote:

> hi,
>
> I am reading the old thread of
> http://comments.gmane.org/gmane.comp.java.hadoop.zookeeper.user/5221.
>
> I have 2 questions.
>
>
> Question 1:
>
> I agree with Zookeeper is sequential consistency, but I don't think
> Zookeeper is linearizability even if doing sync+read for every read.
> for example,
>
> timeline ------------>
>
> C1:                sync()                  r1(x)
>
> C2: w2(x=1)             w2(x=2)
>
> It is possible r1 return x=1.  This history is not linearable according
> to Herlihy difination.
>
>
> Question 2:
>
> I think that Zookeeper is linearable only if there are only write
> operation.
>
>
> Dominic,
>
> Best Regard.
>
>
>

Re: document for zk internals

Posted by Michael Han <ha...@apache.org>.
>> which mentioned LeaderElection and FastLeaderElection.

The document here is a little bit outdated. We deprecated the old LE
implementation (LeaderElection) after 3.4.0 release, and the only leader
election in use (for both stable 3.4.x and 3.5/6) is now
FastLeaderElection. So now we only have a single LE implementation:
FastLeaderElection.

>> I want to dig into the "Leader election". Any document for it?

The best document is probably the source code itself. Also, the book
"ZooKeeper: Distributed Process Coordination" has a "ZooKeeper Internal"
section that sketches the algorithms used in FLE.

>> Another my understanding, LeaderElection and FastLeaderElection in the
"internals" doc doesn't include phase 1,2,3, right?

Yes, I think FLE is phase 0 - a preliminary phase that should be finished
before discovery / sync phase begin.

Also, it's probably worth noting that the paper "Zab: High-performance
broadcast for primary-backup systems" was talking about an older version
(pre_1.0) of ZAB. The ZAB used in 3.4 and 3.5/6 now is ZAB 1.0. We have
some documentations on the differences between these:

https://cwiki.apache.org/confluence/display/ZOOKEEPER/Zab+Pre+1.0
https://cwiki.apache.org/confluence/display/ZOOKEEPER/Zab1.0


On Thu, Oct 4, 2018 at 8:27 AM chen dongming <ca...@hotmail.com> wrote:

> I had read 2 papers of zab. My question derive from them.
>
> <Zab: High-performance broadcast for primary-backup systems> mentioned:
>
> "Leader election occurs in two stages. First, we run a leader
> election algorithm that outputs a new process as the leader.
> We can use any protocol that, with high probability, chooses
> a process that is up and that a quorum of processes selects."
>
> But I don't think the detail of this "Leader election" is included in 2
> papers. According to my understanding, "Leader election" happen before
> phase 1,2,3 through the oracle. The paper describe phase 1,2,3 in
> detail, but not for "Leader election" and oracle.
>
> I want to dig into the "Leader election". Any document for it?
>
> Another my understanding, LeaderElection and FastLeaderElection in the
> "internals" doc doesn't include phase 1,2,3, right?
>
>
> On 2018/10/4 11:00 PM, Chris wrote:
> > You can find a lot of info if you search for Zookeeper zab protocol.
> >
> >
> http://diyhpl.us/~bryan/papers2/distributed/distributed-systems/zab.totally-ordered-broadcast-protocol.2008.pdf
> >
> >
> > http://web.stanford.edu/class/cs347/reading/zab.pdf
> >
> > Regards
> > Chris
> >
> > On 4 October 2018 4:29:21 pm chen dongming <ca...@hotmail.com> wrote:
> >
> >> Hi,
> >>
> >> I am reading Zookeeper
> >> Internals(
> http://zookeeper.apache.org/doc/current/zookeeperInternals.html)
> >>
> >> which mentioned LeaderElection and FastLeaderElection.
> >>
> >> Is there any more document about LeaderElection and FastLeaderElection?
> >> Such as mechanism of them? Or how are they implement?
> >>
> >> Dominic
> >
> >
> >
>
>

Re: document for zk internals

Posted by chen dongming <ca...@hotmail.com>.
I had read 2 papers of zab. My question derive from them.

<Zab: High-performance broadcast for primary-backup systems> mentioned:

"Leader election occurs in two stages. First, we run a leader
election algorithm that outputs a new process as the leader.
We can use any protocol that, with high probability, chooses
a process that is up and that a quorum of processes selects."

But I don't think the detail of this "Leader election" is included in 2 
papers. According to my understanding, "Leader election" happen before 
phase 1,2,3 through the oracle. The paper describe phase 1,2,3 in 
detail, but not for "Leader election" and oracle.

I want to dig into the "Leader election". Any document for it?

Another my understanding, LeaderElection and FastLeaderElection in the 
"internals" doc doesn't include phase 1,2,3, right?


On 2018/10/4 11:00 PM, Chris wrote:
> You can find a lot of info if you search for Zookeeper zab protocol.
>
> http://diyhpl.us/~bryan/papers2/distributed/distributed-systems/zab.totally-ordered-broadcast-protocol.2008.pdf 
>
>
> http://web.stanford.edu/class/cs347/reading/zab.pdf
>
> Regards
> Chris
>
> On 4 October 2018 4:29:21 pm chen dongming <ca...@hotmail.com> wrote:
>
>> Hi,
>>
>> I am reading Zookeeper
>> Internals(http://zookeeper.apache.org/doc/current/zookeeperInternals.html) 
>>
>> which mentioned LeaderElection and FastLeaderElection.
>>
>> Is there any more document about LeaderElection and FastLeaderElection?
>> Such as mechanism of them? Or how are they implement?
>>
>> Dominic
>
>
>


Re: document for zk internals

Posted by Chris <c....@gmail.com>.
You can find a lot of info if you search for Zookeeper zab protocol.

http://diyhpl.us/~bryan/papers2/distributed/distributed-systems/zab.totally-ordered-broadcast-protocol.2008.pdf

http://web.stanford.edu/class/cs347/reading/zab.pdf

Regards
Chris

On 4 October 2018 4:29:21 pm chen dongming <ca...@hotmail.com> wrote:

> Hi,
>
> I am reading Zookeeper
> Internals(http://zookeeper.apache.org/doc/current/zookeeperInternals.html)
> which mentioned LeaderElection and FastLeaderElection.
>
> Is there any more document about LeaderElection and FastLeaderElection?
> Such as mechanism of them? Or how are they implement?
>
> Dominic




document for zk internals

Posted by chen dongming <ca...@hotmail.com>.
Hi,

I am reading Zookeeper 
Internals(http://zookeeper.apache.org/doc/current/zookeeperInternals.html) 
which mentioned LeaderElection and FastLeaderElection.

Is there any more document about LeaderElection and FastLeaderElection? 
Such as mechanism of them? Or how are they implement?

Dominic