You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@zookeeper.apache.org by daidong <da...@gmail.com> on 2011/04/20 11:37:37 UTC

Problems about Zab protocol

Hi, everyone. 

Recently, i read the paper "a simple total ordered broadcast protocol" and
there are some problems i can not figure out. Hope anyone can help me... :P

The paper describes the Zab protocol as a 2 phase commit protocol when
system is under broadcast mode. However some paper(Skeen 82, "A Quorum Based
Commit Protocol") has mentioned if we want to extend an 2PC to adapt a
quorum based commit protocol we must introduce a three phase commit
protocol(In fact, i haven't quit understood this, :( ). However according
Zab paper, this still can be done. Why and how to do this?

Secondly, even Zookeeper can guarantee that status in different followers
are consistent. However, this consistency only works among a quorum of
followers that has acked the COMMIT. As the client can connect to any
followers when perform reading action, so what happens if the client happens
to connect with the follower that has not acked the COMMIT? I can not find
the information in this paper...

If i ask some naive question, Hope anybody can tell me where i can find the
answer or some suggestions, thanks :)


--
View this message in context: http://zookeeper-user.578899.n2.nabble.com/Problems-about-Zab-protocol-tp6290102p6290102.html
Sent from the zookeeper-user mailing list archive at Nabble.com.

Re: Problems about Zab protocol

Posted by Flavio Junqueira <fp...@yahoo-inc.com>.

Daidong, There are several key differences between distributed  
transactions and the replication problem we solve in ZooKeeper, and if  
you are interested in understanding them, you might start by having a  
look at the Paxos Commit work of Gray and Lamport. They have a TR  
available online, just use your favorite search engine.

-Flavio

On Apr 23, 2011, at 6:55 AM, daidong wrote:

> Hi, Alex
>
> Thanks for your reply and Flavio's
>
> I think i finally get the idea. :)
>
> Would it be appropriate to see the ZAB as a 3PC without the READY/ 
> WAIT status? As all the participators will reply VOTE_COMMIT (they  
> do not abort...).
>
> I will read the source code and hope can do some stuff with ZAB.  
> Thanks a lot for all the replies.
> -- 
> daidong
> On 2011年4月22日星期五 at 上午3:54, Alexander Shraer [via  
> zookeeper-user] wrote:
>> Hi Daidong,
>>
>> In addition to Flavio's response, I'll try to address some of your  
>> specific questions.
>>
>>> In my opinion, an atomic broadcast protocol must guarantee all the  
>>> non-
>>> faulty servers have the same status eventually. So in the 2PC  
>>> protocol,
>>> the coordinator must block until "all" the servers reply "ok".
>>
>> Designed this way, the protocol wouldn't be able to tolerate any  
>> failures - the leader could block
>> waiting for a response from a server that had crashed. The idea is  
>> to receive enough "ok" messages
>> to guarantee that even if a minority of servers crash, the  
>> information is still not lost. That's why
>> the leader waits for a majority of acks. Messages are still sent to  
>> all followers, so they will eventually
>> get them (or if they disconnect they will later reconnect and synch  
>> with the leader automatically).
>>
>> Regarding your second question - formally, sequential consistency  
>> guarantees that operations of each client take effect in the order
>> they were submitted by the client - so a client's read is  
>> guaranteed to see its own last complete write.
>> In the example you mention, the client first executes a create()  
>> and then getChildren(). If clients C1 and C2 both submit a create()
>> concurrently, one of these requests will reach the leader and will  
>> be scheduled by the leader before the other one, suppose the  
>> create() request of C1.
>> Then, when C2 is notified about the completion of its own create,  
>> FIFO ensures that it also finds out about any operation that  
>> completed before that create()
>> (these messages were sent by the leader earlier). So when C2  
>> finally runs getChildren(), its local state will already have every  
>> operation that was scheduled
>> by the leader before its own create() completed.
>>
>> In general, ZAB implements state-machine replication by executing  
>> consensus on each operation. To understand the general idea,
>> I recommend reading Lamport's "Paxos made simple" paper I sent  
>> earlier - it has a constructive explanation of this
>> (although the algorithm is somewhat different from ZAB).
>>
>> Alex
>>
>>> -----Original Message-----
>>> From: daidong [mailto:]
>>> Sent: Wednesday, April 20, 2011 11:31 PM
>>> To: [hidden email]
>>> Subject: Re: RE: Problems about Zab protocol
>>>
>>> Hi, Alex
>>>
>>> Thanks for your reply. :)
>>>
>>> I knew ZAB has two modes, but things i do not quit understand  
>>> focus on
>>> the broadcast mode. In the ZAB paper, authors said ZAB is a simple
>>> version of two phases commit protocol because we don't have abort
>>> actions in followers. I do not quit understand this.
>>>
>>> In my opinion, an atomic broadcast protocol must guarantee all the  
>>> non-
>>> faulty servers have the same status eventually. So in the 2PC  
>>> protocol,
>>> the coordinator must block until "all" the servers reply "ok". If  
>>> there
>>> is not any abort too, consider the situation that we have a very  
>>> slow
>>> follower F who processes messages slower than other followers.
>>> According TCP and FIFO channel, We can say all the messages will be
>>> processed orderly in F, however, the messages will assemble if
>>> coordinator continues to broadcasting. What happens if the receive
>>> buffer in F is overflow?
>>>
>>> Is there any mechanism i have not noticed to avoid this situation in
>>> ZAB?
>>>
>>> About my second questions, I read the consistency guarantees  
>>> section,
>>> thanks for your tips. I still have a question, if zookeeper do not  
>>> make
>>> sure that all the clients will see the latest value, how the lock
>>> mechanism works? i checked the recipe example code in Zookeeper  
>>> 3.3.3,
>>> when a client try to get the write lock, it does not sync() before  
>>> call
>>> getChildren(). If other client has created a ephemeral node with the
>>> lowest number suffix, this client does not get this information as
>>> getChildren() do not sync with leader. Is there any possibility that
>>> two clients will think they both got the lock?
>>>
>>> Thanks for any words. :)
>>> -- 
>>> daidong
>>> Sent with Sparrow
>>> On 2011年4月21日星期四 at 上午2:30, Alexander Shraer [via  
>>> zookeeper-
>>> user] wrote:
>>>> Hi,
>>>>
>>>> Regarding your first question - ZAB has two parts - the broadcast
>>> protocol you mention,
>>>> which is executed by a leader, and the leader election protocol,
>>> which recovers from a leader failure.
>>>> This is similar to the way other state-machine replication  
>>>> algorithms
>>> work, where you have
>>>> a fast normal mode and a slower recovery mode (you don't need to
>>> execute both all the time - only when the leader fails).
>>>> See Paxos state-machine replication for example (section 3):
>>> http://research.microsoft.com/en-
>>> us/um/people/lamport/pubs/pubs.html#paxos-simple
>>>>
>>>> Regarding your second question - Zookeeper basically guarantees so
>>> called "sequential consistency" semantics.
>>>> This guarantees that the real execution looks to clients like some
>>> sequential execution in which
>>>> the operations of every client appear in the order they were
>>> submitted. It does not guarantee that a read of one client
>>>> returns the latest value written by another client. This allows  
>>>> reads
>>> to be executed locally. If you need to return the latest
>>>> state, you can use the sync() call which flushes the pending  
>>>> updates
>>> between the leader and a follower.
>>>> See also the "consistency guarantees" section here:
>>>>
>>> http://hadoop.apache.org/zookeeper/docs/r3.3.1/zookeeperProgrammers.htm
>>> l
>>>>
>>>> Alex
>>>>
>>>>> -----Original Message-----
>>>>> From: daidong [mailto:[hidden email]]
>>>>> Sent: Wednesday, April 20, 2011 2:38 AM
>>>>> To: [hidden email]
>>>>> Subject: Problems about Zab protocol
>>>>>
>>>>> Hi, everyone.
>>>>>
>>>>> Recently, i read the paper "a simple total ordered broadcast
>>> protocol"
>>>>> and
>>>>> there are some problems i can not figure out. Hope anyone can help
>>>>> me... :P
>>>>>
>>>>> The paper describes the Zab protocol as a 2 phase commit protocol
>>> when
>>>>> system is under broadcast mode. However some paper(Skeen 82, "A
>>> Quorum
>>>>> Based
>>>>> Commit Protocol") has mentioned if we want to extend an 2PC to
>>> adapt a
>>>>> quorum based commit protocol we must introduce a three phase  
>>>>> commit
>>>>> protocol(In fact, i haven't quit understood this, :( ). However
>>>>> according
>>>>> Zab paper, this still can be done. Why and how to do this?
>>>>>
>>>>> Secondly, even Zookeeper can guarantee that status in different
>>>>> followers
>>>>> are consistent. However, this consistency only works among a  
>>>>> quorum
>>> of
>>>>> followers that has acked the COMMIT. As the client can connect to
>>> any
>>>>> followers when perform reading action, so what happens if the
>>> client
>>>>> happens
>>>>> to connect with the follower that has not acked the COMMIT? I can
>>> not
>>>>> find
>>>>> the information in this paper...
>>>>>
>>>>> If i ask some naive question, Hope anybody can tell me where i can
>>> find
>>>>> the
>>>>> answer or some suggestions, thanks :)
>>>>>
>>>>>
>>>>> -- 
>>>>> View this message in context: http://zookeeper-
>>>>> user.578899.n2.nabble.com/Problems-about-Zab-protocol-
>>>>> tp6290102p6290102.html
>>>>> Sent from the zookeeper-user mailing list archive at Nabble.com.
>>>>
>>>>
>>>> If you reply to this email, your message will be added to the
>>> discussion below: http://zookeeper-user.578899.n2.nabble.com/Problems-
>>> about-Zab-protocol-tp6290102p6291775.html
>>>> To unsubscribe from Problems about Zab protocol, click here.
>>>>
>>>>
>>>>
>>>
>>>
>>>
>>> -- 
>>> View this message in context: http://zookeeper-
>>> user.578899.n2.nabble.com/Problems-about-Zab-protocol-
>>> tp6290102p6293369.html
>>> Sent from the zookeeper-user mailing list archive at Nabble.com.
>>
>>
>> If you reply to this email, your message will be added to the  
>> discussion below: http://zookeeper-user.578899.n2.nabble.com/Problems-about-Zab-protocol-tp6290102p6295361.html
>> To unsubscribe from Problems about Zab protocol, click here.
>>
>>
>>
>
>
>
> --
> View this message in context: http://zookeeper-user.578899.n2.nabble.com/Problems-about-Zab-protocol-tp6290102p6298861.html
> Sent from the zookeeper-user mailing list archive at Nabble.com.

flavio
junqueira

research scientist

fpj@yahoo-inc.com
direct +34 93-183-8828

avinguda diagonal 177, 8th floor, barcelona, 08018, es
phone (408) 349 3300    fax (408) 349 3301

Re: RE: RE: Problems about Zab protocol

Posted by daidong <da...@gmail.com>.

Hi, Alex

Thanks for your reply and Flavio's 

I think i finally get the idea. :)

Would it be appropriate to see the ZAB as a 3PC without the READY/WAIT status? As all the participators will reply VOTE_COMMIT (they do not abort...).

I will read the source code and hope can do some stuff with ZAB. Thanks a lot for all the replies.
-- 
daidong
On 2011年4月22日星期五 at 上午3:54, Alexander Shraer [via zookeeper-user] wrote: 
>  Hi Daidong, 
> 
> In addition to Flavio's response, I'll try to address some of your specific questions. 
> 
> > In my opinion, an atomic broadcast protocol must guarantee all the non- 
> > faulty servers have the same status eventually. So in the 2PC protocol, 
> > the coordinator must block until "all" the servers reply "ok". 
> 
> Designed this way, the protocol wouldn't be able to tolerate any failures - the leader could block 
> waiting for a response from a server that had crashed. The idea is to receive enough "ok" messages 
> to guarantee that even if a minority of servers crash, the information is still not lost. That's why 
> the leader waits for a majority of acks. Messages are still sent to all followers, so they will eventually 
> get them (or if they disconnect they will later reconnect and synch with the leader automatically). 
> 
> Regarding your second question - formally, sequential consistency guarantees that operations of each client take effect in the order 
> they were submitted by the client - so a client's read is guaranteed to see its own last complete write. 
> In the example you mention, the client first executes a create() and then getChildren(). If clients C1 and C2 both submit a create() 
> concurrently, one of these requests will reach the leader and will be scheduled by the leader before the other one, suppose the create() request of C1. 
> Then, when C2 is notified about the completion of its own create, FIFO ensures that it also finds out about any operation that completed before that create() 
> (these messages were sent by the leader earlier). So when C2 finally runs getChildren(), its local state will already have every operation that was scheduled 
> by the leader before its own create() completed. 
> 
> In general, ZAB implements state-machine replication by executing consensus on each operation. To understand the general idea, 
> I recommend reading Lamport's "Paxos made simple" paper I sent earlier - it has a constructive explanation of this 
> (although the algorithm is somewhat different from ZAB). 
> 
> Alex 
> 
> > -----Original Message----- 
> > From: daidong [mailto:] 
> > Sent: Wednesday, April 20, 2011 11:31 PM 
> > To: [hidden email] 
> > Subject: Re: RE: Problems about Zab protocol 
> > 
> > Hi, Alex 
> > 
> > Thanks for your reply. :) 
> > 
> > I knew ZAB has two modes, but things i do not quit understand focus on 
> > the broadcast mode. In the ZAB paper, authors said ZAB is a simple 
> > version of two phases commit protocol because we don't have abort 
> > actions in followers. I do not quit understand this. 
> > 
> > In my opinion, an atomic broadcast protocol must guarantee all the non- 
> > faulty servers have the same status eventually. So in the 2PC protocol, 
> > the coordinator must block until "all" the servers reply "ok". If there 
> > is not any abort too, consider the situation that we have a very slow 
> > follower F who processes messages slower than other followers. 
> > According TCP and FIFO channel, We can say all the messages will be 
> > processed orderly in F, however, the messages will assemble if 
> > coordinator continues to broadcasting. What happens if the receive 
> > buffer in F is overflow? 
> > 
> > Is there any mechanism i have not noticed to avoid this situation in 
> > ZAB? 
> > 
> > About my second questions, I read the consistency guarantees section, 
> > thanks for your tips. I still have a question, if zookeeper do not make 
> > sure that all the clients will see the latest value, how the lock 
> > mechanism works? i checked the recipe example code in Zookeeper 3.3.3, 
> > when a client try to get the write lock, it does not sync() before call 
> > getChildren(). If other client has created a ephemeral node with the 
> > lowest number suffix, this client does not get this information as 
> > getChildren() do not sync with leader. Is there any possibility that 
> > two clients will think they both got the lock? 
> > 
> > Thanks for any words. :) 
> > -- 
> > daidong 
> > Sent with Sparrow 
> > On 2011年4月21日星期四 at 上午2:30, Alexander Shraer [via zookeeper- 
> > user] wrote: 
> > > Hi, 
> > > 
> > > Regarding your first question - ZAB has two parts - the broadcast 
> > protocol you mention, 
> > > which is executed by a leader, and the leader election protocol, 
> > which recovers from a leader failure. 
> > > This is similar to the way other state-machine replication algorithms 
> > work, where you have 
> > > a fast normal mode and a slower recovery mode (you don't need to 
> > execute both all the time - only when the leader fails). 
> > > See Paxos state-machine replication for example (section 3): 
> > http://research.microsoft.com/en-
> > us/um/people/lamport/pubs/pubs.html#paxos-simple 
> > > 
> > > Regarding your second question - Zookeeper basically guarantees so 
> > called "sequential consistency" semantics. 
> > > This guarantees that the real execution looks to clients like some 
> > sequential execution in which 
> > > the operations of every client appear in the order they were 
> > submitted. It does not guarantee that a read of one client 
> > > returns the latest value written by another client. This allows reads 
> > to be executed locally. If you need to return the latest 
> > > state, you can use the sync() call which flushes the pending updates 
> > between the leader and a follower. 
> > > See also the "consistency guarantees" section here: 
> > > 
> > http://hadoop.apache.org/zookeeper/docs/r3.3.1/zookeeperProgrammers.htm
> > l 
> > > 
> > > Alex 
> > > 
> > > > -----Original Message----- 
> > > > From: daidong [mailto:[hidden email]] 
> > > > Sent: Wednesday, April 20, 2011 2:38 AM 
> > > > To: [hidden email] 
> > > > Subject: Problems about Zab protocol 
> > > > 
> > > > Hi, everyone. 
> > > > 
> > > > Recently, i read the paper "a simple total ordered broadcast 
> > protocol" 
> > > > and 
> > > > there are some problems i can not figure out. Hope anyone can help 
> > > > me... :P 
> > > > 
> > > > The paper describes the Zab protocol as a 2 phase commit protocol 
> > when 
> > > > system is under broadcast mode. However some paper(Skeen 82, "A 
> > Quorum 
> > > > Based 
> > > > Commit Protocol") has mentioned if we want to extend an 2PC to 
> > adapt a 
> > > > quorum based commit protocol we must introduce a three phase commit 
> > > > protocol(In fact, i haven't quit understood this, :( ). However 
> > > > according 
> > > > Zab paper, this still can be done. Why and how to do this? 
> > > > 
> > > > Secondly, even Zookeeper can guarantee that status in different 
> > > > followers 
> > > > are consistent. However, this consistency only works among a quorum 
> > of 
> > > > followers that has acked the COMMIT. As the client can connect to 
> > any 
> > > > followers when perform reading action, so what happens if the 
> > client 
> > > > happens 
> > > > to connect with the follower that has not acked the COMMIT? I can 
> > not 
> > > > find 
> > > > the information in this paper... 
> > > > 
> > > > If i ask some naive question, Hope anybody can tell me where i can 
> > find 
> > > > the 
> > > > answer or some suggestions, thanks :) 
> > > > 
> > > > 
> > > > -- 
> > > > View this message in context: http://zookeeper-
> > > > user.578899.n2.nabble.com/Problems-about-Zab-protocol- 
> > > > tp6290102p6290102.html 
> > > > Sent from the zookeeper-user mailing list archive at Nabble.com. 
> > > 
> > > 
> > > If you reply to this email, your message will be added to the 
> > discussion below: http://zookeeper-user.578899.n2.nabble.com/Problems-
> > about-Zab-protocol-tp6290102p6291775.html 
> > > To unsubscribe from Problems about Zab protocol, click here. 
> > > 
> > > 
> > > 
> > 
> > 
> > 
> > -- 
> > View this message in context: http://zookeeper-
> > user.578899.n2.nabble.com/Problems-about-Zab-protocol- 
> > tp6290102p6293369.html 
> > Sent from the zookeeper-user mailing list archive at Nabble.com. 
> 
> 
> If you reply to this email, your message will be added to the discussion below: http://zookeeper-user.578899.n2.nabble.com/Problems-about-Zab-protocol-tp6290102p6295361.html 
>  To unsubscribe from Problems about Zab protocol, click here. 
> 
> 
> 



--
View this message in context: http://zookeeper-user.578899.n2.nabble.com/Problems-about-Zab-protocol-tp6290102p6298861.html
Sent from the zookeeper-user mailing list archive at Nabble.com.

RE: RE: Problems about Zab protocol

Posted by Alexander Shraer <sh...@yahoo-inc.com>.

Hi Daidong,

In addition to Flavio's response, I'll try to address some of your specific questions.

> In my opinion, an atomic broadcast protocol must guarantee all the non-
> faulty servers have the same status eventually. So in the 2PC protocol,
> the coordinator must block until "all" the servers reply "ok".

Designed this way, the protocol wouldn't be able to tolerate any failures - the leader could block
waiting for a response from a server that had crashed. The idea is to receive enough "ok" messages
to guarantee that even if a minority of servers crash, the information is still not lost. That's why 
the leader waits for a majority of acks. Messages are still sent to all followers, so they will eventually
get them (or if they disconnect they will later reconnect and synch with the leader automatically).

Regarding your second question - formally, sequential consistency guarantees that operations of each client take effect in the order
they were submitted by the client - so a client's read is guaranteed to see its own last complete write. 
In the example you mention, the client first executes a create() and then getChildren(). If clients C1 and C2 both submit a create() 
concurrently, one of these requests will reach the leader and will be scheduled by the leader before the other one, suppose the create() request of C1. 
Then, when C2 is notified about the completion of its own create, FIFO ensures that it also finds out about any operation that completed before that create() 
(these messages were sent by the leader earlier). So when C2 finally runs getChildren(), its local state will already have every operation that was scheduled 
by the leader before its own create() completed. 

In general, ZAB implements state-machine replication by executing consensus on each operation. To understand the general idea, 
I recommend reading Lamport's "Paxos made simple" paper I sent earlier - it has a constructive explanation of this 
(although the algorithm is somewhat different from ZAB).

Alex

> -----Original Message-----
> From: daidong [mailto:]
> Sent: Wednesday, April 20, 2011 11:31 PM
> To: zookeeper-user@hadoop.apache.org
> Subject: Re: RE: Problems about Zab protocol
> 
> Hi, Alex
> 
> Thanks for your reply. :)
> 
> I knew ZAB has two modes, but things i do not quit understand focus on
> the broadcast mode. In the ZAB paper, authors said ZAB is a simple
> version of two phases commit protocol because we don't have abort
> actions in followers. I do not quit understand this.
> 
> In my opinion, an atomic broadcast protocol must guarantee all the non-
> faulty servers have the same status eventually. So in the 2PC protocol,
> the coordinator must block until "all" the servers reply "ok". If there
> is not any abort too, consider the situation that we have a very slow
> follower F who processes messages slower than other followers.
> According TCP and FIFO channel, We can say all the messages will be
> processed orderly in F, however, the messages will assemble if
> coordinator continues to broadcasting. What happens if the receive
> buffer in F is overflow?
> 
> Is there any mechanism i have not noticed to avoid this situation in
> ZAB?
> 
> About my second questions, I read the consistency guarantees section,
> thanks for your tips. I still have a question, if zookeeper do not make
> sure that all the clients will see the latest value, how the lock
> mechanism works? i checked the recipe example code in Zookeeper 3.3.3,
> when a client try to get the write lock, it does not sync() before call
> getChildren(). If other client has created a ephemeral node with the
> lowest number suffix, this client does not get this information as
> getChildren() do not sync with leader. Is there any possibility that
> two clients will think they both got the lock?
> 
> Thanks for any words. :)
> --
> daidong
> Sent with Sparrow
> On 2011年4月21日星期四 at 上午2:30, Alexander Shraer [via zookeeper-
> user] wrote:
> >  Hi,
> >
> > Regarding your first question - ZAB has two parts - the broadcast
> protocol you mention,
> > which is executed by a leader, and the leader election protocol,
> which recovers from a leader failure.
> > This is similar to the way other state-machine replication algorithms
> work, where you have
> > a fast normal mode and a slower recovery mode (you don't need to
> execute both all the time - only when the leader fails).
> > See Paxos state-machine replication for example (section 3):
> http://research.microsoft.com/en-
> us/um/people/lamport/pubs/pubs.html#paxos-simple
> >
> > Regarding your second question - Zookeeper basically guarantees so
> called "sequential consistency" semantics.
> > This guarantees that the real execution looks to clients like some
> sequential execution in which
> > the operations of every client appear in the order they were
> submitted. It does not guarantee that a read of one client
> > returns the latest value written by another client. This allows reads
> to be executed locally. If you need to return the latest
> > state, you can use the sync() call which flushes the pending updates
> between the leader and a follower.
> > See also the "consistency guarantees" section here:
> >
> http://hadoop.apache.org/zookeeper/docs/r3.3.1/zookeeperProgrammers.htm
> l
> >
> > Alex
> >
> > > -----Original Message-----
> > > From: daidong [mailto:[hidden email]]
> > > Sent: Wednesday, April 20, 2011 2:38 AM
> > > To: [hidden email]
> > > Subject: Problems about Zab protocol
> > >
> > > Hi, everyone.
> > >
> > > Recently, i read the paper "a simple total ordered broadcast
> protocol"
> > > and
> > > there are some problems i can not figure out. Hope anyone can help
> > > me... :P
> > >
> > > The paper describes the Zab protocol as a 2 phase commit protocol
> when
> > > system is under broadcast mode. However some paper(Skeen 82, "A
> Quorum
> > > Based
> > > Commit Protocol") has mentioned if we want to extend an 2PC to
> adapt a
> > > quorum based commit protocol we must introduce a three phase commit
> > > protocol(In fact, i haven't quit understood this, :( ). However
> > > according
> > > Zab paper, this still can be done. Why and how to do this?
> > >
> > > Secondly, even Zookeeper can guarantee that status in different
> > > followers
> > > are consistent. However, this consistency only works among a quorum
> of
> > > followers that has acked the COMMIT. As the client can connect to
> any
> > > followers when perform reading action, so what happens if the
> client
> > > happens
> > > to connect with the follower that has not acked the COMMIT? I can
> not
> > > find
> > > the information in this paper...
> > >
> > > If i ask some naive question, Hope anybody can tell me where i can
> find
> > > the
> > > answer or some suggestions, thanks :)
> > >
> > >
> > > --
> > > View this message in context: http://zookeeper-
> > > user.578899.n2.nabble.com/Problems-about-Zab-protocol-
> > > tp6290102p6290102.html
> > > Sent from the zookeeper-user mailing list archive at Nabble.com.
> >
> >
> > If you reply to this email, your message will be added to the
> discussion below: http://zookeeper-user.578899.n2.nabble.com/Problems-
> about-Zab-protocol-tp6290102p6291775.html
> >  To unsubscribe from Problems about Zab protocol, click here.
> >
> >
> >
> 
> 
> 
> --
> View this message in context: http://zookeeper-
> user.578899.n2.nabble.com/Problems-about-Zab-protocol-
> tp6290102p6293369.html
> Sent from the zookeeper-user mailing list archive at Nabble.com.

Re: RE: Problems about Zab protocol

Posted by daidong <da...@gmail.com>.

Hi, Alex

Thanks for your reply. :)

I knew ZAB has two modes, but things i do not quit understand focus on the broadcast mode. In the ZAB paper, authors said ZAB is a simple version of two phases commit protocol because we don't have abort actions in followers. I do not quit understand this.

In my opinion, an atomic broadcast protocol must guarantee all the non-faulty servers have the same status eventually. So in the 2PC protocol, the coordinator must block until "all" the servers reply "ok". If there is not any abort too, consider the situation that we have a very slow follower F who processes messages slower than other followers. According TCP and FIFO channel, We can say all the messages will be processed orderly in F, however, the messages will assemble if coordinator continues to broadcasting. What happens if the receive buffer in F is overflow?

Is there any mechanism i have not noticed to avoid this situation in ZAB?

About my second questions, I read the consistency guarantees section, thanks for your tips. I still have a question, if zookeeper do not make sure that all the clients will see the latest value, how the lock mechanism works? i checked the recipe example code in Zookeeper 3.3.3, when a client try to get the write lock, it does not sync() before call getChildren(). If other client has created a ephemeral node with the lowest number suffix, this client does not get this information as getChildren() do not sync with leader. Is there any possibility that two clients will think they both got the lock?

Thanks for any words. :)
-- 
daidong
Sent with Sparrow
On 2011年4月21日星期四 at 上午2:30, Alexander Shraer [via zookeeper-user] wrote: 
>  Hi, 
> 
> Regarding your first question - ZAB has two parts - the broadcast protocol you mention, 
> which is executed by a leader, and the leader election protocol, which recovers from a leader failure. 
> This is similar to the way other state-machine replication algorithms work, where you have 
> a fast normal mode and a slower recovery mode (you don't need to execute both all the time - only when the leader fails). 
> See Paxos state-machine replication for example (section 3): http://research.microsoft.com/en-us/um/people/lamport/pubs/pubs.html#paxos-simple
> 
> Regarding your second question - Zookeeper basically guarantees so called "sequential consistency" semantics. 
> This guarantees that the real execution looks to clients like some sequential execution in which 
> the operations of every client appear in the order they were submitted. It does not guarantee that a read of one client 
> returns the latest value written by another client. This allows reads to be executed locally. If you need to return the latest 
> state, you can use the sync() call which flushes the pending updates between the leader and a follower. 
> See also the "consistency guarantees" section here: 
> http://hadoop.apache.org/zookeeper/docs/r3.3.1/zookeeperProgrammers.html
> 
> Alex 
> 
> > -----Original Message----- 
> > From: daidong [mailto:[hidden email]] 
> > Sent: Wednesday, April 20, 2011 2:38 AM 
> > To: [hidden email] 
> > Subject: Problems about Zab protocol 
> > 
> > Hi, everyone. 
> > 
> > Recently, i read the paper "a simple total ordered broadcast protocol" 
> > and 
> > there are some problems i can not figure out. Hope anyone can help 
> > me... :P 
> > 
> > The paper describes the Zab protocol as a 2 phase commit protocol when 
> > system is under broadcast mode. However some paper(Skeen 82, "A Quorum 
> > Based 
> > Commit Protocol") has mentioned if we want to extend an 2PC to adapt a 
> > quorum based commit protocol we must introduce a three phase commit 
> > protocol(In fact, i haven't quit understood this, :( ). However 
> > according 
> > Zab paper, this still can be done. Why and how to do this? 
> > 
> > Secondly, even Zookeeper can guarantee that status in different 
> > followers 
> > are consistent. However, this consistency only works among a quorum of 
> > followers that has acked the COMMIT. As the client can connect to any 
> > followers when perform reading action, so what happens if the client 
> > happens 
> > to connect with the follower that has not acked the COMMIT? I can not 
> > find 
> > the information in this paper... 
> > 
> > If i ask some naive question, Hope anybody can tell me where i can find 
> > the 
> > answer or some suggestions, thanks :) 
> > 
> > 
> > -- 
> > View this message in context: http://zookeeper-
> > user.578899.n2.nabble.com/Problems-about-Zab-protocol- 
> > tp6290102p6290102.html 
> > Sent from the zookeeper-user mailing list archive at Nabble.com. 
> 
> 
> If you reply to this email, your message will be added to the discussion below: http://zookeeper-user.578899.n2.nabble.com/Problems-about-Zab-protocol-tp6290102p6291775.html 
>  To unsubscribe from Problems about Zab protocol, click here. 
> 
> 
> 



--
View this message in context: http://zookeeper-user.578899.n2.nabble.com/Problems-about-Zab-protocol-tp6290102p6293369.html
Sent from the zookeeper-user mailing list archive at Nabble.com.

Re: Problems about Zab protocol

Posted by daidong <da...@gmail.com>.

Thanks for this paper info... It is formalized and not easy to understand. 

Still reading... :P

-- 
daidong
Sent with Sparrow
On 2011年4月21日星期四 at 上午7:21, André Oriani [via zookeeper-user] wrote: 
>  If you wanna go deep on Zab http://research.yahoo.com/files/YL-2010-007.pdf
> 
> - 
> André 
> 
> On Wed, Apr 20, 2011 at 17:26, Benjamin Reed <[hidden email]> wrote: 
> 
> > just to add a bit to alex's reponse: we do a simplified 2pc since we 
> > do not have aborts. we also differ from 2pc during recovery which is 
> > made up of two sub phases. 
> > 
> > ben 
> > 
> > On Wed, Apr 20, 2011 at 11:29 AM, Alexander Shraer 
> > <[hidden email]> wrote: 
> >> Hi, 
> >> 
> >> Regarding your first question - ZAB has two parts - the broadcast protocol you mention, 
> >> which is executed by a leader, and the leader election protocol, which recovers from a leader failure. 
> >> This is similar to the way other state-machine replication algorithms work, where you have 
> >> a fast normal mode and a slower recovery mode (you don't need to execute both all the time - only when the leader fails). 
> >> See Paxos state-machine replication for example (section 3): http://research.microsoft.com/en-us/um/people/lamport/pubs/pubs.html#paxos-simple
> >> 
> >> Regarding your second question - Zookeeper basically guarantees so called "sequential consistency" semantics. 
> >> This guarantees that the real execution looks to clients like some sequential execution in which 
> >> the operations of every client appear in the order they were submitted. It does not guarantee that a read of one client 
> >> returns the latest value written by another client. This allows reads to be executed locally. If you need to return the latest 
> >> state, you can use the sync() call which flushes the pending updates between the leader and a follower. 
> >> See also the "consistency guarantees" section here: 
> >> http://hadoop.apache.org/zookeeper/docs/r3.3.1/zookeeperProgrammers.html
> >> 
> >> Alex 
> >> 
> >>> -----Original Message----- 
> >>> From: daidong [mailto:[hidden email]] 
> >>> Sent: Wednesday, April 20, 2011 2:38 AM 
> >>> To: [hidden email] 
> >>> Subject: Problems about Zab protocol 
> >>> 
> >>> Hi, everyone. 
> >>> 
> >>> Recently, i read the paper "a simple total ordered broadcast protocol" 
> >>> and 
> >>> there are some problems i can not figure out. Hope anyone can help 
> >>> me... :P 
> >>> 
> >>> The paper describes the Zab protocol as a 2 phase commit protocol when 
> >>> system is under broadcast mode. However some paper(Skeen 82, "A Quorum 
> >>> Based 
> >>> Commit Protocol") has mentioned if we want to extend an 2PC to adapt a 
> >>> quorum based commit protocol we must introduce a three phase commit 
> >>> protocol(In fact, i haven't quit understood this, :( ). However 
> >>> according 
> >>> Zab paper, this still can be done. Why and how to do this? 
> >>> 
> >>> Secondly, even Zookeeper can guarantee that status in different 
> >>> followers 
> >>> are consistent. However, this consistency only works among a quorum of 
> >>> followers that has acked the COMMIT. As the client can connect to any 
> >>> followers when perform reading action, so what happens if the client 
> >>> happens 
> >>> to connect with the follower that has not acked the COMMIT? I can not 
> >>> find 
> >>> the information in this paper... 
> >>> 
> >>> If i ask some naive question, Hope anybody can tell me where i can find 
> >>> the 
> >>> answer or some suggestions, thanks :) 
> >>> 
> >>> 
> >>> -- 
> >>> View this message in context: http://zookeeper-
> >>> user.578899.n2.nabble.com/Problems-about-Zab-protocol- 
> >>> tp6290102p6290102.html 
> >>> Sent from the zookeeper-user mailing list archive at Nabble.com. 
> >> 
> > 
> 
> 
> If you reply to this email, your message will be added to the discussion below: http://zookeeper-user.578899.n2.nabble.com/Problems-about-Zab-protocol-tp6290102p6292678.html 
>  To unsubscribe from Problems about Zab protocol, click here. 
> 
> 
> 



--
View this message in context: http://zookeeper-user.578899.n2.nabble.com/Problems-about-Zab-protocol-tp6290102p6293373.html
Sent from the zookeeper-user mailing list archive at Nabble.com.

Re: Problems about Zab protocol

Posted by André Oriani <ra...@students.ic.unicamp.br>.

If you wanna go deep on Zab http://research.yahoo.com/files/YL-2010-007.pdf

-
André

On Wed, Apr 20, 2011 at 17:26, Benjamin Reed <br...@apache.org> wrote:
> just to add a bit to alex's reponse: we do a simplified 2pc since we
> do not have aborts. we also differ from 2pc during recovery which is
> made up of two sub phases.
>
> ben
>
> On Wed, Apr 20, 2011 at 11:29 AM, Alexander Shraer
> <sh...@yahoo-inc.com> wrote:
>> Hi,
>>
>> Regarding your first question - ZAB has two parts - the broadcast protocol you mention,
>> which is executed by a leader, and the leader election protocol, which recovers from a leader failure.
>> This is similar to the way other state-machine replication algorithms work, where you have
>> a fast normal mode and a slower recovery mode (you don't need to execute both all the time - only when the leader fails).
>> See Paxos state-machine replication for example (section 3): http://research.microsoft.com/en-us/um/people/lamport/pubs/pubs.html#paxos-simple
>>
>> Regarding your second question - Zookeeper basically guarantees so called "sequential consistency" semantics.
>> This guarantees that the real execution looks to clients like some sequential execution in which
>> the operations of every client appear in the order they were submitted. It does not guarantee that a read of one client
>> returns the latest value written by another client. This allows reads to be executed locally.  If you need to return the latest
>> state, you can use the sync() call which flushes the pending updates between the leader and a follower.
>> See also the "consistency guarantees" section here:
>> http://hadoop.apache.org/zookeeper/docs/r3.3.1/zookeeperProgrammers.html
>>
>> Alex
>>
>>> -----Original Message-----
>>> From: daidong [mailto:daidongly@gmail.com]
>>> Sent: Wednesday, April 20, 2011 2:38 AM
>>> To: zookeeper-user@hadoop.apache.org
>>> Subject: Problems about Zab protocol
>>>
>>> Hi, everyone.
>>>
>>> Recently, i read the paper "a simple total ordered broadcast protocol"
>>> and
>>> there are some problems i can not figure out. Hope anyone can help
>>> me... :P
>>>
>>> The paper describes the Zab protocol as a 2 phase commit protocol when
>>> system is under broadcast mode. However some paper(Skeen 82, "A Quorum
>>> Based
>>> Commit Protocol") has mentioned if we want to extend an 2PC to adapt a
>>> quorum based commit protocol we must introduce a three phase commit
>>> protocol(In fact, i haven't quit understood this, :( ). However
>>> according
>>> Zab paper, this still can be done. Why and how to do this?
>>>
>>> Secondly, even Zookeeper can guarantee that status in different
>>> followers
>>> are consistent. However, this consistency only works among a quorum of
>>> followers that has acked the COMMIT. As the client can connect to any
>>> followers when perform reading action, so what happens if the client
>>> happens
>>> to connect with the follower that has not acked the COMMIT? I can not
>>> find
>>> the information in this paper...
>>>
>>> If i ask some naive question, Hope anybody can tell me where i can find
>>> the
>>> answer or some suggestions, thanks :)
>>>
>>>
>>> --
>>> View this message in context: http://zookeeper-
>>> user.578899.n2.nabble.com/Problems-about-Zab-protocol-
>>> tp6290102p6290102.html
>>> Sent from the zookeeper-user mailing list archive at Nabble.com.
>>
>

Re: Problems about Zab protocol

Posted by Flavio Junqueira <fp...@yahoo-inc.com>.

Daidong, The comment in the paper just refers to the communication  
pattern, which resembles 2PC and exists in many other replication  
protocols. Let me make it clear that it was not our intention to say  
that we are implementing state-machine replication with 2PC, which is  
possibly the source of the confusion.

-Flavio

On Apr 21, 2011, at 8:35 AM, daidong wrote:

> This is the idea i can not get :)
>
> Why do not have aborts can simplify 2PC without any affects on  
> Integrity?
>
> Thanks!
> -- 
> daidong
> Sent with Sparrow
> On 2011年4月21日星期四 at 上午4:26, Benjamin Reed-3 [via  
> zookeeper-user] wrote:
>> just to add a bit to alex's reponse: we do a simplified 2pc since we
>> do not have aborts. we also differ from 2pc during recovery which is
>> made up of two sub phases.
>>
>> ben
>>
>> On Wed, Apr 20, 2011 at 11:29 AM, Alexander Shraer
>> <[hidden email]> wrote:
>>
>>> Hi,
>>>
>>> Regarding your first question - ZAB has two parts - the broadcast  
>>> protocol you mention,
>>> which is executed by a leader, and the leader election protocol,  
>>> which recovers from a leader failure.
>>> This is similar to the way other state-machine replication  
>>> algorithms work, where you have
>>> a fast normal mode and a slower recovery mode (you don't need to  
>>> execute both all the time - only when the leader fails).
>>> See Paxos state-machine replication for example (section 3): http://research.microsoft.com/en-us/um/people/lamport/pubs/pubs.html#paxos-simple
>>>
>>> Regarding your second question - Zookeeper basically guarantees so  
>>> called "sequential consistency" semantics.
>>> This guarantees that the real execution looks to clients like some  
>>> sequential execution in which
>>> the operations of every client appear in the order they were  
>>> submitted. It does not guarantee that a read of one client
>>> returns the latest value written by another client. This allows  
>>> reads to be executed locally. If you need to return the latest
>>> state, you can use the sync() call which flushes the pending  
>>> updates between the leader and a follower.
>>> See also the "consistency guarantees" section here:
>>> http://hadoop.apache.org/zookeeper/docs/r3.3.1/zookeeperProgrammers.html
>>>
>>> Alex
>>>
>>>> -----Original Message-----
>>>> From: daidong [mailto:[hidden email]]
>>>> Sent: Wednesday, April 20, 2011 2:38 AM
>>>> To: [hidden email]
>>>> Subject: Problems about Zab protocol
>>>>
>>>> Hi, everyone.
>>>>
>>>> Recently, i read the paper "a simple total ordered broadcast  
>>>> protocol"
>>>> and
>>>> there are some problems i can not figure out. Hope anyone can help
>>>> me... :P
>>>>
>>>> The paper describes the Zab protocol as a 2 phase commit protocol  
>>>> when
>>>> system is under broadcast mode. However some paper(Skeen 82, "A  
>>>> Quorum
>>>> Based
>>>> Commit Protocol") has mentioned if we want to extend an 2PC to  
>>>> adapt a
>>>> quorum based commit protocol we must introduce a three phase commit
>>>> protocol(In fact, i haven't quit understood this, :( ). However
>>>> according
>>>> Zab paper, this still can be done. Why and how to do this?
>>>>
>>>> Secondly, even Zookeeper can guarantee that status in different
>>>> followers
>>>> are consistent. However, this consistency only works among a  
>>>> quorum of
>>>> followers that has acked the COMMIT. As the client can connect to  
>>>> any
>>>> followers when perform reading action, so what happens if the  
>>>> client
>>>> happens
>>>> to connect with the follower that has not acked the COMMIT? I can  
>>>> not
>>>> find
>>>> the information in this paper...
>>>>
>>>> If i ask some naive question, Hope anybody can tell me where i  
>>>> can find
>>>> the
>>>> answer or some suggestions, thanks :)
>>>>
>>>>
>>>> -- 
>>>> View this message in context: http://zookeeper-
>>>> user.578899.n2.nabble.com/Problems-about-Zab-protocol-
>>>> tp6290102p6290102.html
>>>> Sent from the zookeeper-user mailing list archive at Nabble.com.
>>>
>>
>>
>> If you reply to this email, your message will be added to the  
>> discussion below: http://zookeeper-user.578899.n2.nabble.com/Problems-about-Zab-protocol-tp6290102p6292118.html
>> To unsubscribe from Problems about Zab protocol, click here.
>>
>>
>>
>
>
>
> --
> View this message in context: http://zookeeper-user.578899.n2.nabble.com/Problems-about-Zab-protocol-tp6290102p6293379.html
> Sent from the zookeeper-user mailing list archive at Nabble.com.

flavio
junqueira

research scientist

fpj@yahoo-inc.com
direct +34 93-183-8828

avinguda diagonal 177, 8th floor, barcelona, 08018, es
phone (408) 349 3300    fax (408) 349 3301

Re: Problems about Zab protocol

Posted by daidong <da...@gmail.com>.

This is the idea i can not get :)

Why do not have aborts can simplify 2PC without any affects on Integrity?

Thanks!
-- 
daidong
Sent with Sparrow
On 2011年4月21日星期四 at 上午4:26, Benjamin Reed-3 [via zookeeper-user] wrote: 
>  just to add a bit to alex's reponse: we do a simplified 2pc since we 
> do not have aborts. we also differ from 2pc during recovery which is 
> made up of two sub phases. 
> 
> ben 
> 
> On Wed, Apr 20, 2011 at 11:29 AM, Alexander Shraer 
> <[hidden email]> wrote: 
> 
> > Hi, 
> > 
> > Regarding your first question - ZAB has two parts - the broadcast protocol you mention, 
> > which is executed by a leader, and the leader election protocol, which recovers from a leader failure. 
> > This is similar to the way other state-machine replication algorithms work, where you have 
> > a fast normal mode and a slower recovery mode (you don't need to execute both all the time - only when the leader fails). 
> > See Paxos state-machine replication for example (section 3): http://research.microsoft.com/en-us/um/people/lamport/pubs/pubs.html#paxos-simple
> > 
> > Regarding your second question - Zookeeper basically guarantees so called "sequential consistency" semantics. 
> > This guarantees that the real execution looks to clients like some sequential execution in which 
> > the operations of every client appear in the order they were submitted. It does not guarantee that a read of one client 
> > returns the latest value written by another client. This allows reads to be executed locally. If you need to return the latest 
> > state, you can use the sync() call which flushes the pending updates between the leader and a follower. 
> > See also the "consistency guarantees" section here: 
> > http://hadoop.apache.org/zookeeper/docs/r3.3.1/zookeeperProgrammers.html
> > 
> > Alex 
> > 
> >> -----Original Message----- 
> >> From: daidong [mailto:[hidden email]] 
> >> Sent: Wednesday, April 20, 2011 2:38 AM 
> >> To: [hidden email] 
> >> Subject: Problems about Zab protocol 
> >> 
> >> Hi, everyone. 
> >> 
> >> Recently, i read the paper "a simple total ordered broadcast protocol" 
> >> and 
> >> there are some problems i can not figure out. Hope anyone can help 
> >> me... :P 
> >> 
> >> The paper describes the Zab protocol as a 2 phase commit protocol when 
> >> system is under broadcast mode. However some paper(Skeen 82, "A Quorum 
> >> Based 
> >> Commit Protocol") has mentioned if we want to extend an 2PC to adapt a 
> >> quorum based commit protocol we must introduce a three phase commit 
> >> protocol(In fact, i haven't quit understood this, :( ). However 
> >> according 
> >> Zab paper, this still can be done. Why and how to do this? 
> >> 
> >> Secondly, even Zookeeper can guarantee that status in different 
> >> followers 
> >> are consistent. However, this consistency only works among a quorum of 
> >> followers that has acked the COMMIT. As the client can connect to any 
> >> followers when perform reading action, so what happens if the client 
> >> happens 
> >> to connect with the follower that has not acked the COMMIT? I can not 
> >> find 
> >> the information in this paper... 
> >> 
> >> If i ask some naive question, Hope anybody can tell me where i can find 
> >> the 
> >> answer or some suggestions, thanks :) 
> >> 
> >> 
> >> -- 
> >> View this message in context: http://zookeeper-
> >> user.578899.n2.nabble.com/Problems-about-Zab-protocol- 
> >> tp6290102p6290102.html 
> >> Sent from the zookeeper-user mailing list archive at Nabble.com. 
> > 
> 
> 
> If you reply to this email, your message will be added to the discussion below: http://zookeeper-user.578899.n2.nabble.com/Problems-about-Zab-protocol-tp6290102p6292118.html 
>  To unsubscribe from Problems about Zab protocol, click here. 
> 
> 
> 



--
View this message in context: http://zookeeper-user.578899.n2.nabble.com/Problems-about-Zab-protocol-tp6290102p6293379.html
Sent from the zookeeper-user mailing list archive at Nabble.com.

Re: Problems about Zab protocol

Posted by Benjamin Reed <br...@apache.org>.

just to add a bit to alex's reponse: we do a simplified 2pc since we
do not have aborts. we also differ from 2pc during recovery which is
made up of two sub phases.

ben

On Wed, Apr 20, 2011 at 11:29 AM, Alexander Shraer
<sh...@yahoo-inc.com> wrote:
> Hi,
>
> Regarding your first question - ZAB has two parts - the broadcast protocol you mention,
> which is executed by a leader, and the leader election protocol, which recovers from a leader failure.
> This is similar to the way other state-machine replication algorithms work, where you have
> a fast normal mode and a slower recovery mode (you don't need to execute both all the time - only when the leader fails).
> See Paxos state-machine replication for example (section 3): http://research.microsoft.com/en-us/um/people/lamport/pubs/pubs.html#paxos-simple
>
> Regarding your second question - Zookeeper basically guarantees so called "sequential consistency" semantics.
> This guarantees that the real execution looks to clients like some sequential execution in which
> the operations of every client appear in the order they were submitted. It does not guarantee that a read of one client
> returns the latest value written by another client. This allows reads to be executed locally.  If you need to return the latest
> state, you can use the sync() call which flushes the pending updates between the leader and a follower.
> See also the "consistency guarantees" section here:
> http://hadoop.apache.org/zookeeper/docs/r3.3.1/zookeeperProgrammers.html
>
> Alex
>
>> -----Original Message-----
>> From: daidong [mailto:daidongly@gmail.com]
>> Sent: Wednesday, April 20, 2011 2:38 AM
>> To: zookeeper-user@hadoop.apache.org
>> Subject: Problems about Zab protocol
>>
>> Hi, everyone.
>>
>> Recently, i read the paper "a simple total ordered broadcast protocol"
>> and
>> there are some problems i can not figure out. Hope anyone can help
>> me... :P
>>
>> The paper describes the Zab protocol as a 2 phase commit protocol when
>> system is under broadcast mode. However some paper(Skeen 82, "A Quorum
>> Based
>> Commit Protocol") has mentioned if we want to extend an 2PC to adapt a
>> quorum based commit protocol we must introduce a three phase commit
>> protocol(In fact, i haven't quit understood this, :( ). However
>> according
>> Zab paper, this still can be done. Why and how to do this?
>>
>> Secondly, even Zookeeper can guarantee that status in different
>> followers
>> are consistent. However, this consistency only works among a quorum of
>> followers that has acked the COMMIT. As the client can connect to any
>> followers when perform reading action, so what happens if the client
>> happens
>> to connect with the follower that has not acked the COMMIT? I can not
>> find
>> the information in this paper...
>>
>> If i ask some naive question, Hope anybody can tell me where i can find
>> the
>> answer or some suggestions, thanks :)
>>
>>
>> --
>> View this message in context: http://zookeeper-
>> user.578899.n2.nabble.com/Problems-about-Zab-protocol-
>> tp6290102p6290102.html
>> Sent from the zookeeper-user mailing list archive at Nabble.com.
>

RE: Problems about Zab protocol

Posted by Alexander Shraer <sh...@yahoo-inc.com>.

Hi,

Regarding your first question - ZAB has two parts - the broadcast protocol you mention,
which is executed by a leader, and the leader election protocol, which recovers from a leader failure.
This is similar to the way other state-machine replication algorithms work, where you have
a fast normal mode and a slower recovery mode (you don't need to execute both all the time - only when the leader fails). 
See Paxos state-machine replication for example (section 3): http://research.microsoft.com/en-us/um/people/lamport/pubs/pubs.html#paxos-simple

Regarding your second question - Zookeeper basically guarantees so called "sequential consistency" semantics.
This guarantees that the real execution looks to clients like some sequential execution in which 
the operations of every client appear in the order they were submitted. It does not guarantee that a read of one client
returns the latest value written by another client. This allows reads to be executed locally.  If you need to return the latest 
state, you can use the sync() call which flushes the pending updates between the leader and a follower. 
See also the "consistency guarantees" section here:
http://hadoop.apache.org/zookeeper/docs/r3.3.1/zookeeperProgrammers.html

Alex

> -----Original Message-----
> From: daidong [mailto:daidongly@gmail.com]
> Sent: Wednesday, April 20, 2011 2:38 AM
> To: zookeeper-user@hadoop.apache.org
> Subject: Problems about Zab protocol
> 
> Hi, everyone.
> 
> Recently, i read the paper "a simple total ordered broadcast protocol"
> and
> there are some problems i can not figure out. Hope anyone can help
> me... :P
> 
> The paper describes the Zab protocol as a 2 phase commit protocol when
> system is under broadcast mode. However some paper(Skeen 82, "A Quorum
> Based
> Commit Protocol") has mentioned if we want to extend an 2PC to adapt a
> quorum based commit protocol we must introduce a three phase commit
> protocol(In fact, i haven't quit understood this, :( ). However
> according
> Zab paper, this still can be done. Why and how to do this?
> 
> Secondly, even Zookeeper can guarantee that status in different
> followers
> are consistent. However, this consistency only works among a quorum of
> followers that has acked the COMMIT. As the client can connect to any
> followers when perform reading action, so what happens if the client
> happens
> to connect with the follower that has not acked the COMMIT? I can not
> find
> the information in this paper...
> 
> If i ask some naive question, Hope anybody can tell me where i can find
> the
> answer or some suggestions, thanks :)
> 
> 
> --
> View this message in context: http://zookeeper-
> user.578899.n2.nabble.com/Problems-about-Zab-protocol-
> tp6290102p6290102.html
> Sent from the zookeeper-user mailing list archive at Nabble.com.