You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@zookeeper.apache.org by "Ibrahim El-sanosi (PGR)" <i....@newcastle.ac.uk> on 2015/09/28 13:19:06 UTC

3-server Zab cluster

In Zab, assume we have a cluster consists of 3-servers. To deliver a write request, it must run 3 communication steps proposal, acknowledgement and commit.
As Zab uses reliable FIFO, it is possible to remove commit round. As soon as a follower receives a proposal, it logs, sends an ACK and commits locally. Upon receiving ACK from any follower, leader commits a proposal locally, no COMMIT message need to be sent to followers. In this case, all servers commit a proposal in two round-trips, resulting in reducing latency particularly in followers.

Note that this optimization can only work in 3-servers cluster (follower reaches a majority as soon as it acks).
Does anyone see any problems with such (small) optimization?
Ibrahim

RE: 3-server Zab cluster

Posted by Flavio P JUNQUEIRA <fp...@apache.org>.

Indeed, I meant to say quorum.

-Flavio
On 5 Oct 2015 6:30 pm, "Ibrahim El-sanosi (PGR)" <
i.s.el-sanosi@newcastle.ac.uk> wrote:

> Hi Flavio,
>
>
> >That's not accurate. Being recorded by a quorum guarantees that a txn
> will be in the initial state of future epochs, but a prospective leader
> might have txns it its log that haven't been recorded in *a log*. The
> ?>prospective leader needs to make sure that such txns are recorded in a
> quorum before establishing a new epoch, though.
>
> I guess you meant a quorum not a LOG in above world *log* !!!
>
> Thank you
>
> Ibrahim
>
> -----Original Message-----
> From: Flavio Junqueira [mailto:fpj@apache.org]
> Sent: Monday, October 05, 2015 06:23 م
> To: user@zookeeper.apache.org?
> Subject: Re: 3-server Zab cluster
>
>
> > On 05 Oct 2015, at 18:13, Ibrahim El-sanosi (PGR) <
> i.s.el-sanosi@newcastle.ac.uk> wrote:
> >
> > Hi Rakesh,
> >
> > In Zab, before the end of synchronization phase, new leader will not
> commit any proposals in transaction logs that have not got a majority of
> acks from pervious ensemble  (that what you are saying).
>
> That's not accurate. Being recorded by a quorum guarantees that a txn will
> be in the initial state of future epochs, but a prospective leader might
> have txns it its log that haven't been recorded in a log. The prospective
> leader needs to make sure that such txns are recorded in a quorum before
> establishing a new epoch, though.
>
> > I think what Zab does is that before the end of synchronization phase,
> in L and F2 (the new quorum), L (a prospective leader) will sync its own
> state with F2 as the initial state.  Referring to my scenario, zxid =10 is
> part of the initial state and as a result it will be delivered in new
> quorum (L and F2) before  processing new proposals of new epoch.
>
> Yes, this is right.
>
> >
> > You can read this thread
> > http://zookeeper-user.578899.n2.nabble.com/Zab-Failure-scenario-td7581
> > 583.html
> > <http://zookeeper-user.578899.n2.nabble.com/Zab-Failure-scenario-td758
> > 1583.html> for more info
> >
> > What do you think? Does anyone have any questions or concerns about such
> (small) optimization?
>
> I'm not entirely sure what the optimization is and if you are proposing a
> change or what. Are you looking for a blessing from this community? I'd
> like to understand what you're trying to achieve.
>
> -Flavio
>
> >
> > Ibrahim
> >
> > From: Rakesh Radhakrishnan [mailto:rakeshr.apache@gmail.com
> > <ma...@gmail.com>]
> > Sent: Thursday, October 01, 2015 06:15 م
> > To: Ibrahim El-sanosi (PGR)
> > Subject: Re: 3-server Zab cluster
> >
> >>>>>>>>> (***) Ok, I thought when F2 form a quorum with L and  before
> serving clients, L synchronizes its state with F2, resulting in zxid=10
> will be committed in L and F2 as well. I also though this process is the
> same as Zab, isn't it?
> >
> > Since L didn't receives any ACK responses from F1 or F2 before leaving
> the Leader status previously, L won't commit transaction zxid=10. IIUC
> after re-forming the new quorum L will not have any mechanism to
> re-initiate the proposal(Active messaging phase) for the previous zxid=10.
> >
> > -Rakesh
> >
> > On Thu, Oct 1, 2015 at 10:19 PM, Ibrahim El-sanosi (PGR) <
> i.s.el-sanosi@newcastle.ac.uk <mailto:i.s.el-sanosi@newcastle.ac.uk
> ><mailto:i.s.el-sanosi@newcastle.ac.uk <mailto:
> i.s.el-sanosi@newcastle.ac.uk>>> wrote:
> > Thank you Rakesh.
> >
> >>>> In your case, zk client sees a successful response from F1. Then
> assume F2 >>>joins quorum first and L become the leader again. But the
> newly formed >>>quorum will not have the zxid=10 transaction. This will
> make the cluster >>>inconsistent, isn't it?
> >
> > (***) Ok, I thought when F2 form a quorum with L and  before serving
> clients, L synchronizes its state with F2, resulting in zxid=10 will be
> committed in L and F2 as well. I also though this process is the same as
> Zab, isn't it?
> >
> >
> >>>> Apart from the above case I'm not seeing any other problems with 3
> node >>>cluster. The above data loss case can be avoided by putting an
> assumption >>>that more than a tolerated number of server failures may
> affect the cluster >>>consistency and results in data loss.
> >
> > Yes, if the solution above (***) is not correct, you assumption makes
> sense.
> >
> > Ibrahim
> >
> > From: Rakesh Radhakrishnan [mailto:rakeshr.apache@gmail.com
> > <ma...@gmail.com><mailto:rakeshr.apache@gmail.com
> > <ma...@gmail.com>>]
> > Sent: 01 October 2015 17:26
> > To: user@zookeeper.apache.org
> > <ma...@zookeeper.apache.org><mailto:user@zookeeper.apache.org
> > <ma...@zookeeper.apache.org>>; Ibrahim El-sanosi (PGR)
> >
> > Subject: Re: 3-server Zab cluster
> >
> > Hi Ibrahim,
> >
> > Below example taken from your older mail thread.
> >
> >>>>>> 1. leader  (L)  sends a proposal p with zxid =10 to F1 and F2.
> >>>>>> 2. F1 logs, sends an ACK, commits, replays to clients and
> >>>>>> crashes. F2 crashes before receiving P10. L has not received any
> >>>>>> ACKs
> >
> > My thoughts for the above scenario is,
> >
> > In your case, zk client sees a successful response from F1. Then assume
> F2 joins quorum first and L become the leader again. But the newly formed
> quorum will not have the zxid=10 transaction. This will make the cluster
> inconsistent, isn't it?
> >
> > Apart from the above case I'm not seeing any other problems with 3 node
> cluster. The above data loss case can be avoided by putting an assumption
> that more than a tolerated number of server failures may affect the cluster
> consistency and results in data loss. But I feel this optimization would
> have more cases if we scale up the cluster size beyond 3 servers. Now, I'm
> not thinking in that direction as your case is limited to 3 node cluster.
> >
> > Regards,
> > Rakesh
> >
> >
> > On Tue, Sep 29, 2015 at 2:28 PM, Ibrahim El-sanosi (PGR) <
> i.s.el-sanosi@newcastle.ac.uk <mailto:i.s.el-sanosi@newcastle.ac.uk
> ><mailto:i.s.el-sanosi@newcastle.ac.uk <mailto:
> i.s.el-sanosi@newcastle.ac.uk>>> wrote:
> > Yes Alex, in my post I mentioned that this (small) optimization can only
> work with 3-servers cluster.
> >
> > Who could confirm the optimization can work?
> >
> > Ibrahim
> >
> > -----Original Message-----
> > From: Alexander Shraer [mailto:shralex@gmail.com
> > <ma...@gmail.com><mailto:shralex@gmail.com
> > <ma...@gmail.com>>]
> > Sent: Tuesday, September 29, 2015 12:11 ص
> > To: user@zookeeper.apache.org
> > <ma...@zookeeper.apache.org><mailto:user@zookeeper.apache.org
> > <ma...@zookeeper.apache.org>>
> > Subject: Re: 3-server Zab cluster
> >
> > I'm not 100% sure whether operations that were pending on the leader are
> sent out during sync when this leader looses quorum and re-elected. If so,
> then maybe you're right. But in any case, this would not work for 5 or more
> servers...
> >
> > On Mon, Sep 28, 2015 at 3:51 PM, Ibrahim El-sanosi (PGR) <
> i.s.el-sanosi@newcastle.ac.uk <mailto:i.s.el-sanosi@newcastle.ac.uk
> ><mailto:i.s.el-sanosi@newcastle.ac.uk <mailto:
> i.s.el-sanosi@newcastle.ac.uk>>> wrote:
> >
> >> Thank you Alex for replaying.
> >>
> >> When you said " the leader gets re-elected and the operation is
> >> truncated from logs at other servers". I though the new leader will
> >> sync the its logs with other followers (synchronization phase),
> >> resulting in the operation will commit by new quorum.  Let me make the
> scenarios as steps:
> >>
> >> 1. leader  (L)  sends a proposal p with zxid =10 to F1 and F2.
> >> 2. F1 logs, sends an ACK, commits, replays to clients and crashes. F2
> >> crashes before receiving P10. L has not received any ACKs
> >>
> >> Possible solution  (1)
> >> The leader will move to LOOKING phase as there is no quorum
> >> supporting its leadership. Now Assume F2 wakes up. F2 forms a quorum
> >> with the L (pervious leader), L becomes new leader again as it has
> latest zxid (10) in its log.
> >> L syncs its state with F2, as a result L, F1 (before crashing) and F2
> >> commit P10.  Is that correct?
> >>
> >> Possible solution  (2)
> >> The leader will move to LOOKING phase as there is no quorum
> >> supporting its leadership. Now Assume F1 (with Zxid =10  committed)
> >> wakes up. I am not sure who should be a leader (F1 with Zxid =10
> >> committed or L (pervious
> >> leader) with Zxid = 10 logged), I think F1 become a new leader as it
> >> has Zxid = 10 committed. F1 forms a quorum with the L (pervious
> >> leader), F1 becomes new leader as it has latest zxid (10) . L (new
> >> leader) syncs its state with L (pervious leader now become a
> >> follower), as a result Zxid10 commits by new quorum.  Is that correct?
> >>
> >> What do you think?
> >>
> >> Ibrahim
> >>
> >>
> >>
> >>
> >>
> >> -----Original Message-----
> >> From: Alexander Shraer [mailto:shralex@gmail.com
> >> <ma...@gmail.com><mailto:shralex@gmail.com
> >> <ma...@gmail.com>>]
> >> Sent: Monday, September 28, 2015 07:27 م
> >> To: user@zookeeper.apache.org
> >> <ma...@zookeeper.apache.org><mailto:user@zookeeper.apache.org
> >> <ma...@zookeeper.apache.org>>
> >> Cc: dev@zookeeper.apache.org
> >> <ma...@zookeeper.apache.org><mailto:dev@zookeeper.apache.org
> >> <ma...@zookeeper.apache.org>>
> >> Subject: Re: 3-server Zab cluster
> >>
> >> Committing locally when sending an ACK at a server would lead to loss
> >> of consistency - it is possible that this is the only server that
> >> acks, e.g., this server is temporarily disconnected from the leader,
> >> the leader gets re-elected and the operation is truncated from logs
> >> at other servers. Its ok to ACK it but its not ok to commit since
> >> this exposes this to users as a committed operation that they can see.
> >>
> >> On Mon, Sep 28, 2015 at 4:19 AM, Ibrahim El-sanosi (PGR) <
> >> i.s.el-sanosi@newcastle.ac.uk <mailto:i.s.el-sanosi@newcastle.ac.uk
> ><mailto:i.s.el-sanosi@newcastle.ac.uk <mailto:
> i.s.el-sanosi@newcastle.ac.uk>>> wrote:
> >>
> >>> In Zab, assume we have a cluster consists of 3-servers. To deliver a
> >>> write request, it must run 3 communication steps proposal,
> >>> acknowledgement and commit.
> >>> As Zab uses reliable FIFO, it is possible to remove commit round. As
> >>> soon as a follower receives a proposal, it logs, sends an ACK and
> >>> commits locally. Upon receiving ACK from any follower, leader
> >>> commits a proposal locally, no COMMIT message need to be sent to
> >>> followers. In this case, all servers commit a proposal in two
> >>> round-trips, resulting in reducing latency particularly in followers.
> >>>
> >>> Note that this optimization can only work in 3-servers cluster
> >>> (follower reaches a majority as soon as it acks).
> >>> Does anyone see any problems with such (small) optimization?
> >>> Ibrahim
>
>

RE: 3-server Zab cluster

Posted by "Ibrahim El-sanosi (PGR)" <i....@newcastle.ac.uk>.

Hi Flavio,


>That's not accurate. Being recorded by a quorum guarantees that a txn will be in the initial state of future epochs, but a prospective leader might have txns it its log that haven't been recorded in *a log*. The ?>prospective leader needs to make sure that such txns are recorded in a quorum before establishing a new epoch, though.

I guess you meant a quorum not a LOG in above world *log* !!!

Thank you

Ibrahim

-----Original Message-----
From: Flavio Junqueira [mailto:fpj@apache.org] 
Sent: Monday, October 05, 2015 06:23 م
To: user@zookeeper.apache.org?
Subject: Re: 3-server Zab cluster


> On 05 Oct 2015, at 18:13, Ibrahim El-sanosi (PGR) <i....@newcastle.ac.uk> wrote:
> 
> Hi Rakesh,
> 
> In Zab, before the end of synchronization phase, new leader will not commit any proposals in transaction logs that have not got a majority of acks from pervious ensemble  (that what you are saying).

That's not accurate. Being recorded by a quorum guarantees that a txn will be in the initial state of future epochs, but a prospective leader might have txns it its log that haven't been recorded in a log. The prospective leader needs to make sure that such txns are recorded in a quorum before establishing a new epoch, though.

> I think what Zab does is that before the end of synchronization phase,  in L and F2 (the new quorum), L (a prospective leader) will sync its own state with F2 as the initial state.  Referring to my scenario, zxid =10 is part of the initial state and as a result it will be delivered in new quorum (L and F2) before  processing new proposals of new epoch.

Yes, this is right.

> 
> You can read this thread 
> http://zookeeper-user.578899.n2.nabble.com/Zab-Failure-scenario-td7581
> 583.html 
> <http://zookeeper-user.578899.n2.nabble.com/Zab-Failure-scenario-td758
> 1583.html> for more info
> 
> What do you think? Does anyone have any questions or concerns about such (small) optimization?

I'm not entirely sure what the optimization is and if you are proposing a change or what. Are you looking for a blessing from this community? I'd like to understand what you're trying to achieve.

-Flavio

> 
> Ibrahim
> 
> From: Rakesh Radhakrishnan [mailto:rakeshr.apache@gmail.com 
> <ma...@gmail.com>]
> Sent: Thursday, October 01, 2015 06:15 م
> To: Ibrahim El-sanosi (PGR)
> Subject: Re: 3-server Zab cluster
> 
>>>>>>>>> (***) Ok, I thought when F2 form a quorum with L and  before serving clients, L synchronizes its state with F2, resulting in zxid=10 will be committed in L and F2 as well. I also though this process is the same as Zab, isn't it?
> 
> Since L didn't receives any ACK responses from F1 or F2 before leaving the Leader status previously, L won't commit transaction zxid=10. IIUC after re-forming the new quorum L will not have any mechanism to re-initiate the proposal(Active messaging phase) for the previous zxid=10.
> 
> -Rakesh
> 
> On Thu, Oct 1, 2015 at 10:19 PM, Ibrahim El-sanosi (PGR) <i.s.el-sanosi@newcastle.ac.uk <ma...@newcastle.ac.uk><mailto:i.s.el-sanosi@newcastle.ac.uk <ma...@newcastle.ac.uk>>> wrote:
> Thank you Rakesh.
> 
>>>> In your case, zk client sees a successful response from F1. Then assume F2 >>>joins quorum first and L become the leader again. But the newly formed >>>quorum will not have the zxid=10 transaction. This will make the cluster >>>inconsistent, isn't it?
> 
> (***) Ok, I thought when F2 form a quorum with L and  before serving clients, L synchronizes its state with F2, resulting in zxid=10 will be committed in L and F2 as well. I also though this process is the same as Zab, isn't it?
> 
> 
>>>> Apart from the above case I'm not seeing any other problems with 3 node >>>cluster. The above data loss case can be avoided by putting an assumption >>>that more than a tolerated number of server failures may affect the cluster >>>consistency and results in data loss.
> 
> Yes, if the solution above (***) is not correct, you assumption makes sense.
> 
> Ibrahim
> 
> From: Rakesh Radhakrishnan [mailto:rakeshr.apache@gmail.com 
> <ma...@gmail.com><mailto:rakeshr.apache@gmail.com 
> <ma...@gmail.com>>]
> Sent: 01 October 2015 17:26
> To: user@zookeeper.apache.org 
> <ma...@zookeeper.apache.org><mailto:user@zookeeper.apache.org 
> <ma...@zookeeper.apache.org>>; Ibrahim El-sanosi (PGR)
> 
> Subject: Re: 3-server Zab cluster
> 
> Hi Ibrahim,
> 
> Below example taken from your older mail thread.
> 
>>>>>> 1. leader  (L)  sends a proposal p with zxid =10 to F1 and F2.
>>>>>> 2. F1 logs, sends an ACK, commits, replays to clients and 
>>>>>> crashes. F2 crashes before receiving P10. L has not received any 
>>>>>> ACKs
> 
> My thoughts for the above scenario is,
> 
> In your case, zk client sees a successful response from F1. Then assume F2 joins quorum first and L become the leader again. But the newly formed quorum will not have the zxid=10 transaction. This will make the cluster inconsistent, isn't it?
> 
> Apart from the above case I'm not seeing any other problems with 3 node cluster. The above data loss case can be avoided by putting an assumption that more than a tolerated number of server failures may affect the cluster consistency and results in data loss. But I feel this optimization would have more cases if we scale up the cluster size beyond 3 servers. Now, I'm not thinking in that direction as your case is limited to 3 node cluster.
> 
> Regards,
> Rakesh
> 
> 
> On Tue, Sep 29, 2015 at 2:28 PM, Ibrahim El-sanosi (PGR) <i.s.el-sanosi@newcastle.ac.uk <ma...@newcastle.ac.uk><mailto:i.s.el-sanosi@newcastle.ac.uk <ma...@newcastle.ac.uk>>> wrote:
> Yes Alex, in my post I mentioned that this (small) optimization can only work with 3-servers cluster.
> 
> Who could confirm the optimization can work?
> 
> Ibrahim
> 
> -----Original Message-----
> From: Alexander Shraer [mailto:shralex@gmail.com 
> <ma...@gmail.com><mailto:shralex@gmail.com 
> <ma...@gmail.com>>]
> Sent: Tuesday, September 29, 2015 12:11 ص
> To: user@zookeeper.apache.org 
> <ma...@zookeeper.apache.org><mailto:user@zookeeper.apache.org 
> <ma...@zookeeper.apache.org>>
> Subject: Re: 3-server Zab cluster
> 
> I'm not 100% sure whether operations that were pending on the leader are sent out during sync when this leader looses quorum and re-elected. If so, then maybe you're right. But in any case, this would not work for 5 or more servers...
> 
> On Mon, Sep 28, 2015 at 3:51 PM, Ibrahim El-sanosi (PGR) < i.s.el-sanosi@newcastle.ac.uk <ma...@newcastle.ac.uk><mailto:i.s.el-sanosi@newcastle.ac.uk <ma...@newcastle.ac.uk>>> wrote:
> 
>> Thank you Alex for replaying.
>> 
>> When you said " the leader gets re-elected and the operation is 
>> truncated from logs at other servers". I though the new leader will 
>> sync the its logs with other followers (synchronization phase), 
>> resulting in the operation will commit by new quorum.  Let me make the scenarios as steps:
>> 
>> 1. leader  (L)  sends a proposal p with zxid =10 to F1 and F2.
>> 2. F1 logs, sends an ACK, commits, replays to clients and crashes. F2 
>> crashes before receiving P10. L has not received any ACKs
>> 
>> Possible solution  (1)
>> The leader will move to LOOKING phase as there is no quorum 
>> supporting its leadership. Now Assume F2 wakes up. F2 forms a quorum 
>> with the L (pervious leader), L becomes new leader again as it has latest zxid (10) in its log.
>> L syncs its state with F2, as a result L, F1 (before crashing) and F2 
>> commit P10.  Is that correct?
>> 
>> Possible solution  (2)
>> The leader will move to LOOKING phase as there is no quorum 
>> supporting its leadership. Now Assume F1 (with Zxid =10  committed) 
>> wakes up. I am not sure who should be a leader (F1 with Zxid =10 
>> committed or L (pervious
>> leader) with Zxid = 10 logged), I think F1 become a new leader as it 
>> has Zxid = 10 committed. F1 forms a quorum with the L (pervious 
>> leader), F1 becomes new leader as it has latest zxid (10) . L (new
>> leader) syncs its state with L (pervious leader now become a 
>> follower), as a result Zxid10 commits by new quorum.  Is that correct?
>> 
>> What do you think?
>> 
>> Ibrahim
>> 
>> 
>> 
>> 
>> 
>> -----Original Message-----
>> From: Alexander Shraer [mailto:shralex@gmail.com 
>> <ma...@gmail.com><mailto:shralex@gmail.com 
>> <ma...@gmail.com>>]
>> Sent: Monday, September 28, 2015 07:27 م
>> To: user@zookeeper.apache.org 
>> <ma...@zookeeper.apache.org><mailto:user@zookeeper.apache.org 
>> <ma...@zookeeper.apache.org>>
>> Cc: dev@zookeeper.apache.org 
>> <ma...@zookeeper.apache.org><mailto:dev@zookeeper.apache.org 
>> <ma...@zookeeper.apache.org>>
>> Subject: Re: 3-server Zab cluster
>> 
>> Committing locally when sending an ACK at a server would lead to loss 
>> of consistency - it is possible that this is the only server that 
>> acks, e.g., this server is temporarily disconnected from the leader, 
>> the leader gets re-elected and the operation is truncated from logs 
>> at other servers. Its ok to ACK it but its not ok to commit since 
>> this exposes this to users as a committed operation that they can see.
>> 
>> On Mon, Sep 28, 2015 at 4:19 AM, Ibrahim El-sanosi (PGR) < 
>> i.s.el-sanosi@newcastle.ac.uk <ma...@newcastle.ac.uk><mailto:i.s.el-sanosi@newcastle.ac.uk <ma...@newcastle.ac.uk>>> wrote:
>> 
>>> In Zab, assume we have a cluster consists of 3-servers. To deliver a 
>>> write request, it must run 3 communication steps proposal, 
>>> acknowledgement and commit.
>>> As Zab uses reliable FIFO, it is possible to remove commit round. As 
>>> soon as a follower receives a proposal, it logs, sends an ACK and 
>>> commits locally. Upon receiving ACK from any follower, leader 
>>> commits a proposal locally, no COMMIT message need to be sent to 
>>> followers. In this case, all servers commit a proposal in two 
>>> round-trips, resulting in reducing latency particularly in followers.
>>> 
>>> Note that this optimization can only work in 3-servers cluster 
>>> (follower reaches a majority as soon as it acks).
>>> Does anyone see any problems with such (small) optimization?
>>> Ibrahim

RE: 3-server Zab cluster

Posted by "Ibrahim El-sanosi (PGR)" <i....@newcastle.ac.uk>.

>I'm not entirely sure what the optimization is and if you are proposing a change or what. Are you looking for a blessing from this community? I'd like to understand what you're trying to achieve.


As Zab uses reliable FIFO, it is possible to remove commit round. As soon as a follower receives a proposal, it logs, sends an ACK and commits locally. Upon receiving ACK from any follower, leader commits a proposal locally, no COMMIT message need to be sent to followers. In this case, all servers commit a proposal in two round-trips, resulting in reducing latency particularly in followers. 

Note that this optimization can only work in 3-servers cluster (follower reaches a majority as soon as it acks).  

The proposal:

ZK with  3-server cluster,  it is common use compared to 5 or 7, etc ensemble (I think). Clients  who  use 3-ZK ensemble and look to achieve better latency, we may provide this optimization (above algorithm)  as optional. 

I hope my aim is clear now.

Ibrahim 

-----Original Message-----
From: Flavio Junqueira [mailto:fpj@apache.org] 
Sent: Monday, October 05, 2015 06:23 م
To: user@zookeeper.apache.org
Subject: Re: 3-server Zab cluster


> On 05 Oct 2015, at 18:13, Ibrahim El-sanosi (PGR) <i....@newcastle.ac.uk> wrote:
> 
> Hi Rakesh,
> 
> In Zab, before the end of synchronization phase, new leader will not commit any proposals in transaction logs that have not got a majority of acks from pervious ensemble  (that what you are saying).

That's not accurate. Being recorded by a quorum guarantees that a txn will be in the initial state of future epochs, but a prospective leader might have txns it its log that haven't been recorded in a log. The prospective leader needs to make sure that such txns are recorded in a quorum before establishing a new epoch, though.

> I think what Zab does is that before the end of synchronization phase,  in L and F2 (the new quorum), L (a prospective leader) will sync its own state with F2 as the initial state.  Referring to my scenario, zxid =10 is part of the initial state and as a result it will be delivered in new quorum (L and F2) before  processing new proposals of new epoch.

Yes, this is right.

> 
> You can read this thread 
> http://zookeeper-user.578899.n2.nabble.com/Zab-Failure-scenario-td7581
> 583.html 
> <http://zookeeper-user.578899.n2.nabble.com/Zab-Failure-scenario-td758
> 1583.html> for more info
> 
> What do you think? Does anyone have any questions or concerns about such (small) optimization?

I'm not entirely sure what the optimization is and if you are proposing a change or what. Are you looking for a blessing from this community? I'd like to understand what you're trying to achieve.

-Flavio

> 
> Ibrahim
> 
> From: Rakesh Radhakrishnan [mailto:rakeshr.apache@gmail.com 
> <ma...@gmail.com>]
> Sent: Thursday, October 01, 2015 06:15 م
> To: Ibrahim El-sanosi (PGR)
> Subject: Re: 3-server Zab cluster
> 
>>>>>>>>> (***) Ok, I thought when F2 form a quorum with L and  before serving clients, L synchronizes its state with F2, resulting in zxid=10 will be committed in L and F2 as well. I also though this process is the same as Zab, isn't it?
> 
> Since L didn't receives any ACK responses from F1 or F2 before leaving the Leader status previously, L won't commit transaction zxid=10. IIUC after re-forming the new quorum L will not have any mechanism to re-initiate the proposal(Active messaging phase) for the previous zxid=10.
> 
> -Rakesh
> 
> On Thu, Oct 1, 2015 at 10:19 PM, Ibrahim El-sanosi (PGR) <i.s.el-sanosi@newcastle.ac.uk <ma...@newcastle.ac.uk><mailto:i.s.el-sanosi@newcastle.ac.uk <ma...@newcastle.ac.uk>>> wrote:
> Thank you Rakesh.
> 
>>>> In your case, zk client sees a successful response from F1. Then assume F2 >>>joins quorum first and L become the leader again. But the newly formed >>>quorum will not have the zxid=10 transaction. This will make the cluster >>>inconsistent, isn't it?
> 
> (***) Ok, I thought when F2 form a quorum with L and  before serving clients, L synchronizes its state with F2, resulting in zxid=10 will be committed in L and F2 as well. I also though this process is the same as Zab, isn't it?
> 
> 
>>>> Apart from the above case I'm not seeing any other problems with 3 node >>>cluster. The above data loss case can be avoided by putting an assumption >>>that more than a tolerated number of server failures may affect the cluster >>>consistency and results in data loss.
> 
> Yes, if the solution above (***) is not correct, you assumption makes sense.
> 
> Ibrahim
> 
> From: Rakesh Radhakrishnan [mailto:rakeshr.apache@gmail.com 
> <ma...@gmail.com><mailto:rakeshr.apache@gmail.com 
> <ma...@gmail.com>>]
> Sent: 01 October 2015 17:26
> To: user@zookeeper.apache.org 
> <ma...@zookeeper.apache.org><mailto:user@zookeeper.apache.org 
> <ma...@zookeeper.apache.org>>; Ibrahim El-sanosi (PGR)
> 
> Subject: Re: 3-server Zab cluster
> 
> Hi Ibrahim,
> 
> Below example taken from your older mail thread.
> 
>>>>>> 1. leader  (L)  sends a proposal p with zxid =10 to F1 and F2.
>>>>>> 2. F1 logs, sends an ACK, commits, replays to clients and 
>>>>>> crashes. F2 crashes before receiving P10. L has not received any 
>>>>>> ACKs
> 
> My thoughts for the above scenario is,
> 
> In your case, zk client sees a successful response from F1. Then assume F2 joins quorum first and L become the leader again. But the newly formed quorum will not have the zxid=10 transaction. This will make the cluster inconsistent, isn't it?
> 
> Apart from the above case I'm not seeing any other problems with 3 node cluster. The above data loss case can be avoided by putting an assumption that more than a tolerated number of server failures may affect the cluster consistency and results in data loss. But I feel this optimization would have more cases if we scale up the cluster size beyond 3 servers. Now, I'm not thinking in that direction as your case is limited to 3 node cluster.
> 
> Regards,
> Rakesh
> 
> 
> On Tue, Sep 29, 2015 at 2:28 PM, Ibrahim El-sanosi (PGR) <i.s.el-sanosi@newcastle.ac.uk <ma...@newcastle.ac.uk><mailto:i.s.el-sanosi@newcastle.ac.uk <ma...@newcastle.ac.uk>>> wrote:
> Yes Alex, in my post I mentioned that this (small) optimization can only work with 3-servers cluster.
> 
> Who could confirm the optimization can work?
> 
> Ibrahim
> 
> -----Original Message-----
> From: Alexander Shraer [mailto:shralex@gmail.com 
> <ma...@gmail.com><mailto:shralex@gmail.com 
> <ma...@gmail.com>>]
> Sent: Tuesday, September 29, 2015 12:11 ص
> To: user@zookeeper.apache.org 
> <ma...@zookeeper.apache.org><mailto:user@zookeeper.apache.org 
> <ma...@zookeeper.apache.org>>
> Subject: Re: 3-server Zab cluster
> 
> I'm not 100% sure whether operations that were pending on the leader are sent out during sync when this leader looses quorum and re-elected. If so, then maybe you're right. But in any case, this would not work for 5 or more servers...
> 
> On Mon, Sep 28, 2015 at 3:51 PM, Ibrahim El-sanosi (PGR) < i.s.el-sanosi@newcastle.ac.uk <ma...@newcastle.ac.uk><mailto:i.s.el-sanosi@newcastle.ac.uk <ma...@newcastle.ac.uk>>> wrote:
> 
>> Thank you Alex for replaying.
>> 
>> When you said " the leader gets re-elected and the operation is 
>> truncated from logs at other servers". I though the new leader will 
>> sync the its logs with other followers (synchronization phase), 
>> resulting in the operation will commit by new quorum.  Let me make the scenarios as steps:
>> 
>> 1. leader  (L)  sends a proposal p with zxid =10 to F1 and F2.
>> 2. F1 logs, sends an ACK, commits, replays to clients and crashes. F2 
>> crashes before receiving P10. L has not received any ACKs
>> 
>> Possible solution  (1)
>> The leader will move to LOOKING phase as there is no quorum 
>> supporting its leadership. Now Assume F2 wakes up. F2 forms a quorum 
>> with the L (pervious leader), L becomes new leader again as it has latest zxid (10) in its log.
>> L syncs its state with F2, as a result L, F1 (before crashing) and F2 
>> commit P10.  Is that correct?
>> 
>> Possible solution  (2)
>> The leader will move to LOOKING phase as there is no quorum 
>> supporting its leadership. Now Assume F1 (with Zxid =10  committed) 
>> wakes up. I am not sure who should be a leader (F1 with Zxid =10 
>> committed or L (pervious
>> leader) with Zxid = 10 logged), I think F1 become a new leader as it 
>> has Zxid = 10 committed. F1 forms a quorum with the L (pervious 
>> leader), F1 becomes new leader as it has latest zxid (10) . L (new
>> leader) syncs its state with L (pervious leader now become a 
>> follower), as a result Zxid10 commits by new quorum.  Is that correct?
>> 
>> What do you think?
>> 
>> Ibrahim
>> 
>> 
>> 
>> 
>> 
>> -----Original Message-----
>> From: Alexander Shraer [mailto:shralex@gmail.com 
>> <ma...@gmail.com><mailto:shralex@gmail.com 
>> <ma...@gmail.com>>]
>> Sent: Monday, September 28, 2015 07:27 م
>> To: user@zookeeper.apache.org 
>> <ma...@zookeeper.apache.org><mailto:user@zookeeper.apache.org 
>> <ma...@zookeeper.apache.org>>
>> Cc: dev@zookeeper.apache.org 
>> <ma...@zookeeper.apache.org><mailto:dev@zookeeper.apache.org 
>> <ma...@zookeeper.apache.org>>
>> Subject: Re: 3-server Zab cluster
>> 
>> Committing locally when sending an ACK at a server would lead to loss 
>> of consistency - it is possible that this is the only server that 
>> acks, e.g., this server is temporarily disconnected from the leader, 
>> the leader gets re-elected and the operation is truncated from logs 
>> at other servers. Its ok to ACK it but its not ok to commit since 
>> this exposes this to users as a committed operation that they can see.
>> 
>> On Mon, Sep 28, 2015 at 4:19 AM, Ibrahim El-sanosi (PGR) < 
>> i.s.el-sanosi@newcastle.ac.uk <ma...@newcastle.ac.uk><mailto:i.s.el-sanosi@newcastle.ac.uk <ma...@newcastle.ac.uk>>> wrote:
>> 
>>> In Zab, assume we have a cluster consists of 3-servers. To deliver a 
>>> write request, it must run 3 communication steps proposal, 
>>> acknowledgement and commit.
>>> As Zab uses reliable FIFO, it is possible to remove commit round. As 
>>> soon as a follower receives a proposal, it logs, sends an ACK and 
>>> commits locally. Upon receiving ACK from any follower, leader 
>>> commits a proposal locally, no COMMIT message need to be sent to 
>>> followers. In this case, all servers commit a proposal in two 
>>> round-trips, resulting in reducing latency particularly in followers.
>>> 
>>> Note that this optimization can only work in 3-servers cluster 
>>> (follower reaches a majority as soon as it acks).
>>> Does anyone see any problems with such (small) optimization?
>>> Ibrahim

Re: 3-server Zab cluster

Posted by Flavio Junqueira <fp...@apache.org>.

> On 05 Oct 2015, at 18:13, Ibrahim El-sanosi (PGR) <i....@newcastle.ac.uk> wrote:
> 
> Hi Rakesh,
> 
> In Zab, before the end of synchronization phase, new leader will not commit any proposals in transaction logs that have not got a majority of acks from pervious ensemble  (that what you are saying).

That's not accurate. Being recorded by a quorum guarantees that a txn will be in the initial state of future epochs, but a prospective leader might have txns it its log that haven't been recorded in a log. The prospective leader needs to make sure that such txns are recorded in a quorum before establishing a new epoch, though.

> I think what Zab does is that before the end of synchronization phase,  in L and F2 (the new quorum), L (a prospective leader) will sync its own state with F2 as the initial state.  Referring to my scenario, zxid =10 is part of the initial state and as a result it will be delivered in new quorum (L and F2) before  processing new proposals of new epoch.

Yes, this is right.

> 
> You can read this thread http://zookeeper-user.578899.n2.nabble.com/Zab-Failure-scenario-td7581583.html <http://zookeeper-user.578899.n2.nabble.com/Zab-Failure-scenario-td7581583.html> for more info
> 
> What do you think? Does anyone have any questions or concerns about such (small) optimization?

I'm not entirely sure what the optimization is and if you are proposing a change or what. Are you looking for a blessing from this community? I'd like to understand what you're trying to achieve.

-Flavio

> 
> Ibrahim
> 
> From: Rakesh Radhakrishnan [mailto:rakeshr.apache@gmail.com <ma...@gmail.com>]
> Sent: Thursday, October 01, 2015 06:15 م
> To: Ibrahim El-sanosi (PGR)
> Subject: Re: 3-server Zab cluster
> 
>>>>>>>>> (***) Ok, I thought when F2 form a quorum with L and  before serving clients, L synchronizes its state with F2, resulting in zxid=10 will be committed in L and F2 as well. I also though this process is the same as Zab, isn't it?
> 
> Since L didn't receives any ACK responses from F1 or F2 before leaving the Leader status previously, L won't commit transaction zxid=10. IIUC after re-forming the new quorum L will not have any mechanism to re-initiate the proposal(Active messaging phase) for the previous zxid=10.
> 
> -Rakesh
> 
> On Thu, Oct 1, 2015 at 10:19 PM, Ibrahim El-sanosi (PGR) <i.s.el-sanosi@newcastle.ac.uk <ma...@newcastle.ac.uk><mailto:i.s.el-sanosi@newcastle.ac.uk <ma...@newcastle.ac.uk>>> wrote:
> Thank you Rakesh.
> 
>>>> In your case, zk client sees a successful response from F1. Then assume F2 >>>joins quorum first and L become the leader again. But the newly formed >>>quorum will not have the zxid=10 transaction. This will make the cluster >>>inconsistent, isn't it?
> 
> (***) Ok, I thought when F2 form a quorum with L and  before serving clients, L synchronizes its state with F2, resulting in zxid=10 will be committed in L and F2 as well. I also though this process is the same as Zab, isn't it?
> 
> 
>>>> Apart from the above case I'm not seeing any other problems with 3 node >>>cluster. The above data loss case can be avoided by putting an assumption >>>that more than a tolerated number of server failures may affect the cluster >>>consistency and results in data loss.
> 
> Yes, if the solution above (***) is not correct, you assumption makes sense.
> 
> Ibrahim
> 
> From: Rakesh Radhakrishnan [mailto:rakeshr.apache@gmail.com <ma...@gmail.com><mailto:rakeshr.apache@gmail.com <ma...@gmail.com>>]
> Sent: 01 October 2015 17:26
> To: user@zookeeper.apache.org <ma...@zookeeper.apache.org><mailto:user@zookeeper.apache.org <ma...@zookeeper.apache.org>>; Ibrahim El-sanosi (PGR)
> 
> Subject: Re: 3-server Zab cluster
> 
> Hi Ibrahim,
> 
> Below example taken from your older mail thread.
> 
>>>>>> 1. leader  (L)  sends a proposal p with zxid =10 to F1 and F2.
>>>>>> 2. F1 logs, sends an ACK, commits, replays to clients and crashes. F2 crashes before receiving P10. L has not received any ACKs
> 
> My thoughts for the above scenario is,
> 
> In your case, zk client sees a successful response from F1. Then assume F2 joins quorum first and L become the leader again. But the newly formed quorum will not have the zxid=10 transaction. This will make the cluster inconsistent, isn't it?
> 
> Apart from the above case I'm not seeing any other problems with 3 node cluster. The above data loss case can be avoided by putting an assumption that more than a tolerated number of server failures may affect the cluster consistency and results in data loss. But I feel this optimization would have more cases if we scale up the cluster size beyond 3 servers. Now, I'm not thinking in that direction as your case is limited to 3 node cluster.
> 
> Regards,
> Rakesh
> 
> 
> On Tue, Sep 29, 2015 at 2:28 PM, Ibrahim El-sanosi (PGR) <i.s.el-sanosi@newcastle.ac.uk <ma...@newcastle.ac.uk><mailto:i.s.el-sanosi@newcastle.ac.uk <ma...@newcastle.ac.uk>>> wrote:
> Yes Alex, in my post I mentioned that this (small) optimization can only work with 3-servers cluster.
> 
> Who could confirm the optimization can work?
> 
> Ibrahim
> 
> -----Original Message-----
> From: Alexander Shraer [mailto:shralex@gmail.com <ma...@gmail.com><mailto:shralex@gmail.com <ma...@gmail.com>>]
> Sent: Tuesday, September 29, 2015 12:11 ص
> To: user@zookeeper.apache.org <ma...@zookeeper.apache.org><mailto:user@zookeeper.apache.org <ma...@zookeeper.apache.org>>
> Subject: Re: 3-server Zab cluster
> 
> I'm not 100% sure whether operations that were pending on the leader are sent out during sync when this leader looses quorum and re-elected. If so, then maybe you're right. But in any case, this would not work for 5 or more servers...
> 
> On Mon, Sep 28, 2015 at 3:51 PM, Ibrahim El-sanosi (PGR) < i.s.el-sanosi@newcastle.ac.uk <ma...@newcastle.ac.uk><mailto:i.s.el-sanosi@newcastle.ac.uk <ma...@newcastle.ac.uk>>> wrote:
> 
>> Thank you Alex for replaying.
>> 
>> When you said " the leader gets re-elected and the operation is
>> truncated from logs at other servers". I though the new leader will
>> sync the its logs with other followers (synchronization phase),
>> resulting in the operation will commit by new quorum.  Let me make the scenarios as steps:
>> 
>> 1. leader  (L)  sends a proposal p with zxid =10 to F1 and F2.
>> 2. F1 logs, sends an ACK, commits, replays to clients and crashes. F2
>> crashes before receiving P10. L has not received any ACKs
>> 
>> Possible solution  (1)
>> The leader will move to LOOKING phase as there is no quorum supporting
>> its leadership. Now Assume F2 wakes up. F2 forms a quorum with the L
>> (pervious leader), L becomes new leader again as it has latest zxid (10) in its log.
>> L syncs its state with F2, as a result L, F1 (before crashing) and F2
>> commit P10.  Is that correct?
>> 
>> Possible solution  (2)
>> The leader will move to LOOKING phase as there is no quorum supporting
>> its leadership. Now Assume F1 (with Zxid =10  committed) wakes up. I
>> am not sure who should be a leader (F1 with Zxid =10 committed or L
>> (pervious
>> leader) with Zxid = 10 logged), I think F1 become a new leader as it
>> has Zxid = 10 committed. F1 forms a quorum with the L (pervious
>> leader), F1 becomes new leader as it has latest zxid (10) . L (new
>> leader) syncs its state with L (pervious leader now become a
>> follower), as a result Zxid10 commits by new quorum.  Is that correct?
>> 
>> What do you think?
>> 
>> Ibrahim
>> 
>> 
>> 
>> 
>> 
>> -----Original Message-----
>> From: Alexander Shraer [mailto:shralex@gmail.com <ma...@gmail.com><mailto:shralex@gmail.com <ma...@gmail.com>>]
>> Sent: Monday, September 28, 2015 07:27 م
>> To: user@zookeeper.apache.org <ma...@zookeeper.apache.org><mailto:user@zookeeper.apache.org <ma...@zookeeper.apache.org>>
>> Cc: dev@zookeeper.apache.org <ma...@zookeeper.apache.org><mailto:dev@zookeeper.apache.org <ma...@zookeeper.apache.org>>
>> Subject: Re: 3-server Zab cluster
>> 
>> Committing locally when sending an ACK at a server would lead to loss
>> of consistency - it is possible that this is the only server that
>> acks, e.g., this server is temporarily disconnected from the leader,
>> the leader gets re-elected and the operation is truncated from logs at
>> other servers. Its ok to ACK it but its not ok to commit since this
>> exposes this to users as a committed operation that they can see.
>> 
>> On Mon, Sep 28, 2015 at 4:19 AM, Ibrahim El-sanosi (PGR) <
>> i.s.el-sanosi@newcastle.ac.uk <ma...@newcastle.ac.uk><mailto:i.s.el-sanosi@newcastle.ac.uk <ma...@newcastle.ac.uk>>> wrote:
>> 
>>> In Zab, assume we have a cluster consists of 3-servers. To deliver a
>>> write request, it must run 3 communication steps proposal,
>>> acknowledgement and commit.
>>> As Zab uses reliable FIFO, it is possible to remove commit round. As
>>> soon as a follower receives a proposal, it logs, sends an ACK and
>>> commits locally. Upon receiving ACK from any follower, leader
>>> commits a proposal locally, no COMMIT message need to be sent to
>>> followers. In this case, all servers commit a proposal in two
>>> round-trips, resulting in reducing latency particularly in followers.
>>> 
>>> Note that this optimization can only work in 3-servers cluster
>>> (follower reaches a majority as soon as it acks).
>>> Does anyone see any problems with such (small) optimization?
>>> Ibrahim

RE: 3-server Zab cluster

Posted by "Ibrahim El-sanosi (PGR)" <i....@newcastle.ac.uk>.

Hi Rakesh,

In Zab, before the end of synchronization phase, new leader will not commit any proposals in transaction logs that have not got a majority of acks from pervious ensemble  (that what you are saying).

I think what Zab does is that before the end of synchronization phase,  in L and F2 (the new quorum), L (a prospective leader) will sync its own state with F2 as the initial state.  Referring to my scenario, zxid =10 is part of the initial state and as a result it will be delivered in new quorum (L and F2) before  processing new proposals of new epoch.

You can read this thread http://zookeeper-user.578899.n2.nabble.com/Zab-Failure-scenario-td7581583.html for more info

What do you think? Does anyone have any questions or concerns about such (small) optimization?

Ibrahim

From: Rakesh Radhakrishnan [mailto:rakeshr.apache@gmail.com]
Sent: Thursday, October 01, 2015 06:15 م
To: Ibrahim El-sanosi (PGR)
Subject: Re: 3-server Zab cluster

>>>>>>>>(***) Ok, I thought when F2 form a quorum with L and  before serving clients, L synchronizes its state with F2, resulting in zxid=10 will be committed in L and F2 as well. I also though this process is the same as Zab, isn't it?

Since L didn't receives any ACK responses from F1 or F2 before leaving the Leader status previously, L won't commit transaction zxid=10. IIUC after re-forming the new quorum L will not have any mechanism to re-initiate the proposal(Active messaging phase) for the previous zxid=10.

-Rakesh

On Thu, Oct 1, 2015 at 10:19 PM, Ibrahim El-sanosi (PGR) <i....@newcastle.ac.uk>> wrote:
Thank you Rakesh.

>>>In your case, zk client sees a successful response from F1. Then assume F2 >>>joins quorum first and L become the leader again. But the newly formed >>>quorum will not have the zxid=10 transaction. This will make the cluster >>>inconsistent, isn't it?

(***) Ok, I thought when F2 form a quorum with L and  before serving clients, L synchronizes its state with F2, resulting in zxid=10 will be committed in L and F2 as well. I also though this process is the same as Zab, isn't it?

>>>Apart from the above case I'm not seeing any other problems with 3 node >>>cluster. The above data loss case can be avoided by putting an assumption >>>that more than a tolerated number of server failures may affect the cluster >>>consistency and results in data loss.

Yes, if the solution above (***) is not correct, you assumption makes sense.

Ibrahim

From: Rakesh Radhakrishnan [mailto:rakeshr.apache@gmail.com<ma...@gmail.com>]
Sent: 01 October 2015 17:26
To: user@zookeeper.apache.org<ma...@zookeeper.apache.org>; Ibrahim El-sanosi (PGR)

Subject: Re: 3-server Zab cluster

Hi Ibrahim,

Below example taken from your older mail thread.

>>>>> 1. leader  (L)  sends a proposal p with zxid =10 to F1 and F2.
>>>>> 2. F1 logs, sends an ACK, commits, replays to clients and crashes. F2 crashes before receiving P10. L has not received any ACKs

My thoughts for the above scenario is,

In your case, zk client sees a successful response from F1. Then assume F2 joins quorum first and L become the leader again. But the newly formed quorum will not have the zxid=10 transaction. This will make the cluster inconsistent, isn't it?

Apart from the above case I'm not seeing any other problems with 3 node cluster. The above data loss case can be avoided by putting an assumption that more than a tolerated number of server failures may affect the cluster consistency and results in data loss. But I feel this optimization would have more cases if we scale up the cluster size beyond 3 servers. Now, I'm not thinking in that direction as your case is limited to 3 node cluster.

Regards,
Rakesh

On Tue, Sep 29, 2015 at 2:28 PM, Ibrahim El-sanosi (PGR) <i....@newcastle.ac.uk>> wrote:
Yes Alex, in my post I mentioned that this (small) optimization can only work with 3-servers cluster.

Who could confirm the optimization can work?

Ibrahim

-----Original Message-----
From: Alexander Shraer [mailto:shralex@gmail.com<ma...@gmail.com>]
Sent: Tuesday, September 29, 2015 12:11 ص
To: user@zookeeper.apache.org<ma...@zookeeper.apache.org>
Subject: Re: 3-server Zab cluster

I'm not 100% sure whether operations that were pending on the leader are sent out during sync when this leader looses quorum and re-elected. If so, then maybe you're right. But in any case, this would not work for 5 or more servers...

On Mon, Sep 28, 2015 at 3:51 PM, Ibrahim El-sanosi (PGR) < i.s.el-sanosi@newcastle.ac.uk<ma...@newcastle.ac.uk>> wrote:

> Thank you Alex for replaying.
>
> When you said " the leader gets re-elected and the operation is
> truncated from logs at other servers". I though the new leader will
> sync the its logs with other followers (synchronization phase),
> resulting in the operation will commit by new quorum.  Let me make the scenarios as steps:
>
> 1. leader  (L)  sends a proposal p with zxid =10 to F1 and F2.
> 2. F1 logs, sends an ACK, commits, replays to clients and crashes. F2
> crashes before receiving P10. L has not received any ACKs
>
> Possible solution  (1)
> The leader will move to LOOKING phase as there is no quorum supporting
> its leadership. Now Assume F2 wakes up. F2 forms a quorum with the L
> (pervious leader), L becomes new leader again as it has latest zxid (10) in its log.
> L syncs its state with F2, as a result L, F1 (before crashing) and F2
> commit P10.  Is that correct?
>
> Possible solution  (2)
> The leader will move to LOOKING phase as there is no quorum supporting
> its leadership. Now Assume F1 (with Zxid =10  committed) wakes up. I
> am not sure who should be a leader (F1 with Zxid =10 committed or L
> (pervious
> leader) with Zxid = 10 logged), I think F1 become a new leader as it
> has Zxid = 10 committed. F1 forms a quorum with the L (pervious
> leader), F1 becomes new leader as it has latest zxid (10) . L (new
> leader) syncs its state with L (pervious leader now become a
> follower), as a result Zxid10 commits by new quorum.  Is that correct?
>
> What do you think?
>
> Ibrahim
>
>
>
>
>
> -----Original Message-----
> From: Alexander Shraer [mailto:shralex@gmail.com<ma...@gmail.com>]
> Sent: Monday, September 28, 2015 07:27 م
> To: user@zookeeper.apache.org<ma...@zookeeper.apache.org>
> Cc: dev@zookeeper.apache.org<ma...@zookeeper.apache.org>
> Subject: Re: 3-server Zab cluster
>
> Committing locally when sending an ACK at a server would lead to loss
> of consistency - it is possible that this is the only server that
> acks, e.g., this server is temporarily disconnected from the leader,
> the leader gets re-elected and the operation is truncated from logs at
> other servers. Its ok to ACK it but its not ok to commit since this
> exposes this to users as a committed operation that they can see.
>
> On Mon, Sep 28, 2015 at 4:19 AM, Ibrahim El-sanosi (PGR) <
> i.s.el-sanosi@newcastle.ac.uk<ma...@newcastle.ac.uk>> wrote:
>
> > In Zab, assume we have a cluster consists of 3-servers. To deliver a
> > write request, it must run 3 communication steps proposal,
> > acknowledgement and commit.
> > As Zab uses reliable FIFO, it is possible to remove commit round. As
> > soon as a follower receives a proposal, it logs, sends an ACK and
> > commits locally. Upon receiving ACK from any follower, leader
> > commits a proposal locally, no COMMIT message need to be sent to
> > followers. In this case, all servers commit a proposal in two
> > round-trips, resulting in reducing latency particularly in followers.
> >
> > Note that this optimization can only work in 3-servers cluster
> > (follower reaches a majority as soon as it acks).
> > Does anyone see any problems with such (small) optimization?
> > Ibrahim
> >
>

Re: 3-server Zab cluster

Posted by Rakesh Radhakrishnan <ra...@gmail.com>.

Hi Ibrahim,

Below example taken from your older mail thread.

>>>>> 1. leader  (L)  sends a proposal p with zxid =10 to F1 and F2.
>>>>> 2. F1 logs, sends an ACK, commits, replays to clients and crashes. F2
crashes before receiving P10. L has not received any ACKs

My thoughts for the above scenario is,

In your case, zk client sees a successful response from F1. Then assume F2
joins quorum first and L become the leader again. But the newly formed
quorum will not have the zxid=10 transaction. This will make the cluster
inconsistent, isn't it?

Apart from the above case I'm not seeing any other problems with 3 node
cluster. The above data loss case can be avoided by putting an assumption
that more than a tolerated number of server failures may affect the cluster
consistency and results in data loss. But I feel this optimization would
have more cases if we scale up the cluster size beyond 3 servers. Now, I'm
not thinking in that direction as your case is limited to 3 node cluster.

Regards,
Rakesh


On Tue, Sep 29, 2015 at 2:28 PM, Ibrahim El-sanosi (PGR) <
i.s.el-sanosi@newcastle.ac.uk> wrote:

> Yes Alex, in my post I mentioned that this (small) optimization can only
> work with 3-servers cluster.
>
> Who could confirm the optimization can work?
>
> Ibrahim
>
> -----Original Message-----
> From: Alexander Shraer [mailto:shralex@gmail.com]
> Sent: Tuesday, September 29, 2015 12:11 ص
> To: user@zookeeper.apache.org
> Subject: Re: 3-server Zab cluster
>
> I'm not 100% sure whether operations that were pending on the leader are
> sent out during sync when this leader looses quorum and re-elected. If so,
> then maybe you're right. But in any case, this would not work for 5 or more
> servers...
>
> On Mon, Sep 28, 2015 at 3:51 PM, Ibrahim El-sanosi (PGR) <
> i.s.el-sanosi@newcastle.ac.uk> wrote:
>
> > Thank you Alex for replaying.
> >
> > When you said " the leader gets re-elected and the operation is
> > truncated from logs at other servers". I though the new leader will
> > sync the its logs with other followers (synchronization phase),
> > resulting in the operation will commit by new quorum.  Let me make the
> scenarios as steps:
> >
> > 1. leader  (L)  sends a proposal p with zxid =10 to F1 and F2.
> > 2. F1 logs, sends an ACK, commits, replays to clients and crashes. F2
> > crashes before receiving P10. L has not received any ACKs
> >
> > Possible solution  (1)
> > The leader will move to LOOKING phase as there is no quorum supporting
> > its leadership. Now Assume F2 wakes up. F2 forms a quorum with the L
> > (pervious leader), L becomes new leader again as it has latest zxid (10)
> in its log.
> > L syncs its state with F2, as a result L, F1 (before crashing) and F2
> > commit P10.  Is that correct?
> >
> > Possible solution  (2)
> > The leader will move to LOOKING phase as there is no quorum supporting
> > its leadership. Now Assume F1 (with Zxid =10  committed) wakes up. I
> > am not sure who should be a leader (F1 with Zxid =10 committed or L
> > (pervious
> > leader) with Zxid = 10 logged), I think F1 become a new leader as it
> > has Zxid = 10 committed. F1 forms a quorum with the L (pervious
> > leader), F1 becomes new leader as it has latest zxid (10) . L (new
> > leader) syncs its state with L (pervious leader now become a
> > follower), as a result Zxid10 commits by new quorum.  Is that correct?
> >
> > What do you think?
> >
> > Ibrahim
> >
> >
> >
> >
> >
> > -----Original Message-----
> > From: Alexander Shraer [mailto:shralex@gmail.com]
> > Sent: Monday, September 28, 2015 07:27 م
> > To: user@zookeeper.apache.org
> > Cc: dev@zookeeper.apache.org
> > Subject: Re: 3-server Zab cluster
> >
> > Committing locally when sending an ACK at a server would lead to loss
> > of consistency - it is possible that this is the only server that
> > acks, e.g., this server is temporarily disconnected from the leader,
> > the leader gets re-elected and the operation is truncated from logs at
> > other servers. Its ok to ACK it but its not ok to commit since this
> > exposes this to users as a committed operation that they can see.
> >
> > On Mon, Sep 28, 2015 at 4:19 AM, Ibrahim El-sanosi (PGR) <
> > i.s.el-sanosi@newcastle.ac.uk> wrote:
> >
> > > In Zab, assume we have a cluster consists of 3-servers. To deliver a
> > > write request, it must run 3 communication steps proposal,
> > > acknowledgement and commit.
> > > As Zab uses reliable FIFO, it is possible to remove commit round. As
> > > soon as a follower receives a proposal, it logs, sends an ACK and
> > > commits locally. Upon receiving ACK from any follower, leader
> > > commits a proposal locally, no COMMIT message need to be sent to
> > > followers. In this case, all servers commit a proposal in two
> > > round-trips, resulting in reducing latency particularly in followers.
> > >
> > > Note that this optimization can only work in 3-servers cluster
> > > (follower reaches a majority as soon as it acks).
> > > Does anyone see any problems with such (small) optimization?
> > > Ibrahim
> > >
> >
>

RE: 3-server Zab cluster

Posted by "Ibrahim El-sanosi (PGR)" <i....@newcastle.ac.uk>.

Yes Alex, in my post I mentioned that this (small) optimization can only work with 3-servers cluster.

Who could confirm the optimization can work?

Ibrahim  

-----Original Message-----
From: Alexander Shraer [mailto:shralex@gmail.com] 
Sent: Tuesday, September 29, 2015 12:11 ص
To: user@zookeeper.apache.org
Subject: Re: 3-server Zab cluster

I'm not 100% sure whether operations that were pending on the leader are sent out during sync when this leader looses quorum and re-elected. If so, then maybe you're right. But in any case, this would not work for 5 or more servers...

On Mon, Sep 28, 2015 at 3:51 PM, Ibrahim El-sanosi (PGR) < i.s.el-sanosi@newcastle.ac.uk> wrote:

> Thank you Alex for replaying.
>
> When you said " the leader gets re-elected and the operation is 
> truncated from logs at other servers". I though the new leader will 
> sync the its logs with other followers (synchronization phase), 
> resulting in the operation will commit by new quorum.  Let me make the scenarios as steps:
>
> 1. leader  (L)  sends a proposal p with zxid =10 to F1 and F2.
> 2. F1 logs, sends an ACK, commits, replays to clients and crashes. F2 
> crashes before receiving P10. L has not received any ACKs
>
> Possible solution  (1)
> The leader will move to LOOKING phase as there is no quorum supporting 
> its leadership. Now Assume F2 wakes up. F2 forms a quorum with the L 
> (pervious leader), L becomes new leader again as it has latest zxid (10) in its log.
> L syncs its state with F2, as a result L, F1 (before crashing) and F2 
> commit P10.  Is that correct?
>
> Possible solution  (2)
> The leader will move to LOOKING phase as there is no quorum supporting 
> its leadership. Now Assume F1 (with Zxid =10  committed) wakes up. I 
> am not sure who should be a leader (F1 with Zxid =10 committed or L 
> (pervious
> leader) with Zxid = 10 logged), I think F1 become a new leader as it 
> has Zxid = 10 committed. F1 forms a quorum with the L (pervious 
> leader), F1 becomes new leader as it has latest zxid (10) . L (new 
> leader) syncs its state with L (pervious leader now become a 
> follower), as a result Zxid10 commits by new quorum.  Is that correct?
>
> What do you think?
>
> Ibrahim
>
>
>
>
>
> -----Original Message-----
> From: Alexander Shraer [mailto:shralex@gmail.com]
> Sent: Monday, September 28, 2015 07:27 م
> To: user@zookeeper.apache.org
> Cc: dev@zookeeper.apache.org
> Subject: Re: 3-server Zab cluster
>
> Committing locally when sending an ACK at a server would lead to loss 
> of consistency - it is possible that this is the only server that 
> acks, e.g., this server is temporarily disconnected from the leader, 
> the leader gets re-elected and the operation is truncated from logs at 
> other servers. Its ok to ACK it but its not ok to commit since this 
> exposes this to users as a committed operation that they can see.
>
> On Mon, Sep 28, 2015 at 4:19 AM, Ibrahim El-sanosi (PGR) < 
> i.s.el-sanosi@newcastle.ac.uk> wrote:
>
> > In Zab, assume we have a cluster consists of 3-servers. To deliver a 
> > write request, it must run 3 communication steps proposal, 
> > acknowledgement and commit.
> > As Zab uses reliable FIFO, it is possible to remove commit round. As 
> > soon as a follower receives a proposal, it logs, sends an ACK and 
> > commits locally. Upon receiving ACK from any follower, leader 
> > commits a proposal locally, no COMMIT message need to be sent to 
> > followers. In this case, all servers commit a proposal in two 
> > round-trips, resulting in reducing latency particularly in followers.
> >
> > Note that this optimization can only work in 3-servers cluster 
> > (follower reaches a majority as soon as it acks).
> > Does anyone see any problems with such (small) optimization?
> > Ibrahim
> >
>

Re: 3-server Zab cluster

Posted by Alexander Shraer <sh...@gmail.com>.

I'm not 100% sure whether operations that were pending on the leader are
sent out during sync when this leader looses quorum and re-elected. If so,
then maybe you're right. But in any case, this would not work for 5 or more
servers...

On Mon, Sep 28, 2015 at 3:51 PM, Ibrahim El-sanosi (PGR) <
i.s.el-sanosi@newcastle.ac.uk> wrote:

> Thank you Alex for replaying.
>
> When you said " the leader gets re-elected and the operation is truncated
> from logs at other servers". I though the new leader will sync the its logs
> with other followers (synchronization phase), resulting in the operation
> will commit by new quorum.  Let me make the scenarios as steps:
>
> 1. leader  (L)  sends a proposal p with zxid =10 to F1 and F2.
> 2. F1 logs, sends an ACK, commits, replays to clients and crashes. F2
> crashes before receiving P10. L has not received any ACKs
>
> Possible solution  (1)
> The leader will move to LOOKING phase as there is no quorum supporting its
> leadership. Now Assume F2 wakes up. F2 forms a quorum with the L (pervious
> leader), L becomes new leader again as it has latest zxid (10) in its log.
> L syncs its state with F2, as a result L, F1 (before crashing) and F2
> commit P10.  Is that correct?
>
> Possible solution  (2)
> The leader will move to LOOKING phase as there is no quorum supporting its
> leadership. Now Assume F1 (with Zxid =10  committed) wakes up. I am not
> sure who should be a leader (F1 with Zxid =10 committed or L (pervious
> leader) with Zxid = 10 logged), I think F1 become a new leader as it has
> Zxid = 10 committed. F1 forms a quorum with the L (pervious leader), F1
> becomes new leader as it has latest zxid (10) . L (new leader) syncs its
> state with L (pervious leader now become a follower), as a result Zxid10
> commits by new quorum.  Is that correct?
>
> What do you think?
>
> Ibrahim
>
>
>
>
>
> -----Original Message-----
> From: Alexander Shraer [mailto:shralex@gmail.com]
> Sent: Monday, September 28, 2015 07:27 م
> To: user@zookeeper.apache.org
> Cc: dev@zookeeper.apache.org
> Subject: Re: 3-server Zab cluster
>
> Committing locally when sending an ACK at a server would lead to loss of
> consistency - it is possible that this is the only server that acks, e.g.,
> this server is temporarily disconnected from the leader, the leader gets
> re-elected and the operation is truncated from logs at other servers. Its
> ok to ACK it but its not ok to commit since this exposes this to users as a
> committed operation that they can see.
>
> On Mon, Sep 28, 2015 at 4:19 AM, Ibrahim El-sanosi (PGR) <
> i.s.el-sanosi@newcastle.ac.uk> wrote:
>
> > In Zab, assume we have a cluster consists of 3-servers. To deliver a
> > write request, it must run 3 communication steps proposal,
> > acknowledgement and commit.
> > As Zab uses reliable FIFO, it is possible to remove commit round. As
> > soon as a follower receives a proposal, it logs, sends an ACK and
> > commits locally. Upon receiving ACK from any follower, leader commits
> > a proposal locally, no COMMIT message need to be sent to followers. In
> > this case, all servers commit a proposal in two round-trips, resulting
> > in reducing latency particularly in followers.
> >
> > Note that this optimization can only work in 3-servers cluster
> > (follower reaches a majority as soon as it acks).
> > Does anyone see any problems with such (small) optimization?
> > Ibrahim
> >
>

RE: 3-server Zab cluster

Posted by "Ibrahim El-sanosi (PGR)" <i....@newcastle.ac.uk>.

Thank you Alex for replaying.

When you said " the leader gets re-elected and the operation is truncated from logs at other servers". I though the new leader will sync the its logs with other followers (synchronization phase), resulting in the operation will commit by new quorum.  Let me make the scenarios as steps:

1. leader  (L)  sends a proposal p with zxid =10 to F1 and F2.
2. F1 logs, sends an ACK, commits, replays to clients and crashes. F2 crashes before receiving P10. L has not received any ACKs

Possible solution  (1) 
The leader will move to LOOKING phase as there is no quorum supporting its leadership. Now Assume F2 wakes up. F2 forms a quorum with the L (pervious leader), L becomes new leader again as it has latest zxid (10) in its log. L syncs its state with F2, as a result L, F1 (before crashing) and F2 commit P10.  Is that correct?

Possible solution  (2)
The leader will move to LOOKING phase as there is no quorum supporting its leadership. Now Assume F1 (with Zxid =10  committed) wakes up. I am not sure who should be a leader (F1 with Zxid =10 committed or L (pervious leader) with Zxid = 10 logged), I think F1 become a new leader as it has Zxid = 10 committed. F1 forms a quorum with the L (pervious leader), F1  becomes new leader as it has latest zxid (10) . L (new leader) syncs its state with L (pervious leader now become a follower), as a result Zxid10 commits by new quorum.  Is that correct?

What do you think? 

Ibrahim

-----Original Message-----
From: Alexander Shraer [mailto:shralex@gmail.com] 
Sent: Monday, September 28, 2015 07:27 م
To: user@zookeeper.apache.org
Cc: dev@zookeeper.apache.org
Subject: Re: 3-server Zab cluster

Committing locally when sending an ACK at a server would lead to loss of consistency - it is possible that this is the only server that acks, e.g., this server is temporarily disconnected from the leader, the leader gets re-elected and the operation is truncated from logs at other servers. Its ok to ACK it but its not ok to commit since this exposes this to users as a committed operation that they can see.

On Mon, Sep 28, 2015 at 4:19 AM, Ibrahim El-sanosi (PGR) < i.s.el-sanosi@newcastle.ac.uk> wrote:

> In Zab, assume we have a cluster consists of 3-servers. To deliver a 
> write request, it must run 3 communication steps proposal, 
> acknowledgement and commit.
> As Zab uses reliable FIFO, it is possible to remove commit round. As 
> soon as a follower receives a proposal, it logs, sends an ACK and 
> commits locally. Upon receiving ACK from any follower, leader commits 
> a proposal locally, no COMMIT message need to be sent to followers. In 
> this case, all servers commit a proposal in two round-trips, resulting 
> in reducing latency particularly in followers.
>
> Note that this optimization can only work in 3-servers cluster 
> (follower reaches a majority as soon as it acks).
> Does anyone see any problems with such (small) optimization?
> Ibrahim
>

Re: 3-server Zab cluster

Posted by Alexander Shraer <sh...@gmail.com>.

Committing locally when sending an ACK at a server would lead to loss of
consistency - it is possible that this is the only
server that acks, e.g., this server is temporarily disconnected from the
leader, the leader gets re-elected and the operation is truncated from logs
at other servers. Its ok to ACK it but its not ok to commit since this
exposes this to users as a committed operation that they can see.

On Mon, Sep 28, 2015 at 4:19 AM, Ibrahim El-sanosi (PGR) <
i.s.el-sanosi@newcastle.ac.uk> wrote:

> In Zab, assume we have a cluster consists of 3-servers. To deliver a write
> request, it must run 3 communication steps proposal, acknowledgement and
> commit.
> As Zab uses reliable FIFO, it is possible to remove commit round. As soon
> as a follower receives a proposal, it logs, sends an ACK and commits
> locally. Upon receiving ACK from any follower, leader commits a proposal
> locally, no COMMIT message need to be sent to followers. In this case, all
> servers commit a proposal in two round-trips, resulting in reducing latency
> particularly in followers.
>
> Note that this optimization can only work in 3-servers cluster (follower
> reaches a majority as soon as it acks).
> Does anyone see any problems with such (small) optimization?
> Ibrahim
>

Re: 3-server Zab cluster

Posted by Alexander Shraer <sh...@gmail.com>.

Committing locally when sending an ACK at a server would lead to loss of
consistency - it is possible that this is the only
server that acks, e.g., this server is temporarily disconnected from the
leader, the leader gets re-elected and the operation is truncated from logs
at other servers. Its ok to ACK it but its not ok to commit since this
exposes this to users as a committed operation that they can see.

On Mon, Sep 28, 2015 at 4:19 AM, Ibrahim El-sanosi (PGR) <
i.s.el-sanosi@newcastle.ac.uk> wrote:

> In Zab, assume we have a cluster consists of 3-servers. To deliver a write
> request, it must run 3 communication steps proposal, acknowledgement and
> commit.
> As Zab uses reliable FIFO, it is possible to remove commit round. As soon
> as a follower receives a proposal, it logs, sends an ACK and commits
> locally. Upon receiving ACK from any follower, leader commits a proposal
> locally, no COMMIT message need to be sent to followers. In this case, all
> servers commit a proposal in two round-trips, resulting in reducing latency
> particularly in followers.
>
> Note that this optimization can only work in 3-servers cluster (follower
> reaches a majority as soon as it acks).
> Does anyone see any problems with such (small) optimization?
> Ibrahim
>