You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@zookeeper.apache.org by Rakesh Radhakrishnan <ra...@gmail.com> on 2015/10/01 18:26:01 UTC

Re: 3-server Zab cluster

Hi Ibrahim,

Below example taken from your older mail thread.

>>>>> 1. leader  (L)  sends a proposal p with zxid =10 to F1 and F2.
>>>>> 2. F1 logs, sends an ACK, commits, replays to clients and crashes. F2
crashes before receiving P10. L has not received any ACKs

My thoughts for the above scenario is,

In your case, zk client sees a successful response from F1. Then assume F2
joins quorum first and L become the leader again. But the newly formed
quorum will not have the zxid=10 transaction. This will make the cluster
inconsistent, isn't it?

Apart from the above case I'm not seeing any other problems with 3 node
cluster. The above data loss case can be avoided by putting an assumption
that more than a tolerated number of server failures may affect the cluster
consistency and results in data loss. But I feel this optimization would
have more cases if we scale up the cluster size beyond 3 servers. Now, I'm
not thinking in that direction as your case is limited to 3 node cluster.

Regards,
Rakesh


On Tue, Sep 29, 2015 at 2:28 PM, Ibrahim El-sanosi (PGR) <
i.s.el-sanosi@newcastle.ac.uk> wrote:

> Yes Alex, in my post I mentioned that this (small) optimization can only
> work with 3-servers cluster.
>
> Who could confirm the optimization can work?
>
> Ibrahim
>
> -----Original Message-----
> From: Alexander Shraer [mailto:shralex@gmail.com]
> Sent: Tuesday, September 29, 2015 12:11 ص
> To: user@zookeeper.apache.org
> Subject: Re: 3-server Zab cluster
>
> I'm not 100% sure whether operations that were pending on the leader are
> sent out during sync when this leader looses quorum and re-elected. If so,
> then maybe you're right. But in any case, this would not work for 5 or more
> servers...
>
> On Mon, Sep 28, 2015 at 3:51 PM, Ibrahim El-sanosi (PGR) <
> i.s.el-sanosi@newcastle.ac.uk> wrote:
>
> > Thank you Alex for replaying.
> >
> > When you said " the leader gets re-elected and the operation is
> > truncated from logs at other servers". I though the new leader will
> > sync the its logs with other followers (synchronization phase),
> > resulting in the operation will commit by new quorum.  Let me make the
> scenarios as steps:
> >
> > 1. leader  (L)  sends a proposal p with zxid =10 to F1 and F2.
> > 2. F1 logs, sends an ACK, commits, replays to clients and crashes. F2
> > crashes before receiving P10. L has not received any ACKs
> >
> > Possible solution  (1)
> > The leader will move to LOOKING phase as there is no quorum supporting
> > its leadership. Now Assume F2 wakes up. F2 forms a quorum with the L
> > (pervious leader), L becomes new leader again as it has latest zxid (10)
> in its log.
> > L syncs its state with F2, as a result L, F1 (before crashing) and F2
> > commit P10.  Is that correct?
> >
> > Possible solution  (2)
> > The leader will move to LOOKING phase as there is no quorum supporting
> > its leadership. Now Assume F1 (with Zxid =10  committed) wakes up. I
> > am not sure who should be a leader (F1 with Zxid =10 committed or L
> > (pervious
> > leader) with Zxid = 10 logged), I think F1 become a new leader as it
> > has Zxid = 10 committed. F1 forms a quorum with the L (pervious
> > leader), F1 becomes new leader as it has latest zxid (10) . L (new
> > leader) syncs its state with L (pervious leader now become a
> > follower), as a result Zxid10 commits by new quorum.  Is that correct?
> >
> > What do you think?
> >
> > Ibrahim
> >
> >
> >
> >
> >
> > -----Original Message-----
> > From: Alexander Shraer [mailto:shralex@gmail.com]
> > Sent: Monday, September 28, 2015 07:27 م
> > To: user@zookeeper.apache.org
> > Cc: dev@zookeeper.apache.org
> > Subject: Re: 3-server Zab cluster
> >
> > Committing locally when sending an ACK at a server would lead to loss
> > of consistency - it is possible that this is the only server that
> > acks, e.g., this server is temporarily disconnected from the leader,
> > the leader gets re-elected and the operation is truncated from logs at
> > other servers. Its ok to ACK it but its not ok to commit since this
> > exposes this to users as a committed operation that they can see.
> >
> > On Mon, Sep 28, 2015 at 4:19 AM, Ibrahim El-sanosi (PGR) <
> > i.s.el-sanosi@newcastle.ac.uk> wrote:
> >
> > > In Zab, assume we have a cluster consists of 3-servers. To deliver a
> > > write request, it must run 3 communication steps proposal,
> > > acknowledgement and commit.
> > > As Zab uses reliable FIFO, it is possible to remove commit round. As
> > > soon as a follower receives a proposal, it logs, sends an ACK and
> > > commits locally. Upon receiving ACK from any follower, leader
> > > commits a proposal locally, no COMMIT message need to be sent to
> > > followers. In this case, all servers commit a proposal in two
> > > round-trips, resulting in reducing latency particularly in followers.
> > >
> > > Note that this optimization can only work in 3-servers cluster
> > > (follower reaches a majority as soon as it acks).
> > > Does anyone see any problems with such (small) optimization?
> > > Ibrahim
> > >
> >
>

RE: 3-server Zab cluster

Posted by Flavio P JUNQUEIRA <fp...@apache.org>.
Indeed, I meant to say quorum.

-Flavio
On 5 Oct 2015 6:30 pm, "Ibrahim El-sanosi (PGR)" <
i.s.el-sanosi@newcastle.ac.uk> wrote:

> Hi Flavio,
>
>
> >That's not accurate. Being recorded by a quorum guarantees that a txn
> will be in the initial state of future epochs, but a prospective leader
> might have txns it its log that haven't been recorded in *a log*. The
> ?>prospective leader needs to make sure that such txns are recorded in a
> quorum before establishing a new epoch, though.
>
> I guess you meant a quorum not a LOG in above world *log* !!!
>
> Thank you
>
> Ibrahim
>
> -----Original Message-----
> From: Flavio Junqueira [mailto:fpj@apache.org]
> Sent: Monday, October 05, 2015 06:23 م
> To: user@zookeeper.apache.org?
> Subject: Re: 3-server Zab cluster
>
>
> > On 05 Oct 2015, at 18:13, Ibrahim El-sanosi (PGR) <
> i.s.el-sanosi@newcastle.ac.uk> wrote:
> >
> > Hi Rakesh,
> >
> > In Zab, before the end of synchronization phase, new leader will not
> commit any proposals in transaction logs that have not got a majority of
> acks from pervious ensemble  (that what you are saying).
>
> That's not accurate. Being recorded by a quorum guarantees that a txn will
> be in the initial state of future epochs, but a prospective leader might
> have txns it its log that haven't been recorded in a log. The prospective
> leader needs to make sure that such txns are recorded in a quorum before
> establishing a new epoch, though.
>
> > I think what Zab does is that before the end of synchronization phase,
> in L and F2 (the new quorum), L (a prospective leader) will sync its own
> state with F2 as the initial state.  Referring to my scenario, zxid =10 is
> part of the initial state and as a result it will be delivered in new
> quorum (L and F2) before  processing new proposals of new epoch.
>
> Yes, this is right.
>
> >
> > You can read this thread
> > http://zookeeper-user.578899.n2.nabble.com/Zab-Failure-scenario-td7581
> > 583.html
> > <http://zookeeper-user.578899.n2.nabble.com/Zab-Failure-scenario-td758
> > 1583.html> for more info
> >
> > What do you think? Does anyone have any questions or concerns about such
> (small) optimization?
>
> I'm not entirely sure what the optimization is and if you are proposing a
> change or what. Are you looking for a blessing from this community? I'd
> like to understand what you're trying to achieve.
>
> -Flavio
>
> >
> > Ibrahim
> >
> > From: Rakesh Radhakrishnan [mailto:rakeshr.apache@gmail.com
> > <ma...@gmail.com>]
> > Sent: Thursday, October 01, 2015 06:15 م
> > To: Ibrahim El-sanosi (PGR)
> > Subject: Re: 3-server Zab cluster
> >
> >>>>>>>>> (***) Ok, I thought when F2 form a quorum with L and  before
> serving clients, L synchronizes its state with F2, resulting in zxid=10
> will be committed in L and F2 as well. I also though this process is the
> same as Zab, isn't it?
> >
> > Since L didn't receives any ACK responses from F1 or F2 before leaving
> the Leader status previously, L won't commit transaction zxid=10. IIUC
> after re-forming the new quorum L will not have any mechanism to
> re-initiate the proposal(Active messaging phase) for the previous zxid=10.
> >
> > -Rakesh
> >
> > On Thu, Oct 1, 2015 at 10:19 PM, Ibrahim El-sanosi (PGR) <
> i.s.el-sanosi@newcastle.ac.uk <mailto:i.s.el-sanosi@newcastle.ac.uk
> ><mailto:i.s.el-sanosi@newcastle.ac.uk <mailto:
> i.s.el-sanosi@newcastle.ac.uk>>> wrote:
> > Thank you Rakesh.
> >
> >>>> In your case, zk client sees a successful response from F1. Then
> assume F2 >>>joins quorum first and L become the leader again. But the
> newly formed >>>quorum will not have the zxid=10 transaction. This will
> make the cluster >>>inconsistent, isn't it?
> >
> > (***) Ok, I thought when F2 form a quorum with L and  before serving
> clients, L synchronizes its state with F2, resulting in zxid=10 will be
> committed in L and F2 as well. I also though this process is the same as
> Zab, isn't it?
> >
> >
> >>>> Apart from the above case I'm not seeing any other problems with 3
> node >>>cluster. The above data loss case can be avoided by putting an
> assumption >>>that more than a tolerated number of server failures may
> affect the cluster >>>consistency and results in data loss.
> >
> > Yes, if the solution above (***) is not correct, you assumption makes
> sense.
> >
> > Ibrahim
> >
> > From: Rakesh Radhakrishnan [mailto:rakeshr.apache@gmail.com
> > <ma...@gmail.com><mailto:rakeshr.apache@gmail.com
> > <ma...@gmail.com>>]
> > Sent: 01 October 2015 17:26
> > To: user@zookeeper.apache.org
> > <ma...@zookeeper.apache.org><mailto:user@zookeeper.apache.org
> > <ma...@zookeeper.apache.org>>; Ibrahim El-sanosi (PGR)
> >
> > Subject: Re: 3-server Zab cluster
> >
> > Hi Ibrahim,
> >
> > Below example taken from your older mail thread.
> >
> >>>>>> 1. leader  (L)  sends a proposal p with zxid =10 to F1 and F2.
> >>>>>> 2. F1 logs, sends an ACK, commits, replays to clients and
> >>>>>> crashes. F2 crashes before receiving P10. L has not received any
> >>>>>> ACKs
> >
> > My thoughts for the above scenario is,
> >
> > In your case, zk client sees a successful response from F1. Then assume
> F2 joins quorum first and L become the leader again. But the newly formed
> quorum will not have the zxid=10 transaction. This will make the cluster
> inconsistent, isn't it?
> >
> > Apart from the above case I'm not seeing any other problems with 3 node
> cluster. The above data loss case can be avoided by putting an assumption
> that more than a tolerated number of server failures may affect the cluster
> consistency and results in data loss. But I feel this optimization would
> have more cases if we scale up the cluster size beyond 3 servers. Now, I'm
> not thinking in that direction as your case is limited to 3 node cluster.
> >
> > Regards,
> > Rakesh
> >
> >
> > On Tue, Sep 29, 2015 at 2:28 PM, Ibrahim El-sanosi (PGR) <
> i.s.el-sanosi@newcastle.ac.uk <mailto:i.s.el-sanosi@newcastle.ac.uk
> ><mailto:i.s.el-sanosi@newcastle.ac.uk <mailto:
> i.s.el-sanosi@newcastle.ac.uk>>> wrote:
> > Yes Alex, in my post I mentioned that this (small) optimization can only
> work with 3-servers cluster.
> >
> > Who could confirm the optimization can work?
> >
> > Ibrahim
> >
> > -----Original Message-----
> > From: Alexander Shraer [mailto:shralex@gmail.com
> > <ma...@gmail.com><mailto:shralex@gmail.com
> > <ma...@gmail.com>>]
> > Sent: Tuesday, September 29, 2015 12:11 ص
> > To: user@zookeeper.apache.org
> > <ma...@zookeeper.apache.org><mailto:user@zookeeper.apache.org
> > <ma...@zookeeper.apache.org>>
> > Subject: Re: 3-server Zab cluster
> >
> > I'm not 100% sure whether operations that were pending on the leader are
> sent out during sync when this leader looses quorum and re-elected. If so,
> then maybe you're right. But in any case, this would not work for 5 or more
> servers...
> >
> > On Mon, Sep 28, 2015 at 3:51 PM, Ibrahim El-sanosi (PGR) <
> i.s.el-sanosi@newcastle.ac.uk <mailto:i.s.el-sanosi@newcastle.ac.uk
> ><mailto:i.s.el-sanosi@newcastle.ac.uk <mailto:
> i.s.el-sanosi@newcastle.ac.uk>>> wrote:
> >
> >> Thank you Alex for replaying.
> >>
> >> When you said " the leader gets re-elected and the operation is
> >> truncated from logs at other servers". I though the new leader will
> >> sync the its logs with other followers (synchronization phase),
> >> resulting in the operation will commit by new quorum.  Let me make the
> scenarios as steps:
> >>
> >> 1. leader  (L)  sends a proposal p with zxid =10 to F1 and F2.
> >> 2. F1 logs, sends an ACK, commits, replays to clients and crashes. F2
> >> crashes before receiving P10. L has not received any ACKs
> >>
> >> Possible solution  (1)
> >> The leader will move to LOOKING phase as there is no quorum
> >> supporting its leadership. Now Assume F2 wakes up. F2 forms a quorum
> >> with the L (pervious leader), L becomes new leader again as it has
> latest zxid (10) in its log.
> >> L syncs its state with F2, as a result L, F1 (before crashing) and F2
> >> commit P10.  Is that correct?
> >>
> >> Possible solution  (2)
> >> The leader will move to LOOKING phase as there is no quorum
> >> supporting its leadership. Now Assume F1 (with Zxid =10  committed)
> >> wakes up. I am not sure who should be a leader (F1 with Zxid =10
> >> committed or L (pervious
> >> leader) with Zxid = 10 logged), I think F1 become a new leader as it
> >> has Zxid = 10 committed. F1 forms a quorum with the L (pervious
> >> leader), F1 becomes new leader as it has latest zxid (10) . L (new
> >> leader) syncs its state with L (pervious leader now become a
> >> follower), as a result Zxid10 commits by new quorum.  Is that correct?
> >>
> >> What do you think?
> >>
> >> Ibrahim
> >>
> >>
> >>
> >>
> >>
> >> -----Original Message-----
> >> From: Alexander Shraer [mailto:shralex@gmail.com
> >> <ma...@gmail.com><mailto:shralex@gmail.com
> >> <ma...@gmail.com>>]
> >> Sent: Monday, September 28, 2015 07:27 م
> >> To: user@zookeeper.apache.org
> >> <ma...@zookeeper.apache.org><mailto:user@zookeeper.apache.org
> >> <ma...@zookeeper.apache.org>>
> >> Cc: dev@zookeeper.apache.org
> >> <ma...@zookeeper.apache.org><mailto:dev@zookeeper.apache.org
> >> <ma...@zookeeper.apache.org>>
> >> Subject: Re: 3-server Zab cluster
> >>
> >> Committing locally when sending an ACK at a server would lead to loss
> >> of consistency - it is possible that this is the only server that
> >> acks, e.g., this server is temporarily disconnected from the leader,
> >> the leader gets re-elected and the operation is truncated from logs
> >> at other servers. Its ok to ACK it but its not ok to commit since
> >> this exposes this to users as a committed operation that they can see.
> >>
> >> On Mon, Sep 28, 2015 at 4:19 AM, Ibrahim El-sanosi (PGR) <
> >> i.s.el-sanosi@newcastle.ac.uk <mailto:i.s.el-sanosi@newcastle.ac.uk
> ><mailto:i.s.el-sanosi@newcastle.ac.uk <mailto:
> i.s.el-sanosi@newcastle.ac.uk>>> wrote:
> >>
> >>> In Zab, assume we have a cluster consists of 3-servers. To deliver a
> >>> write request, it must run 3 communication steps proposal,
> >>> acknowledgement and commit.
> >>> As Zab uses reliable FIFO, it is possible to remove commit round. As
> >>> soon as a follower receives a proposal, it logs, sends an ACK and
> >>> commits locally. Upon receiving ACK from any follower, leader
> >>> commits a proposal locally, no COMMIT message need to be sent to
> >>> followers. In this case, all servers commit a proposal in two
> >>> round-trips, resulting in reducing latency particularly in followers.
> >>>
> >>> Note that this optimization can only work in 3-servers cluster
> >>> (follower reaches a majority as soon as it acks).
> >>> Does anyone see any problems with such (small) optimization?
> >>> Ibrahim
>
>

RE: 3-server Zab cluster

Posted by "Ibrahim El-sanosi (PGR)" <i....@newcastle.ac.uk>.
Hi Flavio,


>That's not accurate. Being recorded by a quorum guarantees that a txn will be in the initial state of future epochs, but a prospective leader might have txns it its log that haven't been recorded in *a log*. The ?>prospective leader needs to make sure that such txns are recorded in a quorum before establishing a new epoch, though.

I guess you meant a quorum not a LOG in above world *log* !!!

Thank you

Ibrahim

-----Original Message-----
From: Flavio Junqueira [mailto:fpj@apache.org] 
Sent: Monday, October 05, 2015 06:23 م
To: user@zookeeper.apache.org?
Subject: Re: 3-server Zab cluster


> On 05 Oct 2015, at 18:13, Ibrahim El-sanosi (PGR) <i....@newcastle.ac.uk> wrote:
> 
> Hi Rakesh,
> 
> In Zab, before the end of synchronization phase, new leader will not commit any proposals in transaction logs that have not got a majority of acks from pervious ensemble  (that what you are saying).

That's not accurate. Being recorded by a quorum guarantees that a txn will be in the initial state of future epochs, but a prospective leader might have txns it its log that haven't been recorded in a log. The prospective leader needs to make sure that such txns are recorded in a quorum before establishing a new epoch, though.

> I think what Zab does is that before the end of synchronization phase,  in L and F2 (the new quorum), L (a prospective leader) will sync its own state with F2 as the initial state.  Referring to my scenario, zxid =10 is part of the initial state and as a result it will be delivered in new quorum (L and F2) before  processing new proposals of new epoch.

Yes, this is right.

> 
> You can read this thread 
> http://zookeeper-user.578899.n2.nabble.com/Zab-Failure-scenario-td7581
> 583.html 
> <http://zookeeper-user.578899.n2.nabble.com/Zab-Failure-scenario-td758
> 1583.html> for more info
> 
> What do you think? Does anyone have any questions or concerns about such (small) optimization?

I'm not entirely sure what the optimization is and if you are proposing a change or what. Are you looking for a blessing from this community? I'd like to understand what you're trying to achieve.

-Flavio

> 
> Ibrahim
> 
> From: Rakesh Radhakrishnan [mailto:rakeshr.apache@gmail.com 
> <ma...@gmail.com>]
> Sent: Thursday, October 01, 2015 06:15 م
> To: Ibrahim El-sanosi (PGR)
> Subject: Re: 3-server Zab cluster
> 
>>>>>>>>> (***) Ok, I thought when F2 form a quorum with L and  before serving clients, L synchronizes its state with F2, resulting in zxid=10 will be committed in L and F2 as well. I also though this process is the same as Zab, isn't it?
> 
> Since L didn't receives any ACK responses from F1 or F2 before leaving the Leader status previously, L won't commit transaction zxid=10. IIUC after re-forming the new quorum L will not have any mechanism to re-initiate the proposal(Active messaging phase) for the previous zxid=10.
> 
> -Rakesh
> 
> On Thu, Oct 1, 2015 at 10:19 PM, Ibrahim El-sanosi (PGR) <i.s.el-sanosi@newcastle.ac.uk <ma...@newcastle.ac.uk><mailto:i.s.el-sanosi@newcastle.ac.uk <ma...@newcastle.ac.uk>>> wrote:
> Thank you Rakesh.
> 
>>>> In your case, zk client sees a successful response from F1. Then assume F2 >>>joins quorum first and L become the leader again. But the newly formed >>>quorum will not have the zxid=10 transaction. This will make the cluster >>>inconsistent, isn't it?
> 
> (***) Ok, I thought when F2 form a quorum with L and  before serving clients, L synchronizes its state with F2, resulting in zxid=10 will be committed in L and F2 as well. I also though this process is the same as Zab, isn't it?
> 
> 
>>>> Apart from the above case I'm not seeing any other problems with 3 node >>>cluster. The above data loss case can be avoided by putting an assumption >>>that more than a tolerated number of server failures may affect the cluster >>>consistency and results in data loss.
> 
> Yes, if the solution above (***) is not correct, you assumption makes sense.
> 
> Ibrahim
> 
> From: Rakesh Radhakrishnan [mailto:rakeshr.apache@gmail.com 
> <ma...@gmail.com><mailto:rakeshr.apache@gmail.com 
> <ma...@gmail.com>>]
> Sent: 01 October 2015 17:26
> To: user@zookeeper.apache.org 
> <ma...@zookeeper.apache.org><mailto:user@zookeeper.apache.org 
> <ma...@zookeeper.apache.org>>; Ibrahim El-sanosi (PGR)
> 
> Subject: Re: 3-server Zab cluster
> 
> Hi Ibrahim,
> 
> Below example taken from your older mail thread.
> 
>>>>>> 1. leader  (L)  sends a proposal p with zxid =10 to F1 and F2.
>>>>>> 2. F1 logs, sends an ACK, commits, replays to clients and 
>>>>>> crashes. F2 crashes before receiving P10. L has not received any 
>>>>>> ACKs
> 
> My thoughts for the above scenario is,
> 
> In your case, zk client sees a successful response from F1. Then assume F2 joins quorum first and L become the leader again. But the newly formed quorum will not have the zxid=10 transaction. This will make the cluster inconsistent, isn't it?
> 
> Apart from the above case I'm not seeing any other problems with 3 node cluster. The above data loss case can be avoided by putting an assumption that more than a tolerated number of server failures may affect the cluster consistency and results in data loss. But I feel this optimization would have more cases if we scale up the cluster size beyond 3 servers. Now, I'm not thinking in that direction as your case is limited to 3 node cluster.
> 
> Regards,
> Rakesh
> 
> 
> On Tue, Sep 29, 2015 at 2:28 PM, Ibrahim El-sanosi (PGR) <i.s.el-sanosi@newcastle.ac.uk <ma...@newcastle.ac.uk><mailto:i.s.el-sanosi@newcastle.ac.uk <ma...@newcastle.ac.uk>>> wrote:
> Yes Alex, in my post I mentioned that this (small) optimization can only work with 3-servers cluster.
> 
> Who could confirm the optimization can work?
> 
> Ibrahim
> 
> -----Original Message-----
> From: Alexander Shraer [mailto:shralex@gmail.com 
> <ma...@gmail.com><mailto:shralex@gmail.com 
> <ma...@gmail.com>>]
> Sent: Tuesday, September 29, 2015 12:11 ص
> To: user@zookeeper.apache.org 
> <ma...@zookeeper.apache.org><mailto:user@zookeeper.apache.org 
> <ma...@zookeeper.apache.org>>
> Subject: Re: 3-server Zab cluster
> 
> I'm not 100% sure whether operations that were pending on the leader are sent out during sync when this leader looses quorum and re-elected. If so, then maybe you're right. But in any case, this would not work for 5 or more servers...
> 
> On Mon, Sep 28, 2015 at 3:51 PM, Ibrahim El-sanosi (PGR) < i.s.el-sanosi@newcastle.ac.uk <ma...@newcastle.ac.uk><mailto:i.s.el-sanosi@newcastle.ac.uk <ma...@newcastle.ac.uk>>> wrote:
> 
>> Thank you Alex for replaying.
>> 
>> When you said " the leader gets re-elected and the operation is 
>> truncated from logs at other servers". I though the new leader will 
>> sync the its logs with other followers (synchronization phase), 
>> resulting in the operation will commit by new quorum.  Let me make the scenarios as steps:
>> 
>> 1. leader  (L)  sends a proposal p with zxid =10 to F1 and F2.
>> 2. F1 logs, sends an ACK, commits, replays to clients and crashes. F2 
>> crashes before receiving P10. L has not received any ACKs
>> 
>> Possible solution  (1)
>> The leader will move to LOOKING phase as there is no quorum 
>> supporting its leadership. Now Assume F2 wakes up. F2 forms a quorum 
>> with the L (pervious leader), L becomes new leader again as it has latest zxid (10) in its log.
>> L syncs its state with F2, as a result L, F1 (before crashing) and F2 
>> commit P10.  Is that correct?
>> 
>> Possible solution  (2)
>> The leader will move to LOOKING phase as there is no quorum 
>> supporting its leadership. Now Assume F1 (with Zxid =10  committed) 
>> wakes up. I am not sure who should be a leader (F1 with Zxid =10 
>> committed or L (pervious
>> leader) with Zxid = 10 logged), I think F1 become a new leader as it 
>> has Zxid = 10 committed. F1 forms a quorum with the L (pervious 
>> leader), F1 becomes new leader as it has latest zxid (10) . L (new
>> leader) syncs its state with L (pervious leader now become a 
>> follower), as a result Zxid10 commits by new quorum.  Is that correct?
>> 
>> What do you think?
>> 
>> Ibrahim
>> 
>> 
>> 
>> 
>> 
>> -----Original Message-----
>> From: Alexander Shraer [mailto:shralex@gmail.com 
>> <ma...@gmail.com><mailto:shralex@gmail.com 
>> <ma...@gmail.com>>]
>> Sent: Monday, September 28, 2015 07:27 م
>> To: user@zookeeper.apache.org 
>> <ma...@zookeeper.apache.org><mailto:user@zookeeper.apache.org 
>> <ma...@zookeeper.apache.org>>
>> Cc: dev@zookeeper.apache.org 
>> <ma...@zookeeper.apache.org><mailto:dev@zookeeper.apache.org 
>> <ma...@zookeeper.apache.org>>
>> Subject: Re: 3-server Zab cluster
>> 
>> Committing locally when sending an ACK at a server would lead to loss 
>> of consistency - it is possible that this is the only server that 
>> acks, e.g., this server is temporarily disconnected from the leader, 
>> the leader gets re-elected and the operation is truncated from logs 
>> at other servers. Its ok to ACK it but its not ok to commit since 
>> this exposes this to users as a committed operation that they can see.
>> 
>> On Mon, Sep 28, 2015 at 4:19 AM, Ibrahim El-sanosi (PGR) < 
>> i.s.el-sanosi@newcastle.ac.uk <ma...@newcastle.ac.uk><mailto:i.s.el-sanosi@newcastle.ac.uk <ma...@newcastle.ac.uk>>> wrote:
>> 
>>> In Zab, assume we have a cluster consists of 3-servers. To deliver a 
>>> write request, it must run 3 communication steps proposal, 
>>> acknowledgement and commit.
>>> As Zab uses reliable FIFO, it is possible to remove commit round. As 
>>> soon as a follower receives a proposal, it logs, sends an ACK and 
>>> commits locally. Upon receiving ACK from any follower, leader 
>>> commits a proposal locally, no COMMIT message need to be sent to 
>>> followers. In this case, all servers commit a proposal in two 
>>> round-trips, resulting in reducing latency particularly in followers.
>>> 
>>> Note that this optimization can only work in 3-servers cluster 
>>> (follower reaches a majority as soon as it acks).
>>> Does anyone see any problems with such (small) optimization?
>>> Ibrahim


RE: 3-server Zab cluster

Posted by "Ibrahim El-sanosi (PGR)" <i....@newcastle.ac.uk>.
>I'm not entirely sure what the optimization is and if you are proposing a change or what. Are you looking for a blessing from this community? I'd like to understand what you're trying to achieve.


As Zab uses reliable FIFO, it is possible to remove commit round. As soon as a follower receives a proposal, it logs, sends an ACK and commits locally. Upon receiving ACK from any follower, leader commits a proposal locally, no COMMIT message need to be sent to followers. In this case, all servers commit a proposal in two round-trips, resulting in reducing latency particularly in followers. 

Note that this optimization can only work in 3-servers cluster (follower reaches a majority as soon as it acks).  

The proposal:

ZK with  3-server cluster,  it is common use compared to 5 or 7, etc ensemble (I think). Clients  who  use 3-ZK ensemble and look to achieve better latency, we may provide this optimization (above algorithm)  as optional. 

I hope my aim is clear now.

Ibrahim 

-----Original Message-----
From: Flavio Junqueira [mailto:fpj@apache.org] 
Sent: Monday, October 05, 2015 06:23 م
To: user@zookeeper.apache.org
Subject: Re: 3-server Zab cluster


> On 05 Oct 2015, at 18:13, Ibrahim El-sanosi (PGR) <i....@newcastle.ac.uk> wrote:
> 
> Hi Rakesh,
> 
> In Zab, before the end of synchronization phase, new leader will not commit any proposals in transaction logs that have not got a majority of acks from pervious ensemble  (that what you are saying).

That's not accurate. Being recorded by a quorum guarantees that a txn will be in the initial state of future epochs, but a prospective leader might have txns it its log that haven't been recorded in a log. The prospective leader needs to make sure that such txns are recorded in a quorum before establishing a new epoch, though.

> I think what Zab does is that before the end of synchronization phase,  in L and F2 (the new quorum), L (a prospective leader) will sync its own state with F2 as the initial state.  Referring to my scenario, zxid =10 is part of the initial state and as a result it will be delivered in new quorum (L and F2) before  processing new proposals of new epoch.

Yes, this is right.

> 
> You can read this thread 
> http://zookeeper-user.578899.n2.nabble.com/Zab-Failure-scenario-td7581
> 583.html 
> <http://zookeeper-user.578899.n2.nabble.com/Zab-Failure-scenario-td758
> 1583.html> for more info
> 
> What do you think? Does anyone have any questions or concerns about such (small) optimization?

I'm not entirely sure what the optimization is and if you are proposing a change or what. Are you looking for a blessing from this community? I'd like to understand what you're trying to achieve.

-Flavio

> 
> Ibrahim
> 
> From: Rakesh Radhakrishnan [mailto:rakeshr.apache@gmail.com 
> <ma...@gmail.com>]
> Sent: Thursday, October 01, 2015 06:15 م
> To: Ibrahim El-sanosi (PGR)
> Subject: Re: 3-server Zab cluster
> 
>>>>>>>>> (***) Ok, I thought when F2 form a quorum with L and  before serving clients, L synchronizes its state with F2, resulting in zxid=10 will be committed in L and F2 as well. I also though this process is the same as Zab, isn't it?
> 
> Since L didn't receives any ACK responses from F1 or F2 before leaving the Leader status previously, L won't commit transaction zxid=10. IIUC after re-forming the new quorum L will not have any mechanism to re-initiate the proposal(Active messaging phase) for the previous zxid=10.
> 
> -Rakesh
> 
> On Thu, Oct 1, 2015 at 10:19 PM, Ibrahim El-sanosi (PGR) <i.s.el-sanosi@newcastle.ac.uk <ma...@newcastle.ac.uk><mailto:i.s.el-sanosi@newcastle.ac.uk <ma...@newcastle.ac.uk>>> wrote:
> Thank you Rakesh.
> 
>>>> In your case, zk client sees a successful response from F1. Then assume F2 >>>joins quorum first and L become the leader again. But the newly formed >>>quorum will not have the zxid=10 transaction. This will make the cluster >>>inconsistent, isn't it?
> 
> (***) Ok, I thought when F2 form a quorum with L and  before serving clients, L synchronizes its state with F2, resulting in zxid=10 will be committed in L and F2 as well. I also though this process is the same as Zab, isn't it?
> 
> 
>>>> Apart from the above case I'm not seeing any other problems with 3 node >>>cluster. The above data loss case can be avoided by putting an assumption >>>that more than a tolerated number of server failures may affect the cluster >>>consistency and results in data loss.
> 
> Yes, if the solution above (***) is not correct, you assumption makes sense.
> 
> Ibrahim
> 
> From: Rakesh Radhakrishnan [mailto:rakeshr.apache@gmail.com 
> <ma...@gmail.com><mailto:rakeshr.apache@gmail.com 
> <ma...@gmail.com>>]
> Sent: 01 October 2015 17:26
> To: user@zookeeper.apache.org 
> <ma...@zookeeper.apache.org><mailto:user@zookeeper.apache.org 
> <ma...@zookeeper.apache.org>>; Ibrahim El-sanosi (PGR)
> 
> Subject: Re: 3-server Zab cluster
> 
> Hi Ibrahim,
> 
> Below example taken from your older mail thread.
> 
>>>>>> 1. leader  (L)  sends a proposal p with zxid =10 to F1 and F2.
>>>>>> 2. F1 logs, sends an ACK, commits, replays to clients and 
>>>>>> crashes. F2 crashes before receiving P10. L has not received any 
>>>>>> ACKs
> 
> My thoughts for the above scenario is,
> 
> In your case, zk client sees a successful response from F1. Then assume F2 joins quorum first and L become the leader again. But the newly formed quorum will not have the zxid=10 transaction. This will make the cluster inconsistent, isn't it?
> 
> Apart from the above case I'm not seeing any other problems with 3 node cluster. The above data loss case can be avoided by putting an assumption that more than a tolerated number of server failures may affect the cluster consistency and results in data loss. But I feel this optimization would have more cases if we scale up the cluster size beyond 3 servers. Now, I'm not thinking in that direction as your case is limited to 3 node cluster.
> 
> Regards,
> Rakesh
> 
> 
> On Tue, Sep 29, 2015 at 2:28 PM, Ibrahim El-sanosi (PGR) <i.s.el-sanosi@newcastle.ac.uk <ma...@newcastle.ac.uk><mailto:i.s.el-sanosi@newcastle.ac.uk <ma...@newcastle.ac.uk>>> wrote:
> Yes Alex, in my post I mentioned that this (small) optimization can only work with 3-servers cluster.
> 
> Who could confirm the optimization can work?
> 
> Ibrahim
> 
> -----Original Message-----
> From: Alexander Shraer [mailto:shralex@gmail.com 
> <ma...@gmail.com><mailto:shralex@gmail.com 
> <ma...@gmail.com>>]
> Sent: Tuesday, September 29, 2015 12:11 ص
> To: user@zookeeper.apache.org 
> <ma...@zookeeper.apache.org><mailto:user@zookeeper.apache.org 
> <ma...@zookeeper.apache.org>>
> Subject: Re: 3-server Zab cluster
> 
> I'm not 100% sure whether operations that were pending on the leader are sent out during sync when this leader looses quorum and re-elected. If so, then maybe you're right. But in any case, this would not work for 5 or more servers...
> 
> On Mon, Sep 28, 2015 at 3:51 PM, Ibrahim El-sanosi (PGR) < i.s.el-sanosi@newcastle.ac.uk <ma...@newcastle.ac.uk><mailto:i.s.el-sanosi@newcastle.ac.uk <ma...@newcastle.ac.uk>>> wrote:
> 
>> Thank you Alex for replaying.
>> 
>> When you said " the leader gets re-elected and the operation is 
>> truncated from logs at other servers". I though the new leader will 
>> sync the its logs with other followers (synchronization phase), 
>> resulting in the operation will commit by new quorum.  Let me make the scenarios as steps:
>> 
>> 1. leader  (L)  sends a proposal p with zxid =10 to F1 and F2.
>> 2. F1 logs, sends an ACK, commits, replays to clients and crashes. F2 
>> crashes before receiving P10. L has not received any ACKs
>> 
>> Possible solution  (1)
>> The leader will move to LOOKING phase as there is no quorum 
>> supporting its leadership. Now Assume F2 wakes up. F2 forms a quorum 
>> with the L (pervious leader), L becomes new leader again as it has latest zxid (10) in its log.
>> L syncs its state with F2, as a result L, F1 (before crashing) and F2 
>> commit P10.  Is that correct?
>> 
>> Possible solution  (2)
>> The leader will move to LOOKING phase as there is no quorum 
>> supporting its leadership. Now Assume F1 (with Zxid =10  committed) 
>> wakes up. I am not sure who should be a leader (F1 with Zxid =10 
>> committed or L (pervious
>> leader) with Zxid = 10 logged), I think F1 become a new leader as it 
>> has Zxid = 10 committed. F1 forms a quorum with the L (pervious 
>> leader), F1 becomes new leader as it has latest zxid (10) . L (new
>> leader) syncs its state with L (pervious leader now become a 
>> follower), as a result Zxid10 commits by new quorum.  Is that correct?
>> 
>> What do you think?
>> 
>> Ibrahim
>> 
>> 
>> 
>> 
>> 
>> -----Original Message-----
>> From: Alexander Shraer [mailto:shralex@gmail.com 
>> <ma...@gmail.com><mailto:shralex@gmail.com 
>> <ma...@gmail.com>>]
>> Sent: Monday, September 28, 2015 07:27 م
>> To: user@zookeeper.apache.org 
>> <ma...@zookeeper.apache.org><mailto:user@zookeeper.apache.org 
>> <ma...@zookeeper.apache.org>>
>> Cc: dev@zookeeper.apache.org 
>> <ma...@zookeeper.apache.org><mailto:dev@zookeeper.apache.org 
>> <ma...@zookeeper.apache.org>>
>> Subject: Re: 3-server Zab cluster
>> 
>> Committing locally when sending an ACK at a server would lead to loss 
>> of consistency - it is possible that this is the only server that 
>> acks, e.g., this server is temporarily disconnected from the leader, 
>> the leader gets re-elected and the operation is truncated from logs 
>> at other servers. Its ok to ACK it but its not ok to commit since 
>> this exposes this to users as a committed operation that they can see.
>> 
>> On Mon, Sep 28, 2015 at 4:19 AM, Ibrahim El-sanosi (PGR) < 
>> i.s.el-sanosi@newcastle.ac.uk <ma...@newcastle.ac.uk><mailto:i.s.el-sanosi@newcastle.ac.uk <ma...@newcastle.ac.uk>>> wrote:
>> 
>>> In Zab, assume we have a cluster consists of 3-servers. To deliver a 
>>> write request, it must run 3 communication steps proposal, 
>>> acknowledgement and commit.
>>> As Zab uses reliable FIFO, it is possible to remove commit round. As 
>>> soon as a follower receives a proposal, it logs, sends an ACK and 
>>> commits locally. Upon receiving ACK from any follower, leader 
>>> commits a proposal locally, no COMMIT message need to be sent to 
>>> followers. In this case, all servers commit a proposal in two 
>>> round-trips, resulting in reducing latency particularly in followers.
>>> 
>>> Note that this optimization can only work in 3-servers cluster 
>>> (follower reaches a majority as soon as it acks).
>>> Does anyone see any problems with such (small) optimization?
>>> Ibrahim


Re: 3-server Zab cluster

Posted by Flavio Junqueira <fp...@apache.org>.
> On 05 Oct 2015, at 18:13, Ibrahim El-sanosi (PGR) <i....@newcastle.ac.uk> wrote:
> 
> Hi Rakesh,
> 
> In Zab, before the end of synchronization phase, new leader will not commit any proposals in transaction logs that have not got a majority of acks from pervious ensemble  (that what you are saying).

That's not accurate. Being recorded by a quorum guarantees that a txn will be in the initial state of future epochs, but a prospective leader might have txns it its log that haven't been recorded in a log. The prospective leader needs to make sure that such txns are recorded in a quorum before establishing a new epoch, though.

> I think what Zab does is that before the end of synchronization phase,  in L and F2 (the new quorum), L (a prospective leader) will sync its own state with F2 as the initial state.  Referring to my scenario, zxid =10 is part of the initial state and as a result it will be delivered in new quorum (L and F2) before  processing new proposals of new epoch.

Yes, this is right.

> 
> You can read this thread http://zookeeper-user.578899.n2.nabble.com/Zab-Failure-scenario-td7581583.html <http://zookeeper-user.578899.n2.nabble.com/Zab-Failure-scenario-td7581583.html> for more info
> 
> What do you think? Does anyone have any questions or concerns about such (small) optimization?

I'm not entirely sure what the optimization is and if you are proposing a change or what. Are you looking for a blessing from this community? I'd like to understand what you're trying to achieve.

-Flavio

> 
> Ibrahim
> 
> From: Rakesh Radhakrishnan [mailto:rakeshr.apache@gmail.com <ma...@gmail.com>]
> Sent: Thursday, October 01, 2015 06:15 م
> To: Ibrahim El-sanosi (PGR)
> Subject: Re: 3-server Zab cluster
> 
>>>>>>>>> (***) Ok, I thought when F2 form a quorum with L and  before serving clients, L synchronizes its state with F2, resulting in zxid=10 will be committed in L and F2 as well. I also though this process is the same as Zab, isn't it?
> 
> Since L didn't receives any ACK responses from F1 or F2 before leaving the Leader status previously, L won't commit transaction zxid=10. IIUC after re-forming the new quorum L will not have any mechanism to re-initiate the proposal(Active messaging phase) for the previous zxid=10.
> 
> -Rakesh
> 
> On Thu, Oct 1, 2015 at 10:19 PM, Ibrahim El-sanosi (PGR) <i.s.el-sanosi@newcastle.ac.uk <ma...@newcastle.ac.uk><mailto:i.s.el-sanosi@newcastle.ac.uk <ma...@newcastle.ac.uk>>> wrote:
> Thank you Rakesh.
> 
>>>> In your case, zk client sees a successful response from F1. Then assume F2 >>>joins quorum first and L become the leader again. But the newly formed >>>quorum will not have the zxid=10 transaction. This will make the cluster >>>inconsistent, isn't it?
> 
> (***) Ok, I thought when F2 form a quorum with L and  before serving clients, L synchronizes its state with F2, resulting in zxid=10 will be committed in L and F2 as well. I also though this process is the same as Zab, isn't it?
> 
> 
>>>> Apart from the above case I'm not seeing any other problems with 3 node >>>cluster. The above data loss case can be avoided by putting an assumption >>>that more than a tolerated number of server failures may affect the cluster >>>consistency and results in data loss.
> 
> Yes, if the solution above (***) is not correct, you assumption makes sense.
> 
> Ibrahim
> 
> From: Rakesh Radhakrishnan [mailto:rakeshr.apache@gmail.com <ma...@gmail.com><mailto:rakeshr.apache@gmail.com <ma...@gmail.com>>]
> Sent: 01 October 2015 17:26
> To: user@zookeeper.apache.org <ma...@zookeeper.apache.org><mailto:user@zookeeper.apache.org <ma...@zookeeper.apache.org>>; Ibrahim El-sanosi (PGR)
> 
> Subject: Re: 3-server Zab cluster
> 
> Hi Ibrahim,
> 
> Below example taken from your older mail thread.
> 
>>>>>> 1. leader  (L)  sends a proposal p with zxid =10 to F1 and F2.
>>>>>> 2. F1 logs, sends an ACK, commits, replays to clients and crashes. F2 crashes before receiving P10. L has not received any ACKs
> 
> My thoughts for the above scenario is,
> 
> In your case, zk client sees a successful response from F1. Then assume F2 joins quorum first and L become the leader again. But the newly formed quorum will not have the zxid=10 transaction. This will make the cluster inconsistent, isn't it?
> 
> Apart from the above case I'm not seeing any other problems with 3 node cluster. The above data loss case can be avoided by putting an assumption that more than a tolerated number of server failures may affect the cluster consistency and results in data loss. But I feel this optimization would have more cases if we scale up the cluster size beyond 3 servers. Now, I'm not thinking in that direction as your case is limited to 3 node cluster.
> 
> Regards,
> Rakesh
> 
> 
> On Tue, Sep 29, 2015 at 2:28 PM, Ibrahim El-sanosi (PGR) <i.s.el-sanosi@newcastle.ac.uk <ma...@newcastle.ac.uk><mailto:i.s.el-sanosi@newcastle.ac.uk <ma...@newcastle.ac.uk>>> wrote:
> Yes Alex, in my post I mentioned that this (small) optimization can only work with 3-servers cluster.
> 
> Who could confirm the optimization can work?
> 
> Ibrahim
> 
> -----Original Message-----
> From: Alexander Shraer [mailto:shralex@gmail.com <ma...@gmail.com><mailto:shralex@gmail.com <ma...@gmail.com>>]
> Sent: Tuesday, September 29, 2015 12:11 ص
> To: user@zookeeper.apache.org <ma...@zookeeper.apache.org><mailto:user@zookeeper.apache.org <ma...@zookeeper.apache.org>>
> Subject: Re: 3-server Zab cluster
> 
> I'm not 100% sure whether operations that were pending on the leader are sent out during sync when this leader looses quorum and re-elected. If so, then maybe you're right. But in any case, this would not work for 5 or more servers...
> 
> On Mon, Sep 28, 2015 at 3:51 PM, Ibrahim El-sanosi (PGR) < i.s.el-sanosi@newcastle.ac.uk <ma...@newcastle.ac.uk><mailto:i.s.el-sanosi@newcastle.ac.uk <ma...@newcastle.ac.uk>>> wrote:
> 
>> Thank you Alex for replaying.
>> 
>> When you said " the leader gets re-elected and the operation is
>> truncated from logs at other servers". I though the new leader will
>> sync the its logs with other followers (synchronization phase),
>> resulting in the operation will commit by new quorum.  Let me make the scenarios as steps:
>> 
>> 1. leader  (L)  sends a proposal p with zxid =10 to F1 and F2.
>> 2. F1 logs, sends an ACK, commits, replays to clients and crashes. F2
>> crashes before receiving P10. L has not received any ACKs
>> 
>> Possible solution  (1)
>> The leader will move to LOOKING phase as there is no quorum supporting
>> its leadership. Now Assume F2 wakes up. F2 forms a quorum with the L
>> (pervious leader), L becomes new leader again as it has latest zxid (10) in its log.
>> L syncs its state with F2, as a result L, F1 (before crashing) and F2
>> commit P10.  Is that correct?
>> 
>> Possible solution  (2)
>> The leader will move to LOOKING phase as there is no quorum supporting
>> its leadership. Now Assume F1 (with Zxid =10  committed) wakes up. I
>> am not sure who should be a leader (F1 with Zxid =10 committed or L
>> (pervious
>> leader) with Zxid = 10 logged), I think F1 become a new leader as it
>> has Zxid = 10 committed. F1 forms a quorum with the L (pervious
>> leader), F1 becomes new leader as it has latest zxid (10) . L (new
>> leader) syncs its state with L (pervious leader now become a
>> follower), as a result Zxid10 commits by new quorum.  Is that correct?
>> 
>> What do you think?
>> 
>> Ibrahim
>> 
>> 
>> 
>> 
>> 
>> -----Original Message-----
>> From: Alexander Shraer [mailto:shralex@gmail.com <ma...@gmail.com><mailto:shralex@gmail.com <ma...@gmail.com>>]
>> Sent: Monday, September 28, 2015 07:27 م
>> To: user@zookeeper.apache.org <ma...@zookeeper.apache.org><mailto:user@zookeeper.apache.org <ma...@zookeeper.apache.org>>
>> Cc: dev@zookeeper.apache.org <ma...@zookeeper.apache.org><mailto:dev@zookeeper.apache.org <ma...@zookeeper.apache.org>>
>> Subject: Re: 3-server Zab cluster
>> 
>> Committing locally when sending an ACK at a server would lead to loss
>> of consistency - it is possible that this is the only server that
>> acks, e.g., this server is temporarily disconnected from the leader,
>> the leader gets re-elected and the operation is truncated from logs at
>> other servers. Its ok to ACK it but its not ok to commit since this
>> exposes this to users as a committed operation that they can see.
>> 
>> On Mon, Sep 28, 2015 at 4:19 AM, Ibrahim El-sanosi (PGR) <
>> i.s.el-sanosi@newcastle.ac.uk <ma...@newcastle.ac.uk><mailto:i.s.el-sanosi@newcastle.ac.uk <ma...@newcastle.ac.uk>>> wrote:
>> 
>>> In Zab, assume we have a cluster consists of 3-servers. To deliver a
>>> write request, it must run 3 communication steps proposal,
>>> acknowledgement and commit.
>>> As Zab uses reliable FIFO, it is possible to remove commit round. As
>>> soon as a follower receives a proposal, it logs, sends an ACK and
>>> commits locally. Upon receiving ACK from any follower, leader
>>> commits a proposal locally, no COMMIT message need to be sent to
>>> followers. In this case, all servers commit a proposal in two
>>> round-trips, resulting in reducing latency particularly in followers.
>>> 
>>> Note that this optimization can only work in 3-servers cluster
>>> (follower reaches a majority as soon as it acks).
>>> Does anyone see any problems with such (small) optimization?
>>> Ibrahim


RE: 3-server Zab cluster

Posted by "Ibrahim El-sanosi (PGR)" <i....@newcastle.ac.uk>.
Hi Rakesh,

In Zab, before the end of synchronization phase, new leader will not commit any proposals in transaction logs that have not got a majority of acks from pervious ensemble  (that what you are saying).

I think what Zab does is that before the end of synchronization phase,  in L and F2 (the new quorum), L (a prospective leader) will sync its own state with F2 as the initial state.  Referring to my scenario, zxid =10 is part of the initial state and as a result it will be delivered in new quorum (L and F2) before  processing new proposals of new epoch.

You can read this thread http://zookeeper-user.578899.n2.nabble.com/Zab-Failure-scenario-td7581583.html for more info

What do you think? Does anyone have any questions or concerns about such (small) optimization?

Ibrahim

From: Rakesh Radhakrishnan [mailto:rakeshr.apache@gmail.com]
Sent: Thursday, October 01, 2015 06:15 م
To: Ibrahim El-sanosi (PGR)
Subject: Re: 3-server Zab cluster

>>>>>>>>(***) Ok, I thought when F2 form a quorum with L and  before serving clients, L synchronizes its state with F2, resulting in zxid=10 will be committed in L and F2 as well. I also though this process is the same as Zab, isn't it?

Since L didn't receives any ACK responses from F1 or F2 before leaving the Leader status previously, L won't commit transaction zxid=10. IIUC after re-forming the new quorum L will not have any mechanism to re-initiate the proposal(Active messaging phase) for the previous zxid=10.

-Rakesh

On Thu, Oct 1, 2015 at 10:19 PM, Ibrahim El-sanosi (PGR) <i....@newcastle.ac.uk>> wrote:
Thank you Rakesh.

>>>In your case, zk client sees a successful response from F1. Then assume F2 >>>joins quorum first and L become the leader again. But the newly formed >>>quorum will not have the zxid=10 transaction. This will make the cluster >>>inconsistent, isn't it?

(***) Ok, I thought when F2 form a quorum with L and  before serving clients, L synchronizes its state with F2, resulting in zxid=10 will be committed in L and F2 as well. I also though this process is the same as Zab, isn't it?


>>>Apart from the above case I'm not seeing any other problems with 3 node >>>cluster. The above data loss case can be avoided by putting an assumption >>>that more than a tolerated number of server failures may affect the cluster >>>consistency and results in data loss.

Yes, if the solution above (***) is not correct, you assumption makes sense.

Ibrahim

From: Rakesh Radhakrishnan [mailto:rakeshr.apache@gmail.com<ma...@gmail.com>]
Sent: 01 October 2015 17:26
To: user@zookeeper.apache.org<ma...@zookeeper.apache.org>; Ibrahim El-sanosi (PGR)

Subject: Re: 3-server Zab cluster

Hi Ibrahim,

Below example taken from your older mail thread.

>>>>> 1. leader  (L)  sends a proposal p with zxid =10 to F1 and F2.
>>>>> 2. F1 logs, sends an ACK, commits, replays to clients and crashes. F2 crashes before receiving P10. L has not received any ACKs

My thoughts for the above scenario is,

In your case, zk client sees a successful response from F1. Then assume F2 joins quorum first and L become the leader again. But the newly formed quorum will not have the zxid=10 transaction. This will make the cluster inconsistent, isn't it?

Apart from the above case I'm not seeing any other problems with 3 node cluster. The above data loss case can be avoided by putting an assumption that more than a tolerated number of server failures may affect the cluster consistency and results in data loss. But I feel this optimization would have more cases if we scale up the cluster size beyond 3 servers. Now, I'm not thinking in that direction as your case is limited to 3 node cluster.

Regards,
Rakesh


On Tue, Sep 29, 2015 at 2:28 PM, Ibrahim El-sanosi (PGR) <i....@newcastle.ac.uk>> wrote:
Yes Alex, in my post I mentioned that this (small) optimization can only work with 3-servers cluster.

Who could confirm the optimization can work?

Ibrahim

-----Original Message-----
From: Alexander Shraer [mailto:shralex@gmail.com<ma...@gmail.com>]
Sent: Tuesday, September 29, 2015 12:11 ص
To: user@zookeeper.apache.org<ma...@zookeeper.apache.org>
Subject: Re: 3-server Zab cluster

I'm not 100% sure whether operations that were pending on the leader are sent out during sync when this leader looses quorum and re-elected. If so, then maybe you're right. But in any case, this would not work for 5 or more servers...

On Mon, Sep 28, 2015 at 3:51 PM, Ibrahim El-sanosi (PGR) < i.s.el-sanosi@newcastle.ac.uk<ma...@newcastle.ac.uk>> wrote:

> Thank you Alex for replaying.
>
> When you said " the leader gets re-elected and the operation is
> truncated from logs at other servers". I though the new leader will
> sync the its logs with other followers (synchronization phase),
> resulting in the operation will commit by new quorum.  Let me make the scenarios as steps:
>
> 1. leader  (L)  sends a proposal p with zxid =10 to F1 and F2.
> 2. F1 logs, sends an ACK, commits, replays to clients and crashes. F2
> crashes before receiving P10. L has not received any ACKs
>
> Possible solution  (1)
> The leader will move to LOOKING phase as there is no quorum supporting
> its leadership. Now Assume F2 wakes up. F2 forms a quorum with the L
> (pervious leader), L becomes new leader again as it has latest zxid (10) in its log.
> L syncs its state with F2, as a result L, F1 (before crashing) and F2
> commit P10.  Is that correct?
>
> Possible solution  (2)
> The leader will move to LOOKING phase as there is no quorum supporting
> its leadership. Now Assume F1 (with Zxid =10  committed) wakes up. I
> am not sure who should be a leader (F1 with Zxid =10 committed or L
> (pervious
> leader) with Zxid = 10 logged), I think F1 become a new leader as it
> has Zxid = 10 committed. F1 forms a quorum with the L (pervious
> leader), F1 becomes new leader as it has latest zxid (10) . L (new
> leader) syncs its state with L (pervious leader now become a
> follower), as a result Zxid10 commits by new quorum.  Is that correct?
>
> What do you think?
>
> Ibrahim
>
>
>
>
>
> -----Original Message-----
> From: Alexander Shraer [mailto:shralex@gmail.com<ma...@gmail.com>]
> Sent: Monday, September 28, 2015 07:27 م
> To: user@zookeeper.apache.org<ma...@zookeeper.apache.org>
> Cc: dev@zookeeper.apache.org<ma...@zookeeper.apache.org>
> Subject: Re: 3-server Zab cluster
>
> Committing locally when sending an ACK at a server would lead to loss
> of consistency - it is possible that this is the only server that
> acks, e.g., this server is temporarily disconnected from the leader,
> the leader gets re-elected and the operation is truncated from logs at
> other servers. Its ok to ACK it but its not ok to commit since this
> exposes this to users as a committed operation that they can see.
>
> On Mon, Sep 28, 2015 at 4:19 AM, Ibrahim El-sanosi (PGR) <
> i.s.el-sanosi@newcastle.ac.uk<ma...@newcastle.ac.uk>> wrote:
>
> > In Zab, assume we have a cluster consists of 3-servers. To deliver a
> > write request, it must run 3 communication steps proposal,
> > acknowledgement and commit.
> > As Zab uses reliable FIFO, it is possible to remove commit round. As
> > soon as a follower receives a proposal, it logs, sends an ACK and
> > commits locally. Upon receiving ACK from any follower, leader
> > commits a proposal locally, no COMMIT message need to be sent to
> > followers. In this case, all servers commit a proposal in two
> > round-trips, resulting in reducing latency particularly in followers.
> >
> > Note that this optimization can only work in 3-servers cluster
> > (follower reaches a majority as soon as it acks).
> > Does anyone see any problems with such (small) optimization?
> > Ibrahim
> >
>