You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@zookeeper.apache.org by Al...@ubs.com on 2012/05/09 15:27:07 UTC

Possible issue with cluster availability following new Leader Election - ZK 3.4

Hi Zookeeper devs and users, 


I've been doing some load and failover testing on the ZK 3.4 branch
using moderately large data sets (700mb and 20k nodes) and I think there
could be an issue. 

When I bring down the leader of a 3 node cluster, it takes around 20-30
seconds for the cluster as a whole to become available again. 
This is because once a new leader is elected it pushes out a snapshot to
all the peers who in turn persist it locally before sending an ack back.
Only then does the leader decide he has a valid quorum. In this case
pretty much all the time is taken up sending the data over the network
and re-saving it. 

Granted I'm testing this on some low-spec VM's so I wouldn't expect a
real-world sync for a data set that size to take anything like as long.
However is this not a significant constraint on availability if,
whenever a leader fails, a full snapshot needs to sent to and persisted
by a quorum of peers before the cluster as a whole can be deemed as
available? 

I notice when a peer joins a stable cluster as a follower,
synchronization is implemented via diffs and the peer is quickly
available for client connections provided it already had an up to date
local state. 
Should not something similar not be possible when a new leader is
elected. A quick glance at the code (line 390 of LearnerHandler)
suggests there is some logic to send an empty diff but I never see this
triggered.

I'm am not mutating any state in the cluster whilst I am bringing stuff
up and down so is this behaviour a bug or by design?

I saw a related question
(http://zookeeper-user.578899.n2.nabble.com/leader-election-length-td708
6868.html#a7089472) a few months back that touched on this, but there
was not much follow up. 

Many thanks

Alex





Visit our website at http://www.ubs.com 

This message contains confidential information and is intended only 
for the individual named. If you are not the named addressee you 
should not disseminate, distribute or copy this e-mail. Please 
notify the sender immediately by e-mail if you have received this 
e-mail by mistake and delete this e-mail from your system. 

E-mails are not encrypted and cannot be guaranteed to be secure or 
error-free as information could be intercepted, corrupted, lost, 
destroyed, arrive late or incomplete, or contain viruses. The sender 
therefore does not accept liability for any errors or omissions in the 
contents of this message which arise as a result of e-mail transmission. 
If verification is required please request a hard-copy version. This 
message is provided for informational purposes and should not be 
construed as a solicitation or offer to buy or sell any securities 
or related financial instruments. 

UBS Limited is a company limited by shares incorporated in the United 
Kingdom registered in England and Wales with number 2035362. 
Registered office: 1 Finsbury Avenue, London EC2M 2PP.  UBS Limited 
is authorised and regulated by the Financial Services Authority. 

UBS AG is a public company incorporated with limited liability in 
Switzerland domiciled in the Canton of Basel-City and the Canton of 
Zurich respectively registered at the Commercial Registry offices in 
those Cantons with Identification No: CH-270.3.004.646-4 and having 
respective head offices at Aeschenvorstadt 1, 4051 Basel and 
Bahnhofstrasse 45, 8001 Zurich, Switzerland.  Registered in the 
United Kingdom as a foreign company with No: FC021146 and having a 
UK Establishment registered at Companies House, Cardiff, with No:  
BR 004507.  The principal office of UK Establishment: 1 Finsbury Avenue, 
London EC2M 2PP.  In the United Kingdom, UBS AG is authorised and 
regulated by the Financial Services Authority.

UBS reserves the right to retain all messages. Messages are protected 
and accessed only in legally justified cases. 

RE: Possible issue with cluster availability following new Leader Election - ZK 3.4

Posted by Al...@ubs.com.
Looks like it is pretty closely correlated to data set size. With an
empty set, it takes a couple of seconds (still quite high?). With a set
half the size it takes roughly half the time. 

I had issues where it would never settle down once the set got above a
certain size. I worked around this by increasing the 'initLimit' and the
'syncLimit' properties. The latter controlled how long the leader would
wait for the follower to ACK his snapshot push. 



  

-----Original Message-----
From: Mark Gius [mailto:mgius7096@gmail.com] 
Sent: 09 May 2012 18:18
To: user@zookeeper.apache.org
Subject: Re: Possible issue with cluster availability following new
Leader Election - ZK 3.4

I'm encountering a similar issue with a more or less empty dataset.  I
bring up a cluster of 3 servers and shoot one of them in the head.  It
takes ~20 seconds for the two remaining hosts to settle and begin
responding the requests again.  If you're certain that your delay is due
to pushing down the larger dataset then we may be seeing different
problems.

Mark

On Wed, May 9, 2012 at 6:27 AM, <Al...@ubs.com> wrote:

>
> Hi Zookeeper devs and users,
>
>
> I've been doing some load and failover testing on the ZK 3.4 branch 
> using moderately large data sets (700mb and 20k nodes) and I think 
> there could be an issue.
>
> When I bring down the leader of a 3 node cluster, it takes around 
> 20-30 seconds for the cluster as a whole to become available again.
> This is because once a new leader is elected it pushes out a snapshot 
> to all the peers who in turn persist it locally before sending an ack
back.
> Only then does the leader decide he has a valid quorum. In this case 
> pretty much all the time is taken up sending the data over the network

> and re-saving it.
>
> Granted I'm testing this on some low-spec VM's so I wouldn't expect a 
> real-world sync for a data set that size to take anything like as
long.
> However is this not a significant constraint on availability if, 
> whenever a leader fails, a full snapshot needs to sent to and 
> persisted by a quorum of peers before the cluster as a whole can be 
> deemed as available?
>
> I notice when a peer joins a stable cluster as a follower, 
> synchronization is implemented via diffs and the peer is quickly 
> available for client connections provided it already had an up to date

> local state.
> Should not something similar not be possible when a new leader is 
> elected. A quick glance at the code (line 390 of LearnerHandler) 
> suggests there is some logic to send an empty diff but I never see 
> this triggered.
>
> I'm am not mutating any state in the cluster whilst I am bringing 
> stuff up and down so is this behaviour a bug or by design?
>
> I saw a related question
> (http://zookeeper-user.578899.n2.nabble.com/leader-election-length-td7
> 08
> 6868.html#a7089472) a few months back that touched on this, but there 
> was not much follow up.
>
> Many thanks
>
> Alex
>
>
>
>
>
> Visit our website at http://www.ubs.com
>
> This message contains confidential information and is intended only 
> for the individual named. If you are not the named addressee you 
> should not disseminate, distribute or copy this e-mail. Please notify 
> the sender immediately by e-mail if you have received this e-mail by 
> mistake and delete this e-mail from your system.
>
> E-mails are not encrypted and cannot be guaranteed to be secure or 
> error-free as information could be intercepted, corrupted, lost, 
> destroyed, arrive late or incomplete, or contain viruses. The sender 
> therefore does not accept liability for any errors or omissions in the

> contents of this message which arise as a result of e-mail
transmission.
> If verification is required please request a hard-copy version. This 
> message is provided for informational purposes and should not be 
> construed as a solicitation or offer to buy or sell any securities or 
> related financial instruments.
>
> UBS Limited is a company limited by shares incorporated in the United 
> Kingdom registered in England and Wales with number 2035362.
> Registered office: 1 Finsbury Avenue, London EC2M 2PP.  UBS Limited is

> authorised and regulated by the Financial Services Authority.
>
> UBS AG is a public company incorporated with limited liability in 
> Switzerland domiciled in the Canton of Basel-City and the Canton of 
> Zurich respectively registered at the Commercial Registry offices in 
> those Cantons with Identification No: CH-270.3.004.646-4 and having 
> respective head offices at Aeschenvorstadt 1, 4051 Basel and 
> Bahnhofstrasse 45, 8001 Zurich, Switzerland.  Registered in the United

> Kingdom as a foreign company with No: FC021146 and having a UK 
> Establishment registered at Companies House, Cardiff, with No:
> BR 004507.  The principal office of UK Establishment: 1 Finsbury 
> Avenue, London EC2M 2PP.  In the United Kingdom, UBS AG is authorised 
> and regulated by the Financial Services Authority.
>
> UBS reserves the right to retain all messages. Messages are protected 
> and accessed only in legally justified cases.
>
Visit our website at http://www.ubs.com 

This message contains confidential information and is intended only 
for the individual named. If you are not the named addressee you 
should not disseminate, distribute or copy this e-mail. Please 
notify the sender immediately by e-mail if you have received this 
e-mail by mistake and delete this e-mail from your system. 

E-mails are not encrypted and cannot be guaranteed to be secure or 
error-free as information could be intercepted, corrupted, lost, 
destroyed, arrive late or incomplete, or contain viruses. The sender 
therefore does not accept liability for any errors or omissions in the 
contents of this message which arise as a result of e-mail transmission. 
If verification is required please request a hard-copy version. This 
message is provided for informational purposes and should not be 
construed as a solicitation or offer to buy or sell any securities 
or related financial instruments. 

UBS Limited is a company limited by shares incorporated in the United 
Kingdom registered in England and Wales with number 2035362. 
Registered office: 1 Finsbury Avenue, London EC2M 2PP.  UBS Limited 
is authorised and regulated by the Financial Services Authority. 

UBS AG is a public company incorporated with limited liability in 
Switzerland domiciled in the Canton of Basel-City and the Canton of 
Zurich respectively registered at the Commercial Registry offices in 
those Cantons with Identification No: CH-270.3.004.646-4 and having 
respective head offices at Aeschenvorstadt 1, 4051 Basel and 
Bahnhofstrasse 45, 8001 Zurich, Switzerland.  Registered in the 
United Kingdom as a foreign company with No: FC021146 and having a 
UK Establishment registered at Companies House, Cardiff, with No:  
BR 004507.  The principal office of UK Establishment: 1 Finsbury Avenue, 
London EC2M 2PP.  In the United Kingdom, UBS AG is authorised and 
regulated by the Financial Services Authority.

UBS reserves the right to retain all messages. Messages are protected 
and accessed only in legally justified cases. 

Re: Possible issue with cluster availability following new Leader Election - ZK 3.4

Posted by Camille Fournier <ca...@apache.org>.
This pretty much matches what I expect. It would be great if you
wanted to try your hand at creating a patch and submitting it to the
ticket that was created for this problem, but if not, please post this
analysis to issue 1465 and we'll look at it ASAP.

C

On Wed, May 16, 2012 at 2:55 PM, Vinayak Khot <vi...@nutanix.com> wrote:
> We also have encountered a problem where the newly elected leader
> sends entire
> snapshot to a follower even though the follower is in sync with the leader.
>
> A closer look at the code shows the problem in the logic where we decide to
> send
> a snapshot.
> Following scenario explains the problem in details.
> Start a 3 node Zookeeper ensemble where every quorum member has seen same
> changes.
> zxid: *0x400000004*
>
> 1. When a newly elected leader starts, it bumps up its zxid to the new
> epoch.
>
> Code snippet Leader.java
>
> long epoch = getEpochToPropose(self.getId(), self.getAcceptedEpoch());
> zk.setZxid(ZxidUtils.makeZxid(epoch, 0));
> synchronized(this){
>     lastProposed = zk.getZxid();  // *0x500000000*
> }
>
> 2. Now a follower tries to join the leader with its peerLastZxid = *
> 0x400000004*
>
> Note that now the leader has in memory committedLog list with* *
> maxCommittedLog=*0x400000004** *
> *
> *
> As committedLog don't have any new transactions which have zxid >
> peerLastZxid, we check if
> the leader and follower are in sync.
>
> Code snippet from LearnerHandler.java
> leaderLastZxid = leader.startForwarding(this, updates);
> if (peerLastZxid == leaderLastZxid) {   *0x400000004 == **0x500000000*
>   // We are in sync so we'll do an empty diff
>   packetToSend = Leader.DIFF;
>   zxidToSend = leaderLastZxid;
> }
>
> Note that the function *leader.startForwarding()* returns *lastProposed *zxid
> which is already set to
> *0x500000000 *by the leader.
> So in this scenario we never send empty diff even though the leader and
> follower are in sync,
> and we end up sending entire snapshot in the code that follows above check.
>
> A possible fix would be to keep *lastProcessedZxid* in the leader which
> will get updated only when
> the leader processes a transaction. While syncing with a follower, if the
> peerLastZxid sent by a follower
> is same as lastProcessedZxid of the leader we can send empty diff to the
> follower.
> This shall avoid unnecessarily sending entire snapshot when the leader and
> follower are already in sync.
>
> Zookeeper developers please share your views on above mentioned issue.
>
> - Vinayak
>
> On Mon, May 14, 2012 at 8:30 AM, Camille Fournier <ca...@apache.org>wrote:
>
>> Thanks.
>> I just ran a couple of tests to start the debugging. Mark, I don't see
>> a long cluster settle with a mostly empty data set, so I think this
>> might be two different problems. I do see a lot of snapshots being
>> sent though so there is probably some overaggressiveness in the way
>> that we evaluate when to send snapshots that should be evaluated.
>> Adding the dev mailing list, as I may need ben or flavio to take a
>> look as well.
>>
>> C
>>
>> On Thu, May 10, 2012 at 10:48 AM,  <Al...@ubs.com> wrote:
>> > Cheers - Raised https://issues.apache.org/jira/browse/ZOOKEEPER-1465
>> >
>> >
>> >
>> > -----Original Message-----
>> > From: Camille Fournier [mailto:camille@apache.org]
>> > Sent: 10 May 2012 14:58
>> > To: user@zookeeper.apache.org
>> > Subject: Re: Possible issue with cluster availability following new
>> Leader Election - ZK 3.4
>> >
>> > I will take a look at this soon, have you created a Jira for it? If not
>> please do so.
>> >
>> > Thanks,
>> > C
>> >
>> > On Thu, May 10, 2012 at 7:20 AM,  <Al...@ubs.com> wrote:
>> >> I think there may be a problem here with the 3.4 branch. I dropped the
>> >> cluster back to 3.3.5 and the behaviour was much better.
>> >>
>> >> To summarize:
>> >>
>> >> 650mb of data
>> >> 20k nodes of varied size
>> >> 3 node cluster
>> >>
>> >> On 3.4.x (using latest branch build)
>> >> ---------
>> >> Takes 3-4 minutes to bring up a cluster from cold Takes 40-50 secs to
>> >> recover from a leader failure Takes 10 secs for a new follower to join
>> >> the cluster
>> >>
>> >> On 3.3.5
>> >> --------
>> >> Takes 10-20 secs to bring up a cluster from cold Takes 10 secs to
>> >> recover from a leader failure Takes 10 secs for a new follower to join
>> >> the cluster
>> >>
>> >> Any views on this from the ZK devs? The differences in behaviour only
>> >> start becoming apparent as the dataset gets bigger.
>> >> I was hoping to use 3.4 for the transactional features it offered via
>> >> the 'multi-update' operations, but this issue seems pretty serious...
>> >>
>> >>
>> >>
>> >> Visit our website at http://www.ubs.com
>> >>
>> >> This message contains confidential information and is intended only
>> >> for the individual named. If you are not the named addressee you
>> >> should not disseminate, distribute or copy this e-mail. Please notify
>> >> the sender immediately by e-mail if you have received this e-mail by
>> >> mistake and delete this e-mail from your system.
>> >>
>> >> E-mails are not encrypted and cannot be guaranteed to be secure or
>> >> error-free as information could be intercepted, corrupted, lost,
>> >> destroyed, arrive late or incomplete, or contain viruses. The sender
>> >> therefore does not accept liability for any errors or omissions in the
>> >> contents of this message which arise as a result of e-mail transmission.
>> >> If verification is required please request a hard-copy version. This
>> >> message is provided for informational purposes and should not be
>> >> construed as a solicitation or offer to buy or sell any securities or
>> >> related financial instruments.
>> >>
>> >> UBS Limited is a company limited by shares incorporated in the United
>> >> Kingdom registered in England and Wales with number 2035362.
>> >> Registered office: 1 Finsbury Avenue, London EC2M 2PP.  UBS Limited is
>> >> authorised and regulated by the Financial Services Authority.
>> >>
>> >> UBS AG is a public company incorporated with limited liability in
>> >> Switzerland domiciled in the Canton of Basel-City and the Canton of
>> >> Zurich respectively registered at the Commercial Registry offices in
>> >> those Cantons with Identification No: CH-270.3.004.646-4 and having
>> >> respective head offices at Aeschenvorstadt 1, 4051 Basel and
>> >> Bahnhofstrasse 45, 8001 Zurich, Switzerland.  Registered in the United
>> >> Kingdom as a foreign company with No: FC021146 and having a UK
>> >> Establishment registered at Companies House, Cardiff, with No:
>> >> BR 004507.  The principal office of UK Establishment: 1 Finsbury
>> >> Avenue, London EC2M 2PP.  In the United Kingdom, UBS AG is authorised
>> >> and regulated by the Financial Services Authority.
>> >>
>> >> UBS reserves the right to retain all messages. Messages are protected
>> >> and accessed only in legally justified cases.
>> > Visit our website at http://www.ubs.com
>> >
>> > This message contains confidential information and is intended only
>> > for the individual named. If you are not the named addressee you
>> > should not disseminate, distribute or copy this e-mail. Please
>> > notify the sender immediately by e-mail if you have received this
>> > e-mail by mistake and delete this e-mail from your system.
>> >
>> > E-mails are not encrypted and cannot be guaranteed to be secure or
>> > error-free as information could be intercepted, corrupted, lost,
>> > destroyed, arrive late or incomplete, or contain viruses. The sender
>> > therefore does not accept liability for any errors or omissions in the
>> > contents of this message which arise as a result of e-mail transmission.
>> > If verification is required please request a hard-copy version. This
>> > message is provided for informational purposes and should not be
>> > construed as a solicitation or offer to buy or sell any securities
>> > or related financial instruments.
>> >
>> > UBS Limited is a company limited by shares incorporated in the United
>> > Kingdom registered in England and Wales with number 2035362.
>> > Registered office: 1 Finsbury Avenue, London EC2M 2PP.  UBS Limited
>> > is authorised and regulated by the Financial Services Authority.
>> >
>> > UBS AG is a public company incorporated with limited liability in
>> > Switzerland domiciled in the Canton of Basel-City and the Canton of
>> > Zurich respectively registered at the Commercial Registry offices in
>> > those Cantons with Identification No: CH-270.3.004.646-4 and having
>> > respective head offices at Aeschenvorstadt 1, 4051 Basel and
>> > Bahnhofstrasse 45, 8001 Zurich, Switzerland.  Registered in the
>> > United Kingdom as a foreign company with No: FC021146 and having a
>> > UK Establishment registered at Companies House, Cardiff, with No:
>> > BR 004507.  The principal office of UK Establishment: 1 Finsbury Avenue,
>> > London EC2M 2PP.  In the United Kingdom, UBS AG is authorised and
>> > regulated by the Financial Services Authority.
>> >
>> > UBS reserves the right to retain all messages. Messages are protected
>> > and accessed only in legally justified cases.
>>

Re: Possible issue with cluster availability following new Leader Election - ZK 3.4

Posted by Camille Fournier <ca...@apache.org>.
This pretty much matches what I expect. It would be great if you
wanted to try your hand at creating a patch and submitting it to the
ticket that was created for this problem, but if not, please post this
analysis to issue 1465 and we'll look at it ASAP.

C

On Wed, May 16, 2012 at 2:55 PM, Vinayak Khot <vi...@nutanix.com> wrote:
> We also have encountered a problem where the newly elected leader
> sends entire
> snapshot to a follower even though the follower is in sync with the leader.
>
> A closer look at the code shows the problem in the logic where we decide to
> send
> a snapshot.
> Following scenario explains the problem in details.
> Start a 3 node Zookeeper ensemble where every quorum member has seen same
> changes.
> zxid: *0x400000004*
>
> 1. When a newly elected leader starts, it bumps up its zxid to the new
> epoch.
>
> Code snippet Leader.java
>
> long epoch = getEpochToPropose(self.getId(), self.getAcceptedEpoch());
> zk.setZxid(ZxidUtils.makeZxid(epoch, 0));
> synchronized(this){
>     lastProposed = zk.getZxid();  // *0x500000000*
> }
>
> 2. Now a follower tries to join the leader with its peerLastZxid = *
> 0x400000004*
>
> Note that now the leader has in memory committedLog list with* *
> maxCommittedLog=*0x400000004** *
> *
> *
> As committedLog don't have any new transactions which have zxid >
> peerLastZxid, we check if
> the leader and follower are in sync.
>
> Code snippet from LearnerHandler.java
> leaderLastZxid = leader.startForwarding(this, updates);
> if (peerLastZxid == leaderLastZxid) {   *0x400000004 == **0x500000000*
>   // We are in sync so we'll do an empty diff
>   packetToSend = Leader.DIFF;
>   zxidToSend = leaderLastZxid;
> }
>
> Note that the function *leader.startForwarding()* returns *lastProposed *zxid
> which is already set to
> *0x500000000 *by the leader.
> So in this scenario we never send empty diff even though the leader and
> follower are in sync,
> and we end up sending entire snapshot in the code that follows above check.
>
> A possible fix would be to keep *lastProcessedZxid* in the leader which
> will get updated only when
> the leader processes a transaction. While syncing with a follower, if the
> peerLastZxid sent by a follower
> is same as lastProcessedZxid of the leader we can send empty diff to the
> follower.
> This shall avoid unnecessarily sending entire snapshot when the leader and
> follower are already in sync.
>
> Zookeeper developers please share your views on above mentioned issue.
>
> - Vinayak
>
> On Mon, May 14, 2012 at 8:30 AM, Camille Fournier <ca...@apache.org>wrote:
>
>> Thanks.
>> I just ran a couple of tests to start the debugging. Mark, I don't see
>> a long cluster settle with a mostly empty data set, so I think this
>> might be two different problems. I do see a lot of snapshots being
>> sent though so there is probably some overaggressiveness in the way
>> that we evaluate when to send snapshots that should be evaluated.
>> Adding the dev mailing list, as I may need ben or flavio to take a
>> look as well.
>>
>> C
>>
>> On Thu, May 10, 2012 at 10:48 AM,  <Al...@ubs.com> wrote:
>> > Cheers - Raised https://issues.apache.org/jira/browse/ZOOKEEPER-1465
>> >
>> >
>> >
>> > -----Original Message-----
>> > From: Camille Fournier [mailto:camille@apache.org]
>> > Sent: 10 May 2012 14:58
>> > To: user@zookeeper.apache.org
>> > Subject: Re: Possible issue with cluster availability following new
>> Leader Election - ZK 3.4
>> >
>> > I will take a look at this soon, have you created a Jira for it? If not
>> please do so.
>> >
>> > Thanks,
>> > C
>> >
>> > On Thu, May 10, 2012 at 7:20 AM,  <Al...@ubs.com> wrote:
>> >> I think there may be a problem here with the 3.4 branch. I dropped the
>> >> cluster back to 3.3.5 and the behaviour was much better.
>> >>
>> >> To summarize:
>> >>
>> >> 650mb of data
>> >> 20k nodes of varied size
>> >> 3 node cluster
>> >>
>> >> On 3.4.x (using latest branch build)
>> >> ---------
>> >> Takes 3-4 minutes to bring up a cluster from cold Takes 40-50 secs to
>> >> recover from a leader failure Takes 10 secs for a new follower to join
>> >> the cluster
>> >>
>> >> On 3.3.5
>> >> --------
>> >> Takes 10-20 secs to bring up a cluster from cold Takes 10 secs to
>> >> recover from a leader failure Takes 10 secs for a new follower to join
>> >> the cluster
>> >>
>> >> Any views on this from the ZK devs? The differences in behaviour only
>> >> start becoming apparent as the dataset gets bigger.
>> >> I was hoping to use 3.4 for the transactional features it offered via
>> >> the 'multi-update' operations, but this issue seems pretty serious...
>> >>
>> >>
>> >>
>> >> Visit our website at http://www.ubs.com
>> >>
>> >> This message contains confidential information and is intended only
>> >> for the individual named. If you are not the named addressee you
>> >> should not disseminate, distribute or copy this e-mail. Please notify
>> >> the sender immediately by e-mail if you have received this e-mail by
>> >> mistake and delete this e-mail from your system.
>> >>
>> >> E-mails are not encrypted and cannot be guaranteed to be secure or
>> >> error-free as information could be intercepted, corrupted, lost,
>> >> destroyed, arrive late or incomplete, or contain viruses. The sender
>> >> therefore does not accept liability for any errors or omissions in the
>> >> contents of this message which arise as a result of e-mail transmission.
>> >> If verification is required please request a hard-copy version. This
>> >> message is provided for informational purposes and should not be
>> >> construed as a solicitation or offer to buy or sell any securities or
>> >> related financial instruments.
>> >>
>> >> UBS Limited is a company limited by shares incorporated in the United
>> >> Kingdom registered in England and Wales with number 2035362.
>> >> Registered office: 1 Finsbury Avenue, London EC2M 2PP.  UBS Limited is
>> >> authorised and regulated by the Financial Services Authority.
>> >>
>> >> UBS AG is a public company incorporated with limited liability in
>> >> Switzerland domiciled in the Canton of Basel-City and the Canton of
>> >> Zurich respectively registered at the Commercial Registry offices in
>> >> those Cantons with Identification No: CH-270.3.004.646-4 and having
>> >> respective head offices at Aeschenvorstadt 1, 4051 Basel and
>> >> Bahnhofstrasse 45, 8001 Zurich, Switzerland.  Registered in the United
>> >> Kingdom as a foreign company with No: FC021146 and having a UK
>> >> Establishment registered at Companies House, Cardiff, with No:
>> >> BR 004507.  The principal office of UK Establishment: 1 Finsbury
>> >> Avenue, London EC2M 2PP.  In the United Kingdom, UBS AG is authorised
>> >> and regulated by the Financial Services Authority.
>> >>
>> >> UBS reserves the right to retain all messages. Messages are protected
>> >> and accessed only in legally justified cases.
>> > Visit our website at http://www.ubs.com
>> >
>> > This message contains confidential information and is intended only
>> > for the individual named. If you are not the named addressee you
>> > should not disseminate, distribute or copy this e-mail. Please
>> > notify the sender immediately by e-mail if you have received this
>> > e-mail by mistake and delete this e-mail from your system.
>> >
>> > E-mails are not encrypted and cannot be guaranteed to be secure or
>> > error-free as information could be intercepted, corrupted, lost,
>> > destroyed, arrive late or incomplete, or contain viruses. The sender
>> > therefore does not accept liability for any errors or omissions in the
>> > contents of this message which arise as a result of e-mail transmission.
>> > If verification is required please request a hard-copy version. This
>> > message is provided for informational purposes and should not be
>> > construed as a solicitation or offer to buy or sell any securities
>> > or related financial instruments.
>> >
>> > UBS Limited is a company limited by shares incorporated in the United
>> > Kingdom registered in England and Wales with number 2035362.
>> > Registered office: 1 Finsbury Avenue, London EC2M 2PP.  UBS Limited
>> > is authorised and regulated by the Financial Services Authority.
>> >
>> > UBS AG is a public company incorporated with limited liability in
>> > Switzerland domiciled in the Canton of Basel-City and the Canton of
>> > Zurich respectively registered at the Commercial Registry offices in
>> > those Cantons with Identification No: CH-270.3.004.646-4 and having
>> > respective head offices at Aeschenvorstadt 1, 4051 Basel and
>> > Bahnhofstrasse 45, 8001 Zurich, Switzerland.  Registered in the
>> > United Kingdom as a foreign company with No: FC021146 and having a
>> > UK Establishment registered at Companies House, Cardiff, with No:
>> > BR 004507.  The principal office of UK Establishment: 1 Finsbury Avenue,
>> > London EC2M 2PP.  In the United Kingdom, UBS AG is authorised and
>> > regulated by the Financial Services Authority.
>> >
>> > UBS reserves the right to retain all messages. Messages are protected
>> > and accessed only in legally justified cases.
>>

Re: Possible issue with cluster availability following new Leader Election - ZK 3.4

Posted by Vinayak Khot <vi...@nutanix.com>.
We also have encountered a problem where the newly elected leader
sends entire
snapshot to a follower even though the follower is in sync with the leader.

A closer look at the code shows the problem in the logic where we decide to
send
a snapshot.
Following scenario explains the problem in details.
Start a 3 node Zookeeper ensemble where every quorum member has seen same
changes.
zxid: *0x400000004*

1. When a newly elected leader starts, it bumps up its zxid to the new
epoch.

Code snippet Leader.java

long epoch = getEpochToPropose(self.getId(), self.getAcceptedEpoch());
zk.setZxid(ZxidUtils.makeZxid(epoch, 0));
synchronized(this){
     lastProposed = zk.getZxid();  // *0x500000000*
}

2. Now a follower tries to join the leader with its peerLastZxid = *
0x400000004*

Note that now the leader has in memory committedLog list with* *
maxCommittedLog=*0x400000004** *
*
*
As committedLog don't have any new transactions which have zxid >
peerLastZxid, we check if
the leader and follower are in sync.

Code snippet from LearnerHandler.java
leaderLastZxid = leader.startForwarding(this, updates);
if (peerLastZxid == leaderLastZxid) {   *0x400000004 == **0x500000000*
   // We are in sync so we'll do an empty diff
   packetToSend = Leader.DIFF;
   zxidToSend = leaderLastZxid;
}

Note that the function *leader.startForwarding()* returns *lastProposed *zxid
which is already set to
*0x500000000 *by the leader.
So in this scenario we never send empty diff even though the leader and
follower are in sync,
and we end up sending entire snapshot in the code that follows above check.

A possible fix would be to keep *lastProcessedZxid* in the leader which
will get updated only when
the leader processes a transaction. While syncing with a follower, if the
peerLastZxid sent by a follower
is same as lastProcessedZxid of the leader we can send empty diff to the
follower.
This shall avoid unnecessarily sending entire snapshot when the leader and
follower are already in sync.

Zookeeper developers please share your views on above mentioned issue.

- Vinayak

On Mon, May 14, 2012 at 8:30 AM, Camille Fournier <ca...@apache.org>wrote:

> Thanks.
> I just ran a couple of tests to start the debugging. Mark, I don't see
> a long cluster settle with a mostly empty data set, so I think this
> might be two different problems. I do see a lot of snapshots being
> sent though so there is probably some overaggressiveness in the way
> that we evaluate when to send snapshots that should be evaluated.
> Adding the dev mailing list, as I may need ben or flavio to take a
> look as well.
>
> C
>
> On Thu, May 10, 2012 at 10:48 AM,  <Al...@ubs.com> wrote:
> > Cheers - Raised https://issues.apache.org/jira/browse/ZOOKEEPER-1465
> >
> >
> >
> > -----Original Message-----
> > From: Camille Fournier [mailto:camille@apache.org]
> > Sent: 10 May 2012 14:58
> > To: user@zookeeper.apache.org
> > Subject: Re: Possible issue with cluster availability following new
> Leader Election - ZK 3.4
> >
> > I will take a look at this soon, have you created a Jira for it? If not
> please do so.
> >
> > Thanks,
> > C
> >
> > On Thu, May 10, 2012 at 7:20 AM,  <Al...@ubs.com> wrote:
> >> I think there may be a problem here with the 3.4 branch. I dropped the
> >> cluster back to 3.3.5 and the behaviour was much better.
> >>
> >> To summarize:
> >>
> >> 650mb of data
> >> 20k nodes of varied size
> >> 3 node cluster
> >>
> >> On 3.4.x (using latest branch build)
> >> ---------
> >> Takes 3-4 minutes to bring up a cluster from cold Takes 40-50 secs to
> >> recover from a leader failure Takes 10 secs for a new follower to join
> >> the cluster
> >>
> >> On 3.3.5
> >> --------
> >> Takes 10-20 secs to bring up a cluster from cold Takes 10 secs to
> >> recover from a leader failure Takes 10 secs for a new follower to join
> >> the cluster
> >>
> >> Any views on this from the ZK devs? The differences in behaviour only
> >> start becoming apparent as the dataset gets bigger.
> >> I was hoping to use 3.4 for the transactional features it offered via
> >> the 'multi-update' operations, but this issue seems pretty serious...
> >>
> >>
> >>
> >> Visit our website at http://www.ubs.com
> >>
> >> This message contains confidential information and is intended only
> >> for the individual named. If you are not the named addressee you
> >> should not disseminate, distribute or copy this e-mail. Please notify
> >> the sender immediately by e-mail if you have received this e-mail by
> >> mistake and delete this e-mail from your system.
> >>
> >> E-mails are not encrypted and cannot be guaranteed to be secure or
> >> error-free as information could be intercepted, corrupted, lost,
> >> destroyed, arrive late or incomplete, or contain viruses. The sender
> >> therefore does not accept liability for any errors or omissions in the
> >> contents of this message which arise as a result of e-mail transmission.
> >> If verification is required please request a hard-copy version. This
> >> message is provided for informational purposes and should not be
> >> construed as a solicitation or offer to buy or sell any securities or
> >> related financial instruments.
> >>
> >> UBS Limited is a company limited by shares incorporated in the United
> >> Kingdom registered in England and Wales with number 2035362.
> >> Registered office: 1 Finsbury Avenue, London EC2M 2PP.  UBS Limited is
> >> authorised and regulated by the Financial Services Authority.
> >>
> >> UBS AG is a public company incorporated with limited liability in
> >> Switzerland domiciled in the Canton of Basel-City and the Canton of
> >> Zurich respectively registered at the Commercial Registry offices in
> >> those Cantons with Identification No: CH-270.3.004.646-4 and having
> >> respective head offices at Aeschenvorstadt 1, 4051 Basel and
> >> Bahnhofstrasse 45, 8001 Zurich, Switzerland.  Registered in the United
> >> Kingdom as a foreign company with No: FC021146 and having a UK
> >> Establishment registered at Companies House, Cardiff, with No:
> >> BR 004507.  The principal office of UK Establishment: 1 Finsbury
> >> Avenue, London EC2M 2PP.  In the United Kingdom, UBS AG is authorised
> >> and regulated by the Financial Services Authority.
> >>
> >> UBS reserves the right to retain all messages. Messages are protected
> >> and accessed only in legally justified cases.
> > Visit our website at http://www.ubs.com
> >
> > This message contains confidential information and is intended only
> > for the individual named. If you are not the named addressee you
> > should not disseminate, distribute or copy this e-mail. Please
> > notify the sender immediately by e-mail if you have received this
> > e-mail by mistake and delete this e-mail from your system.
> >
> > E-mails are not encrypted and cannot be guaranteed to be secure or
> > error-free as information could be intercepted, corrupted, lost,
> > destroyed, arrive late or incomplete, or contain viruses. The sender
> > therefore does not accept liability for any errors or omissions in the
> > contents of this message which arise as a result of e-mail transmission.
> > If verification is required please request a hard-copy version. This
> > message is provided for informational purposes and should not be
> > construed as a solicitation or offer to buy or sell any securities
> > or related financial instruments.
> >
> > UBS Limited is a company limited by shares incorporated in the United
> > Kingdom registered in England and Wales with number 2035362.
> > Registered office: 1 Finsbury Avenue, London EC2M 2PP.  UBS Limited
> > is authorised and regulated by the Financial Services Authority.
> >
> > UBS AG is a public company incorporated with limited liability in
> > Switzerland domiciled in the Canton of Basel-City and the Canton of
> > Zurich respectively registered at the Commercial Registry offices in
> > those Cantons with Identification No: CH-270.3.004.646-4 and having
> > respective head offices at Aeschenvorstadt 1, 4051 Basel and
> > Bahnhofstrasse 45, 8001 Zurich, Switzerland.  Registered in the
> > United Kingdom as a foreign company with No: FC021146 and having a
> > UK Establishment registered at Companies House, Cardiff, with No:
> > BR 004507.  The principal office of UK Establishment: 1 Finsbury Avenue,
> > London EC2M 2PP.  In the United Kingdom, UBS AG is authorised and
> > regulated by the Financial Services Authority.
> >
> > UBS reserves the right to retain all messages. Messages are protected
> > and accessed only in legally justified cases.
>

Re: Possible issue with cluster availability following new Leader Election - ZK 3.4

Posted by Vinayak Khot <vi...@nutanix.com>.
We also have encountered a problem where the newly elected leader
sends entire
snapshot to a follower even though the follower is in sync with the leader.

A closer look at the code shows the problem in the logic where we decide to
send
a snapshot.
Following scenario explains the problem in details.
Start a 3 node Zookeeper ensemble where every quorum member has seen same
changes.
zxid: *0x400000004*

1. When a newly elected leader starts, it bumps up its zxid to the new
epoch.

Code snippet Leader.java

long epoch = getEpochToPropose(self.getId(), self.getAcceptedEpoch());
zk.setZxid(ZxidUtils.makeZxid(epoch, 0));
synchronized(this){
     lastProposed = zk.getZxid();  // *0x500000000*
}

2. Now a follower tries to join the leader with its peerLastZxid = *
0x400000004*

Note that now the leader has in memory committedLog list with* *
maxCommittedLog=*0x400000004** *
*
*
As committedLog don't have any new transactions which have zxid >
peerLastZxid, we check if
the leader and follower are in sync.

Code snippet from LearnerHandler.java
leaderLastZxid = leader.startForwarding(this, updates);
if (peerLastZxid == leaderLastZxid) {   *0x400000004 == **0x500000000*
   // We are in sync so we'll do an empty diff
   packetToSend = Leader.DIFF;
   zxidToSend = leaderLastZxid;
}

Note that the function *leader.startForwarding()* returns *lastProposed *zxid
which is already set to
*0x500000000 *by the leader.
So in this scenario we never send empty diff even though the leader and
follower are in sync,
and we end up sending entire snapshot in the code that follows above check.

A possible fix would be to keep *lastProcessedZxid* in the leader which
will get updated only when
the leader processes a transaction. While syncing with a follower, if the
peerLastZxid sent by a follower
is same as lastProcessedZxid of the leader we can send empty diff to the
follower.
This shall avoid unnecessarily sending entire snapshot when the leader and
follower are already in sync.

Zookeeper developers please share your views on above mentioned issue.

- Vinayak

On Mon, May 14, 2012 at 8:30 AM, Camille Fournier <ca...@apache.org>wrote:

> Thanks.
> I just ran a couple of tests to start the debugging. Mark, I don't see
> a long cluster settle with a mostly empty data set, so I think this
> might be two different problems. I do see a lot of snapshots being
> sent though so there is probably some overaggressiveness in the way
> that we evaluate when to send snapshots that should be evaluated.
> Adding the dev mailing list, as I may need ben or flavio to take a
> look as well.
>
> C
>
> On Thu, May 10, 2012 at 10:48 AM,  <Al...@ubs.com> wrote:
> > Cheers - Raised https://issues.apache.org/jira/browse/ZOOKEEPER-1465
> >
> >
> >
> > -----Original Message-----
> > From: Camille Fournier [mailto:camille@apache.org]
> > Sent: 10 May 2012 14:58
> > To: user@zookeeper.apache.org
> > Subject: Re: Possible issue with cluster availability following new
> Leader Election - ZK 3.4
> >
> > I will take a look at this soon, have you created a Jira for it? If not
> please do so.
> >
> > Thanks,
> > C
> >
> > On Thu, May 10, 2012 at 7:20 AM,  <Al...@ubs.com> wrote:
> >> I think there may be a problem here with the 3.4 branch. I dropped the
> >> cluster back to 3.3.5 and the behaviour was much better.
> >>
> >> To summarize:
> >>
> >> 650mb of data
> >> 20k nodes of varied size
> >> 3 node cluster
> >>
> >> On 3.4.x (using latest branch build)
> >> ---------
> >> Takes 3-4 minutes to bring up a cluster from cold Takes 40-50 secs to
> >> recover from a leader failure Takes 10 secs for a new follower to join
> >> the cluster
> >>
> >> On 3.3.5
> >> --------
> >> Takes 10-20 secs to bring up a cluster from cold Takes 10 secs to
> >> recover from a leader failure Takes 10 secs for a new follower to join
> >> the cluster
> >>
> >> Any views on this from the ZK devs? The differences in behaviour only
> >> start becoming apparent as the dataset gets bigger.
> >> I was hoping to use 3.4 for the transactional features it offered via
> >> the 'multi-update' operations, but this issue seems pretty serious...
> >>
> >>
> >>
> >> Visit our website at http://www.ubs.com
> >>
> >> This message contains confidential information and is intended only
> >> for the individual named. If you are not the named addressee you
> >> should not disseminate, distribute or copy this e-mail. Please notify
> >> the sender immediately by e-mail if you have received this e-mail by
> >> mistake and delete this e-mail from your system.
> >>
> >> E-mails are not encrypted and cannot be guaranteed to be secure or
> >> error-free as information could be intercepted, corrupted, lost,
> >> destroyed, arrive late or incomplete, or contain viruses. The sender
> >> therefore does not accept liability for any errors or omissions in the
> >> contents of this message which arise as a result of e-mail transmission.
> >> If verification is required please request a hard-copy version. This
> >> message is provided for informational purposes and should not be
> >> construed as a solicitation or offer to buy or sell any securities or
> >> related financial instruments.
> >>
> >> UBS Limited is a company limited by shares incorporated in the United
> >> Kingdom registered in England and Wales with number 2035362.
> >> Registered office: 1 Finsbury Avenue, London EC2M 2PP.  UBS Limited is
> >> authorised and regulated by the Financial Services Authority.
> >>
> >> UBS AG is a public company incorporated with limited liability in
> >> Switzerland domiciled in the Canton of Basel-City and the Canton of
> >> Zurich respectively registered at the Commercial Registry offices in
> >> those Cantons with Identification No: CH-270.3.004.646-4 and having
> >> respective head offices at Aeschenvorstadt 1, 4051 Basel and
> >> Bahnhofstrasse 45, 8001 Zurich, Switzerland.  Registered in the United
> >> Kingdom as a foreign company with No: FC021146 and having a UK
> >> Establishment registered at Companies House, Cardiff, with No:
> >> BR 004507.  The principal office of UK Establishment: 1 Finsbury
> >> Avenue, London EC2M 2PP.  In the United Kingdom, UBS AG is authorised
> >> and regulated by the Financial Services Authority.
> >>
> >> UBS reserves the right to retain all messages. Messages are protected
> >> and accessed only in legally justified cases.
> > Visit our website at http://www.ubs.com
> >
> > This message contains confidential information and is intended only
> > for the individual named. If you are not the named addressee you
> > should not disseminate, distribute or copy this e-mail. Please
> > notify the sender immediately by e-mail if you have received this
> > e-mail by mistake and delete this e-mail from your system.
> >
> > E-mails are not encrypted and cannot be guaranteed to be secure or
> > error-free as information could be intercepted, corrupted, lost,
> > destroyed, arrive late or incomplete, or contain viruses. The sender
> > therefore does not accept liability for any errors or omissions in the
> > contents of this message which arise as a result of e-mail transmission.
> > If verification is required please request a hard-copy version. This
> > message is provided for informational purposes and should not be
> > construed as a solicitation or offer to buy or sell any securities
> > or related financial instruments.
> >
> > UBS Limited is a company limited by shares incorporated in the United
> > Kingdom registered in England and Wales with number 2035362.
> > Registered office: 1 Finsbury Avenue, London EC2M 2PP.  UBS Limited
> > is authorised and regulated by the Financial Services Authority.
> >
> > UBS AG is a public company incorporated with limited liability in
> > Switzerland domiciled in the Canton of Basel-City and the Canton of
> > Zurich respectively registered at the Commercial Registry offices in
> > those Cantons with Identification No: CH-270.3.004.646-4 and having
> > respective head offices at Aeschenvorstadt 1, 4051 Basel and
> > Bahnhofstrasse 45, 8001 Zurich, Switzerland.  Registered in the
> > United Kingdom as a foreign company with No: FC021146 and having a
> > UK Establishment registered at Companies House, Cardiff, with No:
> > BR 004507.  The principal office of UK Establishment: 1 Finsbury Avenue,
> > London EC2M 2PP.  In the United Kingdom, UBS AG is authorised and
> > regulated by the Financial Services Authority.
> >
> > UBS reserves the right to retain all messages. Messages are protected
> > and accessed only in legally justified cases.
>

Re: Possible issue with cluster availability following new Leader Election - ZK 3.4

Posted by Camille Fournier <ca...@apache.org>.
Thanks.
I just ran a couple of tests to start the debugging. Mark, I don't see
a long cluster settle with a mostly empty data set, so I think this
might be two different problems. I do see a lot of snapshots being
sent though so there is probably some overaggressiveness in the way
that we evaluate when to send snapshots that should be evaluated.
Adding the dev mailing list, as I may need ben or flavio to take a
look as well.

C

On Thu, May 10, 2012 at 10:48 AM,  <Al...@ubs.com> wrote:
> Cheers - Raised https://issues.apache.org/jira/browse/ZOOKEEPER-1465
>
>
>
> -----Original Message-----
> From: Camille Fournier [mailto:camille@apache.org]
> Sent: 10 May 2012 14:58
> To: user@zookeeper.apache.org
> Subject: Re: Possible issue with cluster availability following new Leader Election - ZK 3.4
>
> I will take a look at this soon, have you created a Jira for it? If not please do so.
>
> Thanks,
> C
>
> On Thu, May 10, 2012 at 7:20 AM,  <Al...@ubs.com> wrote:
>> I think there may be a problem here with the 3.4 branch. I dropped the
>> cluster back to 3.3.5 and the behaviour was much better.
>>
>> To summarize:
>>
>> 650mb of data
>> 20k nodes of varied size
>> 3 node cluster
>>
>> On 3.4.x (using latest branch build)
>> ---------
>> Takes 3-4 minutes to bring up a cluster from cold Takes 40-50 secs to
>> recover from a leader failure Takes 10 secs for a new follower to join
>> the cluster
>>
>> On 3.3.5
>> --------
>> Takes 10-20 secs to bring up a cluster from cold Takes 10 secs to
>> recover from a leader failure Takes 10 secs for a new follower to join
>> the cluster
>>
>> Any views on this from the ZK devs? The differences in behaviour only
>> start becoming apparent as the dataset gets bigger.
>> I was hoping to use 3.4 for the transactional features it offered via
>> the 'multi-update' operations, but this issue seems pretty serious...
>>
>>
>>
>> Visit our website at http://www.ubs.com
>>
>> This message contains confidential information and is intended only
>> for the individual named. If you are not the named addressee you
>> should not disseminate, distribute or copy this e-mail. Please notify
>> the sender immediately by e-mail if you have received this e-mail by
>> mistake and delete this e-mail from your system.
>>
>> E-mails are not encrypted and cannot be guaranteed to be secure or
>> error-free as information could be intercepted, corrupted, lost,
>> destroyed, arrive late or incomplete, or contain viruses. The sender
>> therefore does not accept liability for any errors or omissions in the
>> contents of this message which arise as a result of e-mail transmission.
>> If verification is required please request a hard-copy version. This
>> message is provided for informational purposes and should not be
>> construed as a solicitation or offer to buy or sell any securities or
>> related financial instruments.
>>
>> UBS Limited is a company limited by shares incorporated in the United
>> Kingdom registered in England and Wales with number 2035362.
>> Registered office: 1 Finsbury Avenue, London EC2M 2PP.  UBS Limited is
>> authorised and regulated by the Financial Services Authority.
>>
>> UBS AG is a public company incorporated with limited liability in
>> Switzerland domiciled in the Canton of Basel-City and the Canton of
>> Zurich respectively registered at the Commercial Registry offices in
>> those Cantons with Identification No: CH-270.3.004.646-4 and having
>> respective head offices at Aeschenvorstadt 1, 4051 Basel and
>> Bahnhofstrasse 45, 8001 Zurich, Switzerland.  Registered in the United
>> Kingdom as a foreign company with No: FC021146 and having a UK
>> Establishment registered at Companies House, Cardiff, with No:
>> BR 004507.  The principal office of UK Establishment: 1 Finsbury
>> Avenue, London EC2M 2PP.  In the United Kingdom, UBS AG is authorised
>> and regulated by the Financial Services Authority.
>>
>> UBS reserves the right to retain all messages. Messages are protected
>> and accessed only in legally justified cases.
> Visit our website at http://www.ubs.com
>
> This message contains confidential information and is intended only
> for the individual named. If you are not the named addressee you
> should not disseminate, distribute or copy this e-mail. Please
> notify the sender immediately by e-mail if you have received this
> e-mail by mistake and delete this e-mail from your system.
>
> E-mails are not encrypted and cannot be guaranteed to be secure or
> error-free as information could be intercepted, corrupted, lost,
> destroyed, arrive late or incomplete, or contain viruses. The sender
> therefore does not accept liability for any errors or omissions in the
> contents of this message which arise as a result of e-mail transmission.
> If verification is required please request a hard-copy version. This
> message is provided for informational purposes and should not be
> construed as a solicitation or offer to buy or sell any securities
> or related financial instruments.
>
> UBS Limited is a company limited by shares incorporated in the United
> Kingdom registered in England and Wales with number 2035362.
> Registered office: 1 Finsbury Avenue, London EC2M 2PP.  UBS Limited
> is authorised and regulated by the Financial Services Authority.
>
> UBS AG is a public company incorporated with limited liability in
> Switzerland domiciled in the Canton of Basel-City and the Canton of
> Zurich respectively registered at the Commercial Registry offices in
> those Cantons with Identification No: CH-270.3.004.646-4 and having
> respective head offices at Aeschenvorstadt 1, 4051 Basel and
> Bahnhofstrasse 45, 8001 Zurich, Switzerland.  Registered in the
> United Kingdom as a foreign company with No: FC021146 and having a
> UK Establishment registered at Companies House, Cardiff, with No:
> BR 004507.  The principal office of UK Establishment: 1 Finsbury Avenue,
> London EC2M 2PP.  In the United Kingdom, UBS AG is authorised and
> regulated by the Financial Services Authority.
>
> UBS reserves the right to retain all messages. Messages are protected
> and accessed only in legally justified cases.

Re: Possible issue with cluster availability following new Leader Election - ZK 3.4

Posted by Camille Fournier <ca...@apache.org>.
Thanks.
I just ran a couple of tests to start the debugging. Mark, I don't see
a long cluster settle with a mostly empty data set, so I think this
might be two different problems. I do see a lot of snapshots being
sent though so there is probably some overaggressiveness in the way
that we evaluate when to send snapshots that should be evaluated.
Adding the dev mailing list, as I may need ben or flavio to take a
look as well.

C

On Thu, May 10, 2012 at 10:48 AM,  <Al...@ubs.com> wrote:
> Cheers - Raised https://issues.apache.org/jira/browse/ZOOKEEPER-1465
>
>
>
> -----Original Message-----
> From: Camille Fournier [mailto:camille@apache.org]
> Sent: 10 May 2012 14:58
> To: user@zookeeper.apache.org
> Subject: Re: Possible issue with cluster availability following new Leader Election - ZK 3.4
>
> I will take a look at this soon, have you created a Jira for it? If not please do so.
>
> Thanks,
> C
>
> On Thu, May 10, 2012 at 7:20 AM,  <Al...@ubs.com> wrote:
>> I think there may be a problem here with the 3.4 branch. I dropped the
>> cluster back to 3.3.5 and the behaviour was much better.
>>
>> To summarize:
>>
>> 650mb of data
>> 20k nodes of varied size
>> 3 node cluster
>>
>> On 3.4.x (using latest branch build)
>> ---------
>> Takes 3-4 minutes to bring up a cluster from cold Takes 40-50 secs to
>> recover from a leader failure Takes 10 secs for a new follower to join
>> the cluster
>>
>> On 3.3.5
>> --------
>> Takes 10-20 secs to bring up a cluster from cold Takes 10 secs to
>> recover from a leader failure Takes 10 secs for a new follower to join
>> the cluster
>>
>> Any views on this from the ZK devs? The differences in behaviour only
>> start becoming apparent as the dataset gets bigger.
>> I was hoping to use 3.4 for the transactional features it offered via
>> the 'multi-update' operations, but this issue seems pretty serious...
>>
>>
>>
>> Visit our website at http://www.ubs.com
>>
>> This message contains confidential information and is intended only
>> for the individual named. If you are not the named addressee you
>> should not disseminate, distribute or copy this e-mail. Please notify
>> the sender immediately by e-mail if you have received this e-mail by
>> mistake and delete this e-mail from your system.
>>
>> E-mails are not encrypted and cannot be guaranteed to be secure or
>> error-free as information could be intercepted, corrupted, lost,
>> destroyed, arrive late or incomplete, or contain viruses. The sender
>> therefore does not accept liability for any errors or omissions in the
>> contents of this message which arise as a result of e-mail transmission.
>> If verification is required please request a hard-copy version. This
>> message is provided for informational purposes and should not be
>> construed as a solicitation or offer to buy or sell any securities or
>> related financial instruments.
>>
>> UBS Limited is a company limited by shares incorporated in the United
>> Kingdom registered in England and Wales with number 2035362.
>> Registered office: 1 Finsbury Avenue, London EC2M 2PP.  UBS Limited is
>> authorised and regulated by the Financial Services Authority.
>>
>> UBS AG is a public company incorporated with limited liability in
>> Switzerland domiciled in the Canton of Basel-City and the Canton of
>> Zurich respectively registered at the Commercial Registry offices in
>> those Cantons with Identification No: CH-270.3.004.646-4 and having
>> respective head offices at Aeschenvorstadt 1, 4051 Basel and
>> Bahnhofstrasse 45, 8001 Zurich, Switzerland.  Registered in the United
>> Kingdom as a foreign company with No: FC021146 and having a UK
>> Establishment registered at Companies House, Cardiff, with No:
>> BR 004507.  The principal office of UK Establishment: 1 Finsbury
>> Avenue, London EC2M 2PP.  In the United Kingdom, UBS AG is authorised
>> and regulated by the Financial Services Authority.
>>
>> UBS reserves the right to retain all messages. Messages are protected
>> and accessed only in legally justified cases.
> Visit our website at http://www.ubs.com
>
> This message contains confidential information and is intended only
> for the individual named. If you are not the named addressee you
> should not disseminate, distribute or copy this e-mail. Please
> notify the sender immediately by e-mail if you have received this
> e-mail by mistake and delete this e-mail from your system.
>
> E-mails are not encrypted and cannot be guaranteed to be secure or
> error-free as information could be intercepted, corrupted, lost,
> destroyed, arrive late or incomplete, or contain viruses. The sender
> therefore does not accept liability for any errors or omissions in the
> contents of this message which arise as a result of e-mail transmission.
> If verification is required please request a hard-copy version. This
> message is provided for informational purposes and should not be
> construed as a solicitation or offer to buy or sell any securities
> or related financial instruments.
>
> UBS Limited is a company limited by shares incorporated in the United
> Kingdom registered in England and Wales with number 2035362.
> Registered office: 1 Finsbury Avenue, London EC2M 2PP.  UBS Limited
> is authorised and regulated by the Financial Services Authority.
>
> UBS AG is a public company incorporated with limited liability in
> Switzerland domiciled in the Canton of Basel-City and the Canton of
> Zurich respectively registered at the Commercial Registry offices in
> those Cantons with Identification No: CH-270.3.004.646-4 and having
> respective head offices at Aeschenvorstadt 1, 4051 Basel and
> Bahnhofstrasse 45, 8001 Zurich, Switzerland.  Registered in the
> United Kingdom as a foreign company with No: FC021146 and having a
> UK Establishment registered at Companies House, Cardiff, with No:
> BR 004507.  The principal office of UK Establishment: 1 Finsbury Avenue,
> London EC2M 2PP.  In the United Kingdom, UBS AG is authorised and
> regulated by the Financial Services Authority.
>
> UBS reserves the right to retain all messages. Messages are protected
> and accessed only in legally justified cases.

RE: Possible issue with cluster availability following new Leader Election - ZK 3.4

Posted by Al...@ubs.com.
Cheers - Raised https://issues.apache.org/jira/browse/ZOOKEEPER-1465

 

-----Original Message-----
From: Camille Fournier [mailto:camille@apache.org] 
Sent: 10 May 2012 14:58
To: user@zookeeper.apache.org
Subject: Re: Possible issue with cluster availability following new Leader Election - ZK 3.4

I will take a look at this soon, have you created a Jira for it? If not please do so.

Thanks,
C

On Thu, May 10, 2012 at 7:20 AM,  <Al...@ubs.com> wrote:
> I think there may be a problem here with the 3.4 branch. I dropped the 
> cluster back to 3.3.5 and the behaviour was much better.
>
> To summarize:
>
> 650mb of data
> 20k nodes of varied size
> 3 node cluster
>
> On 3.4.x (using latest branch build)
> ---------
> Takes 3-4 minutes to bring up a cluster from cold Takes 40-50 secs to 
> recover from a leader failure Takes 10 secs for a new follower to join 
> the cluster
>
> On 3.3.5
> --------
> Takes 10-20 secs to bring up a cluster from cold Takes 10 secs to 
> recover from a leader failure Takes 10 secs for a new follower to join 
> the cluster
>
> Any views on this from the ZK devs? The differences in behaviour only 
> start becoming apparent as the dataset gets bigger.
> I was hoping to use 3.4 for the transactional features it offered via 
> the 'multi-update' operations, but this issue seems pretty serious...
>
>
>
> Visit our website at http://www.ubs.com
>
> This message contains confidential information and is intended only 
> for the individual named. If you are not the named addressee you 
> should not disseminate, distribute or copy this e-mail. Please notify 
> the sender immediately by e-mail if you have received this e-mail by 
> mistake and delete this e-mail from your system.
>
> E-mails are not encrypted and cannot be guaranteed to be secure or 
> error-free as information could be intercepted, corrupted, lost, 
> destroyed, arrive late or incomplete, or contain viruses. The sender 
> therefore does not accept liability for any errors or omissions in the 
> contents of this message which arise as a result of e-mail transmission.
> If verification is required please request a hard-copy version. This 
> message is provided for informational purposes and should not be 
> construed as a solicitation or offer to buy or sell any securities or 
> related financial instruments.
>
> UBS Limited is a company limited by shares incorporated in the United 
> Kingdom registered in England and Wales with number 2035362.
> Registered office: 1 Finsbury Avenue, London EC2M 2PP.  UBS Limited is 
> authorised and regulated by the Financial Services Authority.
>
> UBS AG is a public company incorporated with limited liability in 
> Switzerland domiciled in the Canton of Basel-City and the Canton of 
> Zurich respectively registered at the Commercial Registry offices in 
> those Cantons with Identification No: CH-270.3.004.646-4 and having 
> respective head offices at Aeschenvorstadt 1, 4051 Basel and 
> Bahnhofstrasse 45, 8001 Zurich, Switzerland.  Registered in the United 
> Kingdom as a foreign company with No: FC021146 and having a UK 
> Establishment registered at Companies House, Cardiff, with No:
> BR 004507.  The principal office of UK Establishment: 1 Finsbury 
> Avenue, London EC2M 2PP.  In the United Kingdom, UBS AG is authorised 
> and regulated by the Financial Services Authority.
>
> UBS reserves the right to retain all messages. Messages are protected 
> and accessed only in legally justified cases.
Visit our website at http://www.ubs.com 

This message contains confidential information and is intended only 
for the individual named. If you are not the named addressee you 
should not disseminate, distribute or copy this e-mail. Please 
notify the sender immediately by e-mail if you have received this 
e-mail by mistake and delete this e-mail from your system. 

E-mails are not encrypted and cannot be guaranteed to be secure or 
error-free as information could be intercepted, corrupted, lost, 
destroyed, arrive late or incomplete, or contain viruses. The sender 
therefore does not accept liability for any errors or omissions in the 
contents of this message which arise as a result of e-mail transmission. 
If verification is required please request a hard-copy version. This 
message is provided for informational purposes and should not be 
construed as a solicitation or offer to buy or sell any securities 
or related financial instruments. 

UBS Limited is a company limited by shares incorporated in the United 
Kingdom registered in England and Wales with number 2035362. 
Registered office: 1 Finsbury Avenue, London EC2M 2PP.  UBS Limited 
is authorised and regulated by the Financial Services Authority. 

UBS AG is a public company incorporated with limited liability in 
Switzerland domiciled in the Canton of Basel-City and the Canton of 
Zurich respectively registered at the Commercial Registry offices in 
those Cantons with Identification No: CH-270.3.004.646-4 and having 
respective head offices at Aeschenvorstadt 1, 4051 Basel and 
Bahnhofstrasse 45, 8001 Zurich, Switzerland.  Registered in the 
United Kingdom as a foreign company with No: FC021146 and having a 
UK Establishment registered at Companies House, Cardiff, with No:  
BR 004507.  The principal office of UK Establishment: 1 Finsbury Avenue, 
London EC2M 2PP.  In the United Kingdom, UBS AG is authorised and 
regulated by the Financial Services Authority.

UBS reserves the right to retain all messages. Messages are protected 
and accessed only in legally justified cases. 

Re: Possible issue with cluster availability following new Leader Election - ZK 3.4

Posted by Camille Fournier <ca...@apache.org>.
I will take a look at this soon, have you created a Jira for it? If
not please do so.

Thanks,
C

On Thu, May 10, 2012 at 7:20 AM,  <Al...@ubs.com> wrote:
> I think there may be a problem here with the 3.4 branch. I dropped the
> cluster back to 3.3.5 and the behaviour was much better.
>
> To summarize:
>
> 650mb of data
> 20k nodes of varied size
> 3 node cluster
>
> On 3.4.x (using latest branch build)
> ---------
> Takes 3-4 minutes to bring up a cluster from cold
> Takes 40-50 secs to recover from a leader failure
> Takes 10 secs for a new follower to join the cluster
>
> On 3.3.5
> --------
> Takes 10-20 secs to bring up a cluster from cold
> Takes 10 secs to recover from a leader failure
> Takes 10 secs for a new follower to join the cluster
>
> Any views on this from the ZK devs? The differences in behaviour only
> start becoming apparent as the dataset gets bigger.
> I was hoping to use 3.4 for the transactional features it offered via
> the 'multi-update' operations, but this issue seems pretty serious...
>
>
>
> Visit our website at http://www.ubs.com
>
> This message contains confidential information and is intended only
> for the individual named. If you are not the named addressee you
> should not disseminate, distribute or copy this e-mail. Please
> notify the sender immediately by e-mail if you have received this
> e-mail by mistake and delete this e-mail from your system.
>
> E-mails are not encrypted and cannot be guaranteed to be secure or
> error-free as information could be intercepted, corrupted, lost,
> destroyed, arrive late or incomplete, or contain viruses. The sender
> therefore does not accept liability for any errors or omissions in the
> contents of this message which arise as a result of e-mail transmission.
> If verification is required please request a hard-copy version. This
> message is provided for informational purposes and should not be
> construed as a solicitation or offer to buy or sell any securities
> or related financial instruments.
>
> UBS Limited is a company limited by shares incorporated in the United
> Kingdom registered in England and Wales with number 2035362.
> Registered office: 1 Finsbury Avenue, London EC2M 2PP.  UBS Limited
> is authorised and regulated by the Financial Services Authority.
>
> UBS AG is a public company incorporated with limited liability in
> Switzerland domiciled in the Canton of Basel-City and the Canton of
> Zurich respectively registered at the Commercial Registry offices in
> those Cantons with Identification No: CH-270.3.004.646-4 and having
> respective head offices at Aeschenvorstadt 1, 4051 Basel and
> Bahnhofstrasse 45, 8001 Zurich, Switzerland.  Registered in the
> United Kingdom as a foreign company with No: FC021146 and having a
> UK Establishment registered at Companies House, Cardiff, with No:
> BR 004507.  The principal office of UK Establishment: 1 Finsbury Avenue,
> London EC2M 2PP.  In the United Kingdom, UBS AG is authorised and
> regulated by the Financial Services Authority.
>
> UBS reserves the right to retain all messages. Messages are protected
> and accessed only in legally justified cases.

RE: Possible issue with cluster availability following new Leader Election - ZK 3.4

Posted by Al...@ubs.com.
I think there may be a problem here with the 3.4 branch. I dropped the
cluster back to 3.3.5 and the behaviour was much better.

To summarize:

650mb of data 
20k nodes of varied size
3 node cluster

On 3.4.x (using latest branch build)
---------
Takes 3-4 minutes to bring up a cluster from cold
Takes 40-50 secs to recover from a leader failure
Takes 10 secs for a new follower to join the cluster

On 3.3.5
--------
Takes 10-20 secs to bring up a cluster from cold
Takes 10 secs to recover from a leader failure
Takes 10 secs for a new follower to join the cluster

Any views on this from the ZK devs? The differences in behaviour only
start becoming apparent as the dataset gets bigger. 
I was hoping to use 3.4 for the transactional features it offered via
the 'multi-update' operations, but this issue seems pretty serious...



Visit our website at http://www.ubs.com 

This message contains confidential information and is intended only 
for the individual named. If you are not the named addressee you 
should not disseminate, distribute or copy this e-mail. Please 
notify the sender immediately by e-mail if you have received this 
e-mail by mistake and delete this e-mail from your system. 

E-mails are not encrypted and cannot be guaranteed to be secure or 
error-free as information could be intercepted, corrupted, lost, 
destroyed, arrive late or incomplete, or contain viruses. The sender 
therefore does not accept liability for any errors or omissions in the 
contents of this message which arise as a result of e-mail transmission. 
If verification is required please request a hard-copy version. This 
message is provided for informational purposes and should not be 
construed as a solicitation or offer to buy or sell any securities 
or related financial instruments. 

UBS Limited is a company limited by shares incorporated in the United 
Kingdom registered in England and Wales with number 2035362. 
Registered office: 1 Finsbury Avenue, London EC2M 2PP.  UBS Limited 
is authorised and regulated by the Financial Services Authority. 

UBS AG is a public company incorporated with limited liability in 
Switzerland domiciled in the Canton of Basel-City and the Canton of 
Zurich respectively registered at the Commercial Registry offices in 
those Cantons with Identification No: CH-270.3.004.646-4 and having 
respective head offices at Aeschenvorstadt 1, 4051 Basel and 
Bahnhofstrasse 45, 8001 Zurich, Switzerland.  Registered in the 
United Kingdom as a foreign company with No: FC021146 and having a 
UK Establishment registered at Companies House, Cardiff, with No:  
BR 004507.  The principal office of UK Establishment: 1 Finsbury Avenue, 
London EC2M 2PP.  In the United Kingdom, UBS AG is authorised and 
regulated by the Financial Services Authority.

UBS reserves the right to retain all messages. Messages are protected 
and accessed only in legally justified cases. 

Re: Possible issue with cluster availability following new Leader Election - ZK 3.4

Posted by Mark Gius <mg...@gmail.com>.
I'm encountering a similar issue with a more or less empty dataset.  I
bring up a cluster of 3 servers and shoot one of them in the head.  It
takes ~20 seconds for the two remaining hosts to settle and begin
responding the requests again.  If you're certain that your delay is due to
pushing down the larger dataset then we may be seeing different problems.

Mark

On Wed, May 9, 2012 at 6:27 AM, <Al...@ubs.com> wrote:

>
> Hi Zookeeper devs and users,
>
>
> I've been doing some load and failover testing on the ZK 3.4 branch
> using moderately large data sets (700mb and 20k nodes) and I think there
> could be an issue.
>
> When I bring down the leader of a 3 node cluster, it takes around 20-30
> seconds for the cluster as a whole to become available again.
> This is because once a new leader is elected it pushes out a snapshot to
> all the peers who in turn persist it locally before sending an ack back.
> Only then does the leader decide he has a valid quorum. In this case
> pretty much all the time is taken up sending the data over the network
> and re-saving it.
>
> Granted I'm testing this on some low-spec VM's so I wouldn't expect a
> real-world sync for a data set that size to take anything like as long.
> However is this not a significant constraint on availability if,
> whenever a leader fails, a full snapshot needs to sent to and persisted
> by a quorum of peers before the cluster as a whole can be deemed as
> available?
>
> I notice when a peer joins a stable cluster as a follower,
> synchronization is implemented via diffs and the peer is quickly
> available for client connections provided it already had an up to date
> local state.
> Should not something similar not be possible when a new leader is
> elected. A quick glance at the code (line 390 of LearnerHandler)
> suggests there is some logic to send an empty diff but I never see this
> triggered.
>
> I'm am not mutating any state in the cluster whilst I am bringing stuff
> up and down so is this behaviour a bug or by design?
>
> I saw a related question
> (http://zookeeper-user.578899.n2.nabble.com/leader-election-length-td708
> 6868.html#a7089472) a few months back that touched on this, but there
> was not much follow up.
>
> Many thanks
>
> Alex
>
>
>
>
>
> Visit our website at http://www.ubs.com
>
> This message contains confidential information and is intended only
> for the individual named. If you are not the named addressee you
> should not disseminate, distribute or copy this e-mail. Please
> notify the sender immediately by e-mail if you have received this
> e-mail by mistake and delete this e-mail from your system.
>
> E-mails are not encrypted and cannot be guaranteed to be secure or
> error-free as information could be intercepted, corrupted, lost,
> destroyed, arrive late or incomplete, or contain viruses. The sender
> therefore does not accept liability for any errors or omissions in the
> contents of this message which arise as a result of e-mail transmission.
> If verification is required please request a hard-copy version. This
> message is provided for informational purposes and should not be
> construed as a solicitation or offer to buy or sell any securities
> or related financial instruments.
>
> UBS Limited is a company limited by shares incorporated in the United
> Kingdom registered in England and Wales with number 2035362.
> Registered office: 1 Finsbury Avenue, London EC2M 2PP.  UBS Limited
> is authorised and regulated by the Financial Services Authority.
>
> UBS AG is a public company incorporated with limited liability in
> Switzerland domiciled in the Canton of Basel-City and the Canton of
> Zurich respectively registered at the Commercial Registry offices in
> those Cantons with Identification No: CH-270.3.004.646-4 and having
> respective head offices at Aeschenvorstadt 1, 4051 Basel and
> Bahnhofstrasse 45, 8001 Zurich, Switzerland.  Registered in the
> United Kingdom as a foreign company with No: FC021146 and having a
> UK Establishment registered at Companies House, Cardiff, with No:
> BR 004507.  The principal office of UK Establishment: 1 Finsbury Avenue,
> London EC2M 2PP.  In the United Kingdom, UBS AG is authorised and
> regulated by the Financial Services Authority.
>
> UBS reserves the right to retain all messages. Messages are protected
> and accessed only in legally justified cases.
>