You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@aurora.apache.org by "Erb, Stephan" <St...@blue-yonder.com> on 2016/08/24 12:19:54 UTC

Re: [FEEDBACK] Transitioning Aurora leader election to Apache Curator (`-zk_use_curator`)

The curator backend has been working well for us so far. I believe it is safe to make it the default for the next release, and to drop the old code in the release after that.

From: John Sirois <js...@apache.org>
Reply-To: "user@aurora.apache.org" <us...@aurora.apache.org>, "jsirois@apache.org" <js...@apache.org>
Date: Thursday 7 July 2016 at 01:13
To: Martin Hrabovčin <ma...@gmail.com>
Cc: "dev@aurora.apache.org" <de...@aurora.apache.org>, Jake Farrell <jf...@apache.org>, "user@aurora.apache.org" <us...@aurora.apache.org>
Subject: Re: [FEEDBACK] Transitioning Aurora leader election to Apache Curator (`-zk_use_curator`)

Now that 0.15.0 has been released, I thought I'd check in on any progress folks have made with testing/deploying the 0.14.0+ with the Aurora Scheduler `-zk_use_curator` flag in-place.
There has been 1 fix that will go out in the 0.16.0 release to reduce logger noise on shutdown [1][2] but I have heard no negative (or positive) feedback otherwise.

[1] https://issues.apache.org/jira/browse/AURORA-1729
[2] https://reviews.apache.org/r/49578/

On Thu, Jun 16, 2016 at 1:18 PM, John Sirois <js...@apache.org>> wrote:

On Thu, Jun 16, 2016 at 12:03 AM, Martin Hrabovčin <ma...@gmail.com>> wrote:
How should be this flag rolled to existing running cluster? Can it be done using rolling update instance by instance or we need to stop the whole cluster and then bring all nodes with new flag?

I recommend a whole cluster down, upgrade +  new flag, up.

A rolling update should work, but will likely be rocky.  My analysis:

The Aurora leader election consists of 2 components, the actual leader election and the resulting advertisement by the leader of itself as the Aurora service endpoint.  These 2 components each use zookeeper and of the 2 I only ensured that the advertisement was compatible with old releases (old clients). The leader election portion is completely internal to the Aurora scheduler instances vying for leadership and, under Curator, uses a different (enhanced), zookeeper node scheme.  As a result, this is what could happen in a slow roll:

before upgrade: 0: old-lead, 1: old-follow, 2: old-follow
upgrade 0: new-lead, 1: old-lead, 2: old-follow

Here, node 0 will see itself as leader and nodes 1 and 2 will see node 1 as leader. The result will be both node 0 and node 1 attempting to read the mesos distributed log.  Now the log uses its own leader election and the reader must be the leader as things stand, so the Aurora-level leadership "tie" will be broken by one of the 2 Aurora-level leaders failing to become the mesos distributed log leader, and that node will restart its lifecycle - ie flap.  This will continue to be the case with second node upgrade and will not stabilize until the 3rd node is upgraded.

2016-06-16 5:03 GMT+02:00 Jake Farrell <jf...@apache.org>>:
+1, will enable on our test clusters to help verify

-Jake

On Tue, Jun 14, 2016 at 7:43 PM, John Sirois <js...@apache.org>> wrote:

> I'd like to move forward with
> https://issues.apache.org/jira/browse/AURORA-1669 asap; ie: removing
> legacy
> (Twitter) commons zookeeper libraries used for Aurora leader election in
> favor of Apache Curator libraries. The change submitted in
> https://reviews.apache.org/r/46286/ is now live in Aurora 0.14.0 and
> Apache
> Curator based service discovery can be enabled with the Aurora scheduler
> flag `-zk_use_curator`.  I'd like feedback from users who enable this
> option.  If you have a test cluster where you can enable `-zk_use_curator`
> and exercise leader failure and failover, I'd be grateful. If you have
> moved to using this option in production with demonstrable improvements or
> even maintenance of status quo, I'd also be grateful for this news. If
> you've found regressions or new bugs, I'd love to know about those as well.
>
> Thanks in advance to all those who find time to test this out on real
> systems!
>

Re: [FEEDBACK] Transitioning Aurora leader election to Apache Curator (`-zk_use_curator`)

Posted by Zameer Manji <zm...@apache.org>.

I managed to deploy this code in a test cluster and observed no issues.

I still advocate for dropping the old code when we change the default but I
understand concerns that it is risky.

On Mon, Aug 29, 2016 at 1:39 PM, John Sirois <js...@apache.org> wrote:

> Thanks for the feedback folks! I'll post a flag default switch shortly.
>
> On Wed, Aug 24, 2016 at 12:20 PM, Joshua Cohen <jc...@apache.org> wrote:
>
> > I have this enabled in a test cluster and have not noticed any issues
> with
> > it yet. I'd like to roll it out to production before we drop the old code
> > though.
> >
>
> Agreed.  This deserves caution, and fwict the jvm leader code is ~never in
> the refactor path; so even though I too am eager to delete the code, it is
> not an active refactoring burden.
>
>
> > On Wed, Aug 24, 2016 at 1:10 PM, Zameer Manji <zm...@apache.org> wrote:
> >
> >> Could we change the default and drop the old code at the same time? I
> >> don't
> >> see any benefit of letting that hang around.
> >>
> >> I have not tested this code yet, but I hope to do it soon.
> >>
> >> On Wed, Aug 24, 2016 at 5:19 AM, Erb, Stephan <
> >> Stephan.Erb@blue-yonder.com>
> >> wrote:
> >>
> >> > The curator backend has been working well for us so far. I believe it
> is
> >> > safe to make it the default for the next release, and to drop the old
> >> code
> >> > in the release after that.
> >> >
> >> >
> >> >
> >> > *From: *John Sirois <js...@apache.org>
> >> > *Reply-To: *"user@aurora.apache.org" <us...@aurora.apache.org>, "
> >> > jsirois@apache.org" <js...@apache.org>
> >> > *Date: *Thursday 7 July 2016 at 01:13
> >> > *To: *Martin Hrabovčin <ma...@gmail.com>
> >> > *Cc: *"dev@aurora.apache.org" <de...@aurora.apache.org>, Jake Farrell <
> >> > jfarrell@apache.org>, "user@aurora.apache.org" <
> user@aurora.apache.org>
> >> > *Subject: *Re: [FEEDBACK] Transitioning Aurora leader election to
> Apache
> >>
> >> > Curator (`-zk_use_curator`)
> >> >
> >> >
> >> >
> >> > Now that 0.15.0 has been released, I thought I'd check in on any
> >> progress
> >> > folks have made with testing/deploying the 0.14.0+ with the Aurora
> >> > Scheduler `-zk_use_curator` flag in-place.
> >> >
> >> > There has been 1 fix that will go out in the 0.16.0 release to reduce
> >> > logger noise on shutdown [1][2] but I have heard no negative (or
> >> positive)
> >> > feedback otherwise.
> >> >
> >> >
> >> >
> >> > [1] https://issues.apache.org/jira/browse/AURORA-1729
> >> >
> >> > [2] https://reviews.apache.org/r/49578/
> >> >
> >> >
> >> >
> >> > On Thu, Jun 16, 2016 at 1:18 PM, John Sirois <js...@apache.org>
> >> wrote:
> >> >
> >> >
> >> >
> >> >
> >> >
> >> > On Thu, Jun 16, 2016 at 12:03 AM, Martin Hrabovčin <
> >> > martin.hrabovcin@gmail.com> wrote:
> >> >
> >> > How should be this flag rolled to existing running cluster? Can it be
> >> done
> >> > using rolling update instance by instance or we need to stop the whole
> >> > cluster and then bring all nodes with new flag?
> >> >
> >> >
> >> >
> >> > I recommend a whole cluster down, upgrade +  new flag, up.
> >> >
> >> >
> >> >
> >> > A rolling update should work, but will likely be rocky.  My analysis:
> >> >
> >> >
> >> >
> >> > The Aurora leader election consists of 2 components, the actual leader
> >> > election and the resulting advertisement by the leader of itself as
> the
> >> > Aurora service endpoint.  These 2 components each use zookeeper and of
> >> the
> >> > 2 I only ensured that the advertisement was compatible with old
> releases
> >> > (old clients). The leader election portion is completely internal to
> the
> >> > Aurora scheduler instances vying for leadership and, under Curator,
> >> uses a
> >> > different (enhanced), zookeeper node scheme.  As a result, this is
> what
> >> > could happen in a slow roll:
> >> >
> >> >
> >> >
> >> > before upgrade: 0: old-lead, 1: old-follow, 2: old-follow
> >> >
> >> > upgrade 0: new-lead, 1: old-lead, 2: old-follow
> >> >
> >> >
> >> >
> >> > Here, node 0 will see itself as leader and nodes 1 and 2 will see
> node 1
> >> > as leader. The result will be both node 0 and node 1 attempting to
> read
> >> the
> >> > mesos distributed log.  Now the log uses its own leader election and
> the
> >> > reader must be the leader as things stand, so the Aurora-level
> >> leadership
> >> > "tie" will be broken by one of the 2 Aurora-level leaders failing to
> >> become
> >> > the mesos distributed log leader, and that node will restart its
> >> lifecycle
> >> > - ie flap.  This will continue to be the case with second node upgrade
> >> and
> >> > will not stabilize until the 3rd node is upgraded.
> >> >
> >> >
> >> >
> >> >
> >> >
> >> > 2016-06-16 5:03 GMT+02:00 Jake Farrell <jf...@apache.org>:
> >> >
> >> > +1, will enable on our test clusters to help verify
> >> >
> >> > -Jake
> >> >
> >> >
> >> > On Tue, Jun 14, 2016 at 7:43 PM, John Sirois <js...@apache.org>
> >> wrote:
> >> >
> >> > > I'd like to move forward with
> >> > > https://issues.apache.org/jira/browse/AURORA-1669 asap; ie:
> removing
> >> > > legacy
> >> > > (Twitter) commons zookeeper libraries used for Aurora leader
> election
> >> in
> >> > > favor of Apache Curator libraries. The change submitted in
> >> > > https://reviews.apache.org/r/46286/ is now live in Aurora 0.14.0
> and
> >> > > Apache
> >> > > Curator based service discovery can be enabled with the Aurora
> >> scheduler
> >> > > flag `-zk_use_curator`.  I'd like feedback from users who enable
> this
> >> > > option.  If you have a test cluster where you can enable
> >> > `-zk_use_curator`
> >> > > and exercise leader failure and failover, I'd be grateful. If you
> have
> >> > > moved to using this option in production with demonstrable
> >> improvements
> >> > or
> >> > > even maintenance of status quo, I'd also be grateful for this news.
> If
> >> > > you've found regressions or new bugs, I'd love to know about those
> as
> >> > well.
> >> > >
> >> > > Thanks in advance to all those who find time to test this out on
> real
> >> > > systems!
> >> > >
> >> >
> >> >
> >> >
> >> >
> >> >
> >> >
> >> >
> >> >
> >>
> >
> >
>

Re: [FEEDBACK] Transitioning Aurora leader election to Apache Curator (`-zk_use_curator`)

Posted by Zameer Manji <zm...@apache.org>.

I managed to deploy this code in a test cluster and observed no issues.

I still advocate for dropping the old code when we change the default but I
understand concerns that it is risky.

On Mon, Aug 29, 2016 at 1:39 PM, John Sirois <js...@apache.org> wrote:

> Thanks for the feedback folks! I'll post a flag default switch shortly.
>
> On Wed, Aug 24, 2016 at 12:20 PM, Joshua Cohen <jc...@apache.org> wrote:
>
> > I have this enabled in a test cluster and have not noticed any issues
> with
> > it yet. I'd like to roll it out to production before we drop the old code
> > though.
> >
>
> Agreed.  This deserves caution, and fwict the jvm leader code is ~never in
> the refactor path; so even though I too am eager to delete the code, it is
> not an active refactoring burden.
>
>
> > On Wed, Aug 24, 2016 at 1:10 PM, Zameer Manji <zm...@apache.org> wrote:
> >
> >> Could we change the default and drop the old code at the same time? I
> >> don't
> >> see any benefit of letting that hang around.
> >>
> >> I have not tested this code yet, but I hope to do it soon.
> >>
> >> On Wed, Aug 24, 2016 at 5:19 AM, Erb, Stephan <
> >> Stephan.Erb@blue-yonder.com>
> >> wrote:
> >>
> >> > The curator backend has been working well for us so far. I believe it
> is
> >> > safe to make it the default for the next release, and to drop the old
> >> code
> >> > in the release after that.
> >> >
> >> >
> >> >
> >> > *From: *John Sirois <js...@apache.org>
> >> > *Reply-To: *"user@aurora.apache.org" <us...@aurora.apache.org>, "
> >> > jsirois@apache.org" <js...@apache.org>
> >> > *Date: *Thursday 7 July 2016 at 01:13
> >> > *To: *Martin Hrabovčin <ma...@gmail.com>
> >> > *Cc: *"dev@aurora.apache.org" <de...@aurora.apache.org>, Jake Farrell <
> >> > jfarrell@apache.org>, "user@aurora.apache.org" <
> user@aurora.apache.org>
> >> > *Subject: *Re: [FEEDBACK] Transitioning Aurora leader election to
> Apache
> >>
> >> > Curator (`-zk_use_curator`)
> >> >
> >> >
> >> >
> >> > Now that 0.15.0 has been released, I thought I'd check in on any
> >> progress
> >> > folks have made with testing/deploying the 0.14.0+ with the Aurora
> >> > Scheduler `-zk_use_curator` flag in-place.
> >> >
> >> > There has been 1 fix that will go out in the 0.16.0 release to reduce
> >> > logger noise on shutdown [1][2] but I have heard no negative (or
> >> positive)
> >> > feedback otherwise.
> >> >
> >> >
> >> >
> >> > [1] https://issues.apache.org/jira/browse/AURORA-1729
> >> >
> >> > [2] https://reviews.apache.org/r/49578/
> >> >
> >> >
> >> >
> >> > On Thu, Jun 16, 2016 at 1:18 PM, John Sirois <js...@apache.org>
> >> wrote:
> >> >
> >> >
> >> >
> >> >
> >> >
> >> > On Thu, Jun 16, 2016 at 12:03 AM, Martin Hrabovčin <
> >> > martin.hrabovcin@gmail.com> wrote:
> >> >
> >> > How should be this flag rolled to existing running cluster? Can it be
> >> done
> >> > using rolling update instance by instance or we need to stop the whole
> >> > cluster and then bring all nodes with new flag?
> >> >
> >> >
> >> >
> >> > I recommend a whole cluster down, upgrade +  new flag, up.
> >> >
> >> >
> >> >
> >> > A rolling update should work, but will likely be rocky.  My analysis:
> >> >
> >> >
> >> >
> >> > The Aurora leader election consists of 2 components, the actual leader
> >> > election and the resulting advertisement by the leader of itself as
> the
> >> > Aurora service endpoint.  These 2 components each use zookeeper and of
> >> the
> >> > 2 I only ensured that the advertisement was compatible with old
> releases
> >> > (old clients). The leader election portion is completely internal to
> the
> >> > Aurora scheduler instances vying for leadership and, under Curator,
> >> uses a
> >> > different (enhanced), zookeeper node scheme.  As a result, this is
> what
> >> > could happen in a slow roll:
> >> >
> >> >
> >> >
> >> > before upgrade: 0: old-lead, 1: old-follow, 2: old-follow
> >> >
> >> > upgrade 0: new-lead, 1: old-lead, 2: old-follow
> >> >
> >> >
> >> >
> >> > Here, node 0 will see itself as leader and nodes 1 and 2 will see
> node 1
> >> > as leader. The result will be both node 0 and node 1 attempting to
> read
> >> the
> >> > mesos distributed log.  Now the log uses its own leader election and
> the
> >> > reader must be the leader as things stand, so the Aurora-level
> >> leadership
> >> > "tie" will be broken by one of the 2 Aurora-level leaders failing to
> >> become
> >> > the mesos distributed log leader, and that node will restart its
> >> lifecycle
> >> > - ie flap.  This will continue to be the case with second node upgrade
> >> and
> >> > will not stabilize until the 3rd node is upgraded.
> >> >
> >> >
> >> >
> >> >
> >> >
> >> > 2016-06-16 5:03 GMT+02:00 Jake Farrell <jf...@apache.org>:
> >> >
> >> > +1, will enable on our test clusters to help verify
> >> >
> >> > -Jake
> >> >
> >> >
> >> > On Tue, Jun 14, 2016 at 7:43 PM, John Sirois <js...@apache.org>
> >> wrote:
> >> >
> >> > > I'd like to move forward with
> >> > > https://issues.apache.org/jira/browse/AURORA-1669 asap; ie:
> removing
> >> > > legacy
> >> > > (Twitter) commons zookeeper libraries used for Aurora leader
> election
> >> in
> >> > > favor of Apache Curator libraries. The change submitted in
> >> > > https://reviews.apache.org/r/46286/ is now live in Aurora 0.14.0
> and
> >> > > Apache
> >> > > Curator based service discovery can be enabled with the Aurora
> >> scheduler
> >> > > flag `-zk_use_curator`.  I'd like feedback from users who enable
> this
> >> > > option.  If you have a test cluster where you can enable
> >> > `-zk_use_curator`
> >> > > and exercise leader failure and failover, I'd be grateful. If you
> have
> >> > > moved to using this option in production with demonstrable
> >> improvements
> >> > or
> >> > > even maintenance of status quo, I'd also be grateful for this news.
> If
> >> > > you've found regressions or new bugs, I'd love to know about those
> as
> >> > well.
> >> > >
> >> > > Thanks in advance to all those who find time to test this out on
> real
> >> > > systems!
> >> > >
> >> >
> >> >
> >> >
> >> >
> >> >
> >> >
> >> >
> >> >
> >>
> >
> >
>

Re: [FEEDBACK] Transitioning Aurora leader election to Apache Curator (`-zk_use_curator`)

Posted by John Sirois <js...@apache.org>.

Thanks for the feedback folks! I'll post a flag default switch shortly.

On Wed, Aug 24, 2016 at 12:20 PM, Joshua Cohen <jc...@apache.org> wrote:

> I have this enabled in a test cluster and have not noticed any issues with
> it yet. I'd like to roll it out to production before we drop the old code
> though.
>

Agreed.  This deserves caution, and fwict the jvm leader code is ~never in
the refactor path; so even though I too am eager to delete the code, it is
not an active refactoring burden.


> On Wed, Aug 24, 2016 at 1:10 PM, Zameer Manji <zm...@apache.org> wrote:
>
>> Could we change the default and drop the old code at the same time? I
>> don't
>> see any benefit of letting that hang around.
>>
>> I have not tested this code yet, but I hope to do it soon.
>>
>> On Wed, Aug 24, 2016 at 5:19 AM, Erb, Stephan <
>> Stephan.Erb@blue-yonder.com>
>> wrote:
>>
>> > The curator backend has been working well for us so far. I believe it is
>> > safe to make it the default for the next release, and to drop the old
>> code
>> > in the release after that.
>> >
>> >
>> >
>> > *From: *John Sirois <js...@apache.org>
>> > *Reply-To: *"user@aurora.apache.org" <us...@aurora.apache.org>, "
>> > jsirois@apache.org" <js...@apache.org>
>> > *Date: *Thursday 7 July 2016 at 01:13
>> > *To: *Martin Hrabovčin <ma...@gmail.com>
>> > *Cc: *"dev@aurora.apache.org" <de...@aurora.apache.org>, Jake Farrell <
>> > jfarrell@apache.org>, "user@aurora.apache.org" <us...@aurora.apache.org>
>> > *Subject: *Re: [FEEDBACK] Transitioning Aurora leader election to Apache
>>
>> > Curator (`-zk_use_curator`)
>> >
>> >
>> >
>> > Now that 0.15.0 has been released, I thought I'd check in on any
>> progress
>> > folks have made with testing/deploying the 0.14.0+ with the Aurora
>> > Scheduler `-zk_use_curator` flag in-place.
>> >
>> > There has been 1 fix that will go out in the 0.16.0 release to reduce
>> > logger noise on shutdown [1][2] but I have heard no negative (or
>> positive)
>> > feedback otherwise.
>> >
>> >
>> >
>> > [1] https://issues.apache.org/jira/browse/AURORA-1729
>> >
>> > [2] https://reviews.apache.org/r/49578/
>> >
>> >
>> >
>> > On Thu, Jun 16, 2016 at 1:18 PM, John Sirois <js...@apache.org>
>> wrote:
>> >
>> >
>> >
>> >
>> >
>> > On Thu, Jun 16, 2016 at 12:03 AM, Martin Hrabovčin <
>> > martin.hrabovcin@gmail.com> wrote:
>> >
>> > How should be this flag rolled to existing running cluster? Can it be
>> done
>> > using rolling update instance by instance or we need to stop the whole
>> > cluster and then bring all nodes with new flag?
>> >
>> >
>> >
>> > I recommend a whole cluster down, upgrade +  new flag, up.
>> >
>> >
>> >
>> > A rolling update should work, but will likely be rocky.  My analysis:
>> >
>> >
>> >
>> > The Aurora leader election consists of 2 components, the actual leader
>> > election and the resulting advertisement by the leader of itself as the
>> > Aurora service endpoint.  These 2 components each use zookeeper and of
>> the
>> > 2 I only ensured that the advertisement was compatible with old releases
>> > (old clients). The leader election portion is completely internal to the
>> > Aurora scheduler instances vying for leadership and, under Curator,
>> uses a
>> > different (enhanced), zookeeper node scheme.  As a result, this is what
>> > could happen in a slow roll:
>> >
>> >
>> >
>> > before upgrade: 0: old-lead, 1: old-follow, 2: old-follow
>> >
>> > upgrade 0: new-lead, 1: old-lead, 2: old-follow
>> >
>> >
>> >
>> > Here, node 0 will see itself as leader and nodes 1 and 2 will see node 1
>> > as leader. The result will be both node 0 and node 1 attempting to read
>> the
>> > mesos distributed log.  Now the log uses its own leader election and the
>> > reader must be the leader as things stand, so the Aurora-level
>> leadership
>> > "tie" will be broken by one of the 2 Aurora-level leaders failing to
>> become
>> > the mesos distributed log leader, and that node will restart its
>> lifecycle
>> > - ie flap.  This will continue to be the case with second node upgrade
>> and
>> > will not stabilize until the 3rd node is upgraded.
>> >
>> >
>> >
>> >
>> >
>> > 2016-06-16 5:03 GMT+02:00 Jake Farrell <jf...@apache.org>:
>> >
>> > +1, will enable on our test clusters to help verify
>> >
>> > -Jake
>> >
>> >
>> > On Tue, Jun 14, 2016 at 7:43 PM, John Sirois <js...@apache.org>
>> wrote:
>> >
>> > > I'd like to move forward with
>> > > https://issues.apache.org/jira/browse/AURORA-1669 asap; ie: removing
>> > > legacy
>> > > (Twitter) commons zookeeper libraries used for Aurora leader election
>> in
>> > > favor of Apache Curator libraries. The change submitted in
>> > > https://reviews.apache.org/r/46286/ is now live in Aurora 0.14.0 and
>> > > Apache
>> > > Curator based service discovery can be enabled with the Aurora
>> scheduler
>> > > flag `-zk_use_curator`.  I'd like feedback from users who enable this
>> > > option.  If you have a test cluster where you can enable
>> > `-zk_use_curator`
>> > > and exercise leader failure and failover, I'd be grateful. If you have
>> > > moved to using this option in production with demonstrable
>> improvements
>> > or
>> > > even maintenance of status quo, I'd also be grateful for this news. If
>> > > you've found regressions or new bugs, I'd love to know about those as
>> > well.
>> > >
>> > > Thanks in advance to all those who find time to test this out on real
>> > > systems!
>> > >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>>
>
>

Re: [FEEDBACK] Transitioning Aurora leader election to Apache Curator (`-zk_use_curator`)

Posted by John Sirois <js...@apache.org>.

Thanks for the feedback folks! I'll post a flag default switch shortly.

On Wed, Aug 24, 2016 at 12:20 PM, Joshua Cohen <jc...@apache.org> wrote:

> I have this enabled in a test cluster and have not noticed any issues with
> it yet. I'd like to roll it out to production before we drop the old code
> though.
>

Agreed.  This deserves caution, and fwict the jvm leader code is ~never in
the refactor path; so even though I too am eager to delete the code, it is
not an active refactoring burden.


> On Wed, Aug 24, 2016 at 1:10 PM, Zameer Manji <zm...@apache.org> wrote:
>
>> Could we change the default and drop the old code at the same time? I
>> don't
>> see any benefit of letting that hang around.
>>
>> I have not tested this code yet, but I hope to do it soon.
>>
>> On Wed, Aug 24, 2016 at 5:19 AM, Erb, Stephan <
>> Stephan.Erb@blue-yonder.com>
>> wrote:
>>
>> > The curator backend has been working well for us so far. I believe it is
>> > safe to make it the default for the next release, and to drop the old
>> code
>> > in the release after that.
>> >
>> >
>> >
>> > *From: *John Sirois <js...@apache.org>
>> > *Reply-To: *"user@aurora.apache.org" <us...@aurora.apache.org>, "
>> > jsirois@apache.org" <js...@apache.org>
>> > *Date: *Thursday 7 July 2016 at 01:13
>> > *To: *Martin Hrabovčin <ma...@gmail.com>
>> > *Cc: *"dev@aurora.apache.org" <de...@aurora.apache.org>, Jake Farrell <
>> > jfarrell@apache.org>, "user@aurora.apache.org" <us...@aurora.apache.org>
>> > *Subject: *Re: [FEEDBACK] Transitioning Aurora leader election to Apache
>>
>> > Curator (`-zk_use_curator`)
>> >
>> >
>> >
>> > Now that 0.15.0 has been released, I thought I'd check in on any
>> progress
>> > folks have made with testing/deploying the 0.14.0+ with the Aurora
>> > Scheduler `-zk_use_curator` flag in-place.
>> >
>> > There has been 1 fix that will go out in the 0.16.0 release to reduce
>> > logger noise on shutdown [1][2] but I have heard no negative (or
>> positive)
>> > feedback otherwise.
>> >
>> >
>> >
>> > [1] https://issues.apache.org/jira/browse/AURORA-1729
>> >
>> > [2] https://reviews.apache.org/r/49578/
>> >
>> >
>> >
>> > On Thu, Jun 16, 2016 at 1:18 PM, John Sirois <js...@apache.org>
>> wrote:
>> >
>> >
>> >
>> >
>> >
>> > On Thu, Jun 16, 2016 at 12:03 AM, Martin Hrabovčin <
>> > martin.hrabovcin@gmail.com> wrote:
>> >
>> > How should be this flag rolled to existing running cluster? Can it be
>> done
>> > using rolling update instance by instance or we need to stop the whole
>> > cluster and then bring all nodes with new flag?
>> >
>> >
>> >
>> > I recommend a whole cluster down, upgrade +  new flag, up.
>> >
>> >
>> >
>> > A rolling update should work, but will likely be rocky.  My analysis:
>> >
>> >
>> >
>> > The Aurora leader election consists of 2 components, the actual leader
>> > election and the resulting advertisement by the leader of itself as the
>> > Aurora service endpoint.  These 2 components each use zookeeper and of
>> the
>> > 2 I only ensured that the advertisement was compatible with old releases
>> > (old clients). The leader election portion is completely internal to the
>> > Aurora scheduler instances vying for leadership and, under Curator,
>> uses a
>> > different (enhanced), zookeeper node scheme.  As a result, this is what
>> > could happen in a slow roll:
>> >
>> >
>> >
>> > before upgrade: 0: old-lead, 1: old-follow, 2: old-follow
>> >
>> > upgrade 0: new-lead, 1: old-lead, 2: old-follow
>> >
>> >
>> >
>> > Here, node 0 will see itself as leader and nodes 1 and 2 will see node 1
>> > as leader. The result will be both node 0 and node 1 attempting to read
>> the
>> > mesos distributed log.  Now the log uses its own leader election and the
>> > reader must be the leader as things stand, so the Aurora-level
>> leadership
>> > "tie" will be broken by one of the 2 Aurora-level leaders failing to
>> become
>> > the mesos distributed log leader, and that node will restart its
>> lifecycle
>> > - ie flap.  This will continue to be the case with second node upgrade
>> and
>> > will not stabilize until the 3rd node is upgraded.
>> >
>> >
>> >
>> >
>> >
>> > 2016-06-16 5:03 GMT+02:00 Jake Farrell <jf...@apache.org>:
>> >
>> > +1, will enable on our test clusters to help verify
>> >
>> > -Jake
>> >
>> >
>> > On Tue, Jun 14, 2016 at 7:43 PM, John Sirois <js...@apache.org>
>> wrote:
>> >
>> > > I'd like to move forward with
>> > > https://issues.apache.org/jira/browse/AURORA-1669 asap; ie: removing
>> > > legacy
>> > > (Twitter) commons zookeeper libraries used for Aurora leader election
>> in
>> > > favor of Apache Curator libraries. The change submitted in
>> > > https://reviews.apache.org/r/46286/ is now live in Aurora 0.14.0 and
>> > > Apache
>> > > Curator based service discovery can be enabled with the Aurora
>> scheduler
>> > > flag `-zk_use_curator`.  I'd like feedback from users who enable this
>> > > option.  If you have a test cluster where you can enable
>> > `-zk_use_curator`
>> > > and exercise leader failure and failover, I'd be grateful. If you have
>> > > moved to using this option in production with demonstrable
>> improvements
>> > or
>> > > even maintenance of status quo, I'd also be grateful for this news. If
>> > > you've found regressions or new bugs, I'd love to know about those as
>> > well.
>> > >
>> > > Thanks in advance to all those who find time to test this out on real
>> > > systems!
>> > >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>>
>
>

Re: [FEEDBACK] Transitioning Aurora leader election to Apache Curator (`-zk_use_curator`)

Posted by Joshua Cohen <jc...@apache.org>.

I have this enabled in a test cluster and have not noticed any issues with
it yet. I'd like to roll it out to production before we drop the old code
though.

On Wed, Aug 24, 2016 at 1:10 PM, Zameer Manji <zm...@apache.org> wrote:

> Could we change the default and drop the old code at the same time? I don't
> see any benefit of letting that hang around.
>
> I have not tested this code yet, but I hope to do it soon.
>
> On Wed, Aug 24, 2016 at 5:19 AM, Erb, Stephan <Stephan.Erb@blue-yonder.com
> >
> wrote:
>
> > The curator backend has been working well for us so far. I believe it is
> > safe to make it the default for the next release, and to drop the old
> code
> > in the release after that.
> >
> >
> >
> > *From: *John Sirois <js...@apache.org>
> > *Reply-To: *"user@aurora.apache.org" <us...@aurora.apache.org>, "
> > jsirois@apache.org" <js...@apache.org>
> > *Date: *Thursday 7 July 2016 at 01:13
> > *To: *Martin Hrabovčin <ma...@gmail.com>
> > *Cc: *"dev@aurora.apache.org" <de...@aurora.apache.org>, Jake Farrell <
> > jfarrell@apache.org>, "user@aurora.apache.org" <us...@aurora.apache.org>
> > *Subject: *Re: [FEEDBACK] Transitioning Aurora leader election to Apache
> > Curator (`-zk_use_curator`)
> >
> >
> >
> > Now that 0.15.0 has been released, I thought I'd check in on any progress
> > folks have made with testing/deploying the 0.14.0+ with the Aurora
> > Scheduler `-zk_use_curator` flag in-place.
> >
> > There has been 1 fix that will go out in the 0.16.0 release to reduce
> > logger noise on shutdown [1][2] but I have heard no negative (or
> positive)
> > feedback otherwise.
> >
> >
> >
> > [1] https://issues.apache.org/jira/browse/AURORA-1729
> >
> > [2] https://reviews.apache.org/r/49578/
> >
> >
> >
> > On Thu, Jun 16, 2016 at 1:18 PM, John Sirois <js...@apache.org> wrote:
> >
> >
> >
> >
> >
> > On Thu, Jun 16, 2016 at 12:03 AM, Martin Hrabovčin <
> > martin.hrabovcin@gmail.com> wrote:
> >
> > How should be this flag rolled to existing running cluster? Can it be
> done
> > using rolling update instance by instance or we need to stop the whole
> > cluster and then bring all nodes with new flag?
> >
> >
> >
> > I recommend a whole cluster down, upgrade +  new flag, up.
> >
> >
> >
> > A rolling update should work, but will likely be rocky.  My analysis:
> >
> >
> >
> > The Aurora leader election consists of 2 components, the actual leader
> > election and the resulting advertisement by the leader of itself as the
> > Aurora service endpoint.  These 2 components each use zookeeper and of
> the
> > 2 I only ensured that the advertisement was compatible with old releases
> > (old clients). The leader election portion is completely internal to the
> > Aurora scheduler instances vying for leadership and, under Curator, uses
> a
> > different (enhanced), zookeeper node scheme.  As a result, this is what
> > could happen in a slow roll:
> >
> >
> >
> > before upgrade: 0: old-lead, 1: old-follow, 2: old-follow
> >
> > upgrade 0: new-lead, 1: old-lead, 2: old-follow
> >
> >
> >
> > Here, node 0 will see itself as leader and nodes 1 and 2 will see node 1
> > as leader. The result will be both node 0 and node 1 attempting to read
> the
> > mesos distributed log.  Now the log uses its own leader election and the
> > reader must be the leader as things stand, so the Aurora-level leadership
> > "tie" will be broken by one of the 2 Aurora-level leaders failing to
> become
> > the mesos distributed log leader, and that node will restart its
> lifecycle
> > - ie flap.  This will continue to be the case with second node upgrade
> and
> > will not stabilize until the 3rd node is upgraded.
> >
> >
> >
> >
> >
> > 2016-06-16 5:03 GMT+02:00 Jake Farrell <jf...@apache.org>:
> >
> > +1, will enable on our test clusters to help verify
> >
> > -Jake
> >
> >
> > On Tue, Jun 14, 2016 at 7:43 PM, John Sirois <js...@apache.org> wrote:
> >
> > > I'd like to move forward with
> > > https://issues.apache.org/jira/browse/AURORA-1669 asap; ie: removing
> > > legacy
> > > (Twitter) commons zookeeper libraries used for Aurora leader election
> in
> > > favor of Apache Curator libraries. The change submitted in
> > > https://reviews.apache.org/r/46286/ is now live in Aurora 0.14.0 and
> > > Apache
> > > Curator based service discovery can be enabled with the Aurora
> scheduler
> > > flag `-zk_use_curator`.  I'd like feedback from users who enable this
> > > option.  If you have a test cluster where you can enable
> > `-zk_use_curator`
> > > and exercise leader failure and failover, I'd be grateful. If you have
> > > moved to using this option in production with demonstrable improvements
> > or
> > > even maintenance of status quo, I'd also be grateful for this news. If
> > > you've found regressions or new bugs, I'd love to know about those as
> > well.
> > >
> > > Thanks in advance to all those who find time to test this out on real
> > > systems!
> > >
> >
> >
> >
> >
> >
> >
> >
> >
>

Re: [FEEDBACK] Transitioning Aurora leader election to Apache Curator (`-zk_use_curator`)

Posted by Joshua Cohen <jc...@apache.org>.

I have this enabled in a test cluster and have not noticed any issues with
it yet. I'd like to roll it out to production before we drop the old code
though.

On Wed, Aug 24, 2016 at 1:10 PM, Zameer Manji <zm...@apache.org> wrote:

> Could we change the default and drop the old code at the same time? I don't
> see any benefit of letting that hang around.
>
> I have not tested this code yet, but I hope to do it soon.
>
> On Wed, Aug 24, 2016 at 5:19 AM, Erb, Stephan <Stephan.Erb@blue-yonder.com
> >
> wrote:
>
> > The curator backend has been working well for us so far. I believe it is
> > safe to make it the default for the next release, and to drop the old
> code
> > in the release after that.
> >
> >
> >
> > *From: *John Sirois <js...@apache.org>
> > *Reply-To: *"user@aurora.apache.org" <us...@aurora.apache.org>, "
> > jsirois@apache.org" <js...@apache.org>
> > *Date: *Thursday 7 July 2016 at 01:13
> > *To: *Martin Hrabovčin <ma...@gmail.com>
> > *Cc: *"dev@aurora.apache.org" <de...@aurora.apache.org>, Jake Farrell <
> > jfarrell@apache.org>, "user@aurora.apache.org" <us...@aurora.apache.org>
> > *Subject: *Re: [FEEDBACK] Transitioning Aurora leader election to Apache
> > Curator (`-zk_use_curator`)
> >
> >
> >
> > Now that 0.15.0 has been released, I thought I'd check in on any progress
> > folks have made with testing/deploying the 0.14.0+ with the Aurora
> > Scheduler `-zk_use_curator` flag in-place.
> >
> > There has been 1 fix that will go out in the 0.16.0 release to reduce
> > logger noise on shutdown [1][2] but I have heard no negative (or
> positive)
> > feedback otherwise.
> >
> >
> >
> > [1] https://issues.apache.org/jira/browse/AURORA-1729
> >
> > [2] https://reviews.apache.org/r/49578/
> >
> >
> >
> > On Thu, Jun 16, 2016 at 1:18 PM, John Sirois <js...@apache.org> wrote:
> >
> >
> >
> >
> >
> > On Thu, Jun 16, 2016 at 12:03 AM, Martin Hrabovčin <
> > martin.hrabovcin@gmail.com> wrote:
> >
> > How should be this flag rolled to existing running cluster? Can it be
> done
> > using rolling update instance by instance or we need to stop the whole
> > cluster and then bring all nodes with new flag?
> >
> >
> >
> > I recommend a whole cluster down, upgrade +  new flag, up.
> >
> >
> >
> > A rolling update should work, but will likely be rocky.  My analysis:
> >
> >
> >
> > The Aurora leader election consists of 2 components, the actual leader
> > election and the resulting advertisement by the leader of itself as the
> > Aurora service endpoint.  These 2 components each use zookeeper and of
> the
> > 2 I only ensured that the advertisement was compatible with old releases
> > (old clients). The leader election portion is completely internal to the
> > Aurora scheduler instances vying for leadership and, under Curator, uses
> a
> > different (enhanced), zookeeper node scheme.  As a result, this is what
> > could happen in a slow roll:
> >
> >
> >
> > before upgrade: 0: old-lead, 1: old-follow, 2: old-follow
> >
> > upgrade 0: new-lead, 1: old-lead, 2: old-follow
> >
> >
> >
> > Here, node 0 will see itself as leader and nodes 1 and 2 will see node 1
> > as leader. The result will be both node 0 and node 1 attempting to read
> the
> > mesos distributed log.  Now the log uses its own leader election and the
> > reader must be the leader as things stand, so the Aurora-level leadership
> > "tie" will be broken by one of the 2 Aurora-level leaders failing to
> become
> > the mesos distributed log leader, and that node will restart its
> lifecycle
> > - ie flap.  This will continue to be the case with second node upgrade
> and
> > will not stabilize until the 3rd node is upgraded.
> >
> >
> >
> >
> >
> > 2016-06-16 5:03 GMT+02:00 Jake Farrell <jf...@apache.org>:
> >
> > +1, will enable on our test clusters to help verify
> >
> > -Jake
> >
> >
> > On Tue, Jun 14, 2016 at 7:43 PM, John Sirois <js...@apache.org> wrote:
> >
> > > I'd like to move forward with
> > > https://issues.apache.org/jira/browse/AURORA-1669 asap; ie: removing
> > > legacy
> > > (Twitter) commons zookeeper libraries used for Aurora leader election
> in
> > > favor of Apache Curator libraries. The change submitted in
> > > https://reviews.apache.org/r/46286/ is now live in Aurora 0.14.0 and
> > > Apache
> > > Curator based service discovery can be enabled with the Aurora
> scheduler
> > > flag `-zk_use_curator`.  I'd like feedback from users who enable this
> > > option.  If you have a test cluster where you can enable
> > `-zk_use_curator`
> > > and exercise leader failure and failover, I'd be grateful. If you have
> > > moved to using this option in production with demonstrable improvements
> > or
> > > even maintenance of status quo, I'd also be grateful for this news. If
> > > you've found regressions or new bugs, I'd love to know about those as
> > well.
> > >
> > > Thanks in advance to all those who find time to test this out on real
> > > systems!
> > >
> >
> >
> >
> >
> >
> >
> >
> >
>

Re: [FEEDBACK] Transitioning Aurora leader election to Apache Curator (`-zk_use_curator`)

Posted by Zameer Manji <zm...@apache.org>.

Could we change the default and drop the old code at the same time? I don't
see any benefit of letting that hang around.

I have not tested this code yet, but I hope to do it soon.

On Wed, Aug 24, 2016 at 5:19 AM, Erb, Stephan <St...@blue-yonder.com>
wrote:

> The curator backend has been working well for us so far. I believe it is
> safe to make it the default for the next release, and to drop the old code
> in the release after that.
>
>
>
> *From: *John Sirois <js...@apache.org>
> *Reply-To: *"user@aurora.apache.org" <us...@aurora.apache.org>, "
> jsirois@apache.org" <js...@apache.org>
> *Date: *Thursday 7 July 2016 at 01:13
> *To: *Martin Hrabovčin <ma...@gmail.com>
> *Cc: *"dev@aurora.apache.org" <de...@aurora.apache.org>, Jake Farrell <
> jfarrell@apache.org>, "user@aurora.apache.org" <us...@aurora.apache.org>
> *Subject: *Re: [FEEDBACK] Transitioning Aurora leader election to Apache
> Curator (`-zk_use_curator`)
>
>
>
> Now that 0.15.0 has been released, I thought I'd check in on any progress
> folks have made with testing/deploying the 0.14.0+ with the Aurora
> Scheduler `-zk_use_curator` flag in-place.
>
> There has been 1 fix that will go out in the 0.16.0 release to reduce
> logger noise on shutdown [1][2] but I have heard no negative (or positive)
> feedback otherwise.
>
>
>
> [1] https://issues.apache.org/jira/browse/AURORA-1729
>
> [2] https://reviews.apache.org/r/49578/
>
>
>
> On Thu, Jun 16, 2016 at 1:18 PM, John Sirois <js...@apache.org> wrote:
>
>
>
>
>
> On Thu, Jun 16, 2016 at 12:03 AM, Martin Hrabovčin <
> martin.hrabovcin@gmail.com> wrote:
>
> How should be this flag rolled to existing running cluster? Can it be done
> using rolling update instance by instance or we need to stop the whole
> cluster and then bring all nodes with new flag?
>
>
>
> I recommend a whole cluster down, upgrade +  new flag, up.
>
>
>
> A rolling update should work, but will likely be rocky.  My analysis:
>
>
>
> The Aurora leader election consists of 2 components, the actual leader
> election and the resulting advertisement by the leader of itself as the
> Aurora service endpoint.  These 2 components each use zookeeper and of the
> 2 I only ensured that the advertisement was compatible with old releases
> (old clients). The leader election portion is completely internal to the
> Aurora scheduler instances vying for leadership and, under Curator, uses a
> different (enhanced), zookeeper node scheme.  As a result, this is what
> could happen in a slow roll:
>
>
>
> before upgrade: 0: old-lead, 1: old-follow, 2: old-follow
>
> upgrade 0: new-lead, 1: old-lead, 2: old-follow
>
>
>
> Here, node 0 will see itself as leader and nodes 1 and 2 will see node 1
> as leader. The result will be both node 0 and node 1 attempting to read the
> mesos distributed log.  Now the log uses its own leader election and the
> reader must be the leader as things stand, so the Aurora-level leadership
> "tie" will be broken by one of the 2 Aurora-level leaders failing to become
> the mesos distributed log leader, and that node will restart its lifecycle
> - ie flap.  This will continue to be the case with second node upgrade and
> will not stabilize until the 3rd node is upgraded.
>
>
>
>
>
> 2016-06-16 5:03 GMT+02:00 Jake Farrell <jf...@apache.org>:
>
> +1, will enable on our test clusters to help verify
>
> -Jake
>
>
> On Tue, Jun 14, 2016 at 7:43 PM, John Sirois <js...@apache.org> wrote:
>
> > I'd like to move forward with
> > https://issues.apache.org/jira/browse/AURORA-1669 asap; ie: removing
> > legacy
> > (Twitter) commons zookeeper libraries used for Aurora leader election in
> > favor of Apache Curator libraries. The change submitted in
> > https://reviews.apache.org/r/46286/ is now live in Aurora 0.14.0 and
> > Apache
> > Curator based service discovery can be enabled with the Aurora scheduler
> > flag `-zk_use_curator`.  I'd like feedback from users who enable this
> > option.  If you have a test cluster where you can enable
> `-zk_use_curator`
> > and exercise leader failure and failover, I'd be grateful. If you have
> > moved to using this option in production with demonstrable improvements
> or
> > even maintenance of status quo, I'd also be grateful for this news. If
> > you've found regressions or new bugs, I'd love to know about those as
> well.
> >
> > Thanks in advance to all those who find time to test this out on real
> > systems!
> >
>
>
>
>
>
>
>
>

Re: [FEEDBACK] Transitioning Aurora leader election to Apache Curator (`-zk_use_curator`)

Posted by Zameer Manji <zm...@apache.org>.

Could we change the default and drop the old code at the same time? I don't
see any benefit of letting that hang around.

I have not tested this code yet, but I hope to do it soon.

On Wed, Aug 24, 2016 at 5:19 AM, Erb, Stephan <St...@blue-yonder.com>
wrote:

> The curator backend has been working well for us so far. I believe it is
> safe to make it the default for the next release, and to drop the old code
> in the release after that.
>
>
>
> *From: *John Sirois <js...@apache.org>
> *Reply-To: *"user@aurora.apache.org" <us...@aurora.apache.org>, "
> jsirois@apache.org" <js...@apache.org>
> *Date: *Thursday 7 July 2016 at 01:13
> *To: *Martin Hrabovčin <ma...@gmail.com>
> *Cc: *"dev@aurora.apache.org" <de...@aurora.apache.org>, Jake Farrell <
> jfarrell@apache.org>, "user@aurora.apache.org" <us...@aurora.apache.org>
> *Subject: *Re: [FEEDBACK] Transitioning Aurora leader election to Apache
> Curator (`-zk_use_curator`)
>
>
>
> Now that 0.15.0 has been released, I thought I'd check in on any progress
> folks have made with testing/deploying the 0.14.0+ with the Aurora
> Scheduler `-zk_use_curator` flag in-place.
>
> There has been 1 fix that will go out in the 0.16.0 release to reduce
> logger noise on shutdown [1][2] but I have heard no negative (or positive)
> feedback otherwise.
>
>
>
> [1] https://issues.apache.org/jira/browse/AURORA-1729
>
> [2] https://reviews.apache.org/r/49578/
>
>
>
> On Thu, Jun 16, 2016 at 1:18 PM, John Sirois <js...@apache.org> wrote:
>
>
>
>
>
> On Thu, Jun 16, 2016 at 12:03 AM, Martin Hrabovčin <
> martin.hrabovcin@gmail.com> wrote:
>
> How should be this flag rolled to existing running cluster? Can it be done
> using rolling update instance by instance or we need to stop the whole
> cluster and then bring all nodes with new flag?
>
>
>
> I recommend a whole cluster down, upgrade +  new flag, up.
>
>
>
> A rolling update should work, but will likely be rocky.  My analysis:
>
>
>
> The Aurora leader election consists of 2 components, the actual leader
> election and the resulting advertisement by the leader of itself as the
> Aurora service endpoint.  These 2 components each use zookeeper and of the
> 2 I only ensured that the advertisement was compatible with old releases
> (old clients). The leader election portion is completely internal to the
> Aurora scheduler instances vying for leadership and, under Curator, uses a
> different (enhanced), zookeeper node scheme.  As a result, this is what
> could happen in a slow roll:
>
>
>
> before upgrade: 0: old-lead, 1: old-follow, 2: old-follow
>
> upgrade 0: new-lead, 1: old-lead, 2: old-follow
>
>
>
> Here, node 0 will see itself as leader and nodes 1 and 2 will see node 1
> as leader. The result will be both node 0 and node 1 attempting to read the
> mesos distributed log.  Now the log uses its own leader election and the
> reader must be the leader as things stand, so the Aurora-level leadership
> "tie" will be broken by one of the 2 Aurora-level leaders failing to become
> the mesos distributed log leader, and that node will restart its lifecycle
> - ie flap.  This will continue to be the case with second node upgrade and
> will not stabilize until the 3rd node is upgraded.
>
>
>
>
>
> 2016-06-16 5:03 GMT+02:00 Jake Farrell <jf...@apache.org>:
>
> +1, will enable on our test clusters to help verify
>
> -Jake
>
>
> On Tue, Jun 14, 2016 at 7:43 PM, John Sirois <js...@apache.org> wrote:
>
> > I'd like to move forward with
> > https://issues.apache.org/jira/browse/AURORA-1669 asap; ie: removing
> > legacy
> > (Twitter) commons zookeeper libraries used for Aurora leader election in
> > favor of Apache Curator libraries. The change submitted in
> > https://reviews.apache.org/r/46286/ is now live in Aurora 0.14.0 and
> > Apache
> > Curator based service discovery can be enabled with the Aurora scheduler
> > flag `-zk_use_curator`.  I'd like feedback from users who enable this
> > option.  If you have a test cluster where you can enable
> `-zk_use_curator`
> > and exercise leader failure and failover, I'd be grateful. If you have
> > moved to using this option in production with demonstrable improvements
> or
> > even maintenance of status quo, I'd also be grateful for this news. If
> > you've found regressions or new bugs, I'd love to know about those as
> well.
> >
> > Thanks in advance to all those who find time to test this out on real
> > systems!
> >
>
>
>
>
>
>
>
>