You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@aurora.apache.org by Cody G <co...@gmail.com> on 2017/03/03 03:40:55 UTC

Idea: rolling restarts in Aurora

Hi all,

I'd like to implement some new functionality in Aurora allowing for rolling
job restarts. There are many reasons why we might need to restart a job,
e.g. freeing instances of a job from deadlock or refreshing some sort of
external configuration.

Currently, there are two options to execute a rolling restart, however both
are undesirable — either use the restartShards endpoint and implement
batching client-side, or use startJobUpdate with slightly modified task
config so that a non-empty job diff forces an update. I propose adding a
new thrift RPC for launching a rolling restart, which is an interface
around the existing upgrade logic. Instead of requiring a TaskConfig and
instanceCount, this restart endpoint will only accept JobUpdateSettings and
will simply launch an update with the currently used task configuration.
All of the existing job update RPCs will still be able to access updates
which were launched from this restart endpoint. This ensures restarts are
available in the UI and no additional storage changes are required.

If this proposal seems reasonable, I’ll file a ticket and draft up a more
detailed RFC for further review.

Cody

Re: Idea: rolling restarts in Aurora

Posted by Cody G <co...@gmail.com>.
I've drafted a small design document for this change:

https://docs.google.com/document/d/13xm23SfIRy5zMro82Ok8dRCsr7lKcC0_UUO_tJX21wQ/edit?usp=sharing

Any feedback would be greatly appreciated!

On Tue, Mar 7, 2017 at 11:15 AM, Cody G <co...@gmail.com> wrote:

> Created a ticket https://issues.apache.org/jira/browse/AURORA-1900 and
> assigned to myself.
>
> On Fri, Mar 3, 2017 at 11:29 AM, David McLaughlin <dm...@apache.org>
> wrote:
>
>> +1 for thinner client.
>>
>> Another reason rolling update was moved to the Scheduler was to have an
>> audit trail of changes to the job. If we could also get these restarts
>> appearing on the job page, it would be great.
>>
>> On Fri, Mar 3, 2017 at 11:15 AM, Zameer Manji <zm...@apache.org> wrote:
>>
>> > +1
>> >
>> > If I recall correctly, the rolling update mechanism was added to Aurora
>> > because having the client coordinate batching was pretty tricky. I think
>> > the same applies here to a rolling restart.
>> >
>> > Considering the job controller technically supports this, adding a new
>> RPC
>> > to expose this behaviour would be beneficial.
>> >
>> > On Thu, Mar 2, 2017 at 7:40 PM, Cody G <co...@gmail.com> wrote:
>> >
>> > > Hi all,
>> > >
>> > > I'd like to implement some new functionality in Aurora allowing for
>> > rolling
>> > > job restarts. There are many reasons why we might need to restart a
>> job,
>> > > e.g. freeing instances of a job from deadlock or refreshing some sort
>> of
>> > > external configuration.
>> > >
>> > > Currently, there are two options to execute a rolling restart, however
>> > both
>> > > are undesirable — either use the restartShards endpoint and implement
>> > > batching client-side, or use startJobUpdate with slightly modified
>> task
>> > > config so that a non-empty job diff forces an update. I propose
>> adding a
>> > > new thrift RPC for launching a rolling restart, which is an interface
>> > > around the existing upgrade logic. Instead of requiring a TaskConfig
>> and
>> > > instanceCount, this restart endpoint will only accept
>> JobUpdateSettings
>> > and
>> > > will simply launch an update with the currently used task
>> configuration.
>> > > All of the existing job update RPCs will still be able to access
>> updates
>> > > which were launched from this restart endpoint. This ensures restarts
>> are
>> > > available in the UI and no additional storage changes are required.
>> > >
>> > > If this proposal seems reasonable, I’ll file a ticket and draft up a
>> more
>> > > detailed RFC for further review.
>> > >
>> > > Cody
>> > >
>> > > --
>> > > Zameer Manji
>> > >
>> >
>>
>
>

Re: Idea: rolling restarts in Aurora

Posted by Cody G <co...@gmail.com>.
Created a ticket https://issues.apache.org/jira/browse/AURORA-1900 and
assigned to myself.

On Fri, Mar 3, 2017 at 11:29 AM, David McLaughlin <dm...@apache.org>
wrote:

> +1 for thinner client.
>
> Another reason rolling update was moved to the Scheduler was to have an
> audit trail of changes to the job. If we could also get these restarts
> appearing on the job page, it would be great.
>
> On Fri, Mar 3, 2017 at 11:15 AM, Zameer Manji <zm...@apache.org> wrote:
>
> > +1
> >
> > If I recall correctly, the rolling update mechanism was added to Aurora
> > because having the client coordinate batching was pretty tricky. I think
> > the same applies here to a rolling restart.
> >
> > Considering the job controller technically supports this, adding a new
> RPC
> > to expose this behaviour would be beneficial.
> >
> > On Thu, Mar 2, 2017 at 7:40 PM, Cody G <co...@gmail.com> wrote:
> >
> > > Hi all,
> > >
> > > I'd like to implement some new functionality in Aurora allowing for
> > rolling
> > > job restarts. There are many reasons why we might need to restart a
> job,
> > > e.g. freeing instances of a job from deadlock or refreshing some sort
> of
> > > external configuration.
> > >
> > > Currently, there are two options to execute a rolling restart, however
> > both
> > > are undesirable — either use the restartShards endpoint and implement
> > > batching client-side, or use startJobUpdate with slightly modified task
> > > config so that a non-empty job diff forces an update. I propose adding
> a
> > > new thrift RPC for launching a rolling restart, which is an interface
> > > around the existing upgrade logic. Instead of requiring a TaskConfig
> and
> > > instanceCount, this restart endpoint will only accept JobUpdateSettings
> > and
> > > will simply launch an update with the currently used task
> configuration.
> > > All of the existing job update RPCs will still be able to access
> updates
> > > which were launched from this restart endpoint. This ensures restarts
> are
> > > available in the UI and no additional storage changes are required.
> > >
> > > If this proposal seems reasonable, I’ll file a ticket and draft up a
> more
> > > detailed RFC for further review.
> > >
> > > Cody
> > >
> > > --
> > > Zameer Manji
> > >
> >
>

Re: Idea: rolling restarts in Aurora

Posted by David McLaughlin <dm...@apache.org>.
+1 for thinner client.

Another reason rolling update was moved to the Scheduler was to have an
audit trail of changes to the job. If we could also get these restarts
appearing on the job page, it would be great.

On Fri, Mar 3, 2017 at 11:15 AM, Zameer Manji <zm...@apache.org> wrote:

> +1
>
> If I recall correctly, the rolling update mechanism was added to Aurora
> because having the client coordinate batching was pretty tricky. I think
> the same applies here to a rolling restart.
>
> Considering the job controller technically supports this, adding a new RPC
> to expose this behaviour would be beneficial.
>
> On Thu, Mar 2, 2017 at 7:40 PM, Cody G <co...@gmail.com> wrote:
>
> > Hi all,
> >
> > I'd like to implement some new functionality in Aurora allowing for
> rolling
> > job restarts. There are many reasons why we might need to restart a job,
> > e.g. freeing instances of a job from deadlock or refreshing some sort of
> > external configuration.
> >
> > Currently, there are two options to execute a rolling restart, however
> both
> > are undesirable — either use the restartShards endpoint and implement
> > batching client-side, or use startJobUpdate with slightly modified task
> > config so that a non-empty job diff forces an update. I propose adding a
> > new thrift RPC for launching a rolling restart, which is an interface
> > around the existing upgrade logic. Instead of requiring a TaskConfig and
> > instanceCount, this restart endpoint will only accept JobUpdateSettings
> and
> > will simply launch an update with the currently used task configuration.
> > All of the existing job update RPCs will still be able to access updates
> > which were launched from this restart endpoint. This ensures restarts are
> > available in the UI and no additional storage changes are required.
> >
> > If this proposal seems reasonable, I’ll file a ticket and draft up a more
> > detailed RFC for further review.
> >
> > Cody
> >
> > --
> > Zameer Manji
> >
>

Re: Idea: rolling restarts in Aurora

Posted by Zameer Manji <zm...@apache.org>.
+1

If I recall correctly, the rolling update mechanism was added to Aurora
because having the client coordinate batching was pretty tricky. I think
the same applies here to a rolling restart.

Considering the job controller technically supports this, adding a new RPC
to expose this behaviour would be beneficial.

On Thu, Mar 2, 2017 at 7:40 PM, Cody G <co...@gmail.com> wrote:

> Hi all,
>
> I'd like to implement some new functionality in Aurora allowing for rolling
> job restarts. There are many reasons why we might need to restart a job,
> e.g. freeing instances of a job from deadlock or refreshing some sort of
> external configuration.
>
> Currently, there are two options to execute a rolling restart, however both
> are undesirable — either use the restartShards endpoint and implement
> batching client-side, or use startJobUpdate with slightly modified task
> config so that a non-empty job diff forces an update. I propose adding a
> new thrift RPC for launching a rolling restart, which is an interface
> around the existing upgrade logic. Instead of requiring a TaskConfig and
> instanceCount, this restart endpoint will only accept JobUpdateSettings and
> will simply launch an update with the currently used task configuration.
> All of the existing job update RPCs will still be able to access updates
> which were launched from this restart endpoint. This ensures restarts are
> available in the UI and no additional storage changes are required.
>
> If this proposal seems reasonable, I’ll file a ticket and draft up a more
> detailed RFC for further review.
>
> Cody
>
> --
> Zameer Manji
>