You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@aurora.apache.org by Bill Farner <wf...@apache.org> on 2014/07/25 20:41:45 UTC

Propsal: Centralizing update orchestration in Aurora

Hi all,

Rolling updates of services is a crucial feature in Aurora. As such, we
want to take great care when changing its behavior. Today, Aurora operates
by delegating this functionality to the client (or any API client, for that
matter). While this has provided a nice abstraction, it turns out there are
some shortcomings with this approach:

  1. Visibility: since the scheduler does not know about updates, it cannot
display useful information about an in-progress update
  2. Visibility: for two users to diagnose a failed update, they must be at
the same terminal, or copy/paste terminal output
  3. Usability: the scheduler has no means to show information about how an
application's packages or configuration changed over time
  4. Usability: update orchestration in the client means a lost connection
to the scheduler halts an update

Some of the above issues can be addressed by moving update orchestration to
a service external to the scheduler. At first glance, this approach is
attractive, as there is a firm separation of concerns. However, there are a
few pitfalls with this approach:

  1. Usability: setup and maintenance of an aurora cluster becomes even
more complicated (additional service + storage system)
  2. Usability: the user interface becomes more complicated to stitch
together, as end-users really should only have to visit one website to view
job information.
  3. Complexity: implementing a new production-ready service from scratch
will take a non-trivial amount of time

With these issues in mind, I propose that the scheduler take over the
responsibility of application update orchestration. This will allow us to
solve the current design shortcomings, without the pitfalls of the separate
service approach.

I'm interested in thoughts others have on this. Does the reasoning seem
sound? Are there things i'm missing?


-=Bill

Re: Propsal: Centralizing update orchestration in Aurora

Posted by Bill Farner <wf...@apache.org>.

Thanks for chiming in, everyone.  We will be tracking the work
with AURORA-610 [1].


[1] https://issues.apache.org/jira/browse/AURORA-610


-=Bill


On Fri, Jul 25, 2014 at 6:18 PM, Maxim Khutornenko <ma...@apache.org> wrote:

> Thanks for clarifying. Makes sense to me.
>
> On Fri, Jul 25, 2014 at 5:14 PM, Bill Farner <wf...@apache.org> wrote:
> > Only the API methods on the scheudler; i propose that the client adopt
> the
> > scheduler's update orchestration and we delete the equivalent code from
> the
> > client.
> >
> > -=Bill
> >
> >
> > On Fri, Jul 25, 2014 at 3:54 PM, Maxim Khutornenko <ma...@apache.org>
> wrote:
> >
> >> I am a bit confused. Are you suggesting we retain the current client
> >> updater algorithm or only the scheduler primitives it currently
> >> employs?
> >>
> >> On Fri, Jul 25, 2014 at 3:36 PM, Bill Farner <wf...@apache.org>
> wrote:
> >> > Yeah, absolutely - we will retain AURORA-383
> >> > <https://issues.apache.org/jira/browse/AURORA-383> for that.
> >> >
> >> > -=Bill
> >> >
> >> >
> >> > On Fri, Jul 25, 2014 at 2:48 PM, Brian Wickman <wi...@apache.org>
> >> wrote:
> >> >
> >> >> The scheduler API should know when jobs are locked, though, right?
>  That
> >> >> information could be made available to the UI.
> >> >>
> >> >>
> >> >> On Fri, Jul 25, 2014 at 2:40 PM, Bill Farner <wf...@apache.org>
> >> wrote:
> >> >>
> >> >> > I think the current API primitives used for updates (kill, add)
> will
> >> >> > continue to make sense, so a client could implement updates that
> way.
> >> >> >  However, these will not appear as updates to the scheduler.
> >> >> >
> >> >> > -=Bill
> >> >> >
> >> >> >
> >> >> > On Fri, Jul 25, 2014 at 2:31 PM, Maxim Khutornenko <
> maxim@apache.org>
> >> >> > wrote:
> >> >> >
> >> >> > > Retaining client update algorithm would require extra work on the
> >> >> > scheduler
> >> >> > > side to satisfy visibility requirements Bill outlined above,
> which
> >> may
> >> >> > not
> >> >> > > worth the effort. That would also create ground for inconsistent
> >> update
> >> >> > > expectations and experience.
> >> >> > >
> >> >> > >
> >> >> > > On Fri, Jul 25, 2014 at 1:34 PM, Brian Wickman <
> wickman@apache.org>
> >> >> > wrote:
> >> >> > >
> >> >> > > > Will the API for client-side updates still exist?  Will the
> client
> >> >> > > continue
> >> >> > > > to have its own implementation of 'update' (or perhaps an
> 'update
> >> >> > > --local'
> >> >> > > > flag?)  The reason I ask is whether customers should continue
> to
> >> have
> >> >> > the
> >> >> > > > flexbility to implement their own update algorithms (e.g. 1% ->
> >> 10%
> >> >> ->
> >> >> > > 25%
> >> >> > > > -> 25% -> 25% -> rest.)
> >> >> > > >
> >> >> > > >
> >> >> > > > On Fri, Jul 25, 2014 at 11:41 AM, Bill Farner <
> wfarner@apache.org
> >> >
> >> >> > > wrote:
> >> >> > > >
> >> >> > > > > Hi all,
> >> >> > > > >
> >> >> > > > > Rolling updates of services is a crucial feature in Aurora.
> As
> >> >> such,
> >> >> > we
> >> >> > > > > want to take great care when changing its behavior. Today,
> >> Aurora
> >> >> > > > operates
> >> >> > > > > by delegating this functionality to the client (or any API
> >> client,
> >> >> > for
> >> >> > > > that
> >> >> > > > > matter). While this has provided a nice abstraction, it turns
> >> out
> >> >> > there
> >> >> > > > are
> >> >> > > > > some shortcomings with this approach:
> >> >> > > > >
> >> >> > > > >   1. Visibility: since the scheduler does not know about
> >> updates,
> >> >> it
> >> >> > > > cannot
> >> >> > > > > display useful information about an in-progress update
> >> >> > > > >   2. Visibility: for two users to diagnose a failed update,
> they
> >> >> must
> >> >> > > be
> >> >> > > > at
> >> >> > > > > the same terminal, or copy/paste terminal output
> >> >> > > > >   3. Usability: the scheduler has no means to show
> information
> >> >> about
> >> >> > > how
> >> >> > > > an
> >> >> > > > > application's packages or configuration changed over time
> >> >> > > > >   4. Usability: update orchestration in the client means a
> lost
> >> >> > > > connection
> >> >> > > > > to the scheduler halts an update
> >> >> > > > >
> >> >> > > > > Some of the above issues can be addressed by moving update
> >> >> > > orchestration
> >> >> > > > to
> >> >> > > > > a service external to the scheduler. At first glance, this
> >> approach
> >> >> > is
> >> >> > > > > attractive, as there is a firm separation of concerns.
> However,
> >> >> there
> >> >> > > > are a
> >> >> > > > > few pitfalls with this approach:
> >> >> > > > >
> >> >> > > > >   1. Usability: setup and maintenance of an aurora cluster
> >> becomes
> >> >> > even
> >> >> > > > > more complicated (additional service + storage system)
> >> >> > > > >   2. Usability: the user interface becomes more complicated
> to
> >> >> stitch
> >> >> > > > > together, as end-users really should only have to visit one
> >> website
> >> >> > to
> >> >> > > > view
> >> >> > > > > job information.
> >> >> > > > >   3. Complexity: implementing a new production-ready service
> >> from
> >> >> > > scratch
> >> >> > > > > will take a non-trivial amount of time
> >> >> > > > >
> >> >> > > > > With these issues in mind, I propose that the scheduler take
> >> over
> >> >> the
> >> >> > > > > responsibility of application update orchestration. This will
> >> allow
> >> >> > us
> >> >> > > to
> >> >> > > > > solve the current design shortcomings, without the pitfalls
> of
> >> the
> >> >> > > > separate
> >> >> > > > > service approach.
> >> >> > > > >
> >> >> > > > > I'm interested in thoughts others have on this. Does the
> >> reasoning
> >> >> > seem
> >> >> > > > > sound? Are there things i'm missing?
> >> >> > > > >
> >> >> > > > >
> >> >> > > > > -=Bill
> >> >> > > > >
> >> >> > > >
> >> >> > >
> >> >> >
> >> >>
> >>
>

Re: Propsal: Centralizing update orchestration in Aurora

Posted by Maxim Khutornenko <ma...@apache.org>.

Thanks for clarifying. Makes sense to me.

On Fri, Jul 25, 2014 at 5:14 PM, Bill Farner <wf...@apache.org> wrote:
> Only the API methods on the scheudler; i propose that the client adopt the
> scheduler's update orchestration and we delete the equivalent code from the
> client.
>
> -=Bill
>
>
> On Fri, Jul 25, 2014 at 3:54 PM, Maxim Khutornenko <ma...@apache.org> wrote:
>
>> I am a bit confused. Are you suggesting we retain the current client
>> updater algorithm or only the scheduler primitives it currently
>> employs?
>>
>> On Fri, Jul 25, 2014 at 3:36 PM, Bill Farner <wf...@apache.org> wrote:
>> > Yeah, absolutely - we will retain AURORA-383
>> > <https://issues.apache.org/jira/browse/AURORA-383> for that.
>> >
>> > -=Bill
>> >
>> >
>> > On Fri, Jul 25, 2014 at 2:48 PM, Brian Wickman <wi...@apache.org>
>> wrote:
>> >
>> >> The scheduler API should know when jobs are locked, though, right?  That
>> >> information could be made available to the UI.
>> >>
>> >>
>> >> On Fri, Jul 25, 2014 at 2:40 PM, Bill Farner <wf...@apache.org>
>> wrote:
>> >>
>> >> > I think the current API primitives used for updates (kill, add) will
>> >> > continue to make sense, so a client could implement updates that way.
>> >> >  However, these will not appear as updates to the scheduler.
>> >> >
>> >> > -=Bill
>> >> >
>> >> >
>> >> > On Fri, Jul 25, 2014 at 2:31 PM, Maxim Khutornenko <ma...@apache.org>
>> >> > wrote:
>> >> >
>> >> > > Retaining client update algorithm would require extra work on the
>> >> > scheduler
>> >> > > side to satisfy visibility requirements Bill outlined above, which
>> may
>> >> > not
>> >> > > worth the effort. That would also create ground for inconsistent
>> update
>> >> > > expectations and experience.
>> >> > >
>> >> > >
>> >> > > On Fri, Jul 25, 2014 at 1:34 PM, Brian Wickman <wi...@apache.org>
>> >> > wrote:
>> >> > >
>> >> > > > Will the API for client-side updates still exist?  Will the client
>> >> > > continue
>> >> > > > to have its own implementation of 'update' (or perhaps an 'update
>> >> > > --local'
>> >> > > > flag?)  The reason I ask is whether customers should continue to
>> have
>> >> > the
>> >> > > > flexbility to implement their own update algorithms (e.g. 1% ->
>> 10%
>> >> ->
>> >> > > 25%
>> >> > > > -> 25% -> 25% -> rest.)
>> >> > > >
>> >> > > >
>> >> > > > On Fri, Jul 25, 2014 at 11:41 AM, Bill Farner <wfarner@apache.org
>> >
>> >> > > wrote:
>> >> > > >
>> >> > > > > Hi all,
>> >> > > > >
>> >> > > > > Rolling updates of services is a crucial feature in Aurora. As
>> >> such,
>> >> > we
>> >> > > > > want to take great care when changing its behavior. Today,
>> Aurora
>> >> > > > operates
>> >> > > > > by delegating this functionality to the client (or any API
>> client,
>> >> > for
>> >> > > > that
>> >> > > > > matter). While this has provided a nice abstraction, it turns
>> out
>> >> > there
>> >> > > > are
>> >> > > > > some shortcomings with this approach:
>> >> > > > >
>> >> > > > >   1. Visibility: since the scheduler does not know about
>> updates,
>> >> it
>> >> > > > cannot
>> >> > > > > display useful information about an in-progress update
>> >> > > > >   2. Visibility: for two users to diagnose a failed update, they
>> >> must
>> >> > > be
>> >> > > > at
>> >> > > > > the same terminal, or copy/paste terminal output
>> >> > > > >   3. Usability: the scheduler has no means to show information
>> >> about
>> >> > > how
>> >> > > > an
>> >> > > > > application's packages or configuration changed over time
>> >> > > > >   4. Usability: update orchestration in the client means a lost
>> >> > > > connection
>> >> > > > > to the scheduler halts an update
>> >> > > > >
>> >> > > > > Some of the above issues can be addressed by moving update
>> >> > > orchestration
>> >> > > > to
>> >> > > > > a service external to the scheduler. At first glance, this
>> approach
>> >> > is
>> >> > > > > attractive, as there is a firm separation of concerns. However,
>> >> there
>> >> > > > are a
>> >> > > > > few pitfalls with this approach:
>> >> > > > >
>> >> > > > >   1. Usability: setup and maintenance of an aurora cluster
>> becomes
>> >> > even
>> >> > > > > more complicated (additional service + storage system)
>> >> > > > >   2. Usability: the user interface becomes more complicated to
>> >> stitch
>> >> > > > > together, as end-users really should only have to visit one
>> website
>> >> > to
>> >> > > > view
>> >> > > > > job information.
>> >> > > > >   3. Complexity: implementing a new production-ready service
>> from
>> >> > > scratch
>> >> > > > > will take a non-trivial amount of time
>> >> > > > >
>> >> > > > > With these issues in mind, I propose that the scheduler take
>> over
>> >> the
>> >> > > > > responsibility of application update orchestration. This will
>> allow
>> >> > us
>> >> > > to
>> >> > > > > solve the current design shortcomings, without the pitfalls of
>> the
>> >> > > > separate
>> >> > > > > service approach.
>> >> > > > >
>> >> > > > > I'm interested in thoughts others have on this. Does the
>> reasoning
>> >> > seem
>> >> > > > > sound? Are there things i'm missing?
>> >> > > > >
>> >> > > > >
>> >> > > > > -=Bill
>> >> > > > >
>> >> > > >
>> >> > >
>> >> >
>> >>
>>

Re: Propsal: Centralizing update orchestration in Aurora

Posted by Bill Farner <wf...@apache.org>.

Only the API methods on the scheudler; i propose that the client adopt the
scheduler's update orchestration and we delete the equivalent code from the
client.

-=Bill


On Fri, Jul 25, 2014 at 3:54 PM, Maxim Khutornenko <ma...@apache.org> wrote:

> I am a bit confused. Are you suggesting we retain the current client
> updater algorithm or only the scheduler primitives it currently
> employs?
>
> On Fri, Jul 25, 2014 at 3:36 PM, Bill Farner <wf...@apache.org> wrote:
> > Yeah, absolutely - we will retain AURORA-383
> > <https://issues.apache.org/jira/browse/AURORA-383> for that.
> >
> > -=Bill
> >
> >
> > On Fri, Jul 25, 2014 at 2:48 PM, Brian Wickman <wi...@apache.org>
> wrote:
> >
> >> The scheduler API should know when jobs are locked, though, right?  That
> >> information could be made available to the UI.
> >>
> >>
> >> On Fri, Jul 25, 2014 at 2:40 PM, Bill Farner <wf...@apache.org>
> wrote:
> >>
> >> > I think the current API primitives used for updates (kill, add) will
> >> > continue to make sense, so a client could implement updates that way.
> >> >  However, these will not appear as updates to the scheduler.
> >> >
> >> > -=Bill
> >> >
> >> >
> >> > On Fri, Jul 25, 2014 at 2:31 PM, Maxim Khutornenko <ma...@apache.org>
> >> > wrote:
> >> >
> >> > > Retaining client update algorithm would require extra work on the
> >> > scheduler
> >> > > side to satisfy visibility requirements Bill outlined above, which
> may
> >> > not
> >> > > worth the effort. That would also create ground for inconsistent
> update
> >> > > expectations and experience.
> >> > >
> >> > >
> >> > > On Fri, Jul 25, 2014 at 1:34 PM, Brian Wickman <wi...@apache.org>
> >> > wrote:
> >> > >
> >> > > > Will the API for client-side updates still exist?  Will the client
> >> > > continue
> >> > > > to have its own implementation of 'update' (or perhaps an 'update
> >> > > --local'
> >> > > > flag?)  The reason I ask is whether customers should continue to
> have
> >> > the
> >> > > > flexbility to implement their own update algorithms (e.g. 1% ->
> 10%
> >> ->
> >> > > 25%
> >> > > > -> 25% -> 25% -> rest.)
> >> > > >
> >> > > >
> >> > > > On Fri, Jul 25, 2014 at 11:41 AM, Bill Farner <wfarner@apache.org
> >
> >> > > wrote:
> >> > > >
> >> > > > > Hi all,
> >> > > > >
> >> > > > > Rolling updates of services is a crucial feature in Aurora. As
> >> such,
> >> > we
> >> > > > > want to take great care when changing its behavior. Today,
> Aurora
> >> > > > operates
> >> > > > > by delegating this functionality to the client (or any API
> client,
> >> > for
> >> > > > that
> >> > > > > matter). While this has provided a nice abstraction, it turns
> out
> >> > there
> >> > > > are
> >> > > > > some shortcomings with this approach:
> >> > > > >
> >> > > > >   1. Visibility: since the scheduler does not know about
> updates,
> >> it
> >> > > > cannot
> >> > > > > display useful information about an in-progress update
> >> > > > >   2. Visibility: for two users to diagnose a failed update, they
> >> must
> >> > > be
> >> > > > at
> >> > > > > the same terminal, or copy/paste terminal output
> >> > > > >   3. Usability: the scheduler has no means to show information
> >> about
> >> > > how
> >> > > > an
> >> > > > > application's packages or configuration changed over time
> >> > > > >   4. Usability: update orchestration in the client means a lost
> >> > > > connection
> >> > > > > to the scheduler halts an update
> >> > > > >
> >> > > > > Some of the above issues can be addressed by moving update
> >> > > orchestration
> >> > > > to
> >> > > > > a service external to the scheduler. At first glance, this
> approach
> >> > is
> >> > > > > attractive, as there is a firm separation of concerns. However,
> >> there
> >> > > > are a
> >> > > > > few pitfalls with this approach:
> >> > > > >
> >> > > > >   1. Usability: setup and maintenance of an aurora cluster
> becomes
> >> > even
> >> > > > > more complicated (additional service + storage system)
> >> > > > >   2. Usability: the user interface becomes more complicated to
> >> stitch
> >> > > > > together, as end-users really should only have to visit one
> website
> >> > to
> >> > > > view
> >> > > > > job information.
> >> > > > >   3. Complexity: implementing a new production-ready service
> from
> >> > > scratch
> >> > > > > will take a non-trivial amount of time
> >> > > > >
> >> > > > > With these issues in mind, I propose that the scheduler take
> over
> >> the
> >> > > > > responsibility of application update orchestration. This will
> allow
> >> > us
> >> > > to
> >> > > > > solve the current design shortcomings, without the pitfalls of
> the
> >> > > > separate
> >> > > > > service approach.
> >> > > > >
> >> > > > > I'm interested in thoughts others have on this. Does the
> reasoning
> >> > seem
> >> > > > > sound? Are there things i'm missing?
> >> > > > >
> >> > > > >
> >> > > > > -=Bill
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
>

Re: Propsal: Centralizing update orchestration in Aurora

Posted by Maxim Khutornenko <ma...@apache.org>.

I am a bit confused. Are you suggesting we retain the current client
updater algorithm or only the scheduler primitives it currently
employs?

On Fri, Jul 25, 2014 at 3:36 PM, Bill Farner <wf...@apache.org> wrote:
> Yeah, absolutely - we will retain AURORA-383
> <https://issues.apache.org/jira/browse/AURORA-383> for that.
>
> -=Bill
>
>
> On Fri, Jul 25, 2014 at 2:48 PM, Brian Wickman <wi...@apache.org> wrote:
>
>> The scheduler API should know when jobs are locked, though, right?  That
>> information could be made available to the UI.
>>
>>
>> On Fri, Jul 25, 2014 at 2:40 PM, Bill Farner <wf...@apache.org> wrote:
>>
>> > I think the current API primitives used for updates (kill, add) will
>> > continue to make sense, so a client could implement updates that way.
>> >  However, these will not appear as updates to the scheduler.
>> >
>> > -=Bill
>> >
>> >
>> > On Fri, Jul 25, 2014 at 2:31 PM, Maxim Khutornenko <ma...@apache.org>
>> > wrote:
>> >
>> > > Retaining client update algorithm would require extra work on the
>> > scheduler
>> > > side to satisfy visibility requirements Bill outlined above, which may
>> > not
>> > > worth the effort. That would also create ground for inconsistent update
>> > > expectations and experience.
>> > >
>> > >
>> > > On Fri, Jul 25, 2014 at 1:34 PM, Brian Wickman <wi...@apache.org>
>> > wrote:
>> > >
>> > > > Will the API for client-side updates still exist?  Will the client
>> > > continue
>> > > > to have its own implementation of 'update' (or perhaps an 'update
>> > > --local'
>> > > > flag?)  The reason I ask is whether customers should continue to have
>> > the
>> > > > flexbility to implement their own update algorithms (e.g. 1% -> 10%
>> ->
>> > > 25%
>> > > > -> 25% -> 25% -> rest.)
>> > > >
>> > > >
>> > > > On Fri, Jul 25, 2014 at 11:41 AM, Bill Farner <wf...@apache.org>
>> > > wrote:
>> > > >
>> > > > > Hi all,
>> > > > >
>> > > > > Rolling updates of services is a crucial feature in Aurora. As
>> such,
>> > we
>> > > > > want to take great care when changing its behavior. Today, Aurora
>> > > > operates
>> > > > > by delegating this functionality to the client (or any API client,
>> > for
>> > > > that
>> > > > > matter). While this has provided a nice abstraction, it turns out
>> > there
>> > > > are
>> > > > > some shortcomings with this approach:
>> > > > >
>> > > > >   1. Visibility: since the scheduler does not know about updates,
>> it
>> > > > cannot
>> > > > > display useful information about an in-progress update
>> > > > >   2. Visibility: for two users to diagnose a failed update, they
>> must
>> > > be
>> > > > at
>> > > > > the same terminal, or copy/paste terminal output
>> > > > >   3. Usability: the scheduler has no means to show information
>> about
>> > > how
>> > > > an
>> > > > > application's packages or configuration changed over time
>> > > > >   4. Usability: update orchestration in the client means a lost
>> > > > connection
>> > > > > to the scheduler halts an update
>> > > > >
>> > > > > Some of the above issues can be addressed by moving update
>> > > orchestration
>> > > > to
>> > > > > a service external to the scheduler. At first glance, this approach
>> > is
>> > > > > attractive, as there is a firm separation of concerns. However,
>> there
>> > > > are a
>> > > > > few pitfalls with this approach:
>> > > > >
>> > > > >   1. Usability: setup and maintenance of an aurora cluster becomes
>> > even
>> > > > > more complicated (additional service + storage system)
>> > > > >   2. Usability: the user interface becomes more complicated to
>> stitch
>> > > > > together, as end-users really should only have to visit one website
>> > to
>> > > > view
>> > > > > job information.
>> > > > >   3. Complexity: implementing a new production-ready service from
>> > > scratch
>> > > > > will take a non-trivial amount of time
>> > > > >
>> > > > > With these issues in mind, I propose that the scheduler take over
>> the
>> > > > > responsibility of application update orchestration. This will allow
>> > us
>> > > to
>> > > > > solve the current design shortcomings, without the pitfalls of the
>> > > > separate
>> > > > > service approach.
>> > > > >
>> > > > > I'm interested in thoughts others have on this. Does the reasoning
>> > seem
>> > > > > sound? Are there things i'm missing?
>> > > > >
>> > > > >
>> > > > > -=Bill
>> > > > >
>> > > >
>> > >
>> >
>>

Re: Propsal: Centralizing update orchestration in Aurora

Posted by Bill Farner <wf...@apache.org>.

Yeah, absolutely - we will retain AURORA-383
<https://issues.apache.org/jira/browse/AURORA-383> for that.

-=Bill


On Fri, Jul 25, 2014 at 2:48 PM, Brian Wickman <wi...@apache.org> wrote:

> The scheduler API should know when jobs are locked, though, right?  That
> information could be made available to the UI.
>
>
> On Fri, Jul 25, 2014 at 2:40 PM, Bill Farner <wf...@apache.org> wrote:
>
> > I think the current API primitives used for updates (kill, add) will
> > continue to make sense, so a client could implement updates that way.
> >  However, these will not appear as updates to the scheduler.
> >
> > -=Bill
> >
> >
> > On Fri, Jul 25, 2014 at 2:31 PM, Maxim Khutornenko <ma...@apache.org>
> > wrote:
> >
> > > Retaining client update algorithm would require extra work on the
> > scheduler
> > > side to satisfy visibility requirements Bill outlined above, which may
> > not
> > > worth the effort. That would also create ground for inconsistent update
> > > expectations and experience.
> > >
> > >
> > > On Fri, Jul 25, 2014 at 1:34 PM, Brian Wickman <wi...@apache.org>
> > wrote:
> > >
> > > > Will the API for client-side updates still exist?  Will the client
> > > continue
> > > > to have its own implementation of 'update' (or perhaps an 'update
> > > --local'
> > > > flag?)  The reason I ask is whether customers should continue to have
> > the
> > > > flexbility to implement their own update algorithms (e.g. 1% -> 10%
> ->
> > > 25%
> > > > -> 25% -> 25% -> rest.)
> > > >
> > > >
> > > > On Fri, Jul 25, 2014 at 11:41 AM, Bill Farner <wf...@apache.org>
> > > wrote:
> > > >
> > > > > Hi all,
> > > > >
> > > > > Rolling updates of services is a crucial feature in Aurora. As
> such,
> > we
> > > > > want to take great care when changing its behavior. Today, Aurora
> > > > operates
> > > > > by delegating this functionality to the client (or any API client,
> > for
> > > > that
> > > > > matter). While this has provided a nice abstraction, it turns out
> > there
> > > > are
> > > > > some shortcomings with this approach:
> > > > >
> > > > >   1. Visibility: since the scheduler does not know about updates,
> it
> > > > cannot
> > > > > display useful information about an in-progress update
> > > > >   2. Visibility: for two users to diagnose a failed update, they
> must
> > > be
> > > > at
> > > > > the same terminal, or copy/paste terminal output
> > > > >   3. Usability: the scheduler has no means to show information
> about
> > > how
> > > > an
> > > > > application's packages or configuration changed over time
> > > > >   4. Usability: update orchestration in the client means a lost
> > > > connection
> > > > > to the scheduler halts an update
> > > > >
> > > > > Some of the above issues can be addressed by moving update
> > > orchestration
> > > > to
> > > > > a service external to the scheduler. At first glance, this approach
> > is
> > > > > attractive, as there is a firm separation of concerns. However,
> there
> > > > are a
> > > > > few pitfalls with this approach:
> > > > >
> > > > >   1. Usability: setup and maintenance of an aurora cluster becomes
> > even
> > > > > more complicated (additional service + storage system)
> > > > >   2. Usability: the user interface becomes more complicated to
> stitch
> > > > > together, as end-users really should only have to visit one website
> > to
> > > > view
> > > > > job information.
> > > > >   3. Complexity: implementing a new production-ready service from
> > > scratch
> > > > > will take a non-trivial amount of time
> > > > >
> > > > > With these issues in mind, I propose that the scheduler take over
> the
> > > > > responsibility of application update orchestration. This will allow
> > us
> > > to
> > > > > solve the current design shortcomings, without the pitfalls of the
> > > > separate
> > > > > service approach.
> > > > >
> > > > > I'm interested in thoughts others have on this. Does the reasoning
> > seem
> > > > > sound? Are there things i'm missing?
> > > > >
> > > > >
> > > > > -=Bill
> > > > >
> > > >
> > >
> >
>

Re: Propsal: Centralizing update orchestration in Aurora

Posted by Brian Wickman <wi...@apache.org>.

The scheduler API should know when jobs are locked, though, right?  That
information could be made available to the UI.


On Fri, Jul 25, 2014 at 2:40 PM, Bill Farner <wf...@apache.org> wrote:

> I think the current API primitives used for updates (kill, add) will
> continue to make sense, so a client could implement updates that way.
>  However, these will not appear as updates to the scheduler.
>
> -=Bill
>
>
> On Fri, Jul 25, 2014 at 2:31 PM, Maxim Khutornenko <ma...@apache.org>
> wrote:
>
> > Retaining client update algorithm would require extra work on the
> scheduler
> > side to satisfy visibility requirements Bill outlined above, which may
> not
> > worth the effort. That would also create ground for inconsistent update
> > expectations and experience.
> >
> >
> > On Fri, Jul 25, 2014 at 1:34 PM, Brian Wickman <wi...@apache.org>
> wrote:
> >
> > > Will the API for client-side updates still exist?  Will the client
> > continue
> > > to have its own implementation of 'update' (or perhaps an 'update
> > --local'
> > > flag?)  The reason I ask is whether customers should continue to have
> the
> > > flexbility to implement their own update algorithms (e.g. 1% -> 10% ->
> > 25%
> > > -> 25% -> 25% -> rest.)
> > >
> > >
> > > On Fri, Jul 25, 2014 at 11:41 AM, Bill Farner <wf...@apache.org>
> > wrote:
> > >
> > > > Hi all,
> > > >
> > > > Rolling updates of services is a crucial feature in Aurora. As such,
> we
> > > > want to take great care when changing its behavior. Today, Aurora
> > > operates
> > > > by delegating this functionality to the client (or any API client,
> for
> > > that
> > > > matter). While this has provided a nice abstraction, it turns out
> there
> > > are
> > > > some shortcomings with this approach:
> > > >
> > > >   1. Visibility: since the scheduler does not know about updates, it
> > > cannot
> > > > display useful information about an in-progress update
> > > >   2. Visibility: for two users to diagnose a failed update, they must
> > be
> > > at
> > > > the same terminal, or copy/paste terminal output
> > > >   3. Usability: the scheduler has no means to show information about
> > how
> > > an
> > > > application's packages or configuration changed over time
> > > >   4. Usability: update orchestration in the client means a lost
> > > connection
> > > > to the scheduler halts an update
> > > >
> > > > Some of the above issues can be addressed by moving update
> > orchestration
> > > to
> > > > a service external to the scheduler. At first glance, this approach
> is
> > > > attractive, as there is a firm separation of concerns. However, there
> > > are a
> > > > few pitfalls with this approach:
> > > >
> > > >   1. Usability: setup and maintenance of an aurora cluster becomes
> even
> > > > more complicated (additional service + storage system)
> > > >   2. Usability: the user interface becomes more complicated to stitch
> > > > together, as end-users really should only have to visit one website
> to
> > > view
> > > > job information.
> > > >   3. Complexity: implementing a new production-ready service from
> > scratch
> > > > will take a non-trivial amount of time
> > > >
> > > > With these issues in mind, I propose that the scheduler take over the
> > > > responsibility of application update orchestration. This will allow
> us
> > to
> > > > solve the current design shortcomings, without the pitfalls of the
> > > separate
> > > > service approach.
> > > >
> > > > I'm interested in thoughts others have on this. Does the reasoning
> seem
> > > > sound? Are there things i'm missing?
> > > >
> > > >
> > > > -=Bill
> > > >
> > >
> >
>

Re: Propsal: Centralizing update orchestration in Aurora

Posted by Bill Farner <wf...@apache.org>.

I think the current API primitives used for updates (kill, add) will
continue to make sense, so a client could implement updates that way.
 However, these will not appear as updates to the scheduler.

-=Bill


On Fri, Jul 25, 2014 at 2:31 PM, Maxim Khutornenko <ma...@apache.org> wrote:

> Retaining client update algorithm would require extra work on the scheduler
> side to satisfy visibility requirements Bill outlined above, which may not
> worth the effort. That would also create ground for inconsistent update
> expectations and experience.
>
>
> On Fri, Jul 25, 2014 at 1:34 PM, Brian Wickman <wi...@apache.org> wrote:
>
> > Will the API for client-side updates still exist?  Will the client
> continue
> > to have its own implementation of 'update' (or perhaps an 'update
> --local'
> > flag?)  The reason I ask is whether customers should continue to have the
> > flexbility to implement their own update algorithms (e.g. 1% -> 10% ->
> 25%
> > -> 25% -> 25% -> rest.)
> >
> >
> > On Fri, Jul 25, 2014 at 11:41 AM, Bill Farner <wf...@apache.org>
> wrote:
> >
> > > Hi all,
> > >
> > > Rolling updates of services is a crucial feature in Aurora. As such, we
> > > want to take great care when changing its behavior. Today, Aurora
> > operates
> > > by delegating this functionality to the client (or any API client, for
> > that
> > > matter). While this has provided a nice abstraction, it turns out there
> > are
> > > some shortcomings with this approach:
> > >
> > >   1. Visibility: since the scheduler does not know about updates, it
> > cannot
> > > display useful information about an in-progress update
> > >   2. Visibility: for two users to diagnose a failed update, they must
> be
> > at
> > > the same terminal, or copy/paste terminal output
> > >   3. Usability: the scheduler has no means to show information about
> how
> > an
> > > application's packages or configuration changed over time
> > >   4. Usability: update orchestration in the client means a lost
> > connection
> > > to the scheduler halts an update
> > >
> > > Some of the above issues can be addressed by moving update
> orchestration
> > to
> > > a service external to the scheduler. At first glance, this approach is
> > > attractive, as there is a firm separation of concerns. However, there
> > are a
> > > few pitfalls with this approach:
> > >
> > >   1. Usability: setup and maintenance of an aurora cluster becomes even
> > > more complicated (additional service + storage system)
> > >   2. Usability: the user interface becomes more complicated to stitch
> > > together, as end-users really should only have to visit one website to
> > view
> > > job information.
> > >   3. Complexity: implementing a new production-ready service from
> scratch
> > > will take a non-trivial amount of time
> > >
> > > With these issues in mind, I propose that the scheduler take over the
> > > responsibility of application update orchestration. This will allow us
> to
> > > solve the current design shortcomings, without the pitfalls of the
> > separate
> > > service approach.
> > >
> > > I'm interested in thoughts others have on this. Does the reasoning seem
> > > sound? Are there things i'm missing?
> > >
> > >
> > > -=Bill
> > >
> >
>

Re: Propsal: Centralizing update orchestration in Aurora

Posted by Maxim Khutornenko <ma...@apache.org>.

Retaining client update algorithm would require extra work on the scheduler
side to satisfy visibility requirements Bill outlined above, which may not
worth the effort. That would also create ground for inconsistent update
expectations and experience.


On Fri, Jul 25, 2014 at 1:34 PM, Brian Wickman <wi...@apache.org> wrote:

> Will the API for client-side updates still exist?  Will the client continue
> to have its own implementation of 'update' (or perhaps an 'update --local'
> flag?)  The reason I ask is whether customers should continue to have the
> flexbility to implement their own update algorithms (e.g. 1% -> 10% -> 25%
> -> 25% -> 25% -> rest.)
>
>
> On Fri, Jul 25, 2014 at 11:41 AM, Bill Farner <wf...@apache.org> wrote:
>
> > Hi all,
> >
> > Rolling updates of services is a crucial feature in Aurora. As such, we
> > want to take great care when changing its behavior. Today, Aurora
> operates
> > by delegating this functionality to the client (or any API client, for
> that
> > matter). While this has provided a nice abstraction, it turns out there
> are
> > some shortcomings with this approach:
> >
> >   1. Visibility: since the scheduler does not know about updates, it
> cannot
> > display useful information about an in-progress update
> >   2. Visibility: for two users to diagnose a failed update, they must be
> at
> > the same terminal, or copy/paste terminal output
> >   3. Usability: the scheduler has no means to show information about how
> an
> > application's packages or configuration changed over time
> >   4. Usability: update orchestration in the client means a lost
> connection
> > to the scheduler halts an update
> >
> > Some of the above issues can be addressed by moving update orchestration
> to
> > a service external to the scheduler. At first glance, this approach is
> > attractive, as there is a firm separation of concerns. However, there
> are a
> > few pitfalls with this approach:
> >
> >   1. Usability: setup and maintenance of an aurora cluster becomes even
> > more complicated (additional service + storage system)
> >   2. Usability: the user interface becomes more complicated to stitch
> > together, as end-users really should only have to visit one website to
> view
> > job information.
> >   3. Complexity: implementing a new production-ready service from scratch
> > will take a non-trivial amount of time
> >
> > With these issues in mind, I propose that the scheduler take over the
> > responsibility of application update orchestration. This will allow us to
> > solve the current design shortcomings, without the pitfalls of the
> separate
> > service approach.
> >
> > I'm interested in thoughts others have on this. Does the reasoning seem
> > sound? Are there things i'm missing?
> >
> >
> > -=Bill
> >
>

Re: Propsal: Centralizing update orchestration in Aurora

Posted by Brian Wickman <wi...@apache.org>.

Will the API for client-side updates still exist?  Will the client continue
to have its own implementation of 'update' (or perhaps an 'update --local'
flag?)  The reason I ask is whether customers should continue to have the
flexbility to implement their own update algorithms (e.g. 1% -> 10% -> 25%
-> 25% -> 25% -> rest.)


On Fri, Jul 25, 2014 at 11:41 AM, Bill Farner <wf...@apache.org> wrote:

> Hi all,
>
> Rolling updates of services is a crucial feature in Aurora. As such, we
> want to take great care when changing its behavior. Today, Aurora operates
> by delegating this functionality to the client (or any API client, for that
> matter). While this has provided a nice abstraction, it turns out there are
> some shortcomings with this approach:
>
>   1. Visibility: since the scheduler does not know about updates, it cannot
> display useful information about an in-progress update
>   2. Visibility: for two users to diagnose a failed update, they must be at
> the same terminal, or copy/paste terminal output
>   3. Usability: the scheduler has no means to show information about how an
> application's packages or configuration changed over time
>   4. Usability: update orchestration in the client means a lost connection
> to the scheduler halts an update
>
> Some of the above issues can be addressed by moving update orchestration to
> a service external to the scheduler. At first glance, this approach is
> attractive, as there is a firm separation of concerns. However, there are a
> few pitfalls with this approach:
>
>   1. Usability: setup and maintenance of an aurora cluster becomes even
> more complicated (additional service + storage system)
>   2. Usability: the user interface becomes more complicated to stitch
> together, as end-users really should only have to visit one website to view
> job information.
>   3. Complexity: implementing a new production-ready service from scratch
> will take a non-trivial amount of time
>
> With these issues in mind, I propose that the scheduler take over the
> responsibility of application update orchestration. This will allow us to
> solve the current design shortcomings, without the pitfalls of the separate
> service approach.
>
> I'm interested in thoughts others have on this. Does the reasoning seem
> sound? Are there things i'm missing?
>
>
> -=Bill
>

Re: Propsal: Centralizing update orchestration in Aurora

Posted by Toby Weingartner <tw...@twopensource.com>.

Possible pros to having the scheduler do the updates:

 - Scheduler likely has the most direct information with respect to
job/task SLA style metrics, and can use these to help in keeping jobs
within SLA during an update.
 - If the updates are given as "rate of change", if/when tasks fail in
large jobs, the update rate may be adjusted automatically to stay within
SLA, and possibly use a opportunistic method to upgrade a new replacement
task with the new one.

-Toby.


On Fri, Jul 25, 2014 at 11:41 AM, Bill Farner <wf...@apache.org> wrote:

> Hi all,
>
> Rolling updates of services is a crucial feature in Aurora. As such, we
> want to take great care when changing its behavior. Today, Aurora operates
> by delegating this functionality to the client (or any API client, for that
> matter). While this has provided a nice abstraction, it turns out there are
> some shortcomings with this approach:
>
>   1. Visibility: since the scheduler does not know about updates, it cannot
> display useful information about an in-progress update
>   2. Visibility: for two users to diagnose a failed update, they must be at
> the same terminal, or copy/paste terminal output
>   3. Usability: the scheduler has no means to show information about how an
> application's packages or configuration changed over time
>   4. Usability: update orchestration in the client means a lost connection
> to the scheduler halts an update
>
> Some of the above issues can be addressed by moving update orchestration to
> a service external to the scheduler. At first glance, this approach is
> attractive, as there is a firm separation of concerns. However, there are a
> few pitfalls with this approach:
>
>   1. Usability: setup and maintenance of an aurora cluster becomes even
> more complicated (additional service + storage system)
>   2. Usability: the user interface becomes more complicated to stitch
> together, as end-users really should only have to visit one website to view
> job information.
>   3. Complexity: implementing a new production-ready service from scratch
> will take a non-trivial amount of time
>
> With these issues in mind, I propose that the scheduler take over the
> responsibility of application update orchestration. This will allow us to
> solve the current design shortcomings, without the pitfalls of the separate
> service approach.
>
> I'm interested in thoughts others have on this. Does the reasoning seem
> sound? Are there things i'm missing?
>
>
> -=Bill
>

Re: Propsal: Centralizing update orchestration in Aurora

Posted by Bill Farner <wf...@apache.org>.

>
> Can the service just use the mesos core state abstraction?  That comes
> along as a free dependency setting up an aurora cluster.


If we take the separate service approach, i probably would not use the
replicated log.  In the scheduler, we're already contemplating moving away
from it due to the amount of database reimplementation required.  We would
likely use an off-the-shelf RDBMS.

I assume 3 is a ~wash in terms of time with productionizing new scheduler
> code.


The two big things we would need to build ~from scratch is storage and
authentication/authorization.  Also, we would need to come up with an
answer for auth delegation with a separate service.


-=Bill


On Fri, Jul 25, 2014 at 11:56 AM, John Sirois <jo...@gmail.com> wrote:

> Inline
>
> On Fri, Jul 25, 2014 at 12:41 PM, Bill Farner <wf...@apache.org> wrote:
>
> > Hi all,
> >
> > Rolling updates of services is a crucial feature in Aurora. As such, we
> > want to take great care when changing its behavior. Today, Aurora
> operates
> > by delegating this functionality to the client (or any API client, for
> that
> > matter). While this has provided a nice abstraction, it turns out there
> are
> > some shortcomings with this approach:
> >
> >   1. Visibility: since the scheduler does not know about updates, it
> cannot
> > display useful information about an in-progress update
> >   2. Visibility: for two users to diagnose a failed update, they must be
> at
> > the same terminal, or copy/paste terminal output
> >   3. Usability: the scheduler has no means to show information about how
> an
> > application's packages or configuration changed over time
> >   4. Usability: update orchestration in the client means a lost
> connection
> > to the scheduler halts an update
> >
> > Some of the above issues can be addressed by moving update orchestration
> to
> > a service external to the scheduler. At first glance, this approach is
> > attractive, as there is a firm separation of concerns. However, there
> are a
> > few pitfalls with this approach:
> >
> >   1. Usability: setup and maintenance of an aurora cluster becomes even
> > more complicated (additional service + storage system)
> >
>
> Can the service just use the mesos core state abstraction?  That comes
> along as a free dependency setting up an aurora cluster.
>
>
> >   2. Usability: the user interface becomes more complicated to stitch
> > together, as end-users really should only have to visit one website to
> view
> > job information.
> >   3. Complexity: implementing a new production-ready service from scratch
> > will take a non-trivial amount of time
> >
>
> I assume 3 is a ~wash in terms of time with productionizing new scheduler
> code.
>
>
> > With these issues in mind, I propose that the scheduler take over the
> > responsibility of application update orchestration. This will allow us to
> > solve the current design shortcomings, without the pitfalls of the
> separate
> > service approach.
> >
> > I'm interested in thoughts others have on this. Does the reasoning seem
> > sound? Are there things i'm missing?
> >
> >
> > -=Bill
> >
>

Re: Propsal: Centralizing update orchestration in Aurora

Posted by John Sirois <jo...@gmail.com>.

Inline

On Fri, Jul 25, 2014 at 12:41 PM, Bill Farner <wf...@apache.org> wrote:

> Hi all,
>
> Rolling updates of services is a crucial feature in Aurora. As such, we
> want to take great care when changing its behavior. Today, Aurora operates
> by delegating this functionality to the client (or any API client, for that
> matter). While this has provided a nice abstraction, it turns out there are
> some shortcomings with this approach:
>
>   1. Visibility: since the scheduler does not know about updates, it cannot
> display useful information about an in-progress update
>   2. Visibility: for two users to diagnose a failed update, they must be at
> the same terminal, or copy/paste terminal output
>   3. Usability: the scheduler has no means to show information about how an
> application's packages or configuration changed over time
>   4. Usability: update orchestration in the client means a lost connection
> to the scheduler halts an update
>
> Some of the above issues can be addressed by moving update orchestration to
> a service external to the scheduler. At first glance, this approach is
> attractive, as there is a firm separation of concerns. However, there are a
> few pitfalls with this approach:
>
>   1. Usability: setup and maintenance of an aurora cluster becomes even
> more complicated (additional service + storage system)
>

Can the service just use the mesos core state abstraction?  That comes
along as a free dependency setting up an aurora cluster.


>   2. Usability: the user interface becomes more complicated to stitch
> together, as end-users really should only have to visit one website to view
> job information.
>   3. Complexity: implementing a new production-ready service from scratch
> will take a non-trivial amount of time
>

I assume 3 is a ~wash in terms of time with productionizing new scheduler
code.


> With these issues in mind, I propose that the scheduler take over the
> responsibility of application update orchestration. This will allow us to
> solve the current design shortcomings, without the pitfalls of the separate
> service approach.
>
> I'm interested in thoughts others have on this. Does the reasoning seem
> sound? Are there things i'm missing?
>
>
> -=Bill
>