You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@aurora.apache.org by Brian Wickman <wi...@apache.org> on 2015/01/22 22:27:10 UTC

Thermos external component deprecation plan

Thermos is a standalone task execution system that is not coupled to Aurora
or Mesos.  This is why by default, Thermos writes out of the sandbox
(/var/run/thermos), has a separate observability system (Thermos observer),
and CLI (thermos.)

Aurora built a Thermos executor as its default executor, but the scheduler
is not architecturally tied to Thermos (or vice versa.)  In order to make
things work smoothly with this decoupling, a Thermos-specific GC executor
is also necessary to clean up the state leftover by the execution of
Thermos tasks and reconcile potential conflicts between the state of the
Mesos master and Aurora scheduler.

Both the GC executor and Thermos observer violate some of the philosophical
axioms of Mesos (e.g. out-of-sandbox access.)  They also significantly
increase the complexity of building, deploying and maintaining Aurora.  I'm
proposing removing both of them as required Aurora components.

In order to do this and make Thermos/Aurora/Mesos to play together more
nicely, several things are necessary.

1) Moving /var/run/thermos for each task into the Mesos sandbox

Thermos is a state machine with all state transitions persisted to disk.
Right now this goes to /var/run/thermos, but it should instead be persisted
some place relative to the Mesos sandbox so that the Mesos slave can
garbage collect this state once a Thermos task has completed.

This poses a task detection problem -- the Thermos CLI and Thermos observer
rely upon the existence of /var/run/thermos to know what tasks are running,
so we will need to develop a plugin to detect alternate task roots (see
AURORA-1024 <https://issues.apache.org/jira/browse/AURORA-1024> AURORA-1025
<https://issues.apache.org/jira/browse/AURORA-1025> AURORA-1026
<https://issues.apache.org/jira/browse/AURORA-1026> AURORA-1027
<https://issues.apache.org/jira/browse/AURORA-1025>).

2) Making the Thermos executor responsible for the Thermos UI

In order to make the Thermos observer an optional component, the Thermos
executor will need to assume Thermos observer responsibilities.  Since the
Mesos slave already provides a webserver to serve executor sandboxes, I am
proposing that the Thermos executor generates static HTML content that can
be served by the Mesos slave as a UI.  This means that the executor can
remain lean (no embedded webserver.)  See AURORA-725
<https://issues.apache.org/jira/browse/AURORA-725> AURORA-777
<https://issues.apache.org/jira/browse/AURORA-777>

3) Making the Aurora scheduler responsible for state reconciliation

The last component that should be removed is the GC executor.  The GC
executor performs the important task of state reconciliation, but this is
now supported directly by the Mesos master.  See AURORA-715
<https://issues.apache.org/jira/browse/AURORA-715> and specifically
AURORA-1047 <https://issues.apache.org/jira/browse/AURORA-1047>.

Lastly, this work should make it much easier to support alternate executor
implementations (including the Mesos default executor) from Aurora once a
proper Aurora API (AURORA-987
<https://issues.apache.org/jira/browse/AURORA-987>) is available.

~brian

Re: Thermos external component deprecation plan

Posted by Steve Niemitz <st...@tellapart.com>.
I agree with everything here.  A big pain point from the docker integration
side was/is the observer, and rolling the observer functionality into the
executor would simplify things greatly.

On Sat, Jan 24, 2015 at 12:29 PM, Bill Farner <wf...@apache.org> wrote:

> +1, thanks for the braindump, Brian!  This sounds great.
>
> -=Bill
>
> On Sat, Jan 24, 2015 at 8:43 AM, Joe Smith <ya...@gmail.com> wrote:
>
> > Thanks for the write up!
> >
> > > On Jan 22, 2015, at 13:27, Brian Wickman <wi...@apache.org> wrote:
> > >
> > > Thermos is a standalone task execution system that is not coupled to
> > Aurora
> > > or Mesos.  This is why by default, Thermos writes out of the sandbox
> > > (/var/run/thermos), has a separate observability system (Thermos
> > observer),
> > > and CLI (thermos.)
> > >
> > > Aurora built a Thermos executor as its default executor, but the
> > scheduler
> > > is not architecturally tied to Thermos (or vice versa.)  In order to
> make
> > > things work smoothly with this decoupling, a Thermos-specific GC
> executor
> > > is also necessary to clean up the state leftover by the execution of
> > > Thermos tasks and reconcile potential conflicts between the state of
> the
> > > Mesos master and Aurora scheduler.
> > >
> > > Both the GC executor and Thermos observer violate some of the
> > philosophical
> > > axioms of Mesos (e.g. out-of-sandbox access.)  They also significantly
> > > increase the complexity of building, deploying and maintaining Aurora.
> > I'm
> > > proposing removing both of them as required Aurora components.
> > >
> > > In order to do this and make Thermos/Aurora/Mesos to play together more
> > > nicely, several things are necessary.
> > >
> > > 1) Moving /var/run/thermos for each task into the Mesos sandbox
> > >
> > > Thermos is a state machine with all state transitions persisted to
> disk.
> > > Right now this goes to /var/run/thermos, but it should instead be
> > persisted
> > > some place relative to the Mesos sandbox so that the Mesos slave can
> > > garbage collect this state once a Thermos task has completed.
> > >
> > > This poses a task detection problem -- the Thermos CLI and Thermos
> > observer
> > > rely upon the existence of /var/run/thermos to know what tasks are
> > running,
> > > so we will need to develop a plugin to detect alternate task roots (see
> > > AURORA-1024 <https://issues.apache.org/jira/browse/AURORA-1024>
> > AURORA-1025
> > > <https://issues.apache.org/jira/browse/AURORA-1025> AURORA-1026
> > > <https://issues.apache.org/jira/browse/AURORA-1026> AURORA-1027
> > > <https://issues.apache.org/jira/browse/AURORA-1025>).
> > >
> > > 2) Making the Thermos executor responsible for the Thermos UI
> > >
> > > In order to make the Thermos observer an optional component, the
> Thermos
> > > executor will need to assume Thermos observer responsibilities.  Since
> > the
> > > Mesos slave already provides a webserver to serve executor sandboxes, I
> > am
> > > proposing that the Thermos executor generates static HTML content that
> > can
> > > be served by the Mesos slave as a UI.  This means that the executor can
> > > remain lean (no embedded webserver.)  See AURORA-725
> > > <https://issues.apache.org/jira/browse/AURORA-725> AURORA-777
> > > <https://issues.apache.org/jira/browse/AURORA-777>
> > >
> > > 3) Making the Aurora scheduler responsible for state reconciliation
> > >
> > > The last component that should be removed is the GC executor.  The GC
> > > executor performs the important task of state reconciliation, but this
> is
> > > now supported directly by the Mesos master.  See AURORA-715
> > > <https://issues.apache.org/jira/browse/AURORA-715> and specifically
> > > AURORA-1047 <https://issues.apache.org/jira/browse/AURORA-1047>.
> >
> > Although the trusty gc_executor has been solid for a long time, removing
> > it would definitely simplify things, so +10.
> >
> >
> > >
> > > Lastly, this work should make it much easier to support alternate
> > executor
> > > implementations (including the Mesos default executor) from Aurora
> once a
> > > proper Aurora API (AURORA-987
> > > <https://issues.apache.org/jira/browse/AURORA-987>) is available.
> > >
> > > ~brian
> >
>

Re: Thermos external component deprecation plan

Posted by Bill Farner <wf...@apache.org>.
+1, thanks for the braindump, Brian!  This sounds great.

-=Bill

On Sat, Jan 24, 2015 at 8:43 AM, Joe Smith <ya...@gmail.com> wrote:

> Thanks for the write up!
>
> > On Jan 22, 2015, at 13:27, Brian Wickman <wi...@apache.org> wrote:
> >
> > Thermos is a standalone task execution system that is not coupled to
> Aurora
> > or Mesos.  This is why by default, Thermos writes out of the sandbox
> > (/var/run/thermos), has a separate observability system (Thermos
> observer),
> > and CLI (thermos.)
> >
> > Aurora built a Thermos executor as its default executor, but the
> scheduler
> > is not architecturally tied to Thermos (or vice versa.)  In order to make
> > things work smoothly with this decoupling, a Thermos-specific GC executor
> > is also necessary to clean up the state leftover by the execution of
> > Thermos tasks and reconcile potential conflicts between the state of the
> > Mesos master and Aurora scheduler.
> >
> > Both the GC executor and Thermos observer violate some of the
> philosophical
> > axioms of Mesos (e.g. out-of-sandbox access.)  They also significantly
> > increase the complexity of building, deploying and maintaining Aurora.
> I'm
> > proposing removing both of them as required Aurora components.
> >
> > In order to do this and make Thermos/Aurora/Mesos to play together more
> > nicely, several things are necessary.
> >
> > 1) Moving /var/run/thermos for each task into the Mesos sandbox
> >
> > Thermos is a state machine with all state transitions persisted to disk.
> > Right now this goes to /var/run/thermos, but it should instead be
> persisted
> > some place relative to the Mesos sandbox so that the Mesos slave can
> > garbage collect this state once a Thermos task has completed.
> >
> > This poses a task detection problem -- the Thermos CLI and Thermos
> observer
> > rely upon the existence of /var/run/thermos to know what tasks are
> running,
> > so we will need to develop a plugin to detect alternate task roots (see
> > AURORA-1024 <https://issues.apache.org/jira/browse/AURORA-1024>
> AURORA-1025
> > <https://issues.apache.org/jira/browse/AURORA-1025> AURORA-1026
> > <https://issues.apache.org/jira/browse/AURORA-1026> AURORA-1027
> > <https://issues.apache.org/jira/browse/AURORA-1025>).
> >
> > 2) Making the Thermos executor responsible for the Thermos UI
> >
> > In order to make the Thermos observer an optional component, the Thermos
> > executor will need to assume Thermos observer responsibilities.  Since
> the
> > Mesos slave already provides a webserver to serve executor sandboxes, I
> am
> > proposing that the Thermos executor generates static HTML content that
> can
> > be served by the Mesos slave as a UI.  This means that the executor can
> > remain lean (no embedded webserver.)  See AURORA-725
> > <https://issues.apache.org/jira/browse/AURORA-725> AURORA-777
> > <https://issues.apache.org/jira/browse/AURORA-777>
> >
> > 3) Making the Aurora scheduler responsible for state reconciliation
> >
> > The last component that should be removed is the GC executor.  The GC
> > executor performs the important task of state reconciliation, but this is
> > now supported directly by the Mesos master.  See AURORA-715
> > <https://issues.apache.org/jira/browse/AURORA-715> and specifically
> > AURORA-1047 <https://issues.apache.org/jira/browse/AURORA-1047>.
>
> Although the trusty gc_executor has been solid for a long time, removing
> it would definitely simplify things, so +10.
>
>
> >
> > Lastly, this work should make it much easier to support alternate
> executor
> > implementations (including the Mesos default executor) from Aurora once a
> > proper Aurora API (AURORA-987
> > <https://issues.apache.org/jira/browse/AURORA-987>) is available.
> >
> > ~brian
>

Re: Thermos external component deprecation plan

Posted by Joe Smith <ya...@gmail.com>.
Thanks for the write up!

> On Jan 22, 2015, at 13:27, Brian Wickman <wi...@apache.org> wrote:
> 
> Thermos is a standalone task execution system that is not coupled to Aurora
> or Mesos.  This is why by default, Thermos writes out of the sandbox
> (/var/run/thermos), has a separate observability system (Thermos observer),
> and CLI (thermos.)
> 
> Aurora built a Thermos executor as its default executor, but the scheduler
> is not architecturally tied to Thermos (or vice versa.)  In order to make
> things work smoothly with this decoupling, a Thermos-specific GC executor
> is also necessary to clean up the state leftover by the execution of
> Thermos tasks and reconcile potential conflicts between the state of the
> Mesos master and Aurora scheduler.
> 
> Both the GC executor and Thermos observer violate some of the philosophical
> axioms of Mesos (e.g. out-of-sandbox access.)  They also significantly
> increase the complexity of building, deploying and maintaining Aurora.  I'm
> proposing removing both of them as required Aurora components.
> 
> In order to do this and make Thermos/Aurora/Mesos to play together more
> nicely, several things are necessary.
> 
> 1) Moving /var/run/thermos for each task into the Mesos sandbox
> 
> Thermos is a state machine with all state transitions persisted to disk.
> Right now this goes to /var/run/thermos, but it should instead be persisted
> some place relative to the Mesos sandbox so that the Mesos slave can
> garbage collect this state once a Thermos task has completed.
> 
> This poses a task detection problem -- the Thermos CLI and Thermos observer
> rely upon the existence of /var/run/thermos to know what tasks are running,
> so we will need to develop a plugin to detect alternate task roots (see
> AURORA-1024 <https://issues.apache.org/jira/browse/AURORA-1024> AURORA-1025
> <https://issues.apache.org/jira/browse/AURORA-1025> AURORA-1026
> <https://issues.apache.org/jira/browse/AURORA-1026> AURORA-1027
> <https://issues.apache.org/jira/browse/AURORA-1025>).
> 
> 2) Making the Thermos executor responsible for the Thermos UI
> 
> In order to make the Thermos observer an optional component, the Thermos
> executor will need to assume Thermos observer responsibilities.  Since the
> Mesos slave already provides a webserver to serve executor sandboxes, I am
> proposing that the Thermos executor generates static HTML content that can
> be served by the Mesos slave as a UI.  This means that the executor can
> remain lean (no embedded webserver.)  See AURORA-725
> <https://issues.apache.org/jira/browse/AURORA-725> AURORA-777
> <https://issues.apache.org/jira/browse/AURORA-777>
> 
> 3) Making the Aurora scheduler responsible for state reconciliation
> 
> The last component that should be removed is the GC executor.  The GC
> executor performs the important task of state reconciliation, but this is
> now supported directly by the Mesos master.  See AURORA-715
> <https://issues.apache.org/jira/browse/AURORA-715> and specifically
> AURORA-1047 <https://issues.apache.org/jira/browse/AURORA-1047>.

Although the trusty gc_executor has been solid for a long time, removing it would definitely simplify things, so +10.


> 
> Lastly, this work should make it much easier to support alternate executor
> implementations (including the Mesos default executor) from Aurora once a
> proper Aurora API (AURORA-987
> <https://issues.apache.org/jira/browse/AURORA-987>) is available.
> 
> ~brian