You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@mesos.apache.org by Benno Evers <be...@mesosphere.com> on 2017/08/23 09:38:49 UTC

Sending TASK_STARTING in the built-in executors

Hi all,

when starting a task, an executor can send out the following status updates:

  - [optional] TASK_STARTING: Sent by the executor when it received the
launch command
  - TASK_RUNNING: Sent by the executor when the task is running

The built-in executors currently don't send out TASK_STARTING updates. I
think this discards potentially valuable information, because TASK_RUNNING
informs us about the current status of the task, but not about the status
change.

For example, if the network connection between scheduler and master is
interrupted during task start, it has no good way to estimate the tasks
start time, because the TASK_RUNNING update that it eventually gets might
be a much later one. Also, for tasks with a long delay between STARTING and
RUNNING, to an outside observer it will look the same as if the task was
stuck in STAGING.

There is a small risk that sending an additional update could break
existing frameworks. We briefly looked through some of the most popular
open-source frameworks and didn't find any major issues, but of course it's
impossible to do an exhaustive check.

In particular, a framework will break if

 1. It runs tasks using one of the built-in mesos executors, and
 2. it doesn't handle the possibility of receiving TASK_STARTING update, and
 3. it reports an error whenever it encounters an unexpected task states in
an update.


If you are aware of any such framework, please speak up so we can consider
it.


Thanks,
-- 
Benno Evers
Software Engineer, Mesosphere

Re: Sending TASK_STARTING in the built-in executors

Posted by Vinod Kone <vi...@apache.org>.

+1 for the change.

On Wed, Aug 23, 2017 at 8:58 AM, Benno Evers <be...@mesosphere.com> wrote:

> I think it's ultimately up to the executor to interpret what "running"
> means exactly. The closest thing to a general definition would probably be
> this from docs/high-availability-framework.md:
>
> > A task transitions to the `TASK_RUNNING` state after it has begun running
> > successfully (if the task fails to start, it transitions to one of the
> > terminal states listed below).
>
> For the current built-in executors, the CommandExecutor sends TASK_RUNNING
> when the process forked successfully and health checks were created, and
> the DockerExecutor when it receives the output of the `docker inspect`
> command for the started container.
>
> On Wed, Aug 23, 2017 at 4:54 PM, James Peach <jo...@gmail.com> wrote:
>
> >
> > > On Aug 23, 2017, at 2:38 AM, Benno Evers <be...@mesosphere.com>
> wrote:
> > >
> > > Hi all,
> > >
> > > when starting a task, an executor can send out the following status
> > updates:
> > >
> > >  - [optional] TASK_STARTING: Sent by the executor when it received the
> > > launch command
> > >  - TASK_RUNNING: Sent by the executor when the task is running
> >
> >
> > How is "running" defined?
> >
> > >
> > > The built-in executors currently don't send out TASK_STARTING updates.
> I
> > > think this discards potentially valuable information, because
> > TASK_RUNNING
> > > informs us about the current status of the task, but not about the
> status
> > > change.
> > >
> > > For example, if the network connection between scheduler and master is
> > > interrupted during task start, it has no good way to estimate the tasks
> > > start time, because the TASK_RUNNING update that it eventually gets
> might
> > > be a much later one. Also, for tasks with a long delay between STARTING
> > and
> > > RUNNING, to an outside observer it will look the same as if the task
> was
> > > stuck in STAGING.
> > >
> > > There is a small risk that sending an additional update could break
> > > existing frameworks. We briefly looked through some of the most popular
> > > open-source frameworks and didn't find any major issues, but of course
> > it's
> > > impossible to do an exhaustive check.
> > >
> > > In particular, a framework will break if
> > >
> > > 1. It runs tasks using one of the built-in mesos executors, and
> > > 2. it doesn't handle the possibility of receiving TASK_STARTING update,
> > and
> > > 3. it reports an error whenever it encounters an unexpected task states
> > in
> > > an update.
> > >
> > >
> > > If you are aware of any such framework, please speak up so we can
> > consider
> > > it.
> > >
> > >
> > > Thanks,
> > > --
> > > Benno Evers
> > > Software Engineer, Mesosphere
> >
> >
>
>
> --
> Benno Evers
> Software Engineer, Mesosphere
>

Re: Sending TASK_STARTING in the built-in executors

Posted by Benno Evers <be...@mesosphere.com>.

I think it's ultimately up to the executor to interpret what "running"
means exactly. The closest thing to a general definition would probably be
this from docs/high-availability-framework.md:

> A task transitions to the `TASK_RUNNING` state after it has begun running
> successfully (if the task fails to start, it transitions to one of the
> terminal states listed below).

For the current built-in executors, the CommandExecutor sends TASK_RUNNING
when the process forked successfully and health checks were created, and
the DockerExecutor when it receives the output of the `docker inspect`
command for the started container.

On Wed, Aug 23, 2017 at 4:54 PM, James Peach <jo...@gmail.com> wrote:

>
> > On Aug 23, 2017, at 2:38 AM, Benno Evers <be...@mesosphere.com> wrote:
> >
> > Hi all,
> >
> > when starting a task, an executor can send out the following status
> updates:
> >
> >  - [optional] TASK_STARTING: Sent by the executor when it received the
> > launch command
> >  - TASK_RUNNING: Sent by the executor when the task is running
>
>
> How is "running" defined?
>
> >
> > The built-in executors currently don't send out TASK_STARTING updates. I
> > think this discards potentially valuable information, because
> TASK_RUNNING
> > informs us about the current status of the task, but not about the status
> > change.
> >
> > For example, if the network connection between scheduler and master is
> > interrupted during task start, it has no good way to estimate the tasks
> > start time, because the TASK_RUNNING update that it eventually gets might
> > be a much later one. Also, for tasks with a long delay between STARTING
> and
> > RUNNING, to an outside observer it will look the same as if the task was
> > stuck in STAGING.
> >
> > There is a small risk that sending an additional update could break
> > existing frameworks. We briefly looked through some of the most popular
> > open-source frameworks and didn't find any major issues, but of course
> it's
> > impossible to do an exhaustive check.
> >
> > In particular, a framework will break if
> >
> > 1. It runs tasks using one of the built-in mesos executors, and
> > 2. it doesn't handle the possibility of receiving TASK_STARTING update,
> and
> > 3. it reports an error whenever it encounters an unexpected task states
> in
> > an update.
> >
> >
> > If you are aware of any such framework, please speak up so we can
> consider
> > it.
> >
> >
> > Thanks,
> > --
> > Benno Evers
> > Software Engineer, Mesosphere
>
>


-- 
Benno Evers
Software Engineer, Mesosphere

Re: Sending TASK_STARTING in the built-in executors

Posted by James Peach <jo...@gmail.com>.

> On Aug 23, 2017, at 2:38 AM, Benno Evers <be...@mesosphere.com> wrote:
> 
> Hi all,
> 
> when starting a task, an executor can send out the following status updates:
> 
>  - [optional] TASK_STARTING: Sent by the executor when it received the
> launch command
>  - TASK_RUNNING: Sent by the executor when the task is running


How is "running" defined?

> 
> The built-in executors currently don't send out TASK_STARTING updates. I
> think this discards potentially valuable information, because TASK_RUNNING
> informs us about the current status of the task, but not about the status
> change.
> 
> For example, if the network connection between scheduler and master is
> interrupted during task start, it has no good way to estimate the tasks
> start time, because the TASK_RUNNING update that it eventually gets might
> be a much later one. Also, for tasks with a long delay between STARTING and
> RUNNING, to an outside observer it will look the same as if the task was
> stuck in STAGING.
> 
> There is a small risk that sending an additional update could break
> existing frameworks. We briefly looked through some of the most popular
> open-source frameworks and didn't find any major issues, but of course it's
> impossible to do an exhaustive check.
> 
> In particular, a framework will break if
> 
> 1. It runs tasks using one of the built-in mesos executors, and
> 2. it doesn't handle the possibility of receiving TASK_STARTING update, and
> 3. it reports an error whenever it encounters an unexpected task states in
> an update.
> 
> 
> If you are aware of any such framework, please speak up so we can consider
> it.
> 
> 
> Thanks,
> -- 
> Benno Evers
> Software Engineer, Mesosphere

Fwd: Sending TASK_STARTING in the built-in executors

Posted by Benno Evers <be...@mesosphere.com>.

Hi all,

when starting a task, an executor can send out the following status updates:

  - [optional] TASK_STARTING: Sent by the executor when it received the
launch command
  - TASK_RUNNING: Sent by the executor when the task is running

The built-in executors currently don't send out TASK_STARTING updates. I
think this discards potentially valuable information, because TASK_RUNNING
informs us about the current status of the task, but not about the status
change.

For example, if the network connection between scheduler and master is
interrupted during task start, it has no good way to estimate the tasks
start time, because the TASK_RUNNING update that it eventually gets might
be a much later one. Also, for tasks with a long delay between STARTING and
RUNNING, to an outside observer it will look the same as if the task was
stuck in STAGING.

There is a small risk that sending an additional update could break
existing frameworks. We briefly looked through some of the most popular
open-source frameworks and didn't find any major issues, but of course it's
impossible to do an exhaustive check.

In particular, a framework will break if

 1. It runs tasks using one of the built-in mesos executors, and
 2. it doesn't handle the possibility of receiving TASK_STARTING update, and
 3. it reports an error whenever it encounters an unexpected task states in
an update.


If you are aware of any such framework, please speak up so we can consider
it.


Thanks,
-- 
Benno Evers
Software Engineer, Mesosphere

Re: Sending TASK_STARTING in the built-in executors

Posted by Benno Evers <be...@mesosphere.com>.

As a follow-up, this change now landed and will likely be part of Mesos
1.5.0.

We did our best to verify that we don't accidentally break existing
frameworks, and the only issue we could find was with chronos (where a fix
was since merged into the stable and master branches).

If you discover that some framework you depend on would choke on this, now
would be a good time to update it before upgrading to Mesos 1.5.0 ;)

Best regards,

On Wed, Aug 23, 2017 at 11:38 AM, Benno Evers <be...@mesosphere.com> wrote:

> Hi all,
>
> when starting a task, an executor can send out the following status
> updates:
>
>   - [optional] TASK_STARTING: Sent by the executor when it received the
> launch command
>   - TASK_RUNNING: Sent by the executor when the task is running
>
> The built-in executors currently don't send out TASK_STARTING updates. I
> think this discards potentially valuable information, because TASK_RUNNING
> informs us about the current status of the task, but not about the status
> change.
>
> For example, if the network connection between scheduler and master is
> interrupted during task start, it has no good way to estimate the tasks
> start time, because the TASK_RUNNING update that it eventually gets might
> be a much later one. Also, for tasks with a long delay between STARTING and
> RUNNING, to an outside observer it will look the same as if the task was
> stuck in STAGING.
>
> There is a small risk that sending an additional update could break
> existing frameworks. We briefly looked through some of the most popular
> open-source frameworks and didn't find any major issues, but of course it's
> impossible to do an exhaustive check.
>
> In particular, a framework will break if
>
>  1. It runs tasks using one of the built-in mesos executors, and
>  2. it doesn't handle the possibility of receiving TASK_STARTING update,
> and
>  3. it reports an error whenever it encounters an unexpected task states
> in an update.
>
>
> If you are aware of any such framework, please speak up so we can consider
> it.
>
>
> Thanks,
> --
> Benno Evers
> Software Engineer, Mesosphere
>



-- 
Benno Evers
Software Engineer, Mesosphere

Re: Sending TASK_STARTING in the built-in executors

Posted by Benno Evers <be...@mesosphere.com>.

As a follow-up, this change now landed and will likely be part of Mesos
1.5.0.

We did our best to verify that we don't accidentally break existing
frameworks, and the only issue we could find was with chronos (where a fix
was since merged into the stable and master branches).

If you discover that some framework you depend on would choke on this, now
would be a good time to update it before upgrading to Mesos 1.5.0 ;)

Best regards,
-- 
Benno Evers
Software Engineer, Mesosphere

On Wed, Aug 23, 2017 at 11:38 AM, Benno Evers <be...@mesosphere.com> wrote:

> Hi all,
>
> when starting a task, an executor can send out the following status
> updates:
>
>   - [optional] TASK_STARTING: Sent by the executor when it received the
> launch command
>   - TASK_RUNNING: Sent by the executor when the task is running
>
> The built-in executors currently don't send out TASK_STARTING updates. I
> think this discards potentially valuable information, because TASK_RUNNING
> informs us about the current status of the task, but not about the status
> change.
>
> For example, if the network connection between scheduler and master is
> interrupted during task start, it has no good way to estimate the tasks
> start time, because the TASK_RUNNING update that it eventually gets might
> be a much later one. Also, for tasks with a long delay between STARTING and
> RUNNING, to an outside observer it will look the same as if the task was
> stuck in STAGING.
>
> There is a small risk that sending an additional update could break
> existing frameworks. We briefly looked through some of the most popular
> open-source frameworks and didn't find any major issues, but of course it's
> impossible to do an exhaustive check.
>
> In particular, a framework will break if
>
>  1. It runs tasks using one of the built-in mesos executors, and
>  2. it doesn't handle the possibility of receiving TASK_STARTING update,
> and
>  3. it reports an error whenever it encounters an unexpected task states
> in an update.
>
>
> If you are aware of any such framework, please speak up so we can consider
> it.
>
>
> Thanks,
> --
> Benno Evers
> Software Engineer, Mesosphere
>



-- 
Benno Evers
Software Engineer, Mesosphere