You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@mesos.apache.org by Evers Benno <be...@yandex-team.ru> on 2016/05/03 14:49:19 UTC

Status acknowledgements in MesosExecutor

Hi,

I was wondering about the semantics of the Executor::sendStatusUpdate()
method. It is described as

    // Sends a status update to the framework scheduler, retrying as
    // necessary until an acknowledgement has been received or the
    // executor is terminated (in which case, a TASK_LOST status update
    // will be sent). See Scheduler::statusUpdate for more information
    // about status update acknowledgements.

I was understanding this to say that the function blocks until an
acknowledgement is received, but looking at the implementation of
MesosExecutor it seems that "retrying as necessary" only means
re-sending of unacknowledged updates when the slave reconnects.
(exec/exec.cpp:274)

I'm wondering because we're currently running a python executor which
ends its life like this:

    driver.sendStatusUpdate(_create_task_status(TASK_FINISHED))
    driver.stop()
    # in a different thread:
    sys.exit(0 if driver.run() == mesos_pb2.DRIVER_STOPPED else 1)

and we're seeing situations (roughly once per 10,000 tasks) where the
executor process is reaped before the acknowledgement for TASK_FINISHED
is sent from slave to executor. This results in mesos generating a
TASK_FAILED status update, probably from
Slave::sendExecutorTerminatedStatusUpdate().

So, did I misunderstand how MesosExecutor works? Or is it indeed a race,
and we have to change the executor shutdown?

Best regards,
Benno

Re: Status acknowledgements in MesosExecutor

Posted by Evers Benno <be...@yandex-team.ru>.

Alex,

thanks, that put me on the right track

Seems like the executor driver is indeed not waiting for
acknowledgements before stopping, so, as observed by Yan Xu and Vinod
Kone in MESOS-243 <https://issues.apache.org/jira/browse/MESOS-243>:

> The right fix for this for stop() to wait for ACKs. We will do this
once acks are plumbed through to the executor.

Without doing this, it seems like MESOS-4111 wouldn't prevent this from
happening, since (afaict) the race is between waitpid() and read() on
the slave.

Best regards,
Benno

On 03.05.2016 15:37, Alex Rukletsov wrote:
> Benno\u2014
> 
> you may be seeing MESOS-4111 
> <https://issues.apache.org/jira/browse/MESOS-4111>. Also, have a
> look at this comment: 
> https://github.com/apache/mesos/blob/9f472b1eff904d0d96063d3bed535a8e81263d69/src/launcher/executor.cpp#L611-L617
>
>
> 
On Tue, May 3, 2016 at 2:49 PM, Evers Benno <be...@yandex-team.ru>
> wrote:
> 
>> Hi,
>> 
>> I was wondering about the semantics of the 
>> Executor::sendStatusUpdate() method. It is described as
>> 
>> // Sends a status update to the framework scheduler, retrying as
>> // necessary until an acknowledgement has been received or the // 
>> executor is terminated (in which case, a TASK_LOST status update
>> // will be sent). See Scheduler::statusUpdate for more information
>> // about status update acknowledgements.
>> 
>> I was understanding this to say that the function blocks until an 
>> acknowledgement is received, but looking at the implementation of 
>> MesosExecutor it seems that "retrying as necessary" only means 
>> re-sending of unacknowledged updates when the slave reconnects. 
>> (exec/exec.cpp:274)
>> 
>> I'm wondering because we're currently running a python executor 
>> which ends its life like this:
>> 
>> driver.sendStatusUpdate(_create_task_status(TASK_FINISHED)) 
>> driver.stop() # in a different thread: sys.exit(0 if driver.run() 
>> == mesos_pb2.DRIVER_STOPPED else 1)
>> 
>> and we're seeing situations (roughly once per 10,000 tasks) where 
>> the executor process is reaped before the acknowledgement for 
>> TASK_FINISHED is sent from slave to executor. This results in
>> mesos generating a TASK_FAILED status update, probably from 
>> Slave::sendExecutorTerminatedStatusUpdate().
>> 
>> So, did I misunderstand how MesosExecutor works? Or is it indeed a 
>> race, and we have to change the executor shutdown?
>> 
>> Best regards, Benno
>> 
>

Re: Status acknowledgements in MesosExecutor

Posted by Anand Mazumdar <an...@mesosphere.io>.

Also, we would be modifying the agent to always acknowledge status updates from the executor. (MESOS-5262 <https://issues.apache.org/jira/browse/MESOS-5262>)

Once, that is done, it should be sufficient for an executor to terminate itself on receiving an acknowledgment message from the agent, instead of relying on the best effort hack of sleeping for some duration.

-anand

> On May 3, 2016, at 6:37 AM, Alex Rukletsov <al...@mesosphere.com> wrote:
> 
> Benno—
> 
> you may be seeing MESOS-4111
> <https://issues.apache.org/jira/browse/MESOS-4111>. Also, have a look at
> this comment:
> https://github.com/apache/mesos/blob/9f472b1eff904d0d96063d3bed535a8e81263d69/src/launcher/executor.cpp#L611-L617
> 
> On Tue, May 3, 2016 at 2:49 PM, Evers Benno <be...@yandex-team.ru> wrote:
> 
>> Hi,
>> 
>> I was wondering about the semantics of the Executor::sendStatusUpdate()
>> method. It is described as
>> 
>>    // Sends a status update to the framework scheduler, retrying as
>>    // necessary until an acknowledgement has been received or the
>>    // executor is terminated (in which case, a TASK_LOST status update
>>    // will be sent). See Scheduler::statusUpdate for more information
>>    // about status update acknowledgements.
>> 
>> I was understanding this to say that the function blocks until an
>> acknowledgement is received, but looking at the implementation of
>> MesosExecutor it seems that "retrying as necessary" only means
>> re-sending of unacknowledged updates when the slave reconnects.
>> (exec/exec.cpp:274)
>> 
>> I'm wondering because we're currently running a python executor which
>> ends its life like this:
>> 
>>    driver.sendStatusUpdate(_create_task_status(TASK_FINISHED))
>>    driver.stop()
>>    # in a different thread:
>>    sys.exit(0 if driver.run() == mesos_pb2.DRIVER_STOPPED else 1)
>> 
>> and we're seeing situations (roughly once per 10,000 tasks) where the
>> executor process is reaped before the acknowledgement for TASK_FINISHED
>> is sent from slave to executor. This results in mesos generating a
>> TASK_FAILED status update, probably from
>> Slave::sendExecutorTerminatedStatusUpdate().
>> 
>> So, did I misunderstand how MesosExecutor works? Or is it indeed a race,
>> and we have to change the executor shutdown?
>> 
>> Best regards,
>> Benno
>>

Re: Status acknowledgements in MesosExecutor

Posted by Alex Rukletsov <al...@mesosphere.com>.

Benno—

you may be seeing MESOS-4111
<https://issues.apache.org/jira/browse/MESOS-4111>. Also, have a look at
this comment:
https://github.com/apache/mesos/blob/9f472b1eff904d0d96063d3bed535a8e81263d69/src/launcher/executor.cpp#L611-L617

On Tue, May 3, 2016 at 2:49 PM, Evers Benno <be...@yandex-team.ru> wrote:

> Hi,
>
> I was wondering about the semantics of the Executor::sendStatusUpdate()
> method. It is described as
>
>     // Sends a status update to the framework scheduler, retrying as
>     // necessary until an acknowledgement has been received or the
>     // executor is terminated (in which case, a TASK_LOST status update
>     // will be sent). See Scheduler::statusUpdate for more information
>     // about status update acknowledgements.
>
> I was understanding this to say that the function blocks until an
> acknowledgement is received, but looking at the implementation of
> MesosExecutor it seems that "retrying as necessary" only means
> re-sending of unacknowledged updates when the slave reconnects.
> (exec/exec.cpp:274)
>
> I'm wondering because we're currently running a python executor which
> ends its life like this:
>
>     driver.sendStatusUpdate(_create_task_status(TASK_FINISHED))
>     driver.stop()
>     # in a different thread:
>     sys.exit(0 if driver.run() == mesos_pb2.DRIVER_STOPPED else 1)
>
> and we're seeing situations (roughly once per 10,000 tasks) where the
> executor process is reaped before the acknowledgement for TASK_FINISHED
> is sent from slave to executor. This results in mesos generating a
> TASK_FAILED status update, probably from
> Slave::sendExecutorTerminatedStatusUpdate().
>
> So, did I misunderstand how MesosExecutor works? Or is it indeed a race,
> and we have to change the executor shutdown?
>
> Best regards,
> Benno
>

Re: Status acknowledgements in MesosExecutor

Posted by Anand Mazumdar <an...@mesosphere.io>.

Hi Evers,

Thanks for taking this on. Vinod has agreed to shepherd this and I would be happy to be the initial reviewer for the patches.

-anand


> On Jun 1, 2016, at 10:27 AM, Evers Benno <be...@yandex-team.ru> wrote:
> 
> Some more context about this bug:
> 
> We did some tests with a framework that does nothing but send empty
> tasks and sample executor that does nothing but send TASK_FINISHED and
> shut itself down.
> 
> Running on two virtual machines on the same host (i.e. no network
> involved), we see TASK_FAILED in about 3% of all tasks (271 out of
> 9000). Adding some megabytes of data into update.data, this can go up
> to 80%. In all cases where I looked manually, the logs look like this:
> (id's shortened to three characters for better readability)
> 
> [...]
> I0502 14:40:33.151075 394179 slave.cpp:3002] Handling status update
> TASK_FINISHED (UUID: 20c) for task 24c of framework f20 from
> executor(1)@[2a02:6b8:0:1a16::165]:49266
> I0502 14:40:33.151175 394179 slave.cpp:3528]
> executor(1)@[2a02:6b8:0:1a16::165]:49266 exited
> I0502 14:40:33.151190 394179 slave.cpp:3886] Executor 'executor_24c' of
> framework f20 exited with status 0
> I0502 14:40:33.151216 394179 slave.cpp:3002] Handling status update
> TASK_FAILED (UUID: 01b) for task 24c of framework f20 from @0.0.0.0:0
> [...]
> 
> The random failure chance is a bit too high to ignore, so we're
> currently writing/testing a patch to wait for confirmations for all
> status updates on executor shutdown.
> 
> It would be great if someone would like to shepherd this.
> 
> Best regards,
> Benno
> 
> On 03.05.2016 14:49, Evers Benno wrote:
>> Hi,
>> 
>> I was wondering about the semantics of the Executor::sendStatusUpdate()
>> method. It is described as
>> 
>>    // Sends a status update to the framework scheduler, retrying as
>>    // necessary until an acknowledgement has been received or the
>>    // executor is terminated (in which case, a TASK_LOST status update
>>    // will be sent). See Scheduler::statusUpdate for more information
>>    // about status update acknowledgements.
>> 
>> I was understanding this to say that the function blocks until an
>> acknowledgement is received, but looking at the implementation of
>> MesosExecutor it seems that "retrying as necessary" only means
>> re-sending of unacknowledged updates when the slave reconnects.
>> (exec/exec.cpp:274)
>> 
>> I'm wondering because we're currently running a python executor which
>> ends its life like this:
>> 
>>    driver.sendStatusUpdate(_create_task_status(TASK_FINISHED))
>>    driver.stop()
>>    # in a different thread:
>>    sys.exit(0 if driver.run() == mesos_pb2.DRIVER_STOPPED else 1)
>> 
>> and we're seeing situations (roughly once per 10,000 tasks) where the
>> executor process is reaped before the acknowledgement for TASK_FINISHED
>> is sent from slave to executor. This results in mesos generating a
>> TASK_FAILED status update, probably from
>> Slave::sendExecutorTerminatedStatusUpdate().
>> 
>> So, did I misunderstand how MesosExecutor works? Or is it indeed a race,
>> and we have to change the executor shutdown?
>> 
>> Best regards,
>> Benno
>>

Re: Status acknowledgements in MesosExecutor

Posted by Evers Benno <be...@yandex-team.ru>.

Some more context about this bug:

We did some tests with a framework that does nothing but send empty
tasks and sample executor that does nothing but send TASK_FINISHED and
shut itself down.

Running on two virtual machines on the same host (i.e. no network
involved), we see TASK_FAILED in about 3% of all tasks (271 out of
9000). Adding some megabytes of data into update.data, this can go up
to 80%. In all cases where I looked manually, the logs look like this:
(id's shortened to three characters for better readability)

[...]
I0502 14:40:33.151075 394179 slave.cpp:3002] Handling status update
TASK_FINISHED (UUID: 20c) for task 24c of framework f20 from
executor(1)@[2a02:6b8:0:1a16::165]:49266
I0502 14:40:33.151175 394179 slave.cpp:3528]
executor(1)@[2a02:6b8:0:1a16::165]:49266 exited
I0502 14:40:33.151190 394179 slave.cpp:3886] Executor 'executor_24c' of
framework f20 exited with status 0
I0502 14:40:33.151216 394179 slave.cpp:3002] Handling status update
TASK_FAILED (UUID: 01b) for task 24c of framework f20 from @0.0.0.0:0
[...]

The random failure chance is a bit too high to ignore, so we're
currently writing/testing a patch to wait for confirmations for all
status updates on executor shutdown.

It would be great if someone would like to shepherd this.

Best regards,
Benno

On 03.05.2016 14:49, Evers Benno wrote:
> Hi,
> 
> I was wondering about the semantics of the Executor::sendStatusUpdate()
> method. It is described as
> 
>     // Sends a status update to the framework scheduler, retrying as
>     // necessary until an acknowledgement has been received or the
>     // executor is terminated (in which case, a TASK_LOST status update
>     // will be sent). See Scheduler::statusUpdate for more information
>     // about status update acknowledgements.
> 
> I was understanding this to say that the function blocks until an
> acknowledgement is received, but looking at the implementation of
> MesosExecutor it seems that "retrying as necessary" only means
> re-sending of unacknowledged updates when the slave reconnects.
> (exec/exec.cpp:274)
> 
> I'm wondering because we're currently running a python executor which
> ends its life like this:
> 
>     driver.sendStatusUpdate(_create_task_status(TASK_FINISHED))
>     driver.stop()
>     # in a different thread:
>     sys.exit(0 if driver.run() == mesos_pb2.DRIVER_STOPPED else 1)
> 
> and we're seeing situations (roughly once per 10,000 tasks) where the
> executor process is reaped before the acknowledgement for TASK_FINISHED
> is sent from slave to executor. This results in mesos generating a
> TASK_FAILED status update, probably from
> Slave::sendExecutorTerminatedStatusUpdate().
> 
> So, did I misunderstand how MesosExecutor works? Or is it indeed a race,
> and we have to change the executor shutdown?
> 
> Best regards,
> Benno
>