You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mesos.apache.org by "Benjamin Mahler (JIRA)" <ji...@apache.org> on 2013/12/11 02:50:07 UTC

[jira] [Closed] (MESOS-875) A recovering slave should not ignore valid status updates.

     [ https://issues.apache.org/jira/browse/MESOS-875?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Benjamin Mahler closed MESOS-875.
---------------------------------

    Resolution: Not A Problem

I filed this a bit pre-maturely:

Here, the executor sent an update while the slave was down. At this point the executor driver caches the update, in order to flush it once the slave reconnects to the driver.

However, in this case, executor exited before the executor driver was able to process the reconnect and re-send the update.

For correctness, executors need to know that they cannot send an update and exit while the slave is down, or this will result in their tasks being lost.

> A recovering slave should not ignore valid status updates.
> ----------------------------------------------------------
>
>                 Key: MESOS-875
>                 URL: https://issues.apache.org/jira/browse/MESOS-875
>             Project: Mesos
>          Issue Type: Bug
>    Affects Versions: 0.16.0
>            Reporter: Benjamin Mahler
>            Assignee: Vinod Kone
>            Priority: Critical
>             Fix For: 0.17.0
>
>
> This is a regression due to the bug fix for MESOS-732: https://reviews.apache.org/r/14616/
> Now that slave recovery is asynchronous, status updates coming from the executors will be ignored since the slave does not know about the framework until recovery is completed.
> Example:
> I1210 20:06:51.633050 54429 slave.cpp:1756] Handling status update TASK_FINISHED (UUID: foo) for task T of framework F from executor(1)@IP:PORT
> W1210 20:06:51.633128 54429 slave.cpp:1766] Ignoring status update TASK_FINISHED (UUID: foo) for task T of framework F for unknown framework F



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)