You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mesos.apache.org by "Vinod Kone (JIRA)" <ji...@apache.org> on 2013/03/01 07:05:14 UTC

[jira] [Resolved] (MESOS-152) Slave should forward status updates for unknown tasks

     [ https://issues.apache.org/jira/browse/MESOS-152?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Vinod Kone resolved MESOS-152.
------------------------------

    Resolution: Fixed
    
> Slave should forward status updates for unknown tasks
> -----------------------------------------------------
>
>                 Key: MESOS-152
>                 URL: https://issues.apache.org/jira/browse/MESOS-152
>             Project: Mesos
>          Issue Type: Bug
>            Reporter: Bill Farner
>            Assignee: Vinod Kone
>
> The slave swallows status updates for tasks that it does not recognize.  Due to the way we handle tasks and history in the twitter framework, it would be ideal if these messages were passed along.
> Relevant code in slave.cpp:
>     Executor* executor = framework->getExecutor(status.task_id());
>     if (executor != NULL) {
>       executor->updateTaskState(status.task_id(), status.state());
>       // Handle the task appropriately if it's terminated.
>       if (status.state() == TASK_FINISHED ||
>           status.state() == TASK_FAILED ||
>           status.state() == TASK_KILLED ||
>           status.state() == TASK_LOST) {
>         executor->removeTask(status.task_id());
>         dispatch(isolationModule,
>                  &IsolationModule::resourcesChanged,
>                  framework->id, executor->id, executor->resources);
>       }
>       // Send message and record the status for possible resending.
>       StatusUpdateMessage message;
>       message.mutable_update()->MergeFrom(update);
>       message.set_pid(self());
>       send(master, message);
>       UUID uuid = UUID::fromBytes(update.uuid());
>       // Send us a message to try and resend after some delay.
>       delay(STATUS_UPDATE_RETRY_INTERVAL_SECONDS,
>             self(), &Slave::statusUpdateTimeout,
>             framework->id, uuid);
>       framework->updates[uuid] = update;
>       stats.tasks[status.state()]++;
>       stats.validStatusUpdates++;
>     } else {
>       LOG(WARNING) << "Status update error: couldn't lookup "
>                    << "executor for framework " << update.framework_id();
>       stats.invalidStatusUpdates++;
>     }
> Ideally, this code would behave more like:
>   Look up executor
>   if executor exists:
>     Update executor state
>   else:
>     Log warning
>   send message
> Of course, this is still in a scope where the framework is known.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira