You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@aurora.apache.org by "Maxim Khutornenko (JIRA)" <ji...@apache.org> on 2015/07/27 20:44:05 UTC

[jira] [Commented] (AURORA-1404) Reconcile ASSIGNED tasks that have not transitioned to STARTING

    [ https://issues.apache.org/jira/browse/AURORA-1404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14643199#comment-14643199 ] 

Maxim Khutornenko commented on AURORA-1404:
-------------------------------------------

The response time for stuck ASSIGNED tasks can be improved via AURORA-1370. I think it's generally more robust to kill/reschedule an ASSIGNED task instead of retrying a {{launchTasks}} call for something that's already in-flight.

> Reconcile ASSIGNED tasks that have not transitioned to STARTING
> ---------------------------------------------------------------
>
>                 Key: AURORA-1404
>                 URL: https://issues.apache.org/jira/browse/AURORA-1404
>             Project: Aurora
>          Issue Type: Task
>          Components: Scheduler
>            Reporter: Joshua Cohen
>
> If the Mesos master fails over between the time that Aurora moves a task to {{ASSIGNED}} but before the slave receives the message, those tasks will never transition and eventually be timed out by [TaskTimeout|https://github.com/apache/aurora/blob/master/src/main/java/org/apache/aurora/scheduler/async/TaskTimeout.java].
> Instead it would be better if we had a mechanism similar to [KillRetry|https://github.com/apache/aurora/blob/master/src/main/java/org/apache/aurora/scheduler/async/KillRetry.java] that ensures assigned tasks have transitioned to a running state, and if not transitions them to {{LOST}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)