You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@myriad.apache.org by "Adam B (JIRA)" <ji...@apache.org> on 2015/11/06 12:42:27 UTC

[jira] [Commented] (MYRIAD-18) staging - pending loop

    [ https://issues.apache.org/jira/browse/MYRIAD-18?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14993548#comment-14993548 ] 

Adam B commented on MYRIAD-18:
------------------------------

TASK_LOST can occur for many reasons, including a network partition or lost/crashed agent. Generally this message implies that restarting the task may be successful, as opposed to a TASK_FAILED/TASK_ERROR where a retry is likely/guaranteed to fail again.
Other TASK_LOST scenarios:
- The scheduler driver is disconnected from the Mesos master at the time of an acceptOffers (e.g. launchTasks) call from the scheduler.
- Accept/Launch call uses invalid/rescinded offers. (Maybe this should be a TASK_ERROR?)
- Master asked to launch a task on an agent that has since been removed or disconnected.
- Tried to reconcile a task unknown to Mesos.
- When a master discovers that a slave process has exited, it reports TASK_LOST for any tasks from non-checkpointing frameworks.
- If an agent is shutdown/removed completely, then all tasks will report a TASK_LOST.
- Upon agent reregistration, any tasks known by the master but unknown by the agent will report TASK_LOST.
- Agent could not launch the task because it failed to unschedule directories for garbage collection.
- If the task/executor uses persistent volumes unknown to the agent.
- If the agent is asked to run a task using an existing executor that is terminating/terminated.
- Agent asked to killTask for an unrecognized executor.
- Executor reregistration timeout expired.
- Failed to update resources for executor container (e.g. grow to launch new task).
- Container/executor preempted by QoS controller.

> staging - pending loop
> ----------------------
>
>                 Key: MYRIAD-18
>                 URL: https://issues.apache.org/jira/browse/MYRIAD-18
>             Project: Myriad
>          Issue Type: Bug
>            Reporter: Maysam Yabandeh
>
> if staging task is lost for any reason it gets stuck in a staging-pending loop.
>             case TASK_LOST:
>                 schedulerState.makeTaskPending(taskId);
>                 break;



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)