You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mesos.apache.org by "Niklas Quarfot Nielsen (JIRA)" <ji...@apache.org> on 2014/05/02 00:02:16 UTC
[jira] [Updated] (MESOS-938) Support task replacement and scaling

     [ https://issues.apache.org/jira/browse/MESOS-938?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Niklas Quarfot Nielsen updated MESOS-938:
-----------------------------------------

    Summary: Support task replacement and scaling  (was: Add replace task primitive)

> Support task replacement and scaling
> ------------------------------------
>
>                 Key: MESOS-938
>                 URL: https://issues.apache.org/jira/browse/MESOS-938
>             Project: Mesos
>          Issue Type: Improvement
>          Components: c++ api, master, slave
>            Reporter: Niklas Quarfot Nielsen
>            Assignee: Niklas Quarfot Nielsen
>         Attachments: resource-flow.png, sequence.png
>
>
> A replaceTask primitive could allow easier up and down-scaling by reusing resources from an old task either partially (leaving resource delta available), fully or augmented (backed by additional offers) for launching a new task. Further, recovery logic can be applied which restarts the old task in case of new task failure.
> The signature could be something like:
> replaceTask(TaskID oldTaskId, TaskInfo newTaskInfo, vector<Offer> offers)
> A suggestion of the semantics of replaceTask could be as follows:
> 1) Framework issues replaceTask with the task id of an old running task, a new task info and optionally a set of offers: replaceTask(oldTaskId, newTaskInfo, offers)
> 2) The master validates offers and task (with respect to the resource requirements section) by reusing offer and task visitors, and sends replaceTask request to slave.
>      a) If new task violates resource requirements or consistency, TASK_LOST is reported for the new task and entire request ends.
> 3) Slave stores and checkpoints request (tuple of old task id and new task info), sends killTask to executor and starts timer. Auxiliary resources (from the optional offers) are reserved before issuing the killTask.
>      a) If executor is reregistering, enqueue entire request, reserve resources and start timer.
>      b) If executor is terminating or terminated, TASK_LOST is reported for new task as old task resources cannot be reused.
>      c) If old task is unknown, it might be about to be reregistered by an executor. Enqueue entire request, reserve resources and start timer.
>      d) If timer times out and a status update for terminating state e.g. TASK_KILLED/TERMINATED has not been received, reserved resources are released and send TASK_LOST for new task and replace request is removed. If executor doesn’t report anything, this should be no different than an unresponsive killTask().
> NOTE: Timers started at request start and during executor reregistration can be merged.
> 4) Executor sends terminal status update for old task as usual response to killTask(). However, the slave now intercepts the stage between releasing resources and announcing change to isolator if old task id is marked to be replaced.  In this stage, depending on whether new task consumes less than the old task, unneeded resources are announced available to isolator.
> Thereafter, a launchTask request for the new task is sent to the executor. Reserved resources are released (but not announced available) so they become available for regular launchTask() request.
> 5) Start timer and await TASK_RUNNING update. If a terminal state update (or no update) is received within time out a roll-back is attempted: If new task use more or equal resources than old task, the old task should be able to be restarted. If old task successfully restarted, a new TASK_ROLLEDBACK status update rather then TASK_RUNNING should be reported to framework.
> Resource requirements:
> The resources required for the new task must be covered by:
> - Using less or same resources as old task: T_{new} <= T_{old}
> - Using more than old task resources, but less than sum of old resources and aggregated offer resources: T_{new} <= T_{old} + O_{1} + … + O_{n}
> Any resources provided but not used by the new task are announced as available.
> Fault tolerance (and open questions):
> - What happens if master fails after receiving a replaceTask request?
>      - Master does not store any state on replacements.
> - What happens if slave fails after receiving a replaceTask request?
>      - A slave, however, becomes aware of replacements in flight which either requires 1) Checkpointing replacement requests to stable storage or 2) Let executor be aware of replace request and let reregistration reestablish lost requests.
> - What happens if slave fails while waiting for executor behavior?
>      - Waiting for executor to run
>      - Waiting for old task to reregister
>      - Killing old task while holding up resources for new task
>      - Just after receiving a terminal update for the old task.
>      - Monitoring new task startup
> - Reconciliation behavior
> This is can end up being a complicated operation in the slaves, but requires virtually no change in executors. 
> An alternative approach could relieve slave of replace logic by pushing the responsibility onto executors.
> Thoughts?



--
This message was sent by Atlassian JIRA
(v6.2#6252)