You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@mesos.apache.org by "Jie Yu (JIRA)" <ji...@apache.org> on 2015/06/09 01:06:01 UTC

[jira] [Commented] (MESOS-2035) Add reason to containerizer proto Termination

    [ https://issues.apache.org/jira/browse/MESOS-2035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14578025#comment-14578025 ] 

Jie Yu commented on MESOS-2035:
-------------------------------

This problem pops again when we are implementing oversubscription. See MESOS-2653 and https://reviews.apache.org/r/34720/ for details.

Here is my proposal for solving this issue:

1) We add a TaskStatus::Reason field in containerizer::Termination protobuf (and deprecate the 'killed' field)
2) In slave's per executor data structure (struct Executor), we maintain an optional 'reason' field. When the slave destroys a container (e.g., due to registration timeout, failed to set resource limits, failed to launch container, qos controller kill, etc.), it will save the 'reason' field in struct Executor.
3) Containerizer is responsible for setting the 'reason' field inside containerizer::Termination (e.g., REASON_MEMORY_LIMIT, REASON_DISK_LIMIT, etc.)
4) In sendExecutorTerminatedStatusUpdate, we look at both reasons (one from slave's executor data structure and one from Termination protobuf). The current proposal is to prefer the reason from Termination protobuf. But in the future, when we allow multiple reasons to be sent (MESOS-2657), we can send both to the scheduler.

> Add reason to containerizer proto Termination
> ---------------------------------------------
>
>                 Key: MESOS-2035
>                 URL: https://issues.apache.org/jira/browse/MESOS-2035
>             Project: Mesos
>          Issue Type: Improvement
>          Components: slave
>    Affects Versions: 0.21.0
>            Reporter: Dominic Hamon
>            Assignee: Joerg Schad
>            Priority: Minor
>
> When an isolator kills a task, the reason is unknown. As part of MESOS-1830, the reason is set to a general one but ideally we would have the termination reason to pass through to the status update.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)