You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@mesos.apache.org by "Qian Zhang (JIRA)" <ji...@apache.org> on 2018/02/14 12:51:00 UTC

[jira] [Commented] (MESOS-8468) `LAUNCH_GROUP` failure tears down the default executor.

    [ https://issues.apache.org/jira/browse/MESOS-8468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16363883#comment-16363883 ] 

Qian Zhang commented on MESOS-8468:
-----------------------------------

https://reviews.apache.org/r/65616/

> `LAUNCH_GROUP` failure tears down the default executor.
> -------------------------------------------------------
>
>                 Key: MESOS-8468
>                 URL: https://issues.apache.org/jira/browse/MESOS-8468
>             Project: Mesos
>          Issue Type: Bug
>    Affects Versions: 1.2.0, 1.3.0, 1.4.0, 1.5.0
>            Reporter: Chun-Hung Hsiao
>            Assignee: Gastón Kleiman
>            Priority: Critical
>              Labels: default-executor, mesosphere
>
> The following code in the default executor (https://github.com/apache/mesos/blob/12be4ba002f2f5ff314fbc16af51d095b0d90e56/src/launcher/default_executor.cpp#L525-L535) shows that if a `LAUNCH_NESTED_CONTAINER` call is failed (say, due to a fetcher failure), the whole executor will be shut down:
> {code:cpp}
> // Check if we received a 200 OK response for all the
> // `LAUNCH_NESTED_CONTAINER` calls. Shutdown the executor
> // if this is not the case.
> foreach (const Response& response, responses.get()) {
>   if (response.code != process::http::Status::OK) {
>     LOG(ERROR) << "Received '" << response.status << "' ("
>                << response.body << ") while launching child container";
>     _shutdown();
>     return;
>   }
> }
> {code}
> This is not expected by a user. Instead, one would expect that a failed `LAUNCH_GROUP` won't affect other task groups launched by the same executor, similar to the case that a task failure only takes down its own task group. We should adjust the semantics to make a failed `LAUNCH_GROUP` not take down the executor and affect other task groups.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)