You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@mesos.apache.org by "Benjamin Bannier (JIRA)" <ji...@apache.org> on 2019/01/09 20:39:00 UTC

[jira] [Comment Edited] (MESOS-9223) Storage local provider does not sufficiently handle container launch failures or errors

    [ https://issues.apache.org/jira/browse/MESOS-9223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16726759#comment-16726759 ] 

Benjamin Bannier edited comment on MESOS-9223 at 1/9/19 8:38 PM:
-----------------------------------------------------------------

Reviews:
[r/69606|https://reviews.apache.org/r/69606/]
-[r/69607|https://reviews.apache.org/r/69607/]-


was (Author: bbannier):
Reviews:
https://reviews.apache.org/r/69606/
https://reviews.apache.org/r/69607/ 

> Storage local provider does not sufficiently handle container launch failures or errors
> ---------------------------------------------------------------------------------------
>
>                 Key: MESOS-9223
>                 URL: https://issues.apache.org/jira/browse/MESOS-9223
>             Project: Mesos
>          Issue Type: Improvement
>          Components: agent, storage
>            Reporter: Benjamin Bannier
>            Assignee: Benjamin Bannier
>            Priority: Critical
>
> The storage local resource provider as currently implemented does not handle launch failures or task errors of its standalone containers well enough, If e.g., a RP container fails to come up during node start a warning would be logged, but an operator still needs to detect degraded functionality, manually check the state of containers with {{GET_CONTAINERS}}, and decide whether the agent needs restarting; I suspect they do not have always have enough context for this decision. It would be better if the provider would either enforce a restart by failing over the whole agent, or by retrying the operation (optionally: up to some maximum amount of retries).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)