You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mesos.apache.org by "Till Toenshoff (JIRA)" <ji...@apache.org> on 2014/05/06 02:40:15 UTC

[jira] [Comment Edited] (MESOS-1243) Containerizer::wait return type should be Option

    [ https://issues.apache.org/jira/browse/MESOS-1243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13990141#comment-13990141 ] 

Till Toenshoff edited comment on MESOS-1243 at 5/6/14 12:38 AM:
----------------------------------------------------------------

Recovery:
Right now {{recover}} is not container or executor specific, hence it shouldn't fail just because a single one wasn't recoverable for any reason.

Let me draft this from the ExternalContainerizer's point of view in a failure scenario;
Slave invokes {{launch}} and the EC tries to pass this on to the ECP. Now assume the slave dies prior to the ECP actually being able to launch anything. After a {{recover}} the slave now assumes that the ECP will be able to {{wait}} on that container. The ECP however never {{launch}} ed that container, hence it is unable to {{wait}}, thus is unable to return a {{Termination}}.

So the problem here has to be seen specifically minding that the ECP and the slave may have differing status.

The quick way out of this is to allow that {{Termination}} to be optional. Another way may be to make sure that the container is only checkpointed after a fully achieved launch?


was (Author: tillt):
Recovery:
Right now {{recover}} is not container or executor specific, hence it shouldn't fail just because a single one wasn't recoverable for any reason.

Let me draft this from the ExternalContainerizer's point of view in a failure scenario;
Slave invokes {{launch}} and the EC tries to pass this on to the ECP. Now assume the slave dies prior to the ECP actually being able to launch anything. After a {{recover}} the slave now assumes that the ECP will be able to {{wait}} on that container. The ECP however never {{launch}}ed that container, hence it is unable to {{wait}}, thus is unable to return a {{Termination}}.

So the problem here has to be seen specifically minding that the ECP and the slave may have differing status.

The quick way out of this is to allow that {{Termination}} to be optional. Another way may be to make sure that the container is only checkpointed after a fully achieved launch?

> Containerizer::wait return type should be Option<Termination>
> -------------------------------------------------------------
>
>                 Key: MESOS-1243
>                 URL: https://issues.apache.org/jira/browse/MESOS-1243
>             Project: Mesos
>          Issue Type: Improvement
>            Reporter: Till Toenshoff
>            Priority: Minor
>              Labels: containerizer, external-containerizer, isolation, mesos, mesos-containerizer
>
> The containerizer {{wait}} should return an {{Option<Termination>}} to distinguish the case when it doesn't know about a {{ContainerID}}.



--
This message was sent by Atlassian JIRA
(v6.2#6252)