You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@mesos.apache.org by "Qian Zhang (JIRA)" <ji...@apache.org> on 2016/09/13 01:20:21 UTC

[jira] [Commented] (MESOS-6149) Checkpoint used subsystems for containers

    [ https://issues.apache.org/jira/browse/MESOS-6149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15485880#comment-15485880 ] 

Qian Zhang commented on MESOS-6149:
-----------------------------------

I think in MESOS-6063, we have handled the case that agent is restarted with more cgroups subsystems enabled, and in this ticket, we are going to handle the case that agent is restarted with less cgroups subsystems enabled, e.g., before agent is restarted, the enabled subsystems are {{cgroups/cpu,cgroups/mem,cgroups/net_cls}}, after agent is restarted, the enabled subsystems are {{cgroups/cpu,cgroups/mem}}, i.e., {{cgroups/net_cls}} is disabled after agent is restarted.

However, I am not sure if checkpointing used subsystems for container can help to handle this case, because I think once a subsystem is disabled after agent is restarted, even we have checkpointed used subsystems for container, when the container terminates, we still have no chance to do any cleanup for the subsystem which is disabled (because agent will not call that subsystem at all), so the cgroups created for the container will remain there as a garbage data.

One possible solution in my mind is, for the container which is created before agent restarts, we will still use its checkpointed subsystems (even some of them are disabled after agent restart), but for new containers created after agent restarts, we will just use the subsystems enabled in agent.

> Checkpoint used subsystems for containers
> -----------------------------------------
>
>                 Key: MESOS-6149
>                 URL: https://issues.apache.org/jira/browse/MESOS-6149
>             Project: Mesos
>          Issue Type: Improvement
>            Reporter: haosdent
>            Assignee: haosdent
>
> In MESOS-6063, we have tracked recovered and prepared subsystems for containers. To make it works better, we could checkpoint this information and recover it after Agent restart.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)