You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@mesos.apache.org by "Zhitao Li (JIRA)" <ji...@apache.org> on 2016/10/19 08:58:58 UTC
[jira] [Comment Edited] (MESOS-4945) Garbage collect unused docker layers in the store.

    [ https://issues.apache.org/jira/browse/MESOS-4945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15586813#comment-15586813 ] 

Zhitao Li edited comment on MESOS-4945 at 10/19/16 8:58 AM:
------------------------------------------------------------

Revised plan in rough steps:
* For each image, checkpoint a) container ids, b) time of last container using it being destroyed, and c) size of each layer;
    ** TODO: how do deal with migration? idea is passing in more info in recover() chain of containerizer -> provisioner -> store;
* Change store interface:
    ** "get(Image)" to "get(Image, ContainerID)",
        ***The containerID field added can be used to implement ref counting and further book keeping (i.e. get local images information);
    **add "remove(Image, ContainerID)" virtual function;
      *** this is optional: store which does not do ref counting can skip implementing.
*  Make sure provisioner::destroy() call store::remove(Image, ContainerID);
* Add command line flag for docker store capacity limit;
* In (docker) store::get(Image, ContainerID), after a pull is done, calculate total layer sizes, if above store capacity, remove images with empty container ids (aka not used), sorted by last time not used. Any layer not used is also removed, until total size is dropped below capacity.

Open question: 

1) In this design, we have one explicit reference counting between {{Container}} and {{Image}} in store. However, this information could be constructed on-the-fly with all containers in {{Containerizer}} class. Do we consider this "double accounting" problematic, or error-prone?
2) Is calling new {{remove(Image, ContainerID)}} from {{Provisioner::destroy()}} sufficient to make sure all book keepings are properly done?


was (Author: zhitao):
Current plan:

- Add a "cleanup" method to store interface, which takes a {{vector<Image>}} for "images in use";
- store can choose its own implementation of what it wants to cleanup. Deleted images will be returned in a {{Future<vector<Image>>}};
- it's the job of Containerizer/Provisioner to actively prepare the list of "images in use"
    - initially this can simply be done by traversing all active containers, if provisioner already has all information in its memory;
- Initial implementation will add a new flag indicating upper limit of size for docker store directory, and docker::store will delete images until it drops below there;
- The invocation to store::cleanup can happen either in a background timer, upon provisioner::destroy, or before the pull? (I have no real preference, but calling it before pull seems safest if we use space based policy?);
- Initial implementation on store will traverse all images in the store;
- Further optimization including implementing a reference counting and size counting of all images in store, and checkpointing them. We might also need some kind of LRU implementation here.

> Garbage collect unused docker layers in the store.
> --------------------------------------------------
>
>                 Key: MESOS-4945
>                 URL: https://issues.apache.org/jira/browse/MESOS-4945
>             Project: Mesos
>          Issue Type: Improvement
>            Reporter: Jie Yu
>            Assignee: Zhitao Li
>
> Right now, we don't have any garbage collection in place for docker layers. It's not straightforward to implement because we don't know what container is currently using the layer. We probably need a way to track the current usage of layers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)