You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@mesos.apache.org by Gilbert Song <so...@gmail.com> on 2017/10/03 01:19:28 UTC

Re: Review Request 56721: Implemented pruneImages with a mark and sweep in docker store.

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/56721/#review186320
-----------------------------------------------------------




src/slave/containerizer/containerizer.hpp
Lines 156 (patched)
<https://reviews.apache.org/r/56721/#comment262815>

    Could we add some comments?



src/slave/containerizer/containerizer.hpp
Lines 156-157 (patched)
<https://reviews.apache.org/r/56721/#comment263289>

    maybe return failure in this virtual function by default? so that you don't need to do that on docker containerizer.



src/slave/containerizer/docker.hpp
Lines 112-114 (patched)
<https://reviews.apache.org/r/56721/#comment263290>

    not needed.



src/slave/containerizer/docker.cpp
Lines 873-880 (patched)
<https://reviews.apache.org/r/56721/#comment263291>

    not needed.



src/slave/containerizer/mesos/containerizer.cpp
Lines 2628 (patched)
<https://reviews.apache.org/r/56721/#comment263693>

    Need to comment here since it depends on the checkpointed containerconfig introduced recently.



src/slave/containerizer/mesos/provisioner/docker/metadata_manager.cpp
Lines 145-147 (patched)
<https://reviews.apache.org/r/56721/#comment263766>

    two more spaces?



src/slave/containerizer/mesos/provisioner/docker/metadata_manager.cpp
Lines 209-211 (patched)
<https://reviews.apache.org/r/56721/#comment263776>

    three scenarios:
    1. checkpointed cache is cleaned up for some reasons
    2. the `activeImages` contains `excludedImages`, which the operator specifys some images that are never pulled before on this agent.
    
    and I just think of the 3rd case:
    3. if a couple big images are still being pulled (pulling started before `prune()`), it means thay are not in `storedImages` yet.
    4. same as #3, we just unlock the store and resume the image pulling, then the requests in queue will be executed simultanuously, but another prune() call come in from the operator for some reason.
    
    for the case of #3 and #4, we may need to fix:
    https://github.com/apache/mesos/blob/master/src/slave/containerizer/mesos/provisioner/docker/registry_puller.cpp#L384~#L388
    
    because container launch may fail if any of the layers are included in those ongoing pull but still get marked from pre-existed unused images.



src/slave/containerizer/mesos/provisioner/docker/metadata_manager.cpp
Lines 212 (patched)
<https://reviews.apache.org/r/56721/#comment263717>

    extra space?



src/slave/containerizer/mesos/provisioner/docker/metadata_manager.cpp
Lines 217 (patched)
<https://reviews.apache.org/r/56721/#comment263786>

    It means your cache is always empty.
    
    `retainedImages` instead?



src/slave/containerizer/mesos/provisioner/docker/metadata_manager.cpp
Lines 219 (patched)
<https://reviews.apache.org/r/56721/#comment263787>

    s/image.get().layer_ids()/image->layer_ids()/g



src/slave/containerizer/mesos/provisioner/docker/metadata_manager.cpp
Lines 228 (patched)
<https://reviews.apache.org/r/56721/#comment263788>

    rename it as `unusedLayers`?



src/slave/containerizer/mesos/provisioner/docker/paths.hpp
Lines 30-42 (original), 30-42 (patched)
<https://reviews.apache.org/r/56721/#comment263708>

    Could you fix the comment as well?



src/slave/containerizer/mesos/provisioner/docker/paths.cpp
Lines 106 (patched)
<https://reviews.apache.org/r/56721/#comment263714>

    Understand this dir is used for sweeping only since marking is already done. but as a helper, we should follow whatever named for this dir, e.g., `gc`.
    
    maybe call it `getGcPath()`? (we should use the same patern in this file e.g., xxxxPath(), the `getStagingDir()` is a tech debt).



src/slave/containerizer/mesos/provisioner/docker/paths.cpp
Lines 112 (patched)
<https://reviews.apache.org/r/56721/#comment263715>

    getGcLayerPath()?



src/slave/containerizer/mesos/provisioner/docker/paths.cpp
Lines 114-115 (patched)
<https://reviews.apache.org/r/56721/#comment263712>

    avoid this extra var?



src/slave/containerizer/mesos/provisioner/docker/paths.cpp
Lines 115 (patched)
<https://reviews.apache.org/r/56721/#comment263718>

    are you prefer namoseconds than milliseconds for precise?



src/slave/containerizer/mesos/provisioner/docker/store.cpp
Lines 133 (patched)
<https://reviews.apache.org/r/56721/#comment263730>

    what do you think renaming this struct to `ImagePullingInfo`? since we are having a queue of `Queued` sounds a little weird.



src/slave/containerizer/mesos/provisioner/docker/store.cpp
Lines 145 (patched)
<https://reviews.apache.org/r/56721/#comment263807>

    any reason it is not a boolean?



src/slave/containerizer/mesos/provisioner/docker/store.cpp
Lines 146 (patched)
<https://reviews.apache.org/r/56721/#comment263731>

    s/queued/imagePullingQueue/g



src/slave/containerizer/mesos/provisioner/docker/store.cpp
Lines 280 (patched)
<https://reviews.apache.org/r/56721/#comment263729>

    VLOG(1) << "Queuing to get the image '" << name << "' since images are being pruned";



src/slave/containerizer/mesos/provisioner/docker/store.cpp
Lines 281 (patched)
<https://reviews.apache.org/r/56721/#comment263733>

    newline above.



src/slave/containerizer/mesos/provisioner/docker/store.cpp
Lines 518 (patched)
<https://reviews.apache.org/r/56721/#comment263740>

    s/Another pruning is already happening!/Another image pruning is happening/g



src/slave/containerizer/mesos/provisioner/docker/store.cpp
Lines 536 (patched)
<https://reviews.apache.org/r/56721/#comment263811>

    should be a boolean. and basically this is the lock. we should put it at the very beginning of this method, right below the check of any other pruning happening.
    
    also, let's comment on this lock.



src/slave/containerizer/mesos/provisioner/docker/store.cpp
Lines 551 (patched)
<https://reviews.apache.org/r/56721/#comment263819>

    The store is already unlocked



src/slave/containerizer/mesos/provisioner/docker/store.cpp
Lines 575 (patched)
<https://reviews.apache.org/r/56721/#comment263806>

    ditto.



src/slave/containerizer/mesos/provisioner/docker/store.cpp
Lines 582 (patched)
<https://reviews.apache.org/r/56721/#comment263812>

    quote the path and no `!` at the end?



src/slave/containerizer/mesos/provisioner/docker/store.cpp
Lines 586 (patched)
<https://reviews.apache.org/r/56721/#comment263813>

    ditto.



src/slave/containerizer/mesos/provisioner/docker/store.cpp
Lines 589 (patched)
<https://reviews.apache.org/r/56721/#comment263814>

    VLOG(1)? since there might be hundreds of layers.



src/slave/containerizer/mesos/provisioner/docker/store.cpp
Lines 589-590 (patched)
<https://reviews.apache.org/r/56721/#comment263815>

    quote these 3 vars. basically we should quote all variables that are not generated by mesos.



src/slave/containerizer/mesos/provisioner/docker/store.cpp
Lines 591 (patched)
<https://reviews.apache.org/r/56721/#comment263816>

    newline above.



src/slave/containerizer/mesos/provisioner/docker/store.cpp
Lines 598 (patched)
<https://reviews.apache.org/r/56721/#comment263818>

    what does this comment mean?



src/slave/containerizer/mesos/provisioner/docker/store.cpp
Lines 619 (patched)
<https://reviews.apache.org/r/56721/#comment263821>

    quote the path.



src/slave/containerizer/mesos/provisioner/docker/store.cpp
Lines 621 (patched)
<https://reviews.apache.org/r/56721/#comment263822>

    ditto.



src/slave/containerizer/mesos/provisioner/docker/store.cpp
Lines 623 (patched)
<https://reviews.apache.org/r/56721/#comment263823>

    ditto.


- Gilbert Song


On Sept. 26, 2017, 11:15 a.m., Zhitao Li wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/56721/
> -----------------------------------------------------------
> 
> (Updated Sept. 26, 2017, 11:15 a.m.)
> 
> 
> Review request for mesos, Gilbert Song, Jason Lai, and Jie Yu.
> 
> 
> Bugs: MESOS-4945
>     https://issues.apache.org/jira/browse/MESOS-4945
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> This includes the following changes:
> - add a `pruneImages()` function on the chain of relevant classes;
> - implement prune in docker store;
> - fix mock interface to keep existing tests pass.
> 
> 
> Diffs
> -----
> 
>   src/slave/containerizer/composing.hpp 06d68eef5de7745e32f0e808f11016bcc285dd8f 
>   src/slave/containerizer/composing.cpp 587f009384f0c7ef87482686578dc822d3d5b208 
>   src/slave/containerizer/containerizer.hpp 449bb5d0902936faae7bf9bae9c703b219aed842 
>   src/slave/containerizer/docker.hpp b602a5698cae12686f51c4b9370a06042cda6270 
>   src/slave/containerizer/docker.cpp 292eecbca246edf068ec8c262aff4f3ce9cd8c67 
>   src/slave/containerizer/mesos/containerizer.hpp cc23b4d91be16fc95a131c09d07378b801e34d6f 
>   src/slave/containerizer/mesos/containerizer.cpp 4d5dc13f363f5d8886983d7dd06a5cecc177c345 
>   src/slave/containerizer/mesos/provisioner/docker/metadata_manager.hpp 954da1681778878c8aff6150002e52ecb648d1bb 
>   src/slave/containerizer/mesos/provisioner/docker/metadata_manager.cpp d86afd2a6ff0bf87e624db1c99255c85068bf6ab 
>   src/slave/containerizer/mesos/provisioner/docker/paths.hpp 232c027f8f96da0cb30b957bce4607d3695050d2 
>   src/slave/containerizer/mesos/provisioner/docker/paths.cpp cd684b33eb308ce1eeb4539a5b2d51985d835db7 
>   src/slave/containerizer/mesos/provisioner/docker/store.hpp 1cf68665d33bd40a7605d26c96fb7b618407fdd0 
>   src/slave/containerizer/mesos/provisioner/docker/store.cpp f357710cb19aec3654b0604f7909d068eaf20095 
>   src/slave/containerizer/mesos/provisioner/provisioner.hpp 7cba54ce490d1e6e17081cd7e04fd6759ceddb8e 
>   src/slave/containerizer/mesos/provisioner/provisioner.cpp 450a3b32d69d2882973a6ed4e94e169a0256056b 
>   src/slave/containerizer/mesos/provisioner/store.hpp 01ab83dca79e51b8c96d18ee65705912b0ac8324 
>   src/slave/containerizer/mesos/provisioner/store.cpp cc5cc81e05f29bb0e11ffa13cdb8d63d4397114f 
>   src/tests/containerizer.hpp a778b8581904bacea9eec3ff50c3c009959b5dac 
>   src/tests/containerizer.cpp cd140f4263621a0a33a34b7e062a9ca6cf426e7a 
>   src/tests/containerizer/mock_containerizer.hpp 0adcb01e6c12d6cc4abed1f14fa2df833ffc6569 
> 
> 
> Diff: https://reviews.apache.org/r/56721/diff/8/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Zhitao Li
> 
>


Re: Review Request 56721: Implemented pruneImages with a mark and sweep in docker store.

Posted by Zhitao Li <zh...@gmail.com>.

> On Oct. 3, 2017, 1:19 a.m., Gilbert Song wrote:
> > src/slave/containerizer/containerizer.hpp
> > Lines 156-157 (patched)
> > <https://reviews.apache.org/r/56721/diff/7/?file=1791102#file1791102line156>
> >
> >     maybe return failure in this virtual function by default? so that you don't need to do that on docker containerizer.

I thought about return value when the underlying containerizer does not support `pruneImages` yet. I think returning `Nothing` will be better than failure so that users who mixes Mesos Containerizer and docker containerizer can still use the functionality without dealing with failures.

w.r.t. return something by default, I don't particularly like that because current `Containerizer` class is a pure interface class (i.e, all virtual functions are pure) and I feel keeping it this way is more consistent.

Also, it's actually possible to implement a reasonable `pruneImages` in DockerContainerizer (with something similar to https://github.com/spotify/docker-gc), so I'd like to keep this possibility there.

Please let me know if you think otherwise.


> On Oct. 3, 2017, 1:19 a.m., Gilbert Song wrote:
> > src/slave/containerizer/docker.hpp
> > Lines 112-114 (patched)
> > <https://reviews.apache.org/r/56721/diff/7/?file=1791103#file1791103line112>
> >
> >     not needed.

See comments above.


> On Oct. 3, 2017, 1:19 a.m., Gilbert Song wrote:
> > src/slave/containerizer/docker.cpp
> > Lines 873-880 (patched)
> > <https://reviews.apache.org/r/56721/diff/7/?file=1791104#file1791104line873>
> >
> >     not needed.

See comments above.


> On Oct. 3, 2017, 1:19 a.m., Gilbert Song wrote:
> > src/slave/containerizer/mesos/provisioner/docker/paths.cpp
> > Lines 115 (patched)
> > <https://reviews.apache.org/r/56721/diff/7/?file=1791110#file1791110line115>
> >
> >     are you prefer namoseconds than milliseconds for precise?

Yeah nanoseconds seems less possible to have a conflict. Is there a reason to prefer milliseconds though?


> On Oct. 3, 2017, 1:19 a.m., Gilbert Song wrote:
> > src/slave/containerizer/mesos/provisioner/docker/store.cpp
> > Lines 536 (patched)
> > <https://reviews.apache.org/r/56721/diff/7/?file=1791112#file1791112line536>
> >
> >     should be a boolean. and basically this is the lock. we should put it at the very beginning of this method, right below the check of any other pruning happening.
> >     
> >     also, let's comment on this lock.

Added comment.

I don't think this should be moved to before input validation above.


> On Oct. 3, 2017, 1:19 a.m., Gilbert Song wrote:
> > src/slave/containerizer/mesos/provisioner/docker/store.cpp
> > Lines 598 (patched)
> > <https://reviews.apache.org/r/56721/diff/7/?file=1791112#file1791112line598>
> >
> >     what does this comment mean?

I thought about tracking disk space size of sweeping directory (disk usage or something else), but I realized that any prolonged non-empty sweeping directory is pretty much a bug, so I guess I'll remove this TODO.


- Zhitao


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/56721/#review186320
-----------------------------------------------------------


On Oct. 3, 2017, 5:13 p.m., Zhitao Li wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/56721/
> -----------------------------------------------------------
> 
> (Updated Oct. 3, 2017, 5:13 p.m.)
> 
> 
> Review request for mesos, Gilbert Song, Jason Lai, and Jie Yu.
> 
> 
> Bugs: MESOS-4945
>     https://issues.apache.org/jira/browse/MESOS-4945
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> This includes the following changes:
> - add a `pruneImages()` function on the chain of relevant classes;
> - implement prune in docker store;
> - fix mock interface to keep existing tests pass.
> 
> 
> Diffs
> -----
> 
>   src/slave/containerizer/composing.hpp 06d68eef5de7745e32f0e808f11016bcc285dd8f 
>   src/slave/containerizer/composing.cpp 587f009384f0c7ef87482686578dc822d3d5b208 
>   src/slave/containerizer/containerizer.hpp 449bb5d0902936faae7bf9bae9c703b219aed842 
>   src/slave/containerizer/docker.hpp b602a5698cae12686f51c4b9370a06042cda6270 
>   src/slave/containerizer/docker.cpp 292eecbca246edf068ec8c262aff4f3ce9cd8c67 
>   src/slave/containerizer/mesos/containerizer.hpp cc23b4d91be16fc95a131c09d07378b801e34d6f 
>   src/slave/containerizer/mesos/containerizer.cpp 4d5dc13f363f5d8886983d7dd06a5cecc177c345 
>   src/slave/containerizer/mesos/provisioner/docker/metadata_manager.hpp 954da1681778878c8aff6150002e52ecb648d1bb 
>   src/slave/containerizer/mesos/provisioner/docker/metadata_manager.cpp d86afd2a6ff0bf87e624db1c99255c85068bf6ab 
>   src/slave/containerizer/mesos/provisioner/docker/paths.hpp 232c027f8f96da0cb30b957bce4607d3695050d2 
>   src/slave/containerizer/mesos/provisioner/docker/paths.cpp cd684b33eb308ce1eeb4539a5b2d51985d835db7 
>   src/slave/containerizer/mesos/provisioner/docker/store.hpp 1cf68665d33bd40a7605d26c96fb7b618407fdd0 
>   src/slave/containerizer/mesos/provisioner/docker/store.cpp f357710cb19aec3654b0604f7909d068eaf20095 
>   src/slave/containerizer/mesos/provisioner/provisioner.hpp 7cba54ce490d1e6e17081cd7e04fd6759ceddb8e 
>   src/slave/containerizer/mesos/provisioner/provisioner.cpp 450a3b32d69d2882973a6ed4e94e169a0256056b 
>   src/slave/containerizer/mesos/provisioner/store.hpp 01ab83dca79e51b8c96d18ee65705912b0ac8324 
>   src/slave/containerizer/mesos/provisioner/store.cpp cc5cc81e05f29bb0e11ffa13cdb8d63d4397114f 
>   src/tests/containerizer.hpp a778b8581904bacea9eec3ff50c3c009959b5dac 
>   src/tests/containerizer.cpp cd140f4263621a0a33a34b7e062a9ca6cf426e7a 
>   src/tests/containerizer/mock_containerizer.hpp 0adcb01e6c12d6cc4abed1f14fa2df833ffc6569 
> 
> 
> Diff: https://reviews.apache.org/r/56721/diff/9/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Zhitao Li
> 
>


Re: Review Request 56721: Implemented pruneImages with a mark and sweep in docker store.

Posted by Zhitao Li <zh...@gmail.com>.

> On Oct. 3, 2017, 1:19 a.m., Gilbert Song wrote:
> > src/slave/containerizer/mesos/provisioner/docker/metadata_manager.cpp
> > Lines 209-211 (patched)
> > <https://reviews.apache.org/r/56721/diff/7/?file=1791108#file1791108line209>
> >
> >     three scenarios:
> >     1. checkpointed cache is cleaned up for some reasons
> >     2. the `activeImages` contains `excludedImages`, which the operator specifys some images that are never pulled before on this agent.
> >     
> >     and I just think of the 3rd case:
> >     3. if a couple big images are still being pulled (pulling started before `prune()`), it means thay are not in `storedImages` yet.
> >     4. same as #3, we just unlock the store and resume the image pulling, then the requests in queue will be executed simultanuously, but another prune() call come in from the operator for some reason.
> >     
> >     for the case of #3 and #4, we may need to fix:
> >     https://github.com/apache/mesos/blob/master/src/slave/containerizer/mesos/provisioner/docker/registry_puller.cpp#L384~#L388
> >     
> >     because container launch may fail if any of the layers are included in those ongoing pull but still get marked from pre-existed unused images.

Case 1 means an operator or unknown manually played with checkpointed cache, so there is nothing we can really do;
Case 2 is an acceptable case IMO;
Case 3 and case 4 could be real problem. One straightfoward fix I can imagine is to require some "mutual exclusive" between any image pulling and pruning. For example, if `store` class received a `pruning` request, we simply lock the store but wait until ongoing image pulling to finish (and not admit any future image pulling until this pruning finishes its marking phase).

I understand this has some throughput implication but it seems easiest to implement. What do you think?


- Zhitao


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/56721/#review186320
-----------------------------------------------------------


On Oct. 3, 2017, 5:13 p.m., Zhitao Li wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/56721/
> -----------------------------------------------------------
> 
> (Updated Oct. 3, 2017, 5:13 p.m.)
> 
> 
> Review request for mesos, Gilbert Song, Jason Lai, and Jie Yu.
> 
> 
> Bugs: MESOS-4945
>     https://issues.apache.org/jira/browse/MESOS-4945
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> This includes the following changes:
> - add a `pruneImages()` function on the chain of relevant classes;
> - implement prune in docker store;
> - fix mock interface to keep existing tests pass.
> 
> 
> Diffs
> -----
> 
>   src/slave/containerizer/composing.hpp 06d68eef5de7745e32f0e808f11016bcc285dd8f 
>   src/slave/containerizer/composing.cpp 587f009384f0c7ef87482686578dc822d3d5b208 
>   src/slave/containerizer/containerizer.hpp 449bb5d0902936faae7bf9bae9c703b219aed842 
>   src/slave/containerizer/docker.hpp b602a5698cae12686f51c4b9370a06042cda6270 
>   src/slave/containerizer/docker.cpp 292eecbca246edf068ec8c262aff4f3ce9cd8c67 
>   src/slave/containerizer/mesos/containerizer.hpp cc23b4d91be16fc95a131c09d07378b801e34d6f 
>   src/slave/containerizer/mesos/containerizer.cpp 4d5dc13f363f5d8886983d7dd06a5cecc177c345 
>   src/slave/containerizer/mesos/provisioner/docker/metadata_manager.hpp 954da1681778878c8aff6150002e52ecb648d1bb 
>   src/slave/containerizer/mesos/provisioner/docker/metadata_manager.cpp d86afd2a6ff0bf87e624db1c99255c85068bf6ab 
>   src/slave/containerizer/mesos/provisioner/docker/paths.hpp 232c027f8f96da0cb30b957bce4607d3695050d2 
>   src/slave/containerizer/mesos/provisioner/docker/paths.cpp cd684b33eb308ce1eeb4539a5b2d51985d835db7 
>   src/slave/containerizer/mesos/provisioner/docker/store.hpp 1cf68665d33bd40a7605d26c96fb7b618407fdd0 
>   src/slave/containerizer/mesos/provisioner/docker/store.cpp f357710cb19aec3654b0604f7909d068eaf20095 
>   src/slave/containerizer/mesos/provisioner/provisioner.hpp 7cba54ce490d1e6e17081cd7e04fd6759ceddb8e 
>   src/slave/containerizer/mesos/provisioner/provisioner.cpp 450a3b32d69d2882973a6ed4e94e169a0256056b 
>   src/slave/containerizer/mesos/provisioner/store.hpp 01ab83dca79e51b8c96d18ee65705912b0ac8324 
>   src/slave/containerizer/mesos/provisioner/store.cpp cc5cc81e05f29bb0e11ffa13cdb8d63d4397114f 
>   src/tests/containerizer.hpp a778b8581904bacea9eec3ff50c3c009959b5dac 
>   src/tests/containerizer.cpp cd140f4263621a0a33a34b7e062a9ca6cf426e7a 
>   src/tests/containerizer/mock_containerizer.hpp 0adcb01e6c12d6cc4abed1f14fa2df833ffc6569 
> 
> 
> Diff: https://reviews.apache.org/r/56721/diff/9/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Zhitao Li
> 
>