You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@mesos.apache.org by Jacob Janco <jj...@gmail.com> on 2016/12/02 02:58:02 UTC

Re: Review Request 53479: Perform GC asynchronously.


> On Nov. 12, 2016, 12:48 a.m., Jiang Yan Xu wrote:
> > src/slave/gc.hpp, line 120
> > <https://reviews.apache.org/r/53479/diff/1/?file=1554434#file1554434line120>
> >
> >     'removal' and 'rm(dir)' sound too synonymous...
> >     
> >     
> >     To distinguish the two promises, perhaps name this one `gc` (to indicate it's for the overall gc and gc implies a sense of "scheduled for the future" to me). In any case we should use comments to make clear what is for what. 
> >     
> >     e.g.,
> >     
> >     ```
> >     // The promise for the scheduled gc of this path.
> >     const process::Owned<process::Promise<Nothing>> gc;
> >     ```

Agreed, changed to gc/removal.


> On Nov. 12, 2016, 12:48 a.m., Jiang Yan Xu wrote:
> > src/slave/gc.hpp, lines 128-131
> > <https://reviews.apache.org/r/53479/diff/1/?file=1554434#file1554434line128>
> >
> >     This is too low-level for a summary IMO. Some of the details, if necessary, could be put in the method definition.

Removed from summary, added a few more details to the method def.


> On Nov. 12, 2016, 12:48 a.m., Jiang Yan Xu wrote:
> > src/slave/gc.cpp, line 103
> > <https://reviews.apache.org/r/53479/diff/1/?file=1554435#file1554435line103>
> >
> >     Here we can do 
> >     
> >     ```
> >     // Return `false` to be consistent with the behavior when `unschedule` is called after the path is removed.
> >     return info.rmdir->future()
> >       .then([]() { return false; });
> >     ```

Makes sense, there isn't parity between true/false removal and true/false unschedule.


> On Nov. 12, 2016, 12:48 a.m., Jiang Yan Xu wrote:
> > src/slave/gc.cpp, line 199
> > <https://reviews.apache.org/r/53479/diff/1/?file=1554435#file1554435line199>
> >
> >     Would this be an abnormal case? If so we can log a warning.

I think this is defensive, removing. I included this before changes to `reset()`


> On Nov. 12, 2016, 12:48 a.m., Jiang Yan Xu wrote:
> > src/slave/gc.cpp, lines 213-217
> > <https://reviews.apache.org/r/53479/diff/1/?file=1554435#file1554435line213>
> >
> >     4 space indentation and put `self()` on the next line.
> >     
> >     Also, it looks like instead of having one callback for each rmdir, we should be probably `await` on all of them and have the callback for the overall result. This is because we need to reset the timer only when the batch of `rmdir`s completes.
> >     
> >     Of course then the overall callback needs to be informed about which paths this list of futures are for.
> >     
> >     i.e., something like this
> >     
> >     ```
> >     hashmap<string, Future<Nothing>> futures;
> >     
> >     ... In foreach:
> >     
> >     futures[info.path] = async(&os::rmdir, info.path, true, true, true);
> >     
> >     ...
> >     
> >     await(values.values())
> >       .onAny(...);
> >     
> >     ```
> >     
> >     The `_remove()` method can be defined as:
> >     
> >     ```
> >     void GarbageCollectorProcess::_remove(
> >         const Timeout& removalTime,
> >         const hashmap<string, Future<Nothing>>& futures)
> >     {
> >     }
> >     ```

Passing this down into the callback and looking it up in paths/timeouts is cleaner.


> On Nov. 12, 2016, 12:48 a.m., Jiang Yan Xu wrote:
> > src/slave/gc.cpp, lines 225-228
> > <https://reviews.apache.org/r/53479/diff/1/?file=1554435#file1554435line225>
> >
> >     Because you
> >     
> >     ```
> >     if (it->second.rmdir.isSome()) {
> >       continue;
> >     }
> >     ```
> >     
> >     above, it's possible for method to not trigger a `reset()` at all if we move it like this?

If there is a removal to be done i.e. there are removalTime keys in our multimap the continuation `_remove` will trigger a `reset()`.


> On Nov. 12, 2016, 12:48 a.m., Jiang Yan Xu wrote:
> > src/slave/gc.cpp, lines 117-118
> > <https://reviews.apache.org/r/53479/diff/1/?file=1554435#file1554435line117>
> >
> >     Not yours but this should probaly be
> >     
> >     ```
> >     UNREACHABLE() << "Inconsistent state across 'paths' and 'timeouts'";
> >     ```

Hmmm like UNREACHABLE(); with a log line?


> On Nov. 12, 2016, 12:48 a.m., Jiang Yan Xu wrote:
> > src/slave/gc.cpp, lines 188-193
> > <https://reviews.apache.org/r/53479/diff/1/?file=1554435#file1554435line188>
> >
> >     I think we actually have to use the following?
> >     
> >     ```
> >     map<process::Timeout, hashmap<std::string, PathInfo>> paths;
> >     ```
> >     
> >     Because we rmdir asynchronously, when it's finished we need to retrieve the same `PathInfo` by both the removalTime and the path!
> >     
> >     If we do that, we can access all entries by reference so we won't have this problem. However we need to be cautious in cleaning up the outer entry when the inner map is all cleaned up.

Ah hmmm are you saying instead of carrying the timeouts and paths data structures, carry a timeout to path/pathinfo map? I changed the logic to use `paths.get(removalTime)` and index into the new `futures` hashmap with the list of `pathInfo`. Maybe this is cleaner?


> On Nov. 12, 2016, 12:48 a.m., Jiang Yan Xu wrote:
> > src/slave/gc.cpp, line 216
> > <https://reviews.apache.org/r/53479/diff/1/?file=1554435#file1554435line216>
> >
> >     It's probably safe to pass a copy of `PathInfo` here but the fact that it's safe it's not obvious.
> >     
> >     We can probably just pass a `path` so in the continuation we can use `removalTime` and `path` to get the `PathInfo` back and we can just `CHECK` to assert that the entry should be there and add a comment on why it should be there (because we don't erase the any entry with a pending `rmdir`, we wait until it's finished).

Removed in favor of looking up in `_remove`


> On Nov. 12, 2016, 12:48 a.m., Jiang Yan Xu wrote:
> > src/slave/gc.cpp, lines 126-131
> > <https://reviews.apache.org/r/53479/diff/1/?file=1554435#file1554435line126>
> >
> >     As it's written this way (along with the single path `_remove()`), in each batch of `rmdir()`s, the quickest one is going to trigger the next reset, by which time the head of the `paths` could be the same batch.
> >     
> >     We won't have this problem if we invoke `_remove()` once per "batch", in which case we don't need to change this method here.

Right, this is unecessary with the batching since there is not trampling on toes when `_remove` streams in for each op.


- Jacob


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/53479/#review155639
-----------------------------------------------------------


On Nov. 12, 2016, 12:48 a.m., Jacob Janco wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/53479/
> -----------------------------------------------------------
> 
> (Updated Nov. 12, 2016, 12:48 a.m.)
> 
> 
> Review request for mesos, Benjamin Mahler and Jiang Yan Xu.
> 
> 
> Bugs: MESOS-6549
>     https://issues.apache.org/jira/browse/MESOS-6549
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Perform GC asynchronously.
> 
> 
> Diffs
> -----
> 
>   src/slave/gc.hpp 5ea82456cffa374203371f227b6f0ee00296553e 
>   src/slave/gc.cpp 961f547793984449ea113d9664b12b5033638c58 
> 
> Diff: https://reviews.apache.org/r/53479/diff/
> 
> 
> Testing
> -------
> 
> `make check`
> 
> 
> Thanks,
> 
> Jacob Janco
> 
>