You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mesos.apache.org by Vinod Kone <vi...@gmail.com> on 2012/09/10 08:37:48 UTC

Re: Review Request: Slave GC based on disk usage

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/6704/
-----------------------------------------------------------

(Updated Sept. 10, 2012, 6:37 a.m.)


Review request for mesos, Benjamin Hindman, John Sirois, and Ben Mahler.


Changes
-------

updated and diff'ed against apache/trunk


Description
-------

Added the ability in the slave to GC executor directories based on the current disk usage.

Currently, it uses a very simple algorithm to decide which directories to delete.


This addresses bug MESOS-254.
    https://issues.apache.org/jira/browse/MESOS-254


Diffs (updated)
-----

  src/slave/constants.hpp 380a2b9 
  src/slave/flags.hpp e99ee8b 
  src/slave/gc.hpp 3760d09 
  src/slave/gc.cpp 5212a41 
  src/slave/slave.hpp b7ab2ab 
  src/slave/slave.cpp 3e2a8d5 
  src/tests/gc_tests.cpp bb7416c 
  src/tests/protobuf_io_tests.cpp 5cc167d 
  src/tests/protobuf_io_tests.cpp.orig PRE-CREATION 
  third_party/libprocess/include/process/timeout.hpp ce0dea7 
  third_party/libprocess/include/stout/duration.hpp e303d2c 
  third_party/libprocess/include/stout/os.hpp df0f7ff 

Diff: https://reviews.apache.org/r/6704/diff/


Testing
-------

make check


Thanks,

Vinod Kone


Re: Review Request: Slave GC based on disk usage

Posted by Vinod Kone <vi...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/6704/
-----------------------------------------------------------

(Updated Sept. 14, 2012, 11:17 p.m.)


Review request for mesos, Benjamin Hindman, John Sirois, and Ben Mahler.


Changes
-------

Ben's


Description
-------

Added the ability in the slave to GC executor directories based on the current disk usage.

Currently, it uses a very simple algorithm to decide which directories to delete.


This addresses bug MESOS-254.
    https://issues.apache.org/jira/browse/MESOS-254


Diffs (updated)
-----

  src/slave/constants.hpp 380a2b9 
  src/slave/flags.hpp e99ee8b 
  src/slave/gc.hpp 3760d09 
  src/slave/gc.cpp 5212a41 
  src/slave/slave.hpp b7ab2ab 
  src/slave/slave.cpp 3e2a8d5 
  src/tests/gc_tests.cpp bb7416c 
  src/tests/utils.hpp 54da799 
  third_party/libprocess/include/process/timeout.hpp ce0dea7 
  third_party/libprocess/include/stout/duration.hpp e303d2c 
  third_party/libprocess/include/stout/os.hpp df0f7ff 

Diff: https://reviews.apache.org/r/6704/diff/


Testing
-------

make check


Thanks,

Vinod Kone


Re: Review Request: Slave GC based on disk usage

Posted by Vinod Kone <vi...@gmail.com>.

> On Sept. 13, 2012, 6:21 p.m., Benjamin Hindman wrote:
> > src/slave/gc.cpp, line 52
> > <https://reviews.apache.org/r/6704/diff/8/?file=152700#file152700line52>
> >
> >     Need a destructor where you loop through all PathInfo objects and discard and delete the Promise<bool>.

good catch, fixed.


> On Sept. 13, 2012, 6:21 p.m., Benjamin Hindman wrote:
> > src/tests/gc_tests.cpp, line 145
> > <https://reviews.apache.org/r/6704/diff/8/?file=152703#file152703line145>
> >
> >     Add a TODO to put this in utils.hpp for other peoples consumption. Also, check out the LaunchTasks action to determine if this solves your problem, or if you could modify LaunchTasks to do so (no worries if you don't do anything about it now, just want to educate ourselves so we don't build two different things which do the same thing).

damn, spent too much trying to use LaunchTasks instead of createTask. should've gone with a TODO :)


- Vinod


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/6704/#review11478
-----------------------------------------------------------


On Sept. 11, 2012, 11:06 p.m., Vinod Kone wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/6704/
> -----------------------------------------------------------
> 
> (Updated Sept. 11, 2012, 11:06 p.m.)
> 
> 
> Review request for mesos, Benjamin Hindman, John Sirois, and Ben Mahler.
> 
> 
> Description
> -------
> 
> Added the ability in the slave to GC executor directories based on the current disk usage.
> 
> Currently, it uses a very simple algorithm to decide which directories to delete.
> 
> 
> This addresses bug MESOS-254.
>     https://issues.apache.org/jira/browse/MESOS-254
> 
> 
> Diffs
> -----
> 
>   src/slave/constants.hpp 380a2b9 
>   src/slave/flags.hpp e99ee8b 
>   src/slave/gc.hpp 3760d09 
>   src/slave/gc.cpp 5212a41 
>   src/slave/slave.hpp b7ab2ab 
>   src/slave/slave.cpp 3e2a8d5 
>   src/tests/gc_tests.cpp bb7416c 
>   third_party/libprocess/include/process/timeout.hpp ce0dea7 
>   third_party/libprocess/include/stout/duration.hpp e303d2c 
>   third_party/libprocess/include/stout/os.hpp df0f7ff 
> 
> Diff: https://reviews.apache.org/r/6704/diff/
> 
> 
> Testing
> -------
> 
> make check
> 
> 
> Thanks,
> 
> Vinod Kone
> 
>


Re: Review Request: Slave GC based on disk usage

Posted by Benjamin Hindman <be...@berkeley.edu>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/6704/#review11478
-----------------------------------------------------------

Ship it!


Minor comments in addition to comments from last review.


src/slave/gc.cpp
<https://reviews.apache.org/r/6704/#comment24583>

    Need a destructor where you loop through all PathInfo objects and discard and delete the Promise<bool>.



src/tests/gc_tests.cpp
<https://reviews.apache.org/r/6704/#comment24595>

    Add a TODO to put this in utils.hpp for other peoples consumption. Also, check out the LaunchTasks action to determine if this solves your problem, or if you could modify LaunchTasks to do so (no worries if you don't do anything about it now, just want to educate ourselves so we don't build two different things which do the same thing).


- Benjamin Hindman


On Sept. 11, 2012, 11:06 p.m., Vinod Kone wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/6704/
> -----------------------------------------------------------
> 
> (Updated Sept. 11, 2012, 11:06 p.m.)
> 
> 
> Review request for mesos, Benjamin Hindman, John Sirois, and Ben Mahler.
> 
> 
> Description
> -------
> 
> Added the ability in the slave to GC executor directories based on the current disk usage.
> 
> Currently, it uses a very simple algorithm to decide which directories to delete.
> 
> 
> This addresses bug MESOS-254.
>     https://issues.apache.org/jira/browse/MESOS-254
> 
> 
> Diffs
> -----
> 
>   src/slave/constants.hpp 380a2b9 
>   src/slave/flags.hpp e99ee8b 
>   src/slave/gc.hpp 3760d09 
>   src/slave/gc.cpp 5212a41 
>   src/slave/slave.hpp b7ab2ab 
>   src/slave/slave.cpp 3e2a8d5 
>   src/tests/gc_tests.cpp bb7416c 
>   third_party/libprocess/include/process/timeout.hpp ce0dea7 
>   third_party/libprocess/include/stout/duration.hpp e303d2c 
>   third_party/libprocess/include/stout/os.hpp df0f7ff 
> 
> Diff: https://reviews.apache.org/r/6704/diff/
> 
> 
> Testing
> -------
> 
> make check
> 
> 
> Thanks,
> 
> Vinod Kone
> 
>


Re: Review Request: Slave GC based on disk usage

Posted by Vinod Kone <vi...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/6704/
-----------------------------------------------------------

(Updated Sept. 11, 2012, 11:06 p.m.)


Review request for mesos, Benjamin Hindman, John Sirois, and Ben Mahler.


Changes
-------

ben's


Description
-------

Added the ability in the slave to GC executor directories based on the current disk usage.

Currently, it uses a very simple algorithm to decide which directories to delete.


This addresses bug MESOS-254.
    https://issues.apache.org/jira/browse/MESOS-254


Diffs (updated)
-----

  src/slave/constants.hpp 380a2b9 
  src/slave/flags.hpp e99ee8b 
  src/slave/gc.hpp 3760d09 
  src/slave/gc.cpp 5212a41 
  src/slave/slave.hpp b7ab2ab 
  src/slave/slave.cpp 3e2a8d5 
  src/tests/gc_tests.cpp bb7416c 
  third_party/libprocess/include/process/timeout.hpp ce0dea7 
  third_party/libprocess/include/stout/duration.hpp e303d2c 
  third_party/libprocess/include/stout/os.hpp df0f7ff 

Diff: https://reviews.apache.org/r/6704/diff/


Testing
-------

make check


Thanks,

Vinod Kone


Re: Review Request: Slave GC based on disk usage

Posted by Benjamin Hindman <be...@berkeley.edu>.

> On Sept. 11, 2012, 5:10 a.m., Benjamin Hindman wrote:
> > src/slave/gc.cpp, line 93
> > <https://reviews.apache.org/r/6704/diff/7/?file=151995#file151995line93>
> >
> >     The timer.creator() seems weird to me. What about capturing exactly the semantics you care about doing something like:
> >     
> >     if (timer.timeout().remaining() == Seconds(0) || removalTime < timer.timeout()) {
> 
> Vinod Kone wrote:
>     i don't think that would work? how would we guarantee that timer.timeout().remaining() is 0? afaict, 'Timer' initializes 'timeout' to Clock::now() and by the time .remaining() is called, some time might have got elapsed!

Just to be clear, if some time has elapsed, shouldn't remaining be 0 (note that the semantics of remaining are _not_ return negative time). And if no time has elapsed, shouldn't remaining be 0 (since timeout == Clock::now)?


> On Sept. 11, 2012, 5:10 a.m., Benjamin Hindman wrote:
> > src/slave/slave.cpp, line 1547
> > <https://reviews.apache.org/r/6704/diff/7/?file=151997#file151997line1547>
> >
> >     We haven't come to a conclusion on the style for putting things inside of a CHECK (except for in tests). For now, please pull it out (like the deleted code) until we decide what we want to do. Thanks!
> 
> Vinod Kone wrote:
>     i'm guessing you have a reason to be inconclusive about this? fixed for now.

Here's my reasoning: in many languages an assert might get removed from a non-debug build, so putting stuff in assert statements was a big no no. In our case, CHECK never gets removed, but it feels enough like an assert that I don't like the habit of sticking stuff in there.


> On Sept. 11, 2012, 5:10 a.m., Benjamin Hindman wrote:
> > src/tests/gc_tests.cpp, line 448
> > <https://reviews.apache.org/r/6704/diff/7/?file=151998#file151998line448>
> >
> >     Once _checkDiskUsage becomes a lambda, you won't be able to test it this way ... any thoughts on how to write a test then?
> 
> Vinod Kone wrote:
>     yea, its tricky! short of mocking the slave, i'm not sure what other options we have.
>     
>     'gc' itself is a private variable, so we cannot call its prune() method directly. may be we can pull up 'gc', to make it visible for testing? but, that doesnt' answer your concern about proliferating private variables up.
>     
>     is there way we could 'catch' a dispatch, mid flight, and change its contents? that seems very hard and an over-kill.

I think two things would need to get mocked in this case. First, you'd want to mock GarbageCollector (and inject it via the constructor) so that we can see when we get prune calls. Second, you'd want to mock os::usage so that it returns a usage value which wold cause the slave to call prune. The former seems trivial, the latter, not so much. Here's one suggestion (we should seriously think about this so that we can write proper tests): create a class OS which includes all of our functions (either as virtual functions or via fast delegates), create a global instance of that called 'os', and replace all calls to 'os::xxx' with 'os.xxx'. Then, any time you want to mock one of those functions, just replace the global 'os' variable with a subclass of OS that mocks what's necessary (i.e., overrides a virtual function or replaces a delegate mapping). Thoughts on this? We shouldn't include this in the review, but re-evaluate it in the near future!


> On Sept. 11, 2012, 5:10 a.m., Benjamin Hindman wrote:
> > third_party/libprocess/include/stout/os.hpp, line 586
> > <https://reviews.apache.org/r/6704/diff/7/?file=152001#file152001line586>
> >
> >     Any reason not to call this du?
> 
> Vinod Kone wrote:
>     I debated quite a bit about naming this guy. I didn't want to use 'du' because I didn't want give an impression that this is like the 'du' unix command. Because, its more like 'df', in the sense that this only cares about file systems not directories. On that note, I will change its argument to fs instead of path, to make this explicit. And I didn't want to call it 'df' because we are actually returning 'used' ratio, not the 'free' ratio.
>     
>     I was torn between using 'usage' or 'capacity' (what 'df' spits out as the header), but decided go with the former because its more explicit.
>     
>     Thoughts?

I think we just move this to the 'fs' namespace (object! ;) in the near future.


- Benjamin


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/6704/#review11304
-----------------------------------------------------------


On Sept. 11, 2012, 11:06 p.m., Vinod Kone wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/6704/
> -----------------------------------------------------------
> 
> (Updated Sept. 11, 2012, 11:06 p.m.)
> 
> 
> Review request for mesos, Benjamin Hindman, John Sirois, and Ben Mahler.
> 
> 
> Description
> -------
> 
> Added the ability in the slave to GC executor directories based on the current disk usage.
> 
> Currently, it uses a very simple algorithm to decide which directories to delete.
> 
> 
> This addresses bug MESOS-254.
>     https://issues.apache.org/jira/browse/MESOS-254
> 
> 
> Diffs
> -----
> 
>   src/slave/constants.hpp 380a2b9 
>   src/slave/flags.hpp e99ee8b 
>   src/slave/gc.hpp 3760d09 
>   src/slave/gc.cpp 5212a41 
>   src/slave/slave.hpp b7ab2ab 
>   src/slave/slave.cpp 3e2a8d5 
>   src/tests/gc_tests.cpp bb7416c 
>   third_party/libprocess/include/process/timeout.hpp ce0dea7 
>   third_party/libprocess/include/stout/duration.hpp e303d2c 
>   third_party/libprocess/include/stout/os.hpp df0f7ff 
> 
> Diff: https://reviews.apache.org/r/6704/diff/
> 
> 
> Testing
> -------
> 
> make check
> 
> 
> Thanks,
> 
> Vinod Kone
> 
>


Re: Review Request: Slave GC based on disk usage

Posted by Vinod Kone <vi...@gmail.com>.

> On Sept. 11, 2012, 5:10 a.m., Benjamin Hindman wrote:
> > src/slave/gc.cpp, line 93
> > <https://reviews.apache.org/r/6704/diff/7/?file=151995#file151995line93>
> >
> >     The timer.creator() seems weird to me. What about capturing exactly the semantics you care about doing something like:
> >     
> >     if (timer.timeout().remaining() == Seconds(0) || removalTime < timer.timeout()) {
> 
> Vinod Kone wrote:
>     i don't think that would work? how would we guarantee that timer.timeout().remaining() is 0? afaict, 'Timer' initializes 'timeout' to Clock::now() and by the time .remaining() is called, some time might have got elapsed!
> 
> Benjamin Hindman wrote:
>     Just to be clear, if some time has elapsed, shouldn't remaining be 0 (note that the semantics of remaining are _not_ return negative time). And if no time has elapsed, shouldn't remaining be 0 (since timeout == Clock::now)?

aha. i see. fixed. but i still think, timer.creator() better captures the fact that it has not been initialized yet (and it fits on one line :)))?


> On Sept. 11, 2012, 5:10 a.m., Benjamin Hindman wrote:
> > src/tests/gc_tests.cpp, line 448
> > <https://reviews.apache.org/r/6704/diff/7/?file=151998#file151998line448>
> >
> >     Once _checkDiskUsage becomes a lambda, you won't be able to test it this way ... any thoughts on how to write a test then?
> 
> Vinod Kone wrote:
>     yea, its tricky! short of mocking the slave, i'm not sure what other options we have.
>     
>     'gc' itself is a private variable, so we cannot call its prune() method directly. may be we can pull up 'gc', to make it visible for testing? but, that doesnt' answer your concern about proliferating private variables up.
>     
>     is there way we could 'catch' a dispatch, mid flight, and change its contents? that seems very hard and an over-kill.
> 
> Benjamin Hindman wrote:
>     I think two things would need to get mocked in this case. First, you'd want to mock GarbageCollector (and inject it via the constructor) so that we can see when we get prune calls. Second, you'd want to mock os::usage so that it returns a usage value which wold cause the slave to call prune. The former seems trivial, the latter, not so much. Here's one suggestion (we should seriously think about this so that we can write proper tests): create a class OS which includes all of our functions (either as virtual functions or via fast delegates), create a global instance of that called 'os', and replace all calls to 'os::xxx' with 'os.xxx'. Then, any time you want to mock one of those functions, just replace the global 'os' variable with a subclass of OS that mocks what's necessary (i.e., overrides a virtual function or replaces a delegate mapping). Thoughts on this? We shouldn't include this in the review, but re-evaluate it in the near future!

I think both, mocking gc and os, are great solutions. I will add a TODO. I don't think mocking just gc doesn't help us with this test, untill os is mocked out.


> On Sept. 11, 2012, 5:10 a.m., Benjamin Hindman wrote:
> > third_party/libprocess/include/stout/os.hpp, line 586
> > <https://reviews.apache.org/r/6704/diff/7/?file=152001#file152001line586>
> >
> >     Any reason not to call this du?
> 
> Vinod Kone wrote:
>     I debated quite a bit about naming this guy. I didn't want to use 'du' because I didn't want give an impression that this is like the 'du' unix command. Because, its more like 'df', in the sense that this only cares about file systems not directories. On that note, I will change its argument to fs instead of path, to make this explicit. And I didn't want to call it 'df' because we are actually returning 'used' ratio, not the 'free' ratio.
>     
>     I was torn between using 'usage' or 'capacity' (what 'df' spits out as the header), but decided go with the former because its more explicit.
>     
>     Thoughts?
> 
> Benjamin Hindman wrote:
>     I think we just move this to the 'fs' namespace (object! ;) in the near future.

sure. but, i dont think fs::du() would be any clear.


- Vinod


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/6704/#review11304
-----------------------------------------------------------


On Sept. 11, 2012, 11:06 p.m., Vinod Kone wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/6704/
> -----------------------------------------------------------
> 
> (Updated Sept. 11, 2012, 11:06 p.m.)
> 
> 
> Review request for mesos, Benjamin Hindman, John Sirois, and Ben Mahler.
> 
> 
> Description
> -------
> 
> Added the ability in the slave to GC executor directories based on the current disk usage.
> 
> Currently, it uses a very simple algorithm to decide which directories to delete.
> 
> 
> This addresses bug MESOS-254.
>     https://issues.apache.org/jira/browse/MESOS-254
> 
> 
> Diffs
> -----
> 
>   src/slave/constants.hpp 380a2b9 
>   src/slave/flags.hpp e99ee8b 
>   src/slave/gc.hpp 3760d09 
>   src/slave/gc.cpp 5212a41 
>   src/slave/slave.hpp b7ab2ab 
>   src/slave/slave.cpp 3e2a8d5 
>   src/tests/gc_tests.cpp bb7416c 
>   third_party/libprocess/include/process/timeout.hpp ce0dea7 
>   third_party/libprocess/include/stout/duration.hpp e303d2c 
>   third_party/libprocess/include/stout/os.hpp df0f7ff 
> 
> Diff: https://reviews.apache.org/r/6704/diff/
> 
> 
> Testing
> -------
> 
> make check
> 
> 
> Thanks,
> 
> Vinod Kone
> 
>


Re: Review Request: Slave GC based on disk usage

Posted by Vinod Kone <vi...@gmail.com>.

> On Sept. 11, 2012, 5:10 a.m., Benjamin Hindman wrote:
> > src/slave/flags.hpp, line 97
> > <https://reviews.apache.org/r/6704/diff/7/?file=151993#file151993line97>
> >
> >     As I mentioned before, I really would prefer that we mention that this number is really just an upper bound and changes based on disk usage, something like:
> >     
> >     Maximum amount of time to wait before cleaning up\n
> >     executor directories (e.g., 3days, 2 weeks, etc).\n
> >     Note that this delay may be shorter depending on\n
> >     available disk usage

I swear I fixed this before, but somehow got some how got lost in my rebase quagmire. fixed


> On Sept. 11, 2012, 5:10 a.m., Benjamin Hindman wrote:
> > src/slave/gc.cpp, line 49
> > <https://reviews.apache.org/r/6704/diff/7/?file=151995#file151995line49>
> >
> >     Previously I've embedded a struct like this directly into the class (just above the map that holds these). See src/zookeeper/group.cpp.

done


> On Sept. 11, 2012, 5:10 a.m., Benjamin Hindman wrote:
> > src/slave/gc.cpp, line 93
> > <https://reviews.apache.org/r/6704/diff/7/?file=151995#file151995line93>
> >
> >     The timer.creator() seems weird to me. What about capturing exactly the semantics you care about doing something like:
> >     
> >     if (timer.timeout().remaining() == Seconds(0) || removalTime < timer.timeout()) {

i don't think that would work? how would we guarantee that timer.timeout().remaining() is 0? afaict, 'Timer' initializes 'timeout' to Clock::now() and by the time .remaining() is called, some time might have got elapsed!


> On Sept. 11, 2012, 5:10 a.m., Benjamin Hindman wrote:
> > src/slave/gc.cpp, line 154
> > <https://reviews.apache.org/r/6704/diff/7/?file=151995#file151995line154>
> >
> >     What is the reason to dispatch here?

Because I don't want remove() to modify 'paths', while prune() is iterating over it.


> On Sept. 11, 2012, 5:10 a.m., Benjamin Hindman wrote:
> > src/slave/slave.cpp, line 1543
> > <https://reviews.apache.org/r/6704/diff/7/?file=151997#file151997line1543>
> >
> >     Please don't crash the slave if for whatever reason a file gets added to the directory that is not a number.

done


> On Sept. 11, 2012, 5:10 a.m., Benjamin Hindman wrote:
> > src/slave/slave.cpp, line 1547
> > <https://reviews.apache.org/r/6704/diff/7/?file=151997#file151997line1547>
> >
> >     We haven't come to a conclusion on the style for putting things inside of a CHECK (except for in tests). For now, please pull it out (like the deleted code) until we decide what we want to do. Thanks!

i'm guessing you have a reason to be inconclusive about this? fixed for now.


> On Sept. 11, 2012, 5:10 a.m., Benjamin Hindman wrote:
> > third_party/libprocess/include/stout/os.hpp, line 586
> > <https://reviews.apache.org/r/6704/diff/7/?file=152001#file152001line586>
> >
> >     Any reason not to call this du?

I debated quite a bit about naming this guy. I didn't want to use 'du' because I didn't want give an impression that this is like the 'du' unix command. Because, its more like 'df', in the sense that this only cares about file systems not directories. On that note, I will change its argument to fs instead of path, to make this explicit. And I didn't want to call it 'df' because we are actually returning 'used' ratio, not the 'free' ratio.

I was torn between using 'usage' or 'capacity' (what 'df' spits out as the header), but decided go with the former because its more explicit.

Thoughts?


> On Sept. 11, 2012, 5:10 a.m., Benjamin Hindman wrote:
> > src/tests/gc_tests.cpp, line 448
> > <https://reviews.apache.org/r/6704/diff/7/?file=151998#file151998line448>
> >
> >     Once _checkDiskUsage becomes a lambda, you won't be able to test it this way ... any thoughts on how to write a test then?

yea, its tricky! short of mocking the slave, i'm not sure what other options we have.

'gc' itself is a private variable, so we cannot call its prune() method directly. may be we can pull up 'gc', to make it visible for testing? but, that doesnt' answer your concern about proliferating private variables up.

is there way we could 'catch' a dispatch, mid flight, and change its contents? that seems very hard and an over-kill.


> On Sept. 11, 2012, 5:10 a.m., Benjamin Hindman wrote:
> > src/slave/slave.hpp, line 128
> > <https://reviews.apache.org/r/6704/diff/7/?file=151996#file151996line128>
> >
> >     I do not think we should proliferate this kind of thing. Please see the comment in the test itself as well.

i agree, see my comment below in test.


- Vinod


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/6704/#review11304
-----------------------------------------------------------


On Sept. 10, 2012, 6:39 a.m., Vinod Kone wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/6704/
> -----------------------------------------------------------
> 
> (Updated Sept. 10, 2012, 6:39 a.m.)
> 
> 
> Review request for mesos, Benjamin Hindman, John Sirois, and Ben Mahler.
> 
> 
> Description
> -------
> 
> Added the ability in the slave to GC executor directories based on the current disk usage.
> 
> Currently, it uses a very simple algorithm to decide which directories to delete.
> 
> 
> This addresses bug MESOS-254.
>     https://issues.apache.org/jira/browse/MESOS-254
> 
> 
> Diffs
> -----
> 
>   src/slave/constants.hpp 380a2b9 
>   src/slave/flags.hpp e99ee8b 
>   src/slave/gc.hpp 3760d09 
>   src/slave/gc.cpp 5212a41 
>   src/slave/slave.hpp b7ab2ab 
>   src/slave/slave.cpp 3e2a8d5 
>   src/tests/gc_tests.cpp bb7416c 
>   third_party/libprocess/include/process/timeout.hpp ce0dea7 
>   third_party/libprocess/include/stout/duration.hpp e303d2c 
>   third_party/libprocess/include/stout/os.hpp df0f7ff 
> 
> Diff: https://reviews.apache.org/r/6704/diff/
> 
> 
> Testing
> -------
> 
> make check
> 
> 
> Thanks,
> 
> Vinod Kone
> 
>


Re: Review Request: Slave GC based on disk usage

Posted by Benjamin Hindman <be...@berkeley.edu>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/6704/#review11304
-----------------------------------------------------------



src/slave/flags.hpp
<https://reviews.apache.org/r/6704/#comment24317>

    As I mentioned before, I really would prefer that we mention that this number is really just an upper bound and changes based on disk usage, something like:
    
    Maximum amount of time to wait before cleaning up\n
    executor directories (e.g., 3days, 2 weeks, etc).\n
    Note that this delay may be shorter depending on\n
    available disk usage



src/slave/gc.cpp
<https://reviews.apache.org/r/6704/#comment24316>

    Previously I've embedded a struct like this directly into the class (just above the map that holds these). See src/zookeeper/group.cpp.



src/slave/gc.cpp
<https://reviews.apache.org/r/6704/#comment24314>

    The timer.creator() seems weird to me. What about capturing exactly the semantics you care about doing something like:
    
    if (timer.timeout().remaining() == Seconds(0) || removalTime < timer.timeout()) {



src/slave/gc.cpp
<https://reviews.apache.org/r/6704/#comment24312>

    const &



src/slave/gc.cpp
<https://reviews.apache.org/r/6704/#comment24313>

    > 0



src/slave/gc.cpp
<https://reviews.apache.org/r/6704/#comment24311>

    s/.//



src/slave/gc.cpp
<https://reviews.apache.org/r/6704/#comment24310>

    const &



src/slave/gc.cpp
<https://reviews.apache.org/r/6704/#comment24315>

    What is the reason to dispatch here?



src/slave/slave.hpp
<https://reviews.apache.org/r/6704/#comment24309>

    I do not think we should proliferate this kind of thing. Please see the comment in the test itself as well.



src/slave/slave.cpp
<https://reviews.apache.org/r/6704/#comment24298>

    Please don't crash the slave if for whatever reason a file gets added to the directory that is not a number.



src/slave/slave.cpp
<https://reviews.apache.org/r/6704/#comment24308>

    We haven't come to a conclusion on the style for putting things inside of a CHECK (except for in tests). For now, please pull it out (like the deleted code) until we decide what we want to do. Thanks!



src/slave/slave.cpp
<https://reviews.apache.org/r/6704/#comment24318>

    s/atleast/at least/



src/tests/gc_tests.cpp
<https://reviews.apache.org/r/6704/#comment24307>

    Once _checkDiskUsage becomes a lambda, you won't be able to test it this way ... any thoughts on how to write a test then?



third_party/libprocess/include/stout/os.hpp
<https://reviews.apache.org/r/6704/#comment24303>

    Rather than the weird wrapping here, why not put all of open(... on the newline? For example:
    
    Try<int> fd =
      open(path, O_RDWR | O_CREAT, S_IRUSR | S_IWUSR | S_IRGRP | S_IRWXO);



third_party/libprocess/include/stout/os.hpp
<https://reviews.apache.org/r/6704/#comment24302>

    Any reason not to call this du?


- Benjamin Hindman


On Sept. 10, 2012, 6:39 a.m., Vinod Kone wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/6704/
> -----------------------------------------------------------
> 
> (Updated Sept. 10, 2012, 6:39 a.m.)
> 
> 
> Review request for mesos, Benjamin Hindman, John Sirois, and Ben Mahler.
> 
> 
> Description
> -------
> 
> Added the ability in the slave to GC executor directories based on the current disk usage.
> 
> Currently, it uses a very simple algorithm to decide which directories to delete.
> 
> 
> This addresses bug MESOS-254.
>     https://issues.apache.org/jira/browse/MESOS-254
> 
> 
> Diffs
> -----
> 
>   src/slave/constants.hpp 380a2b9 
>   src/slave/flags.hpp e99ee8b 
>   src/slave/gc.hpp 3760d09 
>   src/slave/gc.cpp 5212a41 
>   src/slave/slave.hpp b7ab2ab 
>   src/slave/slave.cpp 3e2a8d5 
>   src/tests/gc_tests.cpp bb7416c 
>   third_party/libprocess/include/process/timeout.hpp ce0dea7 
>   third_party/libprocess/include/stout/duration.hpp e303d2c 
>   third_party/libprocess/include/stout/os.hpp df0f7ff 
> 
> Diff: https://reviews.apache.org/r/6704/diff/
> 
> 
> Testing
> -------
> 
> make check
> 
> 
> Thanks,
> 
> Vinod Kone
> 
>


Re: Review Request: Slave GC based on disk usage

Posted by Vinod Kone <vi...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/6704/
-----------------------------------------------------------

(Updated Sept. 10, 2012, 6:39 a.m.)


Review request for mesos, Benjamin Hindman, John Sirois, and Ben Mahler.


Changes
-------

please ignore previous diff(6).


Description
-------

Added the ability in the slave to GC executor directories based on the current disk usage.

Currently, it uses a very simple algorithm to decide which directories to delete.


This addresses bug MESOS-254.
    https://issues.apache.org/jira/browse/MESOS-254


Diffs (updated)
-----

  src/slave/constants.hpp 380a2b9 
  src/slave/flags.hpp e99ee8b 
  src/slave/gc.hpp 3760d09 
  src/slave/gc.cpp 5212a41 
  src/slave/slave.hpp b7ab2ab 
  src/slave/slave.cpp 3e2a8d5 
  src/tests/gc_tests.cpp bb7416c 
  third_party/libprocess/include/process/timeout.hpp ce0dea7 
  third_party/libprocess/include/stout/duration.hpp e303d2c 
  third_party/libprocess/include/stout/os.hpp df0f7ff 

Diff: https://reviews.apache.org/r/6704/diff/


Testing
-------

make check


Thanks,

Vinod Kone