You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mesos.apache.org by Vinod Kone <vi...@gmail.com> on 2012/08/22 01:20:19 UTC

Re: Review Request: Slave GC based on disk usage

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/6704/
-----------------------------------------------------------

(Updated Aug. 21, 2012, 11:20 p.m.)


Review request for mesos, Benjamin Hindman, John Sirois, and Ben Mahler.


Changes
-------

Added a test and also refactored existing tests in gc.

This is now ready for review.


Summary (updated)
-----------------

Slave GC based on disk usage


Description (updated)
-------

Added the ability in the slave to GC executor directories based on the current disk usage.

Currently, it uses a very simple algorithm to decide which directories to delete.


This addresses bug MESOS-254.
    https://issues.apache.org/jira/browse/MESOS-254


Diffs (updated)
-----

  src/slave/constants.hpp ab83972 
  src/slave/flags.hpp 0c7917f 
  src/slave/gc.hpp 6704742 
  src/slave/gc.cpp 9c01024 
  src/slave/slave.hpp 10c537b 
  src/slave/slave.cpp 4efd41e 
  src/tests/gc_tests.cpp 2f0bdde 
  third_party/libprocess/include/stout/os.hpp b1eceb3 

Diff: https://reviews.apache.org/r/6704/diff/


Testing
-------

make check


Thanks,

Vinod Kone


Re: Review Request: Slave GC based on disk usage

Posted by Vinod Kone <vi...@gmail.com>.

> On Aug. 23, 2012, 2:28 a.m., Ben Mahler wrote:
> > src/slave/constants.hpp, line 30
> > <https://reviews.apache.org/r/6704/diff/2/?file=143654#file143654line30>
> >
> >     so does 1 week seem correct?
> >     
> >     using the math you have below:
> >     disk 50%  
> >     max_age 3.5 days
> >     
> >     disk 75%
> >     max_age 1.75 days
> >     
> >     do we actually see the disks fill up this rapidly? perhaps initially we should be less agressive on this age, like set 2x or 3x on the current gc_timeout_hours
> >     
> >     of course, ben and you know much more about the typical disk usage patterns than I do, so let me know your opinion please :)

I agree. The current algorithm for calculating age is completely arbitrary. Not based on historical data. The (TODO) idea is for the age function to be expressed on command line. That makes it easy for us to tune it on the go (with a puppet run + master roll)


> On Aug. 23, 2012, 2:28 a.m., Ben Mahler wrote:
> > src/slave/constants.hpp, line 31
> > <https://reviews.apache.org/r/6704/diff/2/?file=143654#file143654line31>
> >
> >     seems a bit frequent no? what does statvfs do under the hood?

changed it to 60 seconds. btw, statvfs is pretty fast (this is what 'df' uses). unlike 'du' which calculates disk space usage "right now", df uses cached information from something called a primary superblock of the filesystem. for more info: http://linuxshellaccount.blogspot.com/2008/12/why-du-and-df-display-different-values.html


> On Aug. 23, 2012, 2:28 a.m., Ben Mahler wrote:
> > src/slave/gc.hpp, line 23
> > <https://reviews.apache.org/r/6704/diff/2/?file=143656#file143656line23>
> >
> >     kill these new imports, and put them in the cpp file instead?

done


> On Aug. 23, 2012, 2:28 a.m., Ben Mahler wrote:
> > src/slave/gc.cpp, line 127
> > <https://reviews.apache.org/r/6704/diff/2/?file=143657#file143657line127>
> >
> >     this will just print true or false, would be nice to have the actual error here, so how about we LOG in os::rmdir() when anything goes wrong?
> >     
> >

the exit status shouldn't be there. i used it for debugging. removed it.

i think most (all?) our stout functions should be wrapped with Try/Result/Option. this would make it easier to grab their exit status at the call site. on the flip side, this makes the functions a bit cumbersome (call, check, get) to use. i will punt on this for now, may be another review (Technical debt :))


> On Aug. 23, 2012, 2:28 a.m., Ben Mahler wrote:
> > src/slave/gc.cpp, line 146
> > <https://reviews.apache.org/r/6704/diff/2/?file=143657#file143657line146>
> >
> >     move the log line within the if? looks really noisy

done


> On Aug. 23, 2012, 2:28 a.m., Ben Mahler wrote:
> > src/slave/slave.cpp, line 1591
> > <https://reviews.apache.org/r/6704/diff/2/?file=143659#file143659line1591>
> >
> >     I'm not getting the math here:
> >     
> >     current gc timeout hours: 1 week
> >     suppose max age: 1 day
> >     
> >     then
> >     
> >     1 week - 1 day = 6 days
> >     
> >     so we would prune everything scheduled for deletion within the next 6 days...?

yes that's correct.  this is because, if a file's deletion time is within next 6 days then its atleast 1 day old! this stems from the fact that a directory is always scheduled for deletion 1 week into future from the time it becomes a candidate for deletion.

i know this might be hard to wrap one's head around. i will add a comment explaining this. but i'm to open to suggestions for making the math easier to understand.


> On Aug. 23, 2012, 2:28 a.m., Ben Mahler wrote:
> > src/slave/slave.cpp, line 1588
> > <https://reviews.apache.org/r/6704/diff/2/?file=143659#file143659line1588>
> >
> >     can you nicely format this number, like %.2f

its a bit involved to do this in C++ (set width, set precision, unset precision, unset width). 
since we haven't done that anywhere else in the code base, i'm gonna punt on this.


- Vinod


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/6704/#review10603
-----------------------------------------------------------


On Aug. 21, 2012, 11:20 p.m., Vinod Kone wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/6704/
> -----------------------------------------------------------
> 
> (Updated Aug. 21, 2012, 11:20 p.m.)
> 
> 
> Review request for mesos, Benjamin Hindman, John Sirois, and Ben Mahler.
> 
> 
> Description
> -------
> 
> Added the ability in the slave to GC executor directories based on the current disk usage.
> 
> Currently, it uses a very simple algorithm to decide which directories to delete.
> 
> 
> This addresses bug MESOS-254.
>     https://issues.apache.org/jira/browse/MESOS-254
> 
> 
> Diffs
> -----
> 
>   src/slave/constants.hpp ab83972 
>   src/slave/flags.hpp 0c7917f 
>   src/slave/gc.hpp 6704742 
>   src/slave/gc.cpp 9c01024 
>   src/slave/slave.hpp 10c537b 
>   src/slave/slave.cpp 4efd41e 
>   src/tests/gc_tests.cpp 2f0bdde 
>   third_party/libprocess/include/stout/os.hpp b1eceb3 
> 
> Diff: https://reviews.apache.org/r/6704/diff/
> 
> 
> Testing
> -------
> 
> make check
> 
> 
> Thanks,
> 
> Vinod Kone
> 
>


Re: Review Request: Slave GC based on disk usage

Posted by Ben Mahler <be...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/6704/#review10603
-----------------------------------------------------------



src/slave/constants.hpp
<https://reviews.apache.org/r/6704/#comment22858>

    so does 1 week seem correct?
    
    using the math you have below:
    disk 50%  
    max_age 3.5 days
    
    disk 75%
    max_age 1.75 days
    
    do we actually see the disks fill up this rapidly? perhaps initially we should be less agressive on this age, like set 2x or 3x on the current gc_timeout_hours
    
    of course, ben and you know much more about the typical disk usage patterns than I do, so let me know your opinion please :)



src/slave/constants.hpp
<https://reviews.apache.org/r/6704/#comment22691>

    seems a bit frequent no? what does statvfs do under the hood?



src/slave/gc.hpp
<https://reviews.apache.org/r/6704/#comment22692>

    kill these new imports, and put them in the cpp file instead?



src/slave/gc.cpp
<https://reviews.apache.org/r/6704/#comment22857>

    this will just print true or false, would be nice to have the actual error here, so how about we LOG in os::rmdir() when anything goes wrong?
    
    



src/slave/gc.cpp
<https://reviews.apache.org/r/6704/#comment22863>

    move the log line within the if? looks really noisy



src/slave/slave.cpp
<https://reviews.apache.org/r/6704/#comment22859>

    can you nicely format this number, like %.2f



src/slave/slave.cpp
<https://reviews.apache.org/r/6704/#comment22862>

    I'm not getting the math here:
    
    current gc timeout hours: 1 week
    suppose max age: 1 day
    
    then
    
    1 week - 1 day = 6 days
    
    so we would prune everything scheduled for deletion within the next 6 days...? 


- Ben Mahler


On Aug. 21, 2012, 11:20 p.m., Vinod Kone wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/6704/
> -----------------------------------------------------------
> 
> (Updated Aug. 21, 2012, 11:20 p.m.)
> 
> 
> Review request for mesos, Benjamin Hindman, John Sirois, and Ben Mahler.
> 
> 
> Description
> -------
> 
> Added the ability in the slave to GC executor directories based on the current disk usage.
> 
> Currently, it uses a very simple algorithm to decide which directories to delete.
> 
> 
> This addresses bug MESOS-254.
>     https://issues.apache.org/jira/browse/MESOS-254
> 
> 
> Diffs
> -----
> 
>   src/slave/constants.hpp ab83972 
>   src/slave/flags.hpp 0c7917f 
>   src/slave/gc.hpp 6704742 
>   src/slave/gc.cpp 9c01024 
>   src/slave/slave.hpp 10c537b 
>   src/slave/slave.cpp 4efd41e 
>   src/tests/gc_tests.cpp 2f0bdde 
>   third_party/libprocess/include/stout/os.hpp b1eceb3 
> 
> Diff: https://reviews.apache.org/r/6704/diff/
> 
> 
> Testing
> -------
> 
> make check
> 
> 
> Thanks,
> 
> Vinod Kone
> 
>


Re: Review Request: Slave GC based on disk usage

Posted by Vinod Kone <vi...@gmail.com>.

> On Aug. 24, 2012, 9:27 p.m., Benjamin Hindman wrote:
> > src/slave/gc.cpp, line 52
> > <https://reviews.apache.org/r/6704/diff/3/?file=144908#file144908line52>
> >
> >     Indent +2.

done


- Vinod


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/6704/#review10753
-----------------------------------------------------------


On Aug. 23, 2012, 7:43 p.m., Vinod Kone wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/6704/
> -----------------------------------------------------------
> 
> (Updated Aug. 23, 2012, 7:43 p.m.)
> 
> 
> Review request for mesos, Benjamin Hindman, John Sirois, and Ben Mahler.
> 
> 
> Description
> -------
> 
> Added the ability in the slave to GC executor directories based on the current disk usage.
> 
> Currently, it uses a very simple algorithm to decide which directories to delete.
> 
> 
> This addresses bug MESOS-254.
>     https://issues.apache.org/jira/browse/MESOS-254
> 
> 
> Diffs
> -----
> 
>   src/slave/constants.hpp ab83972 
>   src/slave/flags.hpp 0c7917f 
>   src/slave/gc.hpp 6704742 
>   src/slave/gc.cpp 9c01024 
>   src/slave/slave.hpp 10c537b 
>   src/slave/slave.cpp 4efd41e 
>   src/tests/gc_tests.cpp 2f0bdde 
>   third_party/libprocess/include/stout/os.hpp b1eceb3 
>   third_party/libprocess/include/stout/time.hpp 46d7077 
> 
> Diff: https://reviews.apache.org/r/6704/diff/
> 
> 
> Testing
> -------
> 
> make check
> 
> 
> Thanks,
> 
> Vinod Kone
> 
>


Re: Review Request: Slave GC based on disk usage

Posted by Benjamin Hindman <be...@berkeley.edu>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/6704/#review10753
-----------------------------------------------------------



src/slave/gc.cpp
<https://reviews.apache.org/r/6704/#comment23194>

    Indent +2.


- Benjamin Hindman


On Aug. 23, 2012, 7:43 p.m., Vinod Kone wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/6704/
> -----------------------------------------------------------
> 
> (Updated Aug. 23, 2012, 7:43 p.m.)
> 
> 
> Review request for mesos, Benjamin Hindman, John Sirois, and Ben Mahler.
> 
> 
> Description
> -------
> 
> Added the ability in the slave to GC executor directories based on the current disk usage.
> 
> Currently, it uses a very simple algorithm to decide which directories to delete.
> 
> 
> This addresses bug MESOS-254.
>     https://issues.apache.org/jira/browse/MESOS-254
> 
> 
> Diffs
> -----
> 
>   src/slave/constants.hpp ab83972 
>   src/slave/flags.hpp 0c7917f 
>   src/slave/gc.hpp 6704742 
>   src/slave/gc.cpp 9c01024 
>   src/slave/slave.hpp 10c537b 
>   src/slave/slave.cpp 4efd41e 
>   src/tests/gc_tests.cpp 2f0bdde 
>   third_party/libprocess/include/stout/os.hpp b1eceb3 
>   third_party/libprocess/include/stout/time.hpp 46d7077 
> 
> Diff: https://reviews.apache.org/r/6704/diff/
> 
> 
> Testing
> -------
> 
> make check
> 
> 
> Thanks,
> 
> Vinod Kone
> 
>


Re: Review Request: Slave GC based on disk usage

Posted by Vinod Kone <vi...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/6704/
-----------------------------------------------------------

(Updated Sept. 14, 2012, 11:17 p.m.)


Review request for mesos, Benjamin Hindman, John Sirois, and Ben Mahler.


Changes
-------

Ben's


Description
-------

Added the ability in the slave to GC executor directories based on the current disk usage.

Currently, it uses a very simple algorithm to decide which directories to delete.


This addresses bug MESOS-254.
    https://issues.apache.org/jira/browse/MESOS-254


Diffs (updated)
-----

  src/slave/constants.hpp 380a2b9 
  src/slave/flags.hpp e99ee8b 
  src/slave/gc.hpp 3760d09 
  src/slave/gc.cpp 5212a41 
  src/slave/slave.hpp b7ab2ab 
  src/slave/slave.cpp 3e2a8d5 
  src/tests/gc_tests.cpp bb7416c 
  src/tests/utils.hpp 54da799 
  third_party/libprocess/include/process/timeout.hpp ce0dea7 
  third_party/libprocess/include/stout/duration.hpp e303d2c 
  third_party/libprocess/include/stout/os.hpp df0f7ff 

Diff: https://reviews.apache.org/r/6704/diff/


Testing
-------

make check


Thanks,

Vinod Kone


Re: Review Request: Slave GC based on disk usage

Posted by Vinod Kone <vi...@gmail.com>.

> On Sept. 13, 2012, 6:21 p.m., Benjamin Hindman wrote:
> > src/slave/gc.cpp, line 52
> > <https://reviews.apache.org/r/6704/diff/8/?file=152700#file152700line52>
> >
> >     Need a destructor where you loop through all PathInfo objects and discard and delete the Promise<bool>.

good catch, fixed.


> On Sept. 13, 2012, 6:21 p.m., Benjamin Hindman wrote:
> > src/tests/gc_tests.cpp, line 145
> > <https://reviews.apache.org/r/6704/diff/8/?file=152703#file152703line145>
> >
> >     Add a TODO to put this in utils.hpp for other peoples consumption. Also, check out the LaunchTasks action to determine if this solves your problem, or if you could modify LaunchTasks to do so (no worries if you don't do anything about it now, just want to educate ourselves so we don't build two different things which do the same thing).

damn, spent too much trying to use LaunchTasks instead of createTask. should've gone with a TODO :)


- Vinod


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/6704/#review11478
-----------------------------------------------------------


On Sept. 11, 2012, 11:06 p.m., Vinod Kone wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/6704/
> -----------------------------------------------------------
> 
> (Updated Sept. 11, 2012, 11:06 p.m.)
> 
> 
> Review request for mesos, Benjamin Hindman, John Sirois, and Ben Mahler.
> 
> 
> Description
> -------
> 
> Added the ability in the slave to GC executor directories based on the current disk usage.
> 
> Currently, it uses a very simple algorithm to decide which directories to delete.
> 
> 
> This addresses bug MESOS-254.
>     https://issues.apache.org/jira/browse/MESOS-254
> 
> 
> Diffs
> -----
> 
>   src/slave/constants.hpp 380a2b9 
>   src/slave/flags.hpp e99ee8b 
>   src/slave/gc.hpp 3760d09 
>   src/slave/gc.cpp 5212a41 
>   src/slave/slave.hpp b7ab2ab 
>   src/slave/slave.cpp 3e2a8d5 
>   src/tests/gc_tests.cpp bb7416c 
>   third_party/libprocess/include/process/timeout.hpp ce0dea7 
>   third_party/libprocess/include/stout/duration.hpp e303d2c 
>   third_party/libprocess/include/stout/os.hpp df0f7ff 
> 
> Diff: https://reviews.apache.org/r/6704/diff/
> 
> 
> Testing
> -------
> 
> make check
> 
> 
> Thanks,
> 
> Vinod Kone
> 
>


Re: Review Request: Slave GC based on disk usage

Posted by Benjamin Hindman <be...@berkeley.edu>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/6704/#review11478
-----------------------------------------------------------

Ship it!


Minor comments in addition to comments from last review.


src/slave/gc.cpp
<https://reviews.apache.org/r/6704/#comment24583>

    Need a destructor where you loop through all PathInfo objects and discard and delete the Promise<bool>.



src/tests/gc_tests.cpp
<https://reviews.apache.org/r/6704/#comment24595>

    Add a TODO to put this in utils.hpp for other peoples consumption. Also, check out the LaunchTasks action to determine if this solves your problem, or if you could modify LaunchTasks to do so (no worries if you don't do anything about it now, just want to educate ourselves so we don't build two different things which do the same thing).


- Benjamin Hindman


On Sept. 11, 2012, 11:06 p.m., Vinod Kone wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/6704/
> -----------------------------------------------------------
> 
> (Updated Sept. 11, 2012, 11:06 p.m.)
> 
> 
> Review request for mesos, Benjamin Hindman, John Sirois, and Ben Mahler.
> 
> 
> Description
> -------
> 
> Added the ability in the slave to GC executor directories based on the current disk usage.
> 
> Currently, it uses a very simple algorithm to decide which directories to delete.
> 
> 
> This addresses bug MESOS-254.
>     https://issues.apache.org/jira/browse/MESOS-254
> 
> 
> Diffs
> -----
> 
>   src/slave/constants.hpp 380a2b9 
>   src/slave/flags.hpp e99ee8b 
>   src/slave/gc.hpp 3760d09 
>   src/slave/gc.cpp 5212a41 
>   src/slave/slave.hpp b7ab2ab 
>   src/slave/slave.cpp 3e2a8d5 
>   src/tests/gc_tests.cpp bb7416c 
>   third_party/libprocess/include/process/timeout.hpp ce0dea7 
>   third_party/libprocess/include/stout/duration.hpp e303d2c 
>   third_party/libprocess/include/stout/os.hpp df0f7ff 
> 
> Diff: https://reviews.apache.org/r/6704/diff/
> 
> 
> Testing
> -------
> 
> make check
> 
> 
> Thanks,
> 
> Vinod Kone
> 
>


Re: Review Request: Slave GC based on disk usage

Posted by Vinod Kone <vi...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/6704/
-----------------------------------------------------------

(Updated Sept. 11, 2012, 11:06 p.m.)


Review request for mesos, Benjamin Hindman, John Sirois, and Ben Mahler.


Changes
-------

ben's


Description
-------

Added the ability in the slave to GC executor directories based on the current disk usage.

Currently, it uses a very simple algorithm to decide which directories to delete.


This addresses bug MESOS-254.
    https://issues.apache.org/jira/browse/MESOS-254


Diffs (updated)
-----

  src/slave/constants.hpp 380a2b9 
  src/slave/flags.hpp e99ee8b 
  src/slave/gc.hpp 3760d09 
  src/slave/gc.cpp 5212a41 
  src/slave/slave.hpp b7ab2ab 
  src/slave/slave.cpp 3e2a8d5 
  src/tests/gc_tests.cpp bb7416c 
  third_party/libprocess/include/process/timeout.hpp ce0dea7 
  third_party/libprocess/include/stout/duration.hpp e303d2c 
  third_party/libprocess/include/stout/os.hpp df0f7ff 

Diff: https://reviews.apache.org/r/6704/diff/


Testing
-------

make check


Thanks,

Vinod Kone


Re: Review Request: Slave GC based on disk usage

Posted by Benjamin Hindman <be...@berkeley.edu>.

> On Sept. 11, 2012, 5:10 a.m., Benjamin Hindman wrote:
> > src/slave/gc.cpp, line 93
> > <https://reviews.apache.org/r/6704/diff/7/?file=151995#file151995line93>
> >
> >     The timer.creator() seems weird to me. What about capturing exactly the semantics you care about doing something like:
> >     
> >     if (timer.timeout().remaining() == Seconds(0) || removalTime < timer.timeout()) {
> 
> Vinod Kone wrote:
>     i don't think that would work? how would we guarantee that timer.timeout().remaining() is 0? afaict, 'Timer' initializes 'timeout' to Clock::now() and by the time .remaining() is called, some time might have got elapsed!

Just to be clear, if some time has elapsed, shouldn't remaining be 0 (note that the semantics of remaining are _not_ return negative time). And if no time has elapsed, shouldn't remaining be 0 (since timeout == Clock::now)?


> On Sept. 11, 2012, 5:10 a.m., Benjamin Hindman wrote:
> > src/slave/slave.cpp, line 1547
> > <https://reviews.apache.org/r/6704/diff/7/?file=151997#file151997line1547>
> >
> >     We haven't come to a conclusion on the style for putting things inside of a CHECK (except for in tests). For now, please pull it out (like the deleted code) until we decide what we want to do. Thanks!
> 
> Vinod Kone wrote:
>     i'm guessing you have a reason to be inconclusive about this? fixed for now.

Here's my reasoning: in many languages an assert might get removed from a non-debug build, so putting stuff in assert statements was a big no no. In our case, CHECK never gets removed, but it feels enough like an assert that I don't like the habit of sticking stuff in there.


> On Sept. 11, 2012, 5:10 a.m., Benjamin Hindman wrote:
> > src/tests/gc_tests.cpp, line 448
> > <https://reviews.apache.org/r/6704/diff/7/?file=151998#file151998line448>
> >
> >     Once _checkDiskUsage becomes a lambda, you won't be able to test it this way ... any thoughts on how to write a test then?
> 
> Vinod Kone wrote:
>     yea, its tricky! short of mocking the slave, i'm not sure what other options we have.
>     
>     'gc' itself is a private variable, so we cannot call its prune() method directly. may be we can pull up 'gc', to make it visible for testing? but, that doesnt' answer your concern about proliferating private variables up.
>     
>     is there way we could 'catch' a dispatch, mid flight, and change its contents? that seems very hard and an over-kill.

I think two things would need to get mocked in this case. First, you'd want to mock GarbageCollector (and inject it via the constructor) so that we can see when we get prune calls. Second, you'd want to mock os::usage so that it returns a usage value which wold cause the slave to call prune. The former seems trivial, the latter, not so much. Here's one suggestion (we should seriously think about this so that we can write proper tests): create a class OS which includes all of our functions (either as virtual functions or via fast delegates), create a global instance of that called 'os', and replace all calls to 'os::xxx' with 'os.xxx'. Then, any time you want to mock one of those functions, just replace the global 'os' variable with a subclass of OS that mocks what's necessary (i.e., overrides a virtual function or replaces a delegate mapping). Thoughts on this? We shouldn't include this in the review, but re-evaluate it in the near future!


> On Sept. 11, 2012, 5:10 a.m., Benjamin Hindman wrote:
> > third_party/libprocess/include/stout/os.hpp, line 586
> > <https://reviews.apache.org/r/6704/diff/7/?file=152001#file152001line586>
> >
> >     Any reason not to call this du?
> 
> Vinod Kone wrote:
>     I debated quite a bit about naming this guy. I didn't want to use 'du' because I didn't want give an impression that this is like the 'du' unix command. Because, its more like 'df', in the sense that this only cares about file systems not directories. On that note, I will change its argument to fs instead of path, to make this explicit. And I didn't want to call it 'df' because we are actually returning 'used' ratio, not the 'free' ratio.
>     
>     I was torn between using 'usage' or 'capacity' (what 'df' spits out as the header), but decided go with the former because its more explicit.
>     
>     Thoughts?

I think we just move this to the 'fs' namespace (object! ;) in the near future.


- Benjamin


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/6704/#review11304
-----------------------------------------------------------


On Sept. 11, 2012, 11:06 p.m., Vinod Kone wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/6704/
> -----------------------------------------------------------
> 
> (Updated Sept. 11, 2012, 11:06 p.m.)
> 
> 
> Review request for mesos, Benjamin Hindman, John Sirois, and Ben Mahler.
> 
> 
> Description
> -------
> 
> Added the ability in the slave to GC executor directories based on the current disk usage.
> 
> Currently, it uses a very simple algorithm to decide which directories to delete.
> 
> 
> This addresses bug MESOS-254.
>     https://issues.apache.org/jira/browse/MESOS-254
> 
> 
> Diffs
> -----
> 
>   src/slave/constants.hpp 380a2b9 
>   src/slave/flags.hpp e99ee8b 
>   src/slave/gc.hpp 3760d09 
>   src/slave/gc.cpp 5212a41 
>   src/slave/slave.hpp b7ab2ab 
>   src/slave/slave.cpp 3e2a8d5 
>   src/tests/gc_tests.cpp bb7416c 
>   third_party/libprocess/include/process/timeout.hpp ce0dea7 
>   third_party/libprocess/include/stout/duration.hpp e303d2c 
>   third_party/libprocess/include/stout/os.hpp df0f7ff 
> 
> Diff: https://reviews.apache.org/r/6704/diff/
> 
> 
> Testing
> -------
> 
> make check
> 
> 
> Thanks,
> 
> Vinod Kone
> 
>


Re: Review Request: Slave GC based on disk usage

Posted by Vinod Kone <vi...@gmail.com>.

> On Sept. 11, 2012, 5:10 a.m., Benjamin Hindman wrote:
> > src/slave/gc.cpp, line 93
> > <https://reviews.apache.org/r/6704/diff/7/?file=151995#file151995line93>
> >
> >     The timer.creator() seems weird to me. What about capturing exactly the semantics you care about doing something like:
> >     
> >     if (timer.timeout().remaining() == Seconds(0) || removalTime < timer.timeout()) {
> 
> Vinod Kone wrote:
>     i don't think that would work? how would we guarantee that timer.timeout().remaining() is 0? afaict, 'Timer' initializes 'timeout' to Clock::now() and by the time .remaining() is called, some time might have got elapsed!
> 
> Benjamin Hindman wrote:
>     Just to be clear, if some time has elapsed, shouldn't remaining be 0 (note that the semantics of remaining are _not_ return negative time). And if no time has elapsed, shouldn't remaining be 0 (since timeout == Clock::now)?

aha. i see. fixed. but i still think, timer.creator() better captures the fact that it has not been initialized yet (and it fits on one line :)))?


> On Sept. 11, 2012, 5:10 a.m., Benjamin Hindman wrote:
> > src/tests/gc_tests.cpp, line 448
> > <https://reviews.apache.org/r/6704/diff/7/?file=151998#file151998line448>
> >
> >     Once _checkDiskUsage becomes a lambda, you won't be able to test it this way ... any thoughts on how to write a test then?
> 
> Vinod Kone wrote:
>     yea, its tricky! short of mocking the slave, i'm not sure what other options we have.
>     
>     'gc' itself is a private variable, so we cannot call its prune() method directly. may be we can pull up 'gc', to make it visible for testing? but, that doesnt' answer your concern about proliferating private variables up.
>     
>     is there way we could 'catch' a dispatch, mid flight, and change its contents? that seems very hard and an over-kill.
> 
> Benjamin Hindman wrote:
>     I think two things would need to get mocked in this case. First, you'd want to mock GarbageCollector (and inject it via the constructor) so that we can see when we get prune calls. Second, you'd want to mock os::usage so that it returns a usage value which wold cause the slave to call prune. The former seems trivial, the latter, not so much. Here's one suggestion (we should seriously think about this so that we can write proper tests): create a class OS which includes all of our functions (either as virtual functions or via fast delegates), create a global instance of that called 'os', and replace all calls to 'os::xxx' with 'os.xxx'. Then, any time you want to mock one of those functions, just replace the global 'os' variable with a subclass of OS that mocks what's necessary (i.e., overrides a virtual function or replaces a delegate mapping). Thoughts on this? We shouldn't include this in the review, but re-evaluate it in the near future!

I think both, mocking gc and os, are great solutions. I will add a TODO. I don't think mocking just gc doesn't help us with this test, untill os is mocked out.


> On Sept. 11, 2012, 5:10 a.m., Benjamin Hindman wrote:
> > third_party/libprocess/include/stout/os.hpp, line 586
> > <https://reviews.apache.org/r/6704/diff/7/?file=152001#file152001line586>
> >
> >     Any reason not to call this du?
> 
> Vinod Kone wrote:
>     I debated quite a bit about naming this guy. I didn't want to use 'du' because I didn't want give an impression that this is like the 'du' unix command. Because, its more like 'df', in the sense that this only cares about file systems not directories. On that note, I will change its argument to fs instead of path, to make this explicit. And I didn't want to call it 'df' because we are actually returning 'used' ratio, not the 'free' ratio.
>     
>     I was torn between using 'usage' or 'capacity' (what 'df' spits out as the header), but decided go with the former because its more explicit.
>     
>     Thoughts?
> 
> Benjamin Hindman wrote:
>     I think we just move this to the 'fs' namespace (object! ;) in the near future.

sure. but, i dont think fs::du() would be any clear.


- Vinod


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/6704/#review11304
-----------------------------------------------------------


On Sept. 11, 2012, 11:06 p.m., Vinod Kone wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/6704/
> -----------------------------------------------------------
> 
> (Updated Sept. 11, 2012, 11:06 p.m.)
> 
> 
> Review request for mesos, Benjamin Hindman, John Sirois, and Ben Mahler.
> 
> 
> Description
> -------
> 
> Added the ability in the slave to GC executor directories based on the current disk usage.
> 
> Currently, it uses a very simple algorithm to decide which directories to delete.
> 
> 
> This addresses bug MESOS-254.
>     https://issues.apache.org/jira/browse/MESOS-254
> 
> 
> Diffs
> -----
> 
>   src/slave/constants.hpp 380a2b9 
>   src/slave/flags.hpp e99ee8b 
>   src/slave/gc.hpp 3760d09 
>   src/slave/gc.cpp 5212a41 
>   src/slave/slave.hpp b7ab2ab 
>   src/slave/slave.cpp 3e2a8d5 
>   src/tests/gc_tests.cpp bb7416c 
>   third_party/libprocess/include/process/timeout.hpp ce0dea7 
>   third_party/libprocess/include/stout/duration.hpp e303d2c 
>   third_party/libprocess/include/stout/os.hpp df0f7ff 
> 
> Diff: https://reviews.apache.org/r/6704/diff/
> 
> 
> Testing
> -------
> 
> make check
> 
> 
> Thanks,
> 
> Vinod Kone
> 
>


Re: Review Request: Slave GC based on disk usage

Posted by Vinod Kone <vi...@gmail.com>.

> On Sept. 11, 2012, 5:10 a.m., Benjamin Hindman wrote:
> > src/slave/flags.hpp, line 97
> > <https://reviews.apache.org/r/6704/diff/7/?file=151993#file151993line97>
> >
> >     As I mentioned before, I really would prefer that we mention that this number is really just an upper bound and changes based on disk usage, something like:
> >     
> >     Maximum amount of time to wait before cleaning up\n
> >     executor directories (e.g., 3days, 2 weeks, etc).\n
> >     Note that this delay may be shorter depending on\n
> >     available disk usage

I swear I fixed this before, but somehow got some how got lost in my rebase quagmire. fixed


> On Sept. 11, 2012, 5:10 a.m., Benjamin Hindman wrote:
> > src/slave/gc.cpp, line 49
> > <https://reviews.apache.org/r/6704/diff/7/?file=151995#file151995line49>
> >
> >     Previously I've embedded a struct like this directly into the class (just above the map that holds these). See src/zookeeper/group.cpp.

done


> On Sept. 11, 2012, 5:10 a.m., Benjamin Hindman wrote:
> > src/slave/gc.cpp, line 93
> > <https://reviews.apache.org/r/6704/diff/7/?file=151995#file151995line93>
> >
> >     The timer.creator() seems weird to me. What about capturing exactly the semantics you care about doing something like:
> >     
> >     if (timer.timeout().remaining() == Seconds(0) || removalTime < timer.timeout()) {

i don't think that would work? how would we guarantee that timer.timeout().remaining() is 0? afaict, 'Timer' initializes 'timeout' to Clock::now() and by the time .remaining() is called, some time might have got elapsed!


> On Sept. 11, 2012, 5:10 a.m., Benjamin Hindman wrote:
> > src/slave/gc.cpp, line 154
> > <https://reviews.apache.org/r/6704/diff/7/?file=151995#file151995line154>
> >
> >     What is the reason to dispatch here?

Because I don't want remove() to modify 'paths', while prune() is iterating over it.


> On Sept. 11, 2012, 5:10 a.m., Benjamin Hindman wrote:
> > src/slave/slave.cpp, line 1543
> > <https://reviews.apache.org/r/6704/diff/7/?file=151997#file151997line1543>
> >
> >     Please don't crash the slave if for whatever reason a file gets added to the directory that is not a number.

done


> On Sept. 11, 2012, 5:10 a.m., Benjamin Hindman wrote:
> > src/slave/slave.cpp, line 1547
> > <https://reviews.apache.org/r/6704/diff/7/?file=151997#file151997line1547>
> >
> >     We haven't come to a conclusion on the style for putting things inside of a CHECK (except for in tests). For now, please pull it out (like the deleted code) until we decide what we want to do. Thanks!

i'm guessing you have a reason to be inconclusive about this? fixed for now.


> On Sept. 11, 2012, 5:10 a.m., Benjamin Hindman wrote:
> > third_party/libprocess/include/stout/os.hpp, line 586
> > <https://reviews.apache.org/r/6704/diff/7/?file=152001#file152001line586>
> >
> >     Any reason not to call this du?

I debated quite a bit about naming this guy. I didn't want to use 'du' because I didn't want give an impression that this is like the 'du' unix command. Because, its more like 'df', in the sense that this only cares about file systems not directories. On that note, I will change its argument to fs instead of path, to make this explicit. And I didn't want to call it 'df' because we are actually returning 'used' ratio, not the 'free' ratio.

I was torn between using 'usage' or 'capacity' (what 'df' spits out as the header), but decided go with the former because its more explicit.

Thoughts?


> On Sept. 11, 2012, 5:10 a.m., Benjamin Hindman wrote:
> > src/tests/gc_tests.cpp, line 448
> > <https://reviews.apache.org/r/6704/diff/7/?file=151998#file151998line448>
> >
> >     Once _checkDiskUsage becomes a lambda, you won't be able to test it this way ... any thoughts on how to write a test then?

yea, its tricky! short of mocking the slave, i'm not sure what other options we have.

'gc' itself is a private variable, so we cannot call its prune() method directly. may be we can pull up 'gc', to make it visible for testing? but, that doesnt' answer your concern about proliferating private variables up.

is there way we could 'catch' a dispatch, mid flight, and change its contents? that seems very hard and an over-kill.


> On Sept. 11, 2012, 5:10 a.m., Benjamin Hindman wrote:
> > src/slave/slave.hpp, line 128
> > <https://reviews.apache.org/r/6704/diff/7/?file=151996#file151996line128>
> >
> >     I do not think we should proliferate this kind of thing. Please see the comment in the test itself as well.

i agree, see my comment below in test.


- Vinod


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/6704/#review11304
-----------------------------------------------------------


On Sept. 10, 2012, 6:39 a.m., Vinod Kone wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/6704/
> -----------------------------------------------------------
> 
> (Updated Sept. 10, 2012, 6:39 a.m.)
> 
> 
> Review request for mesos, Benjamin Hindman, John Sirois, and Ben Mahler.
> 
> 
> Description
> -------
> 
> Added the ability in the slave to GC executor directories based on the current disk usage.
> 
> Currently, it uses a very simple algorithm to decide which directories to delete.
> 
> 
> This addresses bug MESOS-254.
>     https://issues.apache.org/jira/browse/MESOS-254
> 
> 
> Diffs
> -----
> 
>   src/slave/constants.hpp 380a2b9 
>   src/slave/flags.hpp e99ee8b 
>   src/slave/gc.hpp 3760d09 
>   src/slave/gc.cpp 5212a41 
>   src/slave/slave.hpp b7ab2ab 
>   src/slave/slave.cpp 3e2a8d5 
>   src/tests/gc_tests.cpp bb7416c 
>   third_party/libprocess/include/process/timeout.hpp ce0dea7 
>   third_party/libprocess/include/stout/duration.hpp e303d2c 
>   third_party/libprocess/include/stout/os.hpp df0f7ff 
> 
> Diff: https://reviews.apache.org/r/6704/diff/
> 
> 
> Testing
> -------
> 
> make check
> 
> 
> Thanks,
> 
> Vinod Kone
> 
>


Re: Review Request: Slave GC based on disk usage

Posted by Benjamin Hindman <be...@berkeley.edu>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/6704/#review11304
-----------------------------------------------------------



src/slave/flags.hpp
<https://reviews.apache.org/r/6704/#comment24317>

    As I mentioned before, I really would prefer that we mention that this number is really just an upper bound and changes based on disk usage, something like:
    
    Maximum amount of time to wait before cleaning up\n
    executor directories (e.g., 3days, 2 weeks, etc).\n
    Note that this delay may be shorter depending on\n
    available disk usage



src/slave/gc.cpp
<https://reviews.apache.org/r/6704/#comment24316>

    Previously I've embedded a struct like this directly into the class (just above the map that holds these). See src/zookeeper/group.cpp.



src/slave/gc.cpp
<https://reviews.apache.org/r/6704/#comment24314>

    The timer.creator() seems weird to me. What about capturing exactly the semantics you care about doing something like:
    
    if (timer.timeout().remaining() == Seconds(0) || removalTime < timer.timeout()) {



src/slave/gc.cpp
<https://reviews.apache.org/r/6704/#comment24312>

    const &



src/slave/gc.cpp
<https://reviews.apache.org/r/6704/#comment24313>

    > 0



src/slave/gc.cpp
<https://reviews.apache.org/r/6704/#comment24311>

    s/.//



src/slave/gc.cpp
<https://reviews.apache.org/r/6704/#comment24310>

    const &



src/slave/gc.cpp
<https://reviews.apache.org/r/6704/#comment24315>

    What is the reason to dispatch here?



src/slave/slave.hpp
<https://reviews.apache.org/r/6704/#comment24309>

    I do not think we should proliferate this kind of thing. Please see the comment in the test itself as well.



src/slave/slave.cpp
<https://reviews.apache.org/r/6704/#comment24298>

    Please don't crash the slave if for whatever reason a file gets added to the directory that is not a number.



src/slave/slave.cpp
<https://reviews.apache.org/r/6704/#comment24308>

    We haven't come to a conclusion on the style for putting things inside of a CHECK (except for in tests). For now, please pull it out (like the deleted code) until we decide what we want to do. Thanks!



src/slave/slave.cpp
<https://reviews.apache.org/r/6704/#comment24318>

    s/atleast/at least/



src/tests/gc_tests.cpp
<https://reviews.apache.org/r/6704/#comment24307>

    Once _checkDiskUsage becomes a lambda, you won't be able to test it this way ... any thoughts on how to write a test then?



third_party/libprocess/include/stout/os.hpp
<https://reviews.apache.org/r/6704/#comment24303>

    Rather than the weird wrapping here, why not put all of open(... on the newline? For example:
    
    Try<int> fd =
      open(path, O_RDWR | O_CREAT, S_IRUSR | S_IWUSR | S_IRGRP | S_IRWXO);



third_party/libprocess/include/stout/os.hpp
<https://reviews.apache.org/r/6704/#comment24302>

    Any reason not to call this du?


- Benjamin Hindman


On Sept. 10, 2012, 6:39 a.m., Vinod Kone wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/6704/
> -----------------------------------------------------------
> 
> (Updated Sept. 10, 2012, 6:39 a.m.)
> 
> 
> Review request for mesos, Benjamin Hindman, John Sirois, and Ben Mahler.
> 
> 
> Description
> -------
> 
> Added the ability in the slave to GC executor directories based on the current disk usage.
> 
> Currently, it uses a very simple algorithm to decide which directories to delete.
> 
> 
> This addresses bug MESOS-254.
>     https://issues.apache.org/jira/browse/MESOS-254
> 
> 
> Diffs
> -----
> 
>   src/slave/constants.hpp 380a2b9 
>   src/slave/flags.hpp e99ee8b 
>   src/slave/gc.hpp 3760d09 
>   src/slave/gc.cpp 5212a41 
>   src/slave/slave.hpp b7ab2ab 
>   src/slave/slave.cpp 3e2a8d5 
>   src/tests/gc_tests.cpp bb7416c 
>   third_party/libprocess/include/process/timeout.hpp ce0dea7 
>   third_party/libprocess/include/stout/duration.hpp e303d2c 
>   third_party/libprocess/include/stout/os.hpp df0f7ff 
> 
> Diff: https://reviews.apache.org/r/6704/diff/
> 
> 
> Testing
> -------
> 
> make check
> 
> 
> Thanks,
> 
> Vinod Kone
> 
>


Re: Review Request: Slave GC based on disk usage

Posted by Vinod Kone <vi...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/6704/
-----------------------------------------------------------

(Updated Sept. 10, 2012, 6:39 a.m.)


Review request for mesos, Benjamin Hindman, John Sirois, and Ben Mahler.


Changes
-------

please ignore previous diff(6).


Description
-------

Added the ability in the slave to GC executor directories based on the current disk usage.

Currently, it uses a very simple algorithm to decide which directories to delete.


This addresses bug MESOS-254.
    https://issues.apache.org/jira/browse/MESOS-254


Diffs (updated)
-----

  src/slave/constants.hpp 380a2b9 
  src/slave/flags.hpp e99ee8b 
  src/slave/gc.hpp 3760d09 
  src/slave/gc.cpp 5212a41 
  src/slave/slave.hpp b7ab2ab 
  src/slave/slave.cpp 3e2a8d5 
  src/tests/gc_tests.cpp bb7416c 
  third_party/libprocess/include/process/timeout.hpp ce0dea7 
  third_party/libprocess/include/stout/duration.hpp e303d2c 
  third_party/libprocess/include/stout/os.hpp df0f7ff 

Diff: https://reviews.apache.org/r/6704/diff/


Testing
-------

make check


Thanks,

Vinod Kone


Re: Review Request: Slave GC based on disk usage

Posted by Vinod Kone <vi...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/6704/
-----------------------------------------------------------

(Updated Sept. 10, 2012, 6:37 a.m.)


Review request for mesos, Benjamin Hindman, John Sirois, and Ben Mahler.


Changes
-------

updated and diff'ed against apache/trunk


Description
-------

Added the ability in the slave to GC executor directories based on the current disk usage.

Currently, it uses a very simple algorithm to decide which directories to delete.


This addresses bug MESOS-254.
    https://issues.apache.org/jira/browse/MESOS-254


Diffs (updated)
-----

  src/slave/constants.hpp 380a2b9 
  src/slave/flags.hpp e99ee8b 
  src/slave/gc.hpp 3760d09 
  src/slave/gc.cpp 5212a41 
  src/slave/slave.hpp b7ab2ab 
  src/slave/slave.cpp 3e2a8d5 
  src/tests/gc_tests.cpp bb7416c 
  src/tests/protobuf_io_tests.cpp 5cc167d 
  src/tests/protobuf_io_tests.cpp.orig PRE-CREATION 
  third_party/libprocess/include/process/timeout.hpp ce0dea7 
  third_party/libprocess/include/stout/duration.hpp e303d2c 
  third_party/libprocess/include/stout/os.hpp df0f7ff 

Diff: https://reviews.apache.org/r/6704/diff/


Testing
-------

make check


Thanks,

Vinod Kone


Re: Review Request: Slave GC based on disk usage

Posted by Vinod Kone <vi...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/6704/
-----------------------------------------------------------

(Updated Aug. 28, 2012, 12:44 a.m.)


Review request for mesos, Benjamin Hindman, John Sirois, and Ben Mahler.


Changes
-------

changed createUniqueWorkDirectory() , so that run number for new executor is always the highest.


Description
-------

Added the ability in the slave to GC executor directories based on the current disk usage.

Currently, it uses a very simple algorithm to decide which directories to delete.


This addresses bug MESOS-254.
    https://issues.apache.org/jira/browse/MESOS-254


Diffs (updated)
-----

  src/slave/constants.hpp ab83972 
  src/slave/flags.hpp 0c7917f 
  src/slave/gc.hpp 6704742 
  src/slave/gc.cpp 9c01024 
  src/slave/slave.hpp 10c537b 
  src/slave/slave.cpp 4efd41e 
  src/tests/gc_tests.cpp 2f0bdde 
  third_party/libprocess/include/stout/os.hpp b1eceb3 
  third_party/libprocess/include/stout/time.hpp 46d7077 

Diff: https://reviews.apache.org/r/6704/diff/


Testing
-------

make check


Thanks,

Vinod Kone


Re: Review Request: Slave GC based on disk usage

Posted by Vinod Kone <vi...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/6704/
-----------------------------------------------------------

(Updated Aug. 27, 2012, 9:44 p.m.)


Review request for mesos, Benjamin Hindman, John Sirois, and Ben Mahler.


Changes
-------

got rid of priority queue and hashmap, and using map


Description
-------

Added the ability in the slave to GC executor directories based on the current disk usage.

Currently, it uses a very simple algorithm to decide which directories to delete.


This addresses bug MESOS-254.
    https://issues.apache.org/jira/browse/MESOS-254


Diffs (updated)
-----

  src/slave/constants.hpp ab83972 
  src/slave/flags.hpp 0c7917f 
  src/slave/gc.hpp 6704742 
  src/slave/gc.cpp 9c01024 
  src/slave/slave.hpp 10c537b 
  src/slave/slave.cpp 4efd41e 
  src/tests/gc_tests.cpp 2f0bdde 
  third_party/libprocess/include/stout/os.hpp b1eceb3 
  third_party/libprocess/include/stout/time.hpp 46d7077 

Diff: https://reviews.apache.org/r/6704/diff/


Testing
-------

make check


Thanks,

Vinod Kone


Re: Review Request: Slave GC based on disk usage

Posted by Vinod Kone <vi...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/6704/
-----------------------------------------------------------

(Updated Aug. 23, 2012, 7:43 p.m.)


Review request for mesos, Benjamin Hindman, John Sirois, and Ben Mahler.


Changes
-------

Bens' comments


Description
-------

Added the ability in the slave to GC executor directories based on the current disk usage.

Currently, it uses a very simple algorithm to decide which directories to delete.


This addresses bug MESOS-254.
    https://issues.apache.org/jira/browse/MESOS-254


Diffs (updated)
-----

  src/slave/constants.hpp ab83972 
  src/slave/flags.hpp 0c7917f 
  src/slave/gc.hpp 6704742 
  src/slave/gc.cpp 9c01024 
  src/slave/slave.hpp 10c537b 
  src/slave/slave.cpp 4efd41e 
  src/tests/gc_tests.cpp 2f0bdde 
  third_party/libprocess/include/stout/os.hpp b1eceb3 
  third_party/libprocess/include/stout/time.hpp 46d7077 

Diff: https://reviews.apache.org/r/6704/diff/


Testing
-------

make check


Thanks,

Vinod Kone


Re: Review Request: Slave GC based on disk usage

Posted by Benjamin Hindman <be...@berkeley.edu>.

> On Aug. 23, 2012, 6:18 a.m., Benjamin Hindman wrote:
> > src/slave/slave.cpp, line 1578
> > <https://reviews.apache.org/r/6704/diff/2/?file=143659#file143659line1578>
> >
> >     s/garbageCollect/checkDiskUsage/
> 
> Vinod Kone wrote:
>     this function is actually scheduling things for garbage collection. i'm open to a better alternative than check() here.

The problem I have with "garbageCollect" is that this function is intrinsically tied to garbage collecting _after_ checking disk usage. Reading 'garbageCollect' to me does not imply: "check resource usage and then possibly garbage collect", it implies: "garbage collect resources".


> On Aug. 23, 2012, 6:18 a.m., Benjamin Hindman wrote:
> > src/slave/slave.cpp, line 1565
> > <https://reviews.apache.org/r/6704/diff/2/?file=143659#file143659line1565>
> >
> >     s/checkDiskUsage/getDiskUsage/
> 
> Vinod Kone wrote:
>     a void get() seems counter-intuitive. check() for me implies we are going to do something about it?

I agree, 'check' for me does imply we are going to do something about it, but that's not what this function does (that's what a later function does, what you called 'garbageCollect'). To me this is really a two-phase process: (1) get the current usage and (2) check if the current usage requires extra garbage collecting.


> On Aug. 23, 2012, 6:18 a.m., Benjamin Hindman wrote:
> > src/slave/slave.cpp, line 1580
> > <https://reviews.apache.org/r/6704/diff/2/?file=143659#file143659line1580>
> >
> >     It is a bit weird to see a Future<Try<>>, but I'm guessing you plan to eliminate that once you make the functions return a Future directly via the async stuff?
> 
> Vinod Kone wrote:
>     i'm not sure what you mean by 'eliminate'? after the async stuff is added, this would look like
>     
>     Future<Try<double> > usage = async(os::usage);
>

I was thinking you would 'async()' the statvfs function, chain with a then (or a libprocess process), and thus just return the future. The benefit of doing it the way you have done it is that we don't have to introduce Future into stout (which I don't want to).

So, I think the next abstraction that we need is something which takes a Future<Try<T>> and returns a Future<T> pulling out the Try value or the Try error! Yeah!


- Benjamin


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/6704/#review10671
-----------------------------------------------------------


On Aug. 23, 2012, 7:43 p.m., Vinod Kone wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/6704/
> -----------------------------------------------------------
> 
> (Updated Aug. 23, 2012, 7:43 p.m.)
> 
> 
> Review request for mesos, Benjamin Hindman, John Sirois, and Ben Mahler.
> 
> 
> Description
> -------
> 
> Added the ability in the slave to GC executor directories based on the current disk usage.
> 
> Currently, it uses a very simple algorithm to decide which directories to delete.
> 
> 
> This addresses bug MESOS-254.
>     https://issues.apache.org/jira/browse/MESOS-254
> 
> 
> Diffs
> -----
> 
>   src/slave/constants.hpp ab83972 
>   src/slave/flags.hpp 0c7917f 
>   src/slave/gc.hpp 6704742 
>   src/slave/gc.cpp 9c01024 
>   src/slave/slave.hpp 10c537b 
>   src/slave/slave.cpp 4efd41e 
>   src/tests/gc_tests.cpp 2f0bdde 
>   third_party/libprocess/include/stout/os.hpp b1eceb3 
>   third_party/libprocess/include/stout/time.hpp 46d7077 
> 
> Diff: https://reviews.apache.org/r/6704/diff/
> 
> 
> Testing
> -------
> 
> make check
> 
> 
> Thanks,
> 
> Vinod Kone
> 
>


Re: Review Request: Slave GC based on disk usage

Posted by Vinod Kone <vi...@gmail.com>.

> On Aug. 23, 2012, 6:18 a.m., Benjamin Hindman wrote:
> > src/slave/flags.hpp, line 96
> > <https://reviews.apache.org/r/6704/diff/2/?file=143655#file143655line96>
> >
> >     I think we should mention that this is actually just an upper bound, but as usage goes up, this number changes (i.e., explain the "age" function).

done


> On Aug. 23, 2012, 6:18 a.m., Benjamin Hindman wrote:
> > src/slave/flags.hpp, line 106
> > <https://reviews.apache.org/r/6704/diff/2/?file=143655#file143655line106>
> >
> >     No need for this newline.

done


> On Aug. 23, 2012, 6:18 a.m., Benjamin Hindman wrote:
> > src/slave/flags.hpp, line 126
> > <https://reviews.apache.org/r/6704/diff/2/?file=143655#file143655line126>
> >
> >     It will be a nice addition to be able to make all of these doubles either seconds, hours, or more likely, "duration" objects. Technical debt. :/

Added a TODO for duration abstraction. I will get to this in another review.

Also, filed https://issues.apache.org/jira/browse/MESOS-260


> On Aug. 23, 2012, 6:18 a.m., Benjamin Hindman wrote:
> > src/slave/gc.cpp, line 41
> > <https://reviews.apache.org/r/6704/diff/2/?file=143657#file143657line41>
> >
> >     Include vector in this file.

done


> On Aug. 23, 2012, 6:18 a.m., Benjamin Hindman wrote:
> > src/slave/gc.cpp, line 85
> > <https://reviews.apache.org/r/6704/diff/2/?file=143657#file143657line85>
> >
> >     Why the 'creator' bit?

to set the timer for the very first time. its a round-about way of checking the timer is not set. i'm open to a better/clear way to capture this?


> On Aug. 23, 2012, 6:18 a.m., Benjamin Hindman wrote:
> > src/slave/gc.cpp, line 101
> > <https://reviews.apache.org/r/6704/diff/2/?file=143657#file143657line101>
> >
> >     No need for ending period here! ;) Ask me later why.

done. but why? to be consistent with other cases where LOG statements end  with variables?


> On Aug. 23, 2012, 6:18 a.m., Benjamin Hindman wrote:
> > src/slave/gc.cpp, line 104
> > <https://reviews.apache.org/r/6704/diff/2/?file=143657#file143657line104>
> >
> >     {

done


> On Aug. 23, 2012, 6:18 a.m., Benjamin Hindman wrote:
> > src/slave/slave.hpp, line 166
> > <https://reviews.apache.org/r/6704/diff/2/?file=143658#file143658line166>
> >
> >     Is usage a percent?

nope, just a fraction


> On Aug. 23, 2012, 6:18 a.m., Benjamin Hindman wrote:
> > src/slave/slave.cpp, line 1565
> > <https://reviews.apache.org/r/6704/diff/2/?file=143659#file143659line1565>
> >
> >     s/checkDiskUsage/getDiskUsage/

a void get() seems counter-intuitive. check() for me implies we are going to do something about it?


> On Aug. 23, 2012, 6:18 a.m., Benjamin Hindman wrote:
> > src/slave/slave.cpp, line 1578
> > <https://reviews.apache.org/r/6704/diff/2/?file=143659#file143659line1578>
> >
> >     s/garbageCollect/checkDiskUsage/

this function is actually scheduling things for garbage collection. i'm open to a better alternative than check() here.


> On Aug. 23, 2012, 6:18 a.m., Benjamin Hindman wrote:
> > src/slave/slave.cpp, line 1569
> > <https://reviews.apache.org/r/6704/diff/2/?file=143659#file143659line1569>
> >
> >     Why not s/capacity/usage/?

i was going for what df prints out. but, i guess 'usage' is fine too. fixed


> On Aug. 23, 2012, 6:18 a.m., Benjamin Hindman wrote:
> > src/slave/slave.cpp, line 1580
> > <https://reviews.apache.org/r/6704/diff/2/?file=143659#file143659line1580>
> >
> >     It is a bit weird to see a Future<Try<>>, but I'm guessing you plan to eliminate that once you make the functions return a Future directly via the async stuff?

i'm not sure what you mean by 'eliminate'? after the async stuff is added, this would look like

Future<Try<double> > usage = async(os::usage);


> On Aug. 23, 2012, 6:18 a.m., Benjamin Hindman wrote:
> > src/slave/gc.cpp, line 59
> > <https://reviews.apache.org/r/6704/diff/2/?file=143657#file143657line59>
> >
> >     It would be more clear to factor the pair<string, Promise<bool>*> out into a struct.

i couldn't find any structs in state.hpp. did u mean class?

anyway, i used a simple struct


> On Aug. 23, 2012, 6:18 a.m., Benjamin Hindman wrote:
> > src/tests/gc_tests.cpp, line 162
> > <https://reviews.apache.org/r/6704/diff/2/?file=143660#file143660line162>
> >
> >     YES YES YES YES YES! I so much prefer this! We made createTask be so damn simple to reason about. Hurray!
> >     
> >     Note, eventually one could imagine making this a nice abstraction that all tests could use to creates tasks.

glad, u like it :)


> On Aug. 23, 2012, 6:18 a.m., Benjamin Hindman wrote:
> > src/slave/gc.cpp, line 95
> > <https://reviews.apache.org/r/6704/diff/2/?file=143657#file143657line95>
> >
> >     I'll suggest 'reset' instead of 'next' for your deliberation.

done


- Vinod


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/6704/#review10671
-----------------------------------------------------------


On Aug. 21, 2012, 11:20 p.m., Vinod Kone wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/6704/
> -----------------------------------------------------------
> 
> (Updated Aug. 21, 2012, 11:20 p.m.)
> 
> 
> Review request for mesos, Benjamin Hindman, John Sirois, and Ben Mahler.
> 
> 
> Description
> -------
> 
> Added the ability in the slave to GC executor directories based on the current disk usage.
> 
> Currently, it uses a very simple algorithm to decide which directories to delete.
> 
> 
> This addresses bug MESOS-254.
>     https://issues.apache.org/jira/browse/MESOS-254
> 
> 
> Diffs
> -----
> 
>   src/slave/constants.hpp ab83972 
>   src/slave/flags.hpp 0c7917f 
>   src/slave/gc.hpp 6704742 
>   src/slave/gc.cpp 9c01024 
>   src/slave/slave.hpp 10c537b 
>   src/slave/slave.cpp 4efd41e 
>   src/tests/gc_tests.cpp 2f0bdde 
>   third_party/libprocess/include/stout/os.hpp b1eceb3 
> 
> Diff: https://reviews.apache.org/r/6704/diff/
> 
> 
> Testing
> -------
> 
> make check
> 
> 
> Thanks,
> 
> Vinod Kone
> 
>


Re: Review Request: Slave GC based on disk usage

Posted by Vinod Kone <vi...@gmail.com>.

> On Aug. 23, 2012, 6:18 a.m., Benjamin Hindman wrote:
> > src/slave/slave.cpp, line 1565
> > <https://reviews.apache.org/r/6704/diff/2/?file=143659#file143659line1565>
> >
> >     s/checkDiskUsage/getDiskUsage/
> 
> Vinod Kone wrote:
>     a void get() seems counter-intuitive. check() for me implies we are going to do something about it?
> 
> Benjamin Hindman wrote:
>     I agree, 'check' for me does imply we are going to do something about it, but that's not what this function does (that's what a later function does, what you called 'garbageCollect'). To me this is really a two-phase process: (1) get the current usage and (2) check if the current usage requires extra garbage collecting.

To me, the fact that garbage collection happens asynchronously inside check() is an implementation detail. Ideally the garbageCollect() function should be private (I just pulled it up to public to make it visible for testing). In other words, garbageCollect() is something we don't expect users of Slave should be calling directly.

I'm also in favor of
s/garbageCollect()/_garbageCollect()/
s/check()/garbageCollect()/

thoughts?


> On Aug. 23, 2012, 6:18 a.m., Benjamin Hindman wrote:
> > src/slave/slave.cpp, line 1578
> > <https://reviews.apache.org/r/6704/diff/2/?file=143659#file143659line1578>
> >
> >     s/garbageCollect/checkDiskUsage/
> 
> Vinod Kone wrote:
>     this function is actually scheduling things for garbage collection. i'm open to a better alternative than check() here.
> 
> Benjamin Hindman wrote:
>     The problem I have with "garbageCollect" is that this function is intrinsically tied to garbage collecting _after_ checking disk usage. Reading 'garbageCollect' to me does not imply: "check resource usage and then possibly garbage collect", it implies: "garbage collect resources".

See my comment above. 

Also, how about thinking garbageCollect() as doing garbage collection 'based on' resource usage. If it can take Try<double> instead of Future<Try <double>>, this would have been less confusing?


- Vinod


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/6704/#review10671
-----------------------------------------------------------


On Aug. 23, 2012, 7:43 p.m., Vinod Kone wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/6704/
> -----------------------------------------------------------
> 
> (Updated Aug. 23, 2012, 7:43 p.m.)
> 
> 
> Review request for mesos, Benjamin Hindman, John Sirois, and Ben Mahler.
> 
> 
> Description
> -------
> 
> Added the ability in the slave to GC executor directories based on the current disk usage.
> 
> Currently, it uses a very simple algorithm to decide which directories to delete.
> 
> 
> This addresses bug MESOS-254.
>     https://issues.apache.org/jira/browse/MESOS-254
> 
> 
> Diffs
> -----
> 
>   src/slave/constants.hpp ab83972 
>   src/slave/flags.hpp 0c7917f 
>   src/slave/gc.hpp 6704742 
>   src/slave/gc.cpp 9c01024 
>   src/slave/slave.hpp 10c537b 
>   src/slave/slave.cpp 4efd41e 
>   src/tests/gc_tests.cpp 2f0bdde 
>   third_party/libprocess/include/stout/os.hpp b1eceb3 
>   third_party/libprocess/include/stout/time.hpp 46d7077 
> 
> Diff: https://reviews.apache.org/r/6704/diff/
> 
> 
> Testing
> -------
> 
> make check
> 
> 
> Thanks,
> 
> Vinod Kone
> 
>


Re: Review Request: Slave GC based on disk usage

Posted by Benjamin Hindman <be...@berkeley.edu>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/6704/#review10671
-----------------------------------------------------------



src/slave/flags.hpp
<https://reviews.apache.org/r/6704/#comment22960>

    I think we should mention that this is actually just an upper bound, but as usage goes up, this number changes (i.e., explain the "age" function).



src/slave/flags.hpp
<https://reviews.apache.org/r/6704/#comment22948>

    No need for this newline.



src/slave/flags.hpp
<https://reviews.apache.org/r/6704/#comment22949>

    It will be a nice addition to be able to make all of these doubles either seconds, hours, or more likely, "duration" objects. Technical debt. :/



src/slave/gc.cpp
<https://reviews.apache.org/r/6704/#comment22950>

    Include vector in this file.



src/slave/gc.cpp
<https://reviews.apache.org/r/6704/#comment22961>

    It would be more clear to factor the pair<string, Promise<bool>*> out into a struct.



src/slave/gc.cpp
<https://reviews.apache.org/r/6704/#comment22968>

    Why the 'creator' bit?



src/slave/gc.cpp
<https://reviews.apache.org/r/6704/#comment22966>

    I'll suggest 'reset' instead of 'next' for your deliberation.



src/slave/gc.cpp
<https://reviews.apache.org/r/6704/#comment22967>

    No need for ending period here! ;) Ask me later why.



src/slave/gc.cpp
<https://reviews.apache.org/r/6704/#comment22963>

    {



src/slave/slave.hpp
<https://reviews.apache.org/r/6704/#comment22958>

    Is usage a percent?



src/slave/slave.cpp
<https://reviews.apache.org/r/6704/#comment22954>

    s/checkDiskUsage/getDiskUsage/



src/slave/slave.cpp
<https://reviews.apache.org/r/6704/#comment22956>

    Why not s/capacity/usage/?



src/slave/slave.cpp
<https://reviews.apache.org/r/6704/#comment22955>

    s/garbageCollect/checkDiskUsage/



src/slave/slave.cpp
<https://reviews.apache.org/r/6704/#comment22957>

    It is a bit weird to see a Future<Try<>>, but I'm guessing you plan to eliminate that once you make the functions return a Future directly via the async stuff?



src/tests/gc_tests.cpp
<https://reviews.apache.org/r/6704/#comment22952>

    YES YES YES YES YES! I so much prefer this! We made createTask be so damn simple to reason about. Hurray!
    
    Note, eventually one could imagine making this a nice abstraction that all tests could use to creates tasks.


- Benjamin Hindman


On Aug. 21, 2012, 11:20 p.m., Vinod Kone wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/6704/
> -----------------------------------------------------------
> 
> (Updated Aug. 21, 2012, 11:20 p.m.)
> 
> 
> Review request for mesos, Benjamin Hindman, John Sirois, and Ben Mahler.
> 
> 
> Description
> -------
> 
> Added the ability in the slave to GC executor directories based on the current disk usage.
> 
> Currently, it uses a very simple algorithm to decide which directories to delete.
> 
> 
> This addresses bug MESOS-254.
>     https://issues.apache.org/jira/browse/MESOS-254
> 
> 
> Diffs
> -----
> 
>   src/slave/constants.hpp ab83972 
>   src/slave/flags.hpp 0c7917f 
>   src/slave/gc.hpp 6704742 
>   src/slave/gc.cpp 9c01024 
>   src/slave/slave.hpp 10c537b 
>   src/slave/slave.cpp 4efd41e 
>   src/tests/gc_tests.cpp 2f0bdde 
>   third_party/libprocess/include/stout/os.hpp b1eceb3 
> 
> Diff: https://reviews.apache.org/r/6704/diff/
> 
> 
> Testing
> -------
> 
> make check
> 
> 
> Thanks,
> 
> Vinod Kone
> 
>