You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mesos.apache.org by Ben Mahler <be...@gmail.com> on 2013/03/07 04:18:46 UTC

Review Request: Added child process monitoring to process based isolation module.

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/9795/
-----------------------------------------------------------

Review request for mesos, Benjamin Hindman and Vinod Kone.


Description
-------

See subject.


This addresses bug MESOS-370.
    https://issues.apache.org/jira/browse/MESOS-370


Diffs
-----

  src/slave/process_based_isolation_module.cpp ff98d105af675dfc66070feaa43b42c1aa438fd8 

Diff: https://reviews.apache.org/r/9795/diff/


Testing
-------

make check and added tests in a subsequent review


Thanks,

Ben Mahler


Re: Review Request: Added child process monitoring to process based isolation module.

Posted by Vinod Kone <vi...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/9795/#review17767
-----------------------------------------------------------

Ship it!


Ship It!

- Vinod Kone


On March 12, 2013, 9:36 p.m., Ben Mahler wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/9795/
> -----------------------------------------------------------
> 
> (Updated March 12, 2013, 9:36 p.m.)
> 
> 
> Review request for mesos, Benjamin Hindman and Vinod Kone.
> 
> 
> Description
> -------
> 
> See subject.
> 
> 
> This addresses bug MESOS-370.
>     https://issues.apache.org/jira/browse/MESOS-370
> 
> 
> Diffs
> -----
> 
>   src/slave/process_based_isolation_module.cpp ff98d105af675dfc66070feaa43b42c1aa438fd8 
> 
> Diff: https://reviews.apache.org/r/9795/diff/
> 
> 
> Testing
> -------
> 
> make check and added tests in a subsequent review
> 
> 
> Thanks,
> 
> Ben Mahler
> 
>


Re: Review Request: Added child process monitoring to process isolator.

Posted by Benjamin Hindman <be...@berkeley.edu>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/9795/#review18546
-----------------------------------------------------------

Ship it!



src/slave/process_isolator.cpp
<https://reviews.apache.org/r/9795/#comment38871>

    Fix me! ;)


- Benjamin Hindman


On March 28, 2013, 9:26 p.m., Ben Mahler wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/9795/
> -----------------------------------------------------------
> 
> (Updated March 28, 2013, 9:26 p.m.)
> 
> 
> Review request for mesos, Benjamin Hindman and Vinod Kone.
> 
> 
> Description
> -------
> 
> See subject.
> 
> 
> This addresses bug MESOS-370.
>     https://issues.apache.org/jira/browse/MESOS-370
> 
> 
> Diffs
> -----
> 
>   src/slave/process_isolator.cpp 210ea10ad97e08c7a303249da97e70b438dfe11d 
> 
> Diff: https://reviews.apache.org/r/9795/diff/
> 
> 
> Testing
> -------
> 
> make check and added tests in a subsequent review
> 
> 
> Thanks,
> 
> Ben Mahler
> 
>


Re: Review Request: Added child process monitoring to process based isolation module.

Posted by Ben Mahler <be...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/9795/
-----------------------------------------------------------

(Updated March 30, 2013, 10:21 p.m.)


Review request for mesos, Benjamin Hindman and Vinod Kone.


Changes
-------

Benh review, rebased off trunk.


Summary (updated)
-----------------

Added child process monitoring to process based isolation module.


Description
-------

See subject.


This addresses bug MESOS-370.
    https://issues.apache.org/jira/browse/MESOS-370


Diffs (updated)
-----

  src/slave/process_isolator.cpp 210ea10ad97e08c7a303249da97e70b438dfe11d 

Diff: https://reviews.apache.org/r/9795/diff/


Testing
-------

make check and added tests in a subsequent review


Thanks,

Ben Mahler


Re: Review Request: Added child process monitoring to process isolator.

Posted by Ben Mahler <be...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/9795/
-----------------------------------------------------------

(Updated March 28, 2013, 9:26 p.m.)


Review request for mesos, Benjamin Hindman and Vinod Kone.


Changes
-------

Rebased off trunk.


Summary (updated)
-----------------

Added child process monitoring to process isolator.


Description
-------

See subject.


This addresses bug MESOS-370.
    https://issues.apache.org/jira/browse/MESOS-370


Diffs (updated)
-----

  src/slave/process_isolator.cpp 210ea10ad97e08c7a303249da97e70b438dfe11d 

Diff: https://reviews.apache.org/r/9795/diff/


Testing
-------

make check and added tests in a subsequent review


Thanks,

Ben Mahler


Re: Review Request: Added child process monitoring to process based isolation module.

Posted by Ben Mahler <be...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/9795/
-----------------------------------------------------------

(Updated March 12, 2013, 9:36 p.m.)


Review request for mesos, Benjamin Hindman and Vinod Kone.


Changes
-------

Rebased off trunk, still needs ship its.


Description
-------

See subject.


This addresses bug MESOS-370.
    https://issues.apache.org/jira/browse/MESOS-370


Diffs (updated)
-----

  src/slave/process_based_isolation_module.cpp ff98d105af675dfc66070feaa43b42c1aa438fd8 

Diff: https://reviews.apache.org/r/9795/diff/


Testing
-------

make check and added tests in a subsequent review


Thanks,

Ben Mahler


Re: Review Request: Added child process monitoring to process based isolation module.

Posted by Ben Mahler <be...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/9795/
-----------------------------------------------------------

(Updated March 8, 2013, 2:27 a.m.)


Review request for mesos, Benjamin Hindman and Vinod Kone.


Changes
-------

Reviews, rebased off trunk.


Description
-------

See subject.


This addresses bug MESOS-370.
    https://issues.apache.org/jira/browse/MESOS-370


Diffs (updated)
-----

  src/slave/process_based_isolation_module.cpp ff98d105af675dfc66070feaa43b42c1aa438fd8 

Diff: https://reviews.apache.org/r/9795/diff/


Testing
-------

make check and added tests in a subsequent review


Thanks,

Ben Mahler


Re: Review Request: Added child process monitoring to process based isolation module.

Posted by Vinod Kone <vi...@gmail.com>.

> On March 7, 2013, 6:25 p.m., Vinod Kone wrote:
> > src/slave/process_based_isolation_module.cpp, line 332
> > <https://reviews.apache.org/r/9795/diff/1/?file=267811#file267811line332>
> >
> >     I think it makes more sense to return an error here instead of returning a wrong value to the user!
> 
> Ben Mahler wrote:
>     I don't think I agree, because gathering all the children, and obtaining the status of each children does not happen atomically. Therefore, if there are any processes that go away during this time, we can't stat them! Maybe it's sufficiently rare, but I'd rather get an approximate answer when this happens, rather than give up, no?
>     
>     Say there are 50 processes, and while we stat, one goes away. I'd rather get the usage for the remaining 49 processes than nothing at all.

Good point. Makes sense.


> On March 7, 2013, 6:25 p.m., Vinod Kone wrote:
> > src/slave/process_based_isolation_module.cpp, line 318
> > <https://reviews.apache.org/r/9795/diff/1/?file=267811#file267811line318>
> >
> >     s/children/descendants/
> 
> Ben Mahler wrote:
>     I think it's clear that it's recursive children, no?


- Vinod


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/9795/#review17556
-----------------------------------------------------------


On March 8, 2013, 2:27 a.m., Ben Mahler wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/9795/
> -----------------------------------------------------------
> 
> (Updated March 8, 2013, 2:27 a.m.)
> 
> 
> Review request for mesos, Benjamin Hindman and Vinod Kone.
> 
> 
> Description
> -------
> 
> See subject.
> 
> 
> This addresses bug MESOS-370.
>     https://issues.apache.org/jira/browse/MESOS-370
> 
> 
> Diffs
> -----
> 
>   src/slave/process_based_isolation_module.cpp ff98d105af675dfc66070feaa43b42c1aa438fd8 
> 
> Diff: https://reviews.apache.org/r/9795/diff/
> 
> 
> Testing
> -------
> 
> make check and added tests in a subsequent review
> 
> 
> Thanks,
> 
> Ben Mahler
> 
>


Re: Review Request: Added child process monitoring to process based isolation module.

Posted by Ben Mahler <be...@gmail.com>.

> On March 7, 2013, 6:25 p.m., Vinod Kone wrote:
> > src/slave/process_based_isolation_module.cpp, line 332
> > <https://reviews.apache.org/r/9795/diff/1/?file=267811#file267811line332>
> >
> >     I think it makes more sense to return an error here instead of returning a wrong value to the user!

I don't think I agree, because gathering all the children, and obtaining the status of each children does not happen atomically. Therefore, if there are any processes that go away during this time, we can't stat them! Maybe it's sufficiently rare, but I'd rather get an approximate answer when this happens, rather than give up, no?

Say there are 50 processes, and while we stat, one goes away. I'd rather get the usage for the remaining 49 processes than nothing at all.


> On March 7, 2013, 6:25 p.m., Vinod Kone wrote:
> > src/slave/process_based_isolation_module.cpp, line 318
> > <https://reviews.apache.org/r/9795/diff/1/?file=267811#file267811line318>
> >
> >     s/children/descendants/

I think it's clear that it's recursive children, no?


- Ben


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/9795/#review17556
-----------------------------------------------------------


On March 7, 2013, 3:18 a.m., Ben Mahler wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/9795/
> -----------------------------------------------------------
> 
> (Updated March 7, 2013, 3:18 a.m.)
> 
> 
> Review request for mesos, Benjamin Hindman and Vinod Kone.
> 
> 
> Description
> -------
> 
> See subject.
> 
> 
> This addresses bug MESOS-370.
>     https://issues.apache.org/jira/browse/MESOS-370
> 
> 
> Diffs
> -----
> 
>   src/slave/process_based_isolation_module.cpp ff98d105af675dfc66070feaa43b42c1aa438fd8 
> 
> Diff: https://reviews.apache.org/r/9795/diff/
> 
> 
> Testing
> -------
> 
> make check and added tests in a subsequent review
> 
> 
> Thanks,
> 
> Ben Mahler
> 
>


Re: Review Request: Added child process monitoring to process based isolation module.

Posted by Vinod Kone <vi...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/9795/#review17556
-----------------------------------------------------------



src/slave/process_based_isolation_module.cpp
<https://reviews.apache.org/r/9795/#comment37217>

    +1 pull this up to where pageSize is initialized.



src/slave/process_based_isolation_module.cpp
<https://reviews.apache.org/r/9795/#comment37220>

    s/children/descendants/



src/slave/process_based_isolation_module.cpp
<https://reviews.apache.org/r/9795/#comment37218>

    i usually like to put ": " on the same line as error. but upto you.



src/slave/process_based_isolation_module.cpp
<https://reviews.apache.org/r/9795/#comment37219>

    s/all the child process usage/the usage of all child processes/



src/slave/process_based_isolation_module.cpp
<https://reviews.apache.org/r/9795/#comment37221>

    umm..i definitely prefer ancestor to parent!
    
    



src/slave/process_based_isolation_module.cpp
<https://reviews.apache.org/r/9795/#comment37222>

    I think it makes more sense to return an error here instead of returning a wrong value to the user!



src/slave/process_based_isolation_module.cpp
<https://reviews.apache.org/r/9795/#comment37223>

    ditto.


- Vinod Kone


On March 7, 2013, 3:18 a.m., Ben Mahler wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/9795/
> -----------------------------------------------------------
> 
> (Updated March 7, 2013, 3:18 a.m.)
> 
> 
> Review request for mesos, Benjamin Hindman and Vinod Kone.
> 
> 
> Description
> -------
> 
> See subject.
> 
> 
> This addresses bug MESOS-370.
>     https://issues.apache.org/jira/browse/MESOS-370
> 
> 
> Diffs
> -----
> 
>   src/slave/process_based_isolation_module.cpp ff98d105af675dfc66070feaa43b42c1aa438fd8 
> 
> Diff: https://reviews.apache.org/r/9795/diff/
> 
> 
> Testing
> -------
> 
> make check and added tests in a subsequent review
> 
> 
> Thanks,
> 
> Ben Mahler
> 
>


Re: Review Request: Added child process monitoring to process based isolation module.

Posted by Ben Mahler <be...@gmail.com>.

> On March 7, 2013, 8:17 a.m., Benjamin Hindman wrote:
> > src/slave/process_based_isolation_module.cpp, lines 331-332
> > <https://reviews.apache.org/r/9795/diff/1/?file=267811#file267811line331>
> >
> >     s/descendant/child/ and s/ancestor/parent/?

Went with parent and descendant.


> On March 7, 2013, 8:17 a.m., Benjamin Hindman wrote:
> > src/slave/process_based_isolation_module.cpp, line 344
> > <https://reviews.apache.org/r/9795/diff/1/?file=267811#file267811line344>
> >
> >     If we ever start to integrate some of the monitoring into the webui it would be swell to be able to play with it on our laptops. For that, I suggest some os::shell calls here. See src/scripts/killtree.sh for inspiration (I could even imagine a minor refactor of killtree.sh so you can easily just print all pids in the tree without actually killing them, i.e., by sending them a CONT signal or 0).

Agreed, I'm planning on implementing an OSX proc interface. Maybe that interface shells out as you said, but I was planning on doing it programmatically.


> On March 7, 2013, 8:17 a.m., Benjamin Hindman wrote:
> > src/slave/process_based_isolation_module.cpp, line 307
> > <https://reviews.apache.org/r/9795/diff/1/?file=267811#file267811line307>
> >
> >     Adding on to a comment in another review, this would then be 'proc::status(info->pid)' which sounds good to me and a /proc/[pid]/status actually exists and is just the prettyfied version of /proc/[pid]/status.

Done in the earlier review. I find it easier to linearly progress through the reviews, so adding comments like this in the right review helps me a lot!


- Ben


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/9795/#review17522
-----------------------------------------------------------


On March 7, 2013, 3:18 a.m., Ben Mahler wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/9795/
> -----------------------------------------------------------
> 
> (Updated March 7, 2013, 3:18 a.m.)
> 
> 
> Review request for mesos, Benjamin Hindman and Vinod Kone.
> 
> 
> Description
> -------
> 
> See subject.
> 
> 
> This addresses bug MESOS-370.
>     https://issues.apache.org/jira/browse/MESOS-370
> 
> 
> Diffs
> -----
> 
>   src/slave/process_based_isolation_module.cpp ff98d105af675dfc66070feaa43b42c1aa438fd8 
> 
> Diff: https://reviews.apache.org/r/9795/diff/
> 
> 
> Testing
> -------
> 
> make check and added tests in a subsequent review
> 
> 
> Thanks,
> 
> Ben Mahler
> 
>


Re: Review Request: Added child process monitoring to process based isolation module.

Posted by Benjamin Hindman <be...@berkeley.edu>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/9795/#review17522
-----------------------------------------------------------



src/slave/process_based_isolation_module.cpp
<https://reviews.apache.org/r/9795/#comment37194>

    I don't understand why these are PCHECKs? Maybe if you did this after each of the sysconf's that would make sense. But then I'd probably return a failed future!



src/slave/process_based_isolation_module.cpp
<https://reviews.apache.org/r/9795/#comment37189>

    Adding on to a comment in another review, this would then be 'proc::status(info->pid)' which sounds good to me and a /proc/[pid]/status actually exists and is just the prettyfied version of /proc/[pid]/status.



src/slave/process_based_isolation_module.cpp
<https://reviews.apache.org/r/9795/#comment37190>

    s/descendant/child/ and s/ancestor/parent/?



src/slave/process_based_isolation_module.cpp
<https://reviews.apache.org/r/9795/#comment37192>

    What's the thought behind this formatting? 



src/slave/process_based_isolation_module.cpp
<https://reviews.apache.org/r/9795/#comment37193>

    If we ever start to integrate some of the monitoring into the webui it would be swell to be able to play with it on our laptops. For that, I suggest some os::shell calls here. See src/scripts/killtree.sh for inspiration (I could even imagine a minor refactor of killtree.sh so you can easily just print all pids in the tree without actually killing them, i.e., by sending them a CONT signal or 0).



src/slave/process_based_isolation_module.cpp
<https://reviews.apache.org/r/9795/#comment37191>

    You don't need the 'struct'.


- Benjamin Hindman


On March 7, 2013, 3:18 a.m., Ben Mahler wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/9795/
> -----------------------------------------------------------
> 
> (Updated March 7, 2013, 3:18 a.m.)
> 
> 
> Review request for mesos, Benjamin Hindman and Vinod Kone.
> 
> 
> Description
> -------
> 
> See subject.
> 
> 
> This addresses bug MESOS-370.
>     https://issues.apache.org/jira/browse/MESOS-370
> 
> 
> Diffs
> -----
> 
>   src/slave/process_based_isolation_module.cpp ff98d105af675dfc66070feaa43b42c1aa438fd8 
> 
> Diff: https://reviews.apache.org/r/9795/diff/
> 
> 
> Testing
> -------
> 
> make check and added tests in a subsequent review
> 
> 
> Thanks,
> 
> Ben Mahler
> 
>