You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mesos.apache.org by Ben Mahler <be...@gmail.com> on 2013/08/29 23:08:14 UTC

Review Request 13904: Fixed CgroupsIsolator to listen for OOMs of recovered executors.

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/13904/
-----------------------------------------------------------

Review request for mesos, Benjamin Hindman, Jie Yu, David Mackey, and Vinod Kone.


Bugs: MESOS-671
    https://issues.apache.org/jira/browse/MESOS-671


Repository: mesos-git


Description
-------

See MESOS-671.

This appears to be the culprit behind triggering MESOS-662 on a frequent basis for recovered slaves.


Diffs
-----

  src/slave/cgroups_isolator.cpp 676768e6b8bd13820467309814845257a9c47e02 

Diff: https://reviews.apache.org/r/13904/diff/


Testing
-------

make check

This requires an integration test to catch, it appears the balloon test could be enhanced to have the slave recover, but punting on adding complexity there until we figure out a good testing strategy to ensure recovered slaves operate the same as non-recovered slaves.


Thanks,

Ben Mahler


Re: Review Request 13904: Fixed CgroupsIsolator to listen for OOMs of recovered executors.

Posted by Vinod Kone <vi...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/13904/#review25785
-----------------------------------------------------------

Ship it!


Ship It!

- Vinod Kone


On Aug. 29, 2013, 9:16 p.m., Ben Mahler wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/13904/
> -----------------------------------------------------------
> 
> (Updated Aug. 29, 2013, 9:16 p.m.)
> 
> 
> Review request for mesos, Benjamin Hindman, Jie Yu, David Mackey, and Vinod Kone.
> 
> 
> Bugs: MESOS-671
>     https://issues.apache.org/jira/browse/MESOS-671
> 
> 
> Repository: mesos-git
> 
> 
> Description
> -------
> 
> See MESOS-671.
> 
> This appears to be the culprit behind triggering MESOS-662 on a frequent basis for recovered slaves.
> 
> 
> Diffs
> -----
> 
>   src/slave/cgroups_isolator.cpp 676768e6b8bd13820467309814845257a9c47e02 
> 
> Diff: https://reviews.apache.org/r/13904/diff/
> 
> 
> Testing
> -------
> 
> make check
> 
> This requires an integration test to catch, it appears the balloon test could be enhanced to have the slave recover, but punting on adding complexity there until we figure out a good testing strategy to ensure recovered slaves operate the same as non-recovered slaves.
> 
> 
> Thanks,
> 
> Ben Mahler
> 
>


Re: Review Request 13904: Fixed CgroupsIsolator to listen for OOMs of recovered executors.

Posted by Ben Mahler <be...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/13904/
-----------------------------------------------------------

(Updated Sept. 4, 2013, 2:44 a.m.)


Review request for mesos, Benjamin Hindman and Vinod Kone.


Changes
-------

Rebase.


Bugs: MESOS-671
    https://issues.apache.org/jira/browse/MESOS-671


Repository: mesos-git


Description
-------

See MESOS-671.

This appears to be the culprit behind triggering MESOS-662 on a frequent basis for recovered slaves.


Diffs (updated)
-----

  src/slave/cgroups_isolator.cpp 676768e6b8bd13820467309814845257a9c47e02 

Diff: https://reviews.apache.org/r/13904/diff/


Testing
-------

make check

This requires an integration test to catch, it appears the balloon test could be enhanced to have the slave recover, but punting on adding complexity there until we figure out a good testing strategy to ensure recovered slaves operate the same as non-recovered slaves.


Thanks,

Ben Mahler


Re: Review Request 13904: Fixed CgroupsIsolator to listen for OOMs of recovered executors.

Posted by Ben Mahler <be...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/13904/
-----------------------------------------------------------

(Updated Aug. 31, 2013, 12:29 a.m.)


Review request for mesos, Benjamin Hindman and Vinod Kone.


Changes
-------

Rebase.


Bugs: MESOS-671
    https://issues.apache.org/jira/browse/MESOS-671


Repository: mesos-git


Description
-------

See MESOS-671.

This appears to be the culprit behind triggering MESOS-662 on a frequent basis for recovered slaves.


Diffs (updated)
-----

  src/slave/cgroups_isolator.cpp 676768e6b8bd13820467309814845257a9c47e02 

Diff: https://reviews.apache.org/r/13904/diff/


Testing
-------

make check

This requires an integration test to catch, it appears the balloon test could be enhanced to have the slave recover, but punting on adding complexity there until we figure out a good testing strategy to ensure recovered slaves operate the same as non-recovered slaves.


Thanks,

Ben Mahler


Re: Review Request 13904: Fixed CgroupsIsolator to listen for OOMs of recovered executors.

Posted by Ben Mahler <be...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/13904/
-----------------------------------------------------------

(Updated Aug. 29, 2013, 9:16 p.m.)


Review request for mesos, Benjamin Hindman, Jie Yu, David Mackey, and Vinod Kone.


Bugs: MESOS-671
    https://issues.apache.org/jira/browse/MESOS-671


Repository: mesos-git


Description
-------

See MESOS-671.

This appears to be the culprit behind triggering MESOS-662 on a frequent basis for recovered slaves.


Diffs
-----

  src/slave/cgroups_isolator.cpp 676768e6b8bd13820467309814845257a9c47e02 

Diff: https://reviews.apache.org/r/13904/diff/


Testing
-------

make check

This requires an integration test to catch, it appears the balloon test could be enhanced to have the slave recover, but punting on adding complexity there until we figure out a good testing strategy to ensure recovered slaves operate the same as non-recovered slaves.


Thanks,

Ben Mahler


Re: Review Request 13904: Fixed CgroupsIsolator to listen for OOMs of recovered executors.

Posted by Ben Mahler <be...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/13904/
-----------------------------------------------------------

(Updated Aug. 29, 2013, 9:15 p.m.)


Review request for mesos, Benjamin Hindman, Jie Yu, David Mackey, and Vinod Kone.


Bugs: MESOS-671
    https://issues.apache.org/jira/browse/MESOS-671


Repository: mesos-git


Description
-------

See MESOS-671.

This appears to be the culprit behind triggering MESOS-662 on a frequent basis for recovered slaves.


Diffs
-----

  src/slave/cgroups_isolator.cpp 676768e6b8bd13820467309814845257a9c47e02 

Diff: https://reviews.apache.org/r/13904/diff/


Testing
-------

make check

This requires an integration test to catch, it appears the balloon test could be enhanced to have the slave recover, but punting on adding complexity there until we figure out a good testing strategy to ensure recovered slaves operate the same as non-recovered slaves.


Thanks,

Ben Mahler


Re: Review Request 13904: Fixed CgroupsIsolator to listen for OOMs of recovered executors.

Posted by Ben Mahler <be...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/13904/
-----------------------------------------------------------

(Updated Aug. 29, 2013, 9:15 p.m.)


Review request for mesos, Benjamin Hindman and Vinod Kone.


Changes
-------

Small cleanup of the existing recover() code.


Bugs: MESOS-671
    https://issues.apache.org/jira/browse/MESOS-671


Repository: mesos-git


Description
-------

See MESOS-671.

This appears to be the culprit behind triggering MESOS-662 on a frequent basis for recovered slaves.


Diffs (updated)
-----

  src/slave/cgroups_isolator.cpp 676768e6b8bd13820467309814845257a9c47e02 

Diff: https://reviews.apache.org/r/13904/diff/


Testing
-------

make check

This requires an integration test to catch, it appears the balloon test could be enhanced to have the slave recover, but punting on adding complexity there until we figure out a good testing strategy to ensure recovered slaves operate the same as non-recovered slaves.


Thanks,

Ben Mahler