You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mesos.apache.org by Ian Downes <ia...@gmail.com> on 2014/06/09 06:16:05 UTC

Re: Review Request 20817: Refactored cgroups::internal::Freezer

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/20817/
-----------------------------------------------------------

(Updated June 8, 2014, 9:16 p.m.)


Review request for mesos and Jie Yu.


Changes
-------

Rebased.


Bugs: MESOS-473
    https://issues.apache.org/jira/browse/MESOS-473


Repository: mesos-git


Description
-------

    The Freezer tries to converge to the "FROZEN" state by repeatedly (every
    100 ms) writing "FROZEN" to the freezer.state control file (up to a
    configurable timeout). It assumes there are two possible reasons why a
    process does not get frozen during an attempt:

    1. It was in the middle of being forked and did not receive the signal;
    it will receive it at the next attempt.

    2. It is in uninterruptable sleep ("D" state). Normally, this is from
    device I/O or paging and is shortlived, in which case it'll be frozen on
    retry. However, processes can get stuck in "D" state, either because of
    a device issue, incorrect OOM handling, or kernel bugs. Under this
    scenario the correct behavior is to fail after a timeout (defaults to 60
    seconds).

    Freezer functions have been namespaced under cgroups::freezer.


Diffs (updated)
-----

  src/linux/cgroups.hpp 21d87a0783c2edd653d28fa89c59773200ae647e 
  src/linux/cgroups.cpp 142ac437d6d53b678ef284bda46444e1615ff0d1 
  src/tests/cgroups_tests.cpp 5f674cd678e67f10bfef4620d927bb5af7c93753 

Diff: https://reviews.apache.org/r/20817/diff/


Testing
-------

make check # Linux


Thanks,

Ian Downes


Re: Review Request 20817: Refactored cgroups::internal::Freezer

Posted by Mesos ReviewBot <de...@mesos.apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/20817/#review45059
-----------------------------------------------------------


Bad patch!

Reviews applied: [20817]

Failed command: make -j3 distcheck GTEST_FILTER='' >/dev/null

Error:
 configure: WARNING: can not find python-boto
-------------------------------------------------------------------
mesos-ec2 services will not function.
-------------------------------------------------------------------
ev.c:1531:31: warning: 'ev_default_loop_ptr' initialized and declared 'extern' [enabled by default]
ev.c: In function 'evpipe_write':
ev.c:2160:17: warning: ignoring return value of 'write', declared with attribute warn_unused_result [-Wunused-result]
ev.c:2172:17: warning: ignoring return value of 'write', declared with attribute warn_unused_result [-Wunused-result]
ev.c: In function 'pipecb':
ev.c:2193:16: warning: ignoring return value of 'read', declared with attribute warn_unused_result [-Wunused-result]
ev.c:2207:16: warning: ignoring return value of 'read', declared with attribute warn_unused_result [-Wunused-result]
In file included from /usr/include/c++/4.6/ext/hash_set:61:0,
                 from src/glog/stl_logging.h:54,
                 from src/stl_logging_unittest.cc:34:
/usr/include/c++/4.6/backward/backward_warning.h:33:2: warning: #warning This file includes at least one deprecated or antiquated header which may be removed without further notice at a future date. Please use a non-deprecated interface with equivalent functionality instead. For a listing of replacement headers and interfaces, consult the file backward_warning.h. To disable this warning use -Wno-deprecated. [-Wcpp]
In file included from src/utilities.h:73:0,
                 from src/googletest.h:38,
                 from src/stl_logging_unittest.cc:48:
src/base/mutex.h:137:0: warning: "_XOPEN_SOURCE" redefined [enabled by default]
/usr/include/features.h:166:0: note: this is the location of the previous definition
warning: no files found matching 'Makefile' under directory 'docs'
warning: no files found matching 'indexsidebar.html' under directory 'docs'
ar: creating libleveldb.a
zip_safe flag not set; analyzing archive contents...
../../src/linux/cgroups.cpp:1449:40: error: 'EMPTY_WATCHER_RETRIES' was not declared in this scope
../../src/linux/cgroups.cpp: In member function 'process::Future<bool> cgroups::internal::TasksKiller::empty()':
../../src/linux/cgroups.cpp:1603:73: error: the default argument for parameter 3 of 'cgroups::internal::EmptyWatcher::EmptyWatcher(const string&, const string&, const Duration&, unsigned int)' has not yet been parsed
make[3]: *** [linux/libmesos_no_3rdparty_la-cgroups.lo] Error 1
make[3]: *** Waiting for unfinished jobs....
make[2]: *** [all] Error 2
make[1]: *** [all-recursive] Error 1
make: *** [distcheck] Error 1


- Mesos ReviewBot


On June 9, 2014, 4:16 a.m., Ian Downes wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/20817/
> -----------------------------------------------------------
> 
> (Updated June 9, 2014, 4:16 a.m.)
> 
> 
> Review request for mesos and Jie Yu.
> 
> 
> Bugs: MESOS-473
>     https://issues.apache.org/jira/browse/MESOS-473
> 
> 
> Repository: mesos-git
> 
> 
> Description
> -------
> 
>     The Freezer tries to converge to the "FROZEN" state by repeatedly (every
>     100 ms) writing "FROZEN" to the freezer.state control file (up to a
>     configurable timeout). It assumes there are two possible reasons why a
>     process does not get frozen during an attempt:
> 
>     1. It was in the middle of being forked and did not receive the signal;
>     it will receive it at the next attempt.
> 
>     2. It is in uninterruptable sleep ("D" state). Normally, this is from
>     device I/O or paging and is shortlived, in which case it'll be frozen on
>     retry. However, processes can get stuck in "D" state, either because of
>     a device issue, incorrect OOM handling, or kernel bugs. Under this
>     scenario the correct behavior is to fail after a timeout (defaults to 60
>     seconds).
> 
>     Freezer functions have been namespaced under cgroups::freezer.
> 
> 
> Diffs
> -----
> 
>   src/linux/cgroups.hpp 21d87a0783c2edd653d28fa89c59773200ae647e 
>   src/linux/cgroups.cpp 142ac437d6d53b678ef284bda46444e1615ff0d1 
>   src/tests/cgroups_tests.cpp 5f674cd678e67f10bfef4620d927bb5af7c93753 
> 
> Diff: https://reviews.apache.org/r/20817/diff/
> 
> 
> Testing
> -------
> 
> make check # Linux
> 
> 
> Thanks,
> 
> Ian Downes
> 
>


Re: Review Request 20817: Refactored cgroups::internal::Freezer

Posted by Ian Downes <ia...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/20817/
-----------------------------------------------------------

(Updated June 16, 2014, 2:58 p.m.)


Review request for mesos and Jie Yu.


Bugs: MESOS-473
    https://issues.apache.org/jira/browse/MESOS-473


Repository: mesos-git


Description
-------

    The Freezer tries to converge to the "FROZEN" state by repeatedly (every
    100 ms) writing "FROZEN" to the freezer.state control file (up to a
    configurable timeout). It assumes there are two possible reasons why a
    process does not get frozen during an attempt:

    1. It was in the middle of being forked and did not receive the signal;
    it will receive it at the next attempt.

    2. It is in uninterruptable sleep ("D" state). Normally, this is from
    device I/O or paging and is shortlived, in which case it'll be frozen on
    retry. However, processes can get stuck in "D" state, either because of
    a device issue, incorrect OOM handling, or kernel bugs. Under this
    scenario the correct behavior is to fail after a timeout (defaults to 60
    seconds).

    Freezer functions have been namespaced under cgroups::freezer.


Diffs (updated)
-----

  src/linux/cgroups.hpp 21d87a0783c2edd653d28fa89c59773200ae647e 
  src/linux/cgroups.cpp 142ac437d6d53b678ef284bda46444e1615ff0d1 
  src/tests/cgroups_tests.cpp adcf46b1a4071b8f29475d2e20aae6dd8fdf09e7 

Diff: https://reviews.apache.org/r/20817/diff/


Testing
-------

make check # Linux


Thanks,

Ian Downes


Re: Review Request 20817: Refactored cgroups::internal::Freezer

Posted by Ian Downes <ia...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/20817/
-----------------------------------------------------------

(Updated June 11, 2014, 3:21 p.m.)


Review request for mesos and Jie Yu.


Changes
-------

Terminate immediately on discard.


Bugs: MESOS-473
    https://issues.apache.org/jira/browse/MESOS-473


Repository: mesos-git


Description
-------

    The Freezer tries to converge to the "FROZEN" state by repeatedly (every
    100 ms) writing "FROZEN" to the freezer.state control file (up to a
    configurable timeout). It assumes there are two possible reasons why a
    process does not get frozen during an attempt:

    1. It was in the middle of being forked and did not receive the signal;
    it will receive it at the next attempt.

    2. It is in uninterruptable sleep ("D" state). Normally, this is from
    device I/O or paging and is shortlived, in which case it'll be frozen on
    retry. However, processes can get stuck in "D" state, either because of
    a device issue, incorrect OOM handling, or kernel bugs. Under this
    scenario the correct behavior is to fail after a timeout (defaults to 60
    seconds).

    Freezer functions have been namespaced under cgroups::freezer.


Diffs (updated)
-----

  src/linux/cgroups.hpp 21d87a0783c2edd653d28fa89c59773200ae647e 
  src/linux/cgroups.cpp 142ac437d6d53b678ef284bda46444e1615ff0d1 
  src/tests/cgroups_tests.cpp 5f674cd678e67f10bfef4620d927bb5af7c93753 

Diff: https://reviews.apache.org/r/20817/diff/


Testing
-------

make check # Linux


Thanks,

Ian Downes


Re: Review Request 20817: Refactored cgroups::internal::Freezer

Posted by Jie Yu <yu...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/20817/#review45397
-----------------------------------------------------------

Ship it!



src/linux/cgroups.hpp
<https://reviews.apache.org/r/20817/#comment80233>

    Is this used anywhere?



src/linux/cgroups.cpp
<https://reviews.apache.org/r/20817/#comment80236>

    Why not terminate the process immediately like we did before. This cause extra unnecessary delay (if there are events in the queue) and may cause some weird issue.


- Jie Yu


On June 11, 2014, 5:27 p.m., Ian Downes wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/20817/
> -----------------------------------------------------------
> 
> (Updated June 11, 2014, 5:27 p.m.)
> 
> 
> Review request for mesos and Jie Yu.
> 
> 
> Bugs: MESOS-473
>     https://issues.apache.org/jira/browse/MESOS-473
> 
> 
> Repository: mesos-git
> 
> 
> Description
> -------
> 
>     The Freezer tries to converge to the "FROZEN" state by repeatedly (every
>     100 ms) writing "FROZEN" to the freezer.state control file (up to a
>     configurable timeout). It assumes there are two possible reasons why a
>     process does not get frozen during an attempt:
> 
>     1. It was in the middle of being forked and did not receive the signal;
>     it will receive it at the next attempt.
> 
>     2. It is in uninterruptable sleep ("D" state). Normally, this is from
>     device I/O or paging and is shortlived, in which case it'll be frozen on
>     retry. However, processes can get stuck in "D" state, either because of
>     a device issue, incorrect OOM handling, or kernel bugs. Under this
>     scenario the correct behavior is to fail after a timeout (defaults to 60
>     seconds).
> 
>     Freezer functions have been namespaced under cgroups::freezer.
> 
> 
> Diffs
> -----
> 
>   src/linux/cgroups.hpp 21d87a0783c2edd653d28fa89c59773200ae647e 
>   src/linux/cgroups.cpp 142ac437d6d53b678ef284bda46444e1615ff0d1 
>   src/tests/cgroups_tests.cpp 5f674cd678e67f10bfef4620d927bb5af7c93753 
> 
> Diff: https://reviews.apache.org/r/20817/diff/
> 
> 
> Testing
> -------
> 
> make check # Linux
> 
> 
> Thanks,
> 
> Ian Downes
> 
>


Re: Review Request 20817: Refactored cgroups::internal::Freezer

Posted by Ian Downes <ia...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/20817/
-----------------------------------------------------------

(Updated June 11, 2014, 10:27 a.m.)


Review request for mesos and Jie Yu.


Changes
-------

Added back some code that is removed too early in this commit.


Bugs: MESOS-473
    https://issues.apache.org/jira/browse/MESOS-473


Repository: mesos-git


Description
-------

    The Freezer tries to converge to the "FROZEN" state by repeatedly (every
    100 ms) writing "FROZEN" to the freezer.state control file (up to a
    configurable timeout). It assumes there are two possible reasons why a
    process does not get frozen during an attempt:

    1. It was in the middle of being forked and did not receive the signal;
    it will receive it at the next attempt.

    2. It is in uninterruptable sleep ("D" state). Normally, this is from
    device I/O or paging and is shortlived, in which case it'll be frozen on
    retry. However, processes can get stuck in "D" state, either because of
    a device issue, incorrect OOM handling, or kernel bugs. Under this
    scenario the correct behavior is to fail after a timeout (defaults to 60
    seconds).

    Freezer functions have been namespaced under cgroups::freezer.


Diffs (updated)
-----

  src/linux/cgroups.hpp 21d87a0783c2edd653d28fa89c59773200ae647e 
  src/linux/cgroups.cpp 142ac437d6d53b678ef284bda46444e1615ff0d1 
  src/tests/cgroups_tests.cpp 5f674cd678e67f10bfef4620d927bb5af7c93753 

Diff: https://reviews.apache.org/r/20817/diff/


Testing
-------

make check # Linux


Thanks,

Ian Downes