You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mesos.apache.org by David Mackey <td...@booleanhaiku.com> on 2013/02/15 07:42:15 UTC

Review Request: Initial support for CPU limits via CFS Bandwidth Control.

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/9464/
-----------------------------------------------------------

Review request for mesos, Benjamin Hindman, Vinod Kone, and Ben Mahler.


Description
-------

Initial support for CPU "hard limits" via CFS Bandwidth Control in cgroups.

CFS is unique relative to existing Mesos cgroups support in that it is a "subfeature" of an already supported cgroups subsystem, cpu. Also, there are two "tunables" for configuring CFS bandwidth limiting.

There are 4 approaches one could take:
1) Use the CFS bandwidth limiting if the feature is present
2) Expose as separate flag, eg "cpu,cfs,memory,freezer"
3) Add feature flag support to subsystems via an additional delimiter, eg "cpu+cfs,memory,freezer". 
4) Add an additional control flag via some other means

Option 2's downside breaks the 1:1 mapping between cgroups subsystems and a cgroups resource flag.
Option 3's downside is it greatly increases complexity of parsing cgroups subsystem flags.

This diff takes option 1. 


This addresses bug MESOS-315.
    https://issues.apache.org/jira/browse/MESOS-315


Diffs
-----

  src/slave/cgroups_isolation_module.cpp 14f549e 

Diff: https://reviews.apache.org/r/9464/diff/


Testing
-------

make check + additional testing


Thanks,

David Mackey


Re: Review Request: Initial support for CPU limits via CFS Bandwidth Control.

Posted by Ben Mahler <be...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/9464/#review17279
-----------------------------------------------------------


Can you resolve or drop the issues brought up in vinod and my reviews?

- Ben Mahler


On March 1, 2013, 9:24 p.m., David Mackey wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/9464/
> -----------------------------------------------------------
> 
> (Updated March 1, 2013, 9:24 p.m.)
> 
> 
> Review request for mesos, Benjamin Hindman, Vinod Kone, and Ben Mahler.
> 
> 
> Description
> -------
> 
> Initial support for CPU "hard limits" via CFS Bandwidth Control in cgroups.
> 
> CFS is unique relative to existing Mesos cgroups support in that it is a "subfeature" of an already supported cgroups subsystem, cpu. Also, there are two "tunables" for configuring CFS bandwidth limiting.
> 
> There are 4 approaches one could take:
> 1) Use the CFS bandwidth limiting if the feature is present
> 2) Expose as separate flag, eg "cpu,cfs,memory,freezer"
> 3) Add feature flag support to subsystems via an additional delimiter, eg "cpu+cfs,memory,freezer". 
> 4) Add an additional control flag via some other means
> 
> Option 2's downside breaks the 1:1 mapping between cgroups subsystems and a cgroups resource flag.
> Option 3's downside is it greatly increases complexity of parsing cgroups subsystem flags.
> 
> This diff takes option 1. 
> 
> 
> This addresses bug MESOS-315.
>     https://issues.apache.org/jira/browse/MESOS-315
> 
> 
> Diffs
> -----
> 
>   src/slave/cgroups_isolation_module.hpp a04fc46 
>   src/slave/cgroups_isolation_module.cpp a779de8 
>   src/slave/flags.hpp d4aa045 
> 
> Diff: https://reviews.apache.org/r/9464/diff/
> 
> 
> Testing
> -------
> 
> make check + additional testing
> 
> 
> Thanks,
> 
> David Mackey
> 
>


Re: Review Request: Initial support for CPU limits via CFS Bandwidth Control.

Posted by Vinod Kone <vi...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/9464/#review17274
-----------------------------------------------------------



src/slave/cgroups_isolation_module.cpp
<https://reviews.apache.org/r/9464/#comment36738>

    period at the end.



src/slave/cgroups_isolation_module.cpp
<https://reviews.apache.org/r/9464/#comment36739>

    s/, your/. Your/ ?



src/slave/cgroups_isolation_module.cpp
<https://reviews.apache.org/r/9464/#comment36740>

    quotes around cfsChanged.



src/slave/cgroups_isolation_module.cpp
<https://reviews.apache.org/r/9464/#comment36743>

    Considering mesos is responsible for manipulating resources, this should be a CHECK.



src/slave/cgroups_isolation_module.cpp
<https://reviews.apache.org/r/9464/#comment36744>

    Move these down to where they are used (line #815)


- Vinod Kone


On March 1, 2013, 9:24 p.m., David Mackey wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/9464/
> -----------------------------------------------------------
> 
> (Updated March 1, 2013, 9:24 p.m.)
> 
> 
> Review request for mesos, Benjamin Hindman, Vinod Kone, and Ben Mahler.
> 
> 
> Description
> -------
> 
> Initial support for CPU "hard limits" via CFS Bandwidth Control in cgroups.
> 
> CFS is unique relative to existing Mesos cgroups support in that it is a "subfeature" of an already supported cgroups subsystem, cpu. Also, there are two "tunables" for configuring CFS bandwidth limiting.
> 
> There are 4 approaches one could take:
> 1) Use the CFS bandwidth limiting if the feature is present
> 2) Expose as separate flag, eg "cpu,cfs,memory,freezer"
> 3) Add feature flag support to subsystems via an additional delimiter, eg "cpu+cfs,memory,freezer". 
> 4) Add an additional control flag via some other means
> 
> Option 2's downside breaks the 1:1 mapping between cgroups subsystems and a cgroups resource flag.
> Option 3's downside is it greatly increases complexity of parsing cgroups subsystem flags.
> 
> This diff takes option 1. 
> 
> 
> This addresses bug MESOS-315.
>     https://issues.apache.org/jira/browse/MESOS-315
> 
> 
> Diffs
> -----
> 
>   src/slave/cgroups_isolation_module.hpp a04fc46 
>   src/slave/cgroups_isolation_module.cpp a779de8 
>   src/slave/flags.hpp d4aa045 
> 
> Diff: https://reviews.apache.org/r/9464/diff/
> 
> 
> Testing
> -------
> 
> make check + additional testing
> 
> 
> Thanks,
> 
> David Mackey
> 
>


Re: Review Request: Initial support for CPU limits via CFS Bandwidth Control.

Posted by David Mackey <td...@booleanhaiku.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/9464/#review17289
-----------------------------------------------------------



src/slave/cgroups_isolation_module.cpp
<https://reviews.apache.org/r/9464/#comment36756>

    I replicated this comment from the subsystems loop as it applied here, but given that there's only one object in the list and that I left it assigned it to benh instead of myself I removed it. I'll re-add it.


- David Mackey


On March 2, 2013, 12:13 a.m., David Mackey wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/9464/
> -----------------------------------------------------------
> 
> (Updated March 2, 2013, 12:13 a.m.)
> 
> 
> Review request for mesos, Benjamin Hindman, Vinod Kone, and Ben Mahler.
> 
> 
> Description
> -------
> 
> Initial support for CPU "hard limits" via CFS Bandwidth Control in cgroups.
> 
> CFS is unique relative to existing Mesos cgroups support in that it is a "subfeature" of an already supported cgroups subsystem, cpu. Also, there are two "tunables" for configuring CFS bandwidth limiting.
> 
> There are 4 approaches one could take:
> 1) Use the CFS bandwidth limiting if the feature is present
> 2) Expose as separate flag, eg "cpu,cfs,memory,freezer"
> 3) Add feature flag support to subsystems via an additional delimiter, eg "cpu+cfs,memory,freezer". 
> 4) Add an additional control flag via some other means
> 
> Option 2's downside breaks the 1:1 mapping between cgroups subsystems and a cgroups resource flag.
> Option 3's downside is it greatly increases complexity of parsing cgroups subsystem flags.
> 
> This diff takes option 1. 
> 
> 
> This addresses bug MESOS-315.
>     https://issues.apache.org/jira/browse/MESOS-315
> 
> 
> Diffs
> -----
> 
>   src/slave/cgroups_isolation_module.hpp a04fc46 
>   src/slave/cgroups_isolation_module.cpp a779de8 
>   src/slave/flags.hpp d4aa045 
> 
> Diff: https://reviews.apache.org/r/9464/diff/
> 
> 
> Testing
> -------
> 
> make check + additional testing
> 
> 
> Thanks,
> 
> David Mackey
> 
>


Re: Review Request: Initial support for CPU limits via CFS Bandwidth Control.

Posted by David Mackey <td...@booleanhaiku.com>.

> On March 2, 2013, 12:16 a.m., Vinod Kone wrote:
> > src/slave/cgroups_isolation_module.cpp, lines 243-244
> > <https://reviews.apache.org/r/9464/diff/4-5/?file=264295#file264295line243>
> >
> >     Why did you remove this?

I replicated this comment from the subsystems loop as it applied here, but given that there's only one object in the list and that I left it assigned it to benh instead of myself I removed it. I'll re-add it.


- David


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/9464/#review17284
-----------------------------------------------------------


On March 2, 2013, 12:13 a.m., David Mackey wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/9464/
> -----------------------------------------------------------
> 
> (Updated March 2, 2013, 12:13 a.m.)
> 
> 
> Review request for mesos, Benjamin Hindman, Vinod Kone, and Ben Mahler.
> 
> 
> Description
> -------
> 
> Initial support for CPU "hard limits" via CFS Bandwidth Control in cgroups.
> 
> CFS is unique relative to existing Mesos cgroups support in that it is a "subfeature" of an already supported cgroups subsystem, cpu. Also, there are two "tunables" for configuring CFS bandwidth limiting.
> 
> There are 4 approaches one could take:
> 1) Use the CFS bandwidth limiting if the feature is present
> 2) Expose as separate flag, eg "cpu,cfs,memory,freezer"
> 3) Add feature flag support to subsystems via an additional delimiter, eg "cpu+cfs,memory,freezer". 
> 4) Add an additional control flag via some other means
> 
> Option 2's downside breaks the 1:1 mapping between cgroups subsystems and a cgroups resource flag.
> Option 3's downside is it greatly increases complexity of parsing cgroups subsystem flags.
> 
> This diff takes option 1. 
> 
> 
> This addresses bug MESOS-315.
>     https://issues.apache.org/jira/browse/MESOS-315
> 
> 
> Diffs
> -----
> 
>   src/slave/cgroups_isolation_module.hpp a04fc46 
>   src/slave/cgroups_isolation_module.cpp a779de8 
>   src/slave/flags.hpp d4aa045 
> 
> Diff: https://reviews.apache.org/r/9464/diff/
> 
> 
> Testing
> -------
> 
> make check + additional testing
> 
> 
> Thanks,
> 
> David Mackey
> 
>


Re: Review Request: Initial support for CPU limits via CFS Bandwidth Control.

Posted by Vinod Kone <vi...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/9464/#review17284
-----------------------------------------------------------

Ship it!



src/slave/cgroups_isolation_module.cpp
<https://reviews.apache.org/r/9464/#comment36753>

    Why did you remove this?


- Vinod Kone


On March 2, 2013, 12:13 a.m., David Mackey wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/9464/
> -----------------------------------------------------------
> 
> (Updated March 2, 2013, 12:13 a.m.)
> 
> 
> Review request for mesos, Benjamin Hindman, Vinod Kone, and Ben Mahler.
> 
> 
> Description
> -------
> 
> Initial support for CPU "hard limits" via CFS Bandwidth Control in cgroups.
> 
> CFS is unique relative to existing Mesos cgroups support in that it is a "subfeature" of an already supported cgroups subsystem, cpu. Also, there are two "tunables" for configuring CFS bandwidth limiting.
> 
> There are 4 approaches one could take:
> 1) Use the CFS bandwidth limiting if the feature is present
> 2) Expose as separate flag, eg "cpu,cfs,memory,freezer"
> 3) Add feature flag support to subsystems via an additional delimiter, eg "cpu+cfs,memory,freezer". 
> 4) Add an additional control flag via some other means
> 
> Option 2's downside breaks the 1:1 mapping between cgroups subsystems and a cgroups resource flag.
> Option 3's downside is it greatly increases complexity of parsing cgroups subsystem flags.
> 
> This diff takes option 1. 
> 
> 
> This addresses bug MESOS-315.
>     https://issues.apache.org/jira/browse/MESOS-315
> 
> 
> Diffs
> -----
> 
>   src/slave/cgroups_isolation_module.hpp a04fc46 
>   src/slave/cgroups_isolation_module.cpp a779de8 
>   src/slave/flags.hpp d4aa045 
> 
> Diff: https://reviews.apache.org/r/9464/diff/
> 
> 
> Testing
> -------
> 
> make check + additional testing
> 
> 
> Thanks,
> 
> David Mackey
> 
>


Re: Review Request: Initial support for CPU limits via CFS Bandwidth Control.

Posted by Ben Mahler <be...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/9464/#review18412
-----------------------------------------------------------

Ship it!


Modulo comments below.


src/slave/cgroups_isolator.cpp
<https://reviews.apache.org/r/9464/#comment38611>

    Let's keep this consistent with other messages from initialize(), so:
    
    s/The/Your



src/slave/cgroups_isolator.cpp
<https://reviews.apache.org/r/9464/#comment38615>

    What is the "parent handler"? I think you can just say "Set the cpu shares as well" to be clear as to what the parent handler is doing.



src/slave/cgroups_isolator.cpp
<https://reviews.apache.org/r/9464/#comment38612>

    Does this need to be namespaced?



src/slave/flags.hpp
<https://reviews.apache.org/r/9464/#comment38613>

    s/ (cgroups_enable_cfs=true)//



src/slave/flags.hpp
<https://reviews.apache.org/r/9464/#comment38614>

    s/true/false
    
    We definitely don't want to enable by default initially.


- Ben Mahler


On March 26, 2013, 11:44 p.m., David Mackey wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/9464/
> -----------------------------------------------------------
> 
> (Updated March 26, 2013, 11:44 p.m.)
> 
> 
> Review request for mesos, Benjamin Hindman, Vinod Kone, and Ben Mahler.
> 
> 
> Description
> -------
> 
> Initial support for CPU "hard limits" via CFS Bandwidth Control in cgroups.
> 
> CFS is unique relative to existing Mesos cgroups support in that it is a "subfeature" of an already supported cgroups subsystem, cpu. Also, there are two "tunables" for configuring CFS bandwidth limiting.
> 
> There are 4 approaches one could take:
> 1) Use the CFS bandwidth limiting if the feature is present
> 2) Expose as separate flag, eg "cpu,cfs,memory,freezer"
> 3) Add feature flag support to subsystems via an additional delimiter, eg "cpu+cfs,memory,freezer". 
> 4) Add an additional control flag via some other means
> 
> Option 2's downside breaks the 1:1 mapping between cgroups subsystems and a cgroups resource flag.
> Option 3's downside is it greatly increases complexity of parsing cgroups subsystem flags.
> 
> This diff takes option 1. 
> 
> 
> This addresses bug MESOS-315.
>     https://issues.apache.org/jira/browse/MESOS-315
> 
> 
> Diffs
> -----
> 
>   src/slave/cgroups_isolator.hpp 1732c4e 
>   src/slave/cgroups_isolator.cpp ebc2843 
>   src/slave/flags.hpp 95969ae 
>   third_party/libprocess/third_party/stout/include/stout/multihashmap.hpp a8feabd 
> 
> Diff: https://reviews.apache.org/r/9464/diff/
> 
> 
> Testing
> -------
> 
> make check + additional testing
> 
> 
> Thanks,
> 
> David Mackey
> 
>


Re: Review Request: Initial support for CPU limits via CFS Bandwidth Control.

Posted by David Mackey <td...@booleanhaiku.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/9464/
-----------------------------------------------------------

(Updated March 27, 2013, 3:11 a.m.)


Review request for mesos, Benjamin Hindman, Vinod Kone, and Ben Mahler.


Changes
-------

Updated per comments.


Description
-------

Initial support for CPU "hard limits" via CFS Bandwidth Control in cgroups.

CFS is unique relative to existing Mesos cgroups support in that it is a "subfeature" of an already supported cgroups subsystem, cpu. Also, there are two "tunables" for configuring CFS bandwidth limiting.

There are 4 approaches one could take:
1) Use the CFS bandwidth limiting if the feature is present
2) Expose as separate flag, eg "cpu,cfs,memory,freezer"
3) Add feature flag support to subsystems via an additional delimiter, eg "cpu+cfs,memory,freezer". 
4) Add an additional control flag via some other means

Option 2's downside breaks the 1:1 mapping between cgroups subsystems and a cgroups resource flag.
Option 3's downside is it greatly increases complexity of parsing cgroups subsystem flags.

This diff takes option 1. 


This addresses bug MESOS-315.
    https://issues.apache.org/jira/browse/MESOS-315


Diffs (updated)
-----

  src/slave/cgroups_isolator.hpp 1732c4e 
  src/slave/cgroups_isolator.cpp ebc2843 
  src/slave/flags.hpp 95969ae 
  third_party/libprocess/third_party/stout/include/stout/multihashmap.hpp a8feabd 

Diff: https://reviews.apache.org/r/9464/diff/


Testing
-------

make check + additional testing


Thanks,

David Mackey


Re: Review Request: Initial support for CPU limits via CFS Bandwidth Control.

Posted by David Mackey <td...@booleanhaiku.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/9464/
-----------------------------------------------------------

(Updated March 26, 2013, 11:44 p.m.)


Review request for mesos, Benjamin Hindman, Vinod Kone, and Ben Mahler.


Changes
-------

Rebased off trunk and changed flag behavior to be an explicit flag instead of a "feature list." Nested handlers remain.


Description
-------

Initial support for CPU "hard limits" via CFS Bandwidth Control in cgroups.

CFS is unique relative to existing Mesos cgroups support in that it is a "subfeature" of an already supported cgroups subsystem, cpu. Also, there are two "tunables" for configuring CFS bandwidth limiting.

There are 4 approaches one could take:
1) Use the CFS bandwidth limiting if the feature is present
2) Expose as separate flag, eg "cpu,cfs,memory,freezer"
3) Add feature flag support to subsystems via an additional delimiter, eg "cpu+cfs,memory,freezer". 
4) Add an additional control flag via some other means

Option 2's downside breaks the 1:1 mapping between cgroups subsystems and a cgroups resource flag.
Option 3's downside is it greatly increases complexity of parsing cgroups subsystem flags.

This diff takes option 1. 


This addresses bug MESOS-315.
    https://issues.apache.org/jira/browse/MESOS-315


Diffs (updated)
-----

  src/slave/cgroups_isolator.hpp 1732c4e 
  src/slave/cgroups_isolator.cpp ebc2843 
  src/slave/flags.hpp 95969ae 
  third_party/libprocess/third_party/stout/include/stout/multihashmap.hpp a8feabd 

Diff: https://reviews.apache.org/r/9464/diff/


Testing
-------

make check + additional testing


Thanks,

David Mackey


Re: Review Request: Initial support for CPU limits via CFS Bandwidth Control.

Posted by Ben Mahler <be...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/9464/#review17308
-----------------------------------------------------------


Getting close! A few high level comments:

1. I don't think we want to split the handlers, I thought it was cleaner when you were doing all the cpu handling inside cpusChanged(). With this change there are now two handlers and one uses the other. I think to keep things intuitive let's just place the logic inside cpusChanged and guard it with a check of the flag.

2. What else do you anticipate using this flag for? I would consider it to be a temporary measure until we can notify schedulers of the level of cpu isolation available. Hence wanting it to be something simple like --cgroups_enable_cfs. We will then be killing this flag when we make resources first class: https://issues.apache.org/jira/browse/MESOS-338.

- Ben Mahler


On March 2, 2013, 12:49 a.m., David Mackey wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/9464/
> -----------------------------------------------------------
> 
> (Updated March 2, 2013, 12:49 a.m.)
> 
> 
> Review request for mesos, Benjamin Hindman, Vinod Kone, and Ben Mahler.
> 
> 
> Description
> -------
> 
> Initial support for CPU "hard limits" via CFS Bandwidth Control in cgroups.
> 
> CFS is unique relative to existing Mesos cgroups support in that it is a "subfeature" of an already supported cgroups subsystem, cpu. Also, there are two "tunables" for configuring CFS bandwidth limiting.
> 
> There are 4 approaches one could take:
> 1) Use the CFS bandwidth limiting if the feature is present
> 2) Expose as separate flag, eg "cpu,cfs,memory,freezer"
> 3) Add feature flag support to subsystems via an additional delimiter, eg "cpu+cfs,memory,freezer". 
> 4) Add an additional control flag via some other means
> 
> Option 2's downside breaks the 1:1 mapping between cgroups subsystems and a cgroups resource flag.
> Option 3's downside is it greatly increases complexity of parsing cgroups subsystem flags.
> 
> This diff takes option 1. 
> 
> 
> This addresses bug MESOS-315.
>     https://issues.apache.org/jira/browse/MESOS-315
> 
> 
> Diffs
> -----
> 
>   src/slave/cgroups_isolation_module.hpp a04fc46 
>   src/slave/cgroups_isolation_module.cpp a779de8 
>   src/slave/flags.hpp d4aa045 
> 
> Diff: https://reviews.apache.org/r/9464/diff/
> 
> 
> Testing
> -------
> 
> make check + additional testing
> 
> 
> Thanks,
> 
> David Mackey
> 
>


Re: Review Request: Initial support for CPU limits via CFS Bandwidth Control.

Posted by David Mackey <td...@booleanhaiku.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/9464/
-----------------------------------------------------------

(Updated March 2, 2013, 12:49 a.m.)


Review request for mesos, Benjamin Hindman, Vinod Kone, and Ben Mahler.


Changes
-------

Re-add comment.


Description
-------

Initial support for CPU "hard limits" via CFS Bandwidth Control in cgroups.

CFS is unique relative to existing Mesos cgroups support in that it is a "subfeature" of an already supported cgroups subsystem, cpu. Also, there are two "tunables" for configuring CFS bandwidth limiting.

There are 4 approaches one could take:
1) Use the CFS bandwidth limiting if the feature is present
2) Expose as separate flag, eg "cpu,cfs,memory,freezer"
3) Add feature flag support to subsystems via an additional delimiter, eg "cpu+cfs,memory,freezer". 
4) Add an additional control flag via some other means

Option 2's downside breaks the 1:1 mapping between cgroups subsystems and a cgroups resource flag.
Option 3's downside is it greatly increases complexity of parsing cgroups subsystem flags.

This diff takes option 1. 


This addresses bug MESOS-315.
    https://issues.apache.org/jira/browse/MESOS-315


Diffs (updated)
-----

  src/slave/cgroups_isolation_module.hpp a04fc46 
  src/slave/cgroups_isolation_module.cpp a779de8 
  src/slave/flags.hpp d4aa045 

Diff: https://reviews.apache.org/r/9464/diff/


Testing
-------

make check + additional testing


Thanks,

David Mackey


Re: Review Request: Initial support for CPU limits via CFS Bandwidth Control.

Posted by David Mackey <td...@booleanhaiku.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/9464/
-----------------------------------------------------------

(Updated March 2, 2013, 12:13 a.m.)


Review request for mesos, Benjamin Hindman, Vinod Kone, and Ben Mahler.


Changes
-------

Vinod's comments


Description
-------

Initial support for CPU "hard limits" via CFS Bandwidth Control in cgroups.

CFS is unique relative to existing Mesos cgroups support in that it is a "subfeature" of an already supported cgroups subsystem, cpu. Also, there are two "tunables" for configuring CFS bandwidth limiting.

There are 4 approaches one could take:
1) Use the CFS bandwidth limiting if the feature is present
2) Expose as separate flag, eg "cpu,cfs,memory,freezer"
3) Add feature flag support to subsystems via an additional delimiter, eg "cpu+cfs,memory,freezer". 
4) Add an additional control flag via some other means

Option 2's downside breaks the 1:1 mapping between cgroups subsystems and a cgroups resource flag.
Option 3's downside is it greatly increases complexity of parsing cgroups subsystem flags.

This diff takes option 1. 


This addresses bug MESOS-315.
    https://issues.apache.org/jira/browse/MESOS-315


Diffs (updated)
-----

  src/slave/cgroups_isolation_module.hpp a04fc46 
  src/slave/cgroups_isolation_module.cpp a779de8 
  src/slave/flags.hpp d4aa045 

Diff: https://reviews.apache.org/r/9464/diff/


Testing
-------

make check + additional testing


Thanks,

David Mackey


Re: Review Request: Initial support for CPU limits via CFS Bandwidth Control.

Posted by David Mackey <td...@booleanhaiku.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/9464/
-----------------------------------------------------------

(Updated March 1, 2013, 9:24 p.m.)


Review request for mesos, Benjamin Hindman, Vinod Kone, and Ben Mahler.


Changes
-------

Latest revision.


Description
-------

Initial support for CPU "hard limits" via CFS Bandwidth Control in cgroups.

CFS is unique relative to existing Mesos cgroups support in that it is a "subfeature" of an already supported cgroups subsystem, cpu. Also, there are two "tunables" for configuring CFS bandwidth limiting.

There are 4 approaches one could take:
1) Use the CFS bandwidth limiting if the feature is present
2) Expose as separate flag, eg "cpu,cfs,memory,freezer"
3) Add feature flag support to subsystems via an additional delimiter, eg "cpu+cfs,memory,freezer". 
4) Add an additional control flag via some other means

Option 2's downside breaks the 1:1 mapping between cgroups subsystems and a cgroups resource flag.
Option 3's downside is it greatly increases complexity of parsing cgroups subsystem flags.

This diff takes option 1. 


This addresses bug MESOS-315.
    https://issues.apache.org/jira/browse/MESOS-315


Diffs (updated)
-----

  src/slave/cgroups_isolation_module.hpp a04fc46 
  src/slave/cgroups_isolation_module.cpp a779de8 
  src/slave/flags.hpp d4aa045 

Diff: https://reviews.apache.org/r/9464/diff/


Testing
-------

make check + additional testing


Thanks,

David Mackey


Re: Review Request: Initial support for CPU limits via CFS Bandwidth Control.

Posted by David Mackey <td...@booleanhaiku.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/9464/
-----------------------------------------------------------

(Updated March 1, 2013, 9:16 p.m.)


Review request for mesos, Benjamin Hindman, Vinod Kone, and Ben Mahler.


Changes
-------

Converted to flag based enablement per comments.


Description
-------

Initial support for CPU "hard limits" via CFS Bandwidth Control in cgroups.

CFS is unique relative to existing Mesos cgroups support in that it is a "subfeature" of an already supported cgroups subsystem, cpu. Also, there are two "tunables" for configuring CFS bandwidth limiting.

There are 4 approaches one could take:
1) Use the CFS bandwidth limiting if the feature is present
2) Expose as separate flag, eg "cpu,cfs,memory,freezer"
3) Add feature flag support to subsystems via an additional delimiter, eg "cpu+cfs,memory,freezer". 
4) Add an additional control flag via some other means

Option 2's downside breaks the 1:1 mapping between cgroups subsystems and a cgroups resource flag.
Option 3's downside is it greatly increases complexity of parsing cgroups subsystem flags.

This diff takes option 1. 


This addresses bug MESOS-315.
    https://issues.apache.org/jira/browse/MESOS-315


Diffs (updated)
-----

  src/slave/cgroups_isolation_module.hpp a04fc46 
  src/slave/cgroups_isolation_module.cpp a779de8 
  src/slave/flags.hpp d4aa045 

Diff: https://reviews.apache.org/r/9464/diff/


Testing
-------

make check + additional testing


Thanks,

David Mackey


Re: Review Request: Initial support for CPU limits via CFS Bandwidth Control.

Posted by David Mackey <td...@booleanhaiku.com>.

> On Feb. 19, 2013, 7:38 p.m., Ben Mahler wrote:
> >
> 
> Ben Mahler wrote:
>     On second thought, after discussing this with ben and vinod. Let's add a flag
>     to enable CFS bandwidth control. This is going to have to be the short term way
>     we roll this out.
>     
>     Once you add the flag, then we can fail fast in the cgroups isolation module if it
>     doesn't exist. There's also no more need to check for existence later when trying
>     to set the control.
>     
>     Sound good?

Expose as a top level resource flag, eg cpu,cfs,memory,freezer? 


- David


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/9464/#review16744
-----------------------------------------------------------


On Feb. 18, 2013, 11:51 p.m., David Mackey wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/9464/
> -----------------------------------------------------------
> 
> (Updated Feb. 18, 2013, 11:51 p.m.)
> 
> 
> Review request for mesos, Benjamin Hindman, Vinod Kone, and Ben Mahler.
> 
> 
> Description
> -------
> 
> Initial support for CPU "hard limits" via CFS Bandwidth Control in cgroups.
> 
> CFS is unique relative to existing Mesos cgroups support in that it is a "subfeature" of an already supported cgroups subsystem, cpu. Also, there are two "tunables" for configuring CFS bandwidth limiting.
> 
> There are 4 approaches one could take:
> 1) Use the CFS bandwidth limiting if the feature is present
> 2) Expose as separate flag, eg "cpu,cfs,memory,freezer"
> 3) Add feature flag support to subsystems via an additional delimiter, eg "cpu+cfs,memory,freezer". 
> 4) Add an additional control flag via some other means
> 
> Option 2's downside breaks the 1:1 mapping between cgroups subsystems and a cgroups resource flag.
> Option 3's downside is it greatly increases complexity of parsing cgroups subsystem flags.
> 
> This diff takes option 1. 
> 
> 
> This addresses bug MESOS-315.
>     https://issues.apache.org/jira/browse/MESOS-315
> 
> 
> Diffs
> -----
> 
>   src/slave/cgroups_isolation_module.cpp 14f549e 
> 
> Diff: https://reviews.apache.org/r/9464/diff/
> 
> 
> Testing
> -------
> 
> make check + additional testing
> 
> 
> Thanks,
> 
> David Mackey
> 
>


Re: Review Request: Initial support for CPU limits via CFS Bandwidth Control.

Posted by Ben Mahler <be...@gmail.com>.

> On Feb. 19, 2013, 7:38 p.m., Ben Mahler wrote:
> >

On second thought, after discussing this with ben and vinod. Let's add a flag
to enable CFS bandwidth control. This is going to have to be the short term way
we roll this out.

Once you add the flag, then we can fail fast in the cgroups isolation module if it
doesn't exist. There's also no more need to check for existence later when trying
to set the control.

Sound good?


- Ben


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/9464/#review16744
-----------------------------------------------------------


On Feb. 18, 2013, 11:51 p.m., David Mackey wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/9464/
> -----------------------------------------------------------
> 
> (Updated Feb. 18, 2013, 11:51 p.m.)
> 
> 
> Review request for mesos, Benjamin Hindman, Vinod Kone, and Ben Mahler.
> 
> 
> Description
> -------
> 
> Initial support for CPU "hard limits" via CFS Bandwidth Control in cgroups.
> 
> CFS is unique relative to existing Mesos cgroups support in that it is a "subfeature" of an already supported cgroups subsystem, cpu. Also, there are two "tunables" for configuring CFS bandwidth limiting.
> 
> There are 4 approaches one could take:
> 1) Use the CFS bandwidth limiting if the feature is present
> 2) Expose as separate flag, eg "cpu,cfs,memory,freezer"
> 3) Add feature flag support to subsystems via an additional delimiter, eg "cpu+cfs,memory,freezer". 
> 4) Add an additional control flag via some other means
> 
> Option 2's downside breaks the 1:1 mapping between cgroups subsystems and a cgroups resource flag.
> Option 3's downside is it greatly increases complexity of parsing cgroups subsystem flags.
> 
> This diff takes option 1. 
> 
> 
> This addresses bug MESOS-315.
>     https://issues.apache.org/jira/browse/MESOS-315
> 
> 
> Diffs
> -----
> 
>   src/slave/cgroups_isolation_module.cpp 14f549e 
> 
> Diff: https://reviews.apache.org/r/9464/diff/
> 
> 
> Testing
> -------
> 
> make check + additional testing
> 
> 
> Thanks,
> 
> David Mackey
> 
>


Re: Review Request: Initial support for CPU limits via CFS Bandwidth Control.

Posted by Vinod Kone <vi...@gmail.com>.

> On Feb. 19, 2013, 7:38 p.m., Ben Mahler wrote:
> >
> 
> Ben Mahler wrote:
>     On second thought, after discussing this with ben and vinod. Let's add a flag
>     to enable CFS bandwidth control. This is going to have to be the short term way
>     we roll this out.
>     
>     Once you add the flag, then we can fail fast in the cgroups isolation module if it
>     doesn't exist. There's also no more need to check for existence later when trying
>     to set the control.
>     
>     Sound good?
> 
> David Mackey wrote:
>     Expose as a top level resource flag, eg cpu,cfs,memory,freezer?

something like a 'cgroups_enable_cfs' flag should be ok for now. we really can't add 'cfs' to the list of subsystems in 'cgroups_subsystems' flag because its technically not a subsystem :/


- Vinod


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/9464/#review16744
-----------------------------------------------------------


On Feb. 18, 2013, 11:51 p.m., David Mackey wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/9464/
> -----------------------------------------------------------
> 
> (Updated Feb. 18, 2013, 11:51 p.m.)
> 
> 
> Review request for mesos, Benjamin Hindman, Vinod Kone, and Ben Mahler.
> 
> 
> Description
> -------
> 
> Initial support for CPU "hard limits" via CFS Bandwidth Control in cgroups.
> 
> CFS is unique relative to existing Mesos cgroups support in that it is a "subfeature" of an already supported cgroups subsystem, cpu. Also, there are two "tunables" for configuring CFS bandwidth limiting.
> 
> There are 4 approaches one could take:
> 1) Use the CFS bandwidth limiting if the feature is present
> 2) Expose as separate flag, eg "cpu,cfs,memory,freezer"
> 3) Add feature flag support to subsystems via an additional delimiter, eg "cpu+cfs,memory,freezer". 
> 4) Add an additional control flag via some other means
> 
> Option 2's downside breaks the 1:1 mapping between cgroups subsystems and a cgroups resource flag.
> Option 3's downside is it greatly increases complexity of parsing cgroups subsystem flags.
> 
> This diff takes option 1. 
> 
> 
> This addresses bug MESOS-315.
>     https://issues.apache.org/jira/browse/MESOS-315
> 
> 
> Diffs
> -----
> 
>   src/slave/cgroups_isolation_module.cpp 14f549e 
> 
> Diff: https://reviews.apache.org/r/9464/diff/
> 
> 
> Testing
> -------
> 
> make check + additional testing
> 
> 
> Thanks,
> 
> David Mackey
> 
>


Re: Review Request: Initial support for CPU limits via CFS Bandwidth Control.

Posted by Ben Mahler <be...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/9464/#review16744
-----------------------------------------------------------



src/slave/cgroups_isolation_module.cpp
<https://reviews.apache.org/r/9464/#comment35580>

    period.



src/slave/cgroups_isolation_module.cpp
<https://reviews.apache.org/r/9464/#comment35582>

    I'd kill this section, since CFS constants _are_ part of CPU subsystem contants, right?



src/slave/cgroups_isolation_module.cpp
<https://reviews.apache.org/r/9464/#comment35583>

    This seems evident enough to me from the variable name, let's kill this comment.



src/slave/cgroups_isolation_module.cpp
<https://reviews.apache.org/r/9464/#comment35584>

    Can you add this to the end of the next line?
    
    // 100ms (Linux default).



src/slave/cgroups_isolation_module.cpp
<https://reviews.apache.org/r/9464/#comment35561>

    sorry for the misguidance, use CHECK_SOME instead:
    
    CHECK_SOME(exists) << "Failed to determine if cpu.cfs_quota_us exists";
    
    CHECK_SOME will append the error message internally (see check_some.hpp)



src/slave/cgroups_isolation_module.cpp
<https://reviews.apache.org/r/9464/#comment35562>

    I'm guessing the lack of a semi-colon would have prevented this from compiling ;)



src/slave/cgroups_isolation_module.cpp
<https://reviews.apache.org/r/9464/#comment35564>

    We typically use the word "Failed" in conjunction with a failed operation. Here the operation succeded, and cpu.cfs_quota_us is indeed not present on the system.
    
    So rather than causing alarm to the log consumer (seeing a log line with "Failed"), I think it's more appropriate to print the fact that 'cpu.cfs_quota_us' is not present.
    
    Since we're now using cfsbc if present, WARNING is still the right thing (for now).



src/slave/cgroups_isolation_module.cpp
<https://reviews.apache.org/r/9464/#comment35571>

    s/.//
    
    We consistently don't use periods on the end of log messages. This is for several reasons:
    
    1. Log messages with an expression at the end, are more clunky to append the period to:
    LOG(INFO) << "Updated 'cpu.shares' to " << shares << ".";
    2. When printing variables, periods can be slightly ambiguous to the uninitiated:
    "Updated foo to ."
    Was foo updated to "."?
    Was foo updated to "", and the log line ended with a period?
    We use single quotes around values on many occasions to avoid ambiguity, but there are still many cases where we do not.
    3. Some information looks a bit strange with a trailing period: "to 4.0."



src/slave/cgroups_isolation_module.cpp
<https://reviews.apache.org/r/9464/#comment35585>

    The !exists.get() case is when cfs is unavailable, right?
    
    This case is a failed operation. I would return Try::error in this case instead (with a "Failed" message).
    
    If you want you can add the else case below to print a WARNING that cfs isn't being applied (when !exists.get()).



src/slave/cgroups_isolation_module.cpp
<https://reviews.apache.org/r/9464/#comment35575>

    We use 2 space indents on line continuation.
    Unless the line continues on a open paren.



src/slave/cgroups_isolation_module.cpp
<https://reviews.apache.org/r/9464/#comment35573>

    Thanks!


- Ben Mahler


On Feb. 18, 2013, 11:51 p.m., David Mackey wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/9464/
> -----------------------------------------------------------
> 
> (Updated Feb. 18, 2013, 11:51 p.m.)
> 
> 
> Review request for mesos, Benjamin Hindman, Vinod Kone, and Ben Mahler.
> 
> 
> Description
> -------
> 
> Initial support for CPU "hard limits" via CFS Bandwidth Control in cgroups.
> 
> CFS is unique relative to existing Mesos cgroups support in that it is a "subfeature" of an already supported cgroups subsystem, cpu. Also, there are two "tunables" for configuring CFS bandwidth limiting.
> 
> There are 4 approaches one could take:
> 1) Use the CFS bandwidth limiting if the feature is present
> 2) Expose as separate flag, eg "cpu,cfs,memory,freezer"
> 3) Add feature flag support to subsystems via an additional delimiter, eg "cpu+cfs,memory,freezer". 
> 4) Add an additional control flag via some other means
> 
> Option 2's downside breaks the 1:1 mapping between cgroups subsystems and a cgroups resource flag.
> Option 3's downside is it greatly increases complexity of parsing cgroups subsystem flags.
> 
> This diff takes option 1. 
> 
> 
> This addresses bug MESOS-315.
>     https://issues.apache.org/jira/browse/MESOS-315
> 
> 
> Diffs
> -----
> 
>   src/slave/cgroups_isolation_module.cpp 14f549e 
> 
> Diff: https://reviews.apache.org/r/9464/diff/
> 
> 
> Testing
> -------
> 
> make check + additional testing
> 
> 
> Thanks,
> 
> David Mackey
> 
>


Re: Review Request: Initial support for CPU limits via CFS Bandwidth Control.

Posted by Vinod Kone <vi...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/9464/#review16745
-----------------------------------------------------------



src/slave/cgroups_isolation_module.cpp
<https://reviews.apache.org/r/9464/#comment35565>

    period at the end.



src/slave/cgroups_isolation_module.cpp
<https://reviews.apache.org/r/9464/#comment35566>

    period.



src/slave/cgroups_isolation_module.cpp
<https://reviews.apache.org/r/9464/#comment35567>

    period



src/slave/cgroups_isolation_module.cpp
<https://reviews.apache.org/r/9464/#comment35570>

    We actually have a Duration abstraction to store time and easily convert to different units. Can you use that instead?
    
    also period at the end.



src/slave/cgroups_isolation_module.cpp
<https://reviews.apache.org/r/9464/#comment35574>

    Shouldn't this be a CHECK or LOG(FATAL)?



src/slave/cgroups_isolation_module.cpp
<https://reviews.apache.org/r/9464/#comment35577>

    thank you.



src/slave/cgroups_isolation_module.cpp
<https://reviews.apache.org/r/9464/#comment35579>

    If you change the check in initialize to LOG(FATAL), you can just do CHECK_SOME here.


- Vinod Kone


On Feb. 18, 2013, 11:51 p.m., David Mackey wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/9464/
> -----------------------------------------------------------
> 
> (Updated Feb. 18, 2013, 11:51 p.m.)
> 
> 
> Review request for mesos, Benjamin Hindman, Vinod Kone, and Ben Mahler.
> 
> 
> Description
> -------
> 
> Initial support for CPU "hard limits" via CFS Bandwidth Control in cgroups.
> 
> CFS is unique relative to existing Mesos cgroups support in that it is a "subfeature" of an already supported cgroups subsystem, cpu. Also, there are two "tunables" for configuring CFS bandwidth limiting.
> 
> There are 4 approaches one could take:
> 1) Use the CFS bandwidth limiting if the feature is present
> 2) Expose as separate flag, eg "cpu,cfs,memory,freezer"
> 3) Add feature flag support to subsystems via an additional delimiter, eg "cpu+cfs,memory,freezer". 
> 4) Add an additional control flag via some other means
> 
> Option 2's downside breaks the 1:1 mapping between cgroups subsystems and a cgroups resource flag.
> Option 3's downside is it greatly increases complexity of parsing cgroups subsystem flags.
> 
> This diff takes option 1. 
> 
> 
> This addresses bug MESOS-315.
>     https://issues.apache.org/jira/browse/MESOS-315
> 
> 
> Diffs
> -----
> 
>   src/slave/cgroups_isolation_module.cpp 14f549e 
> 
> Diff: https://reviews.apache.org/r/9464/diff/
> 
> 
> Testing
> -------
> 
> make check + additional testing
> 
> 
> Thanks,
> 
> David Mackey
> 
>


Re: Review Request: Initial support for CPU limits via CFS Bandwidth Control.

Posted by David Mackey <td...@booleanhaiku.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/9464/
-----------------------------------------------------------

(Updated Feb. 18, 2013, 11:51 p.m.)


Review request for mesos, Benjamin Hindman, Vinod Kone, and Ben Mahler.


Changes
-------

Updated per BenM's comments.


Description
-------

Initial support for CPU "hard limits" via CFS Bandwidth Control in cgroups.

CFS is unique relative to existing Mesos cgroups support in that it is a "subfeature" of an already supported cgroups subsystem, cpu. Also, there are two "tunables" for configuring CFS bandwidth limiting.

There are 4 approaches one could take:
1) Use the CFS bandwidth limiting if the feature is present
2) Expose as separate flag, eg "cpu,cfs,memory,freezer"
3) Add feature flag support to subsystems via an additional delimiter, eg "cpu+cfs,memory,freezer". 
4) Add an additional control flag via some other means

Option 2's downside breaks the 1:1 mapping between cgroups subsystems and a cgroups resource flag.
Option 3's downside is it greatly increases complexity of parsing cgroups subsystem flags.

This diff takes option 1. 


This addresses bug MESOS-315.
    https://issues.apache.org/jira/browse/MESOS-315


Diffs (updated)
-----

  src/slave/cgroups_isolation_module.cpp 14f549e 

Diff: https://reviews.apache.org/r/9464/diff/


Testing
-------

make check + additional testing


Thanks,

David Mackey


Re: Review Request: Initial support for CPU limits via CFS Bandwidth Control.

Posted by David Mackey <td...@booleanhaiku.com>.

> On Feb. 15, 2013, 11:17 p.m., Ben Mahler wrote:
> > src/slave/cgroups_isolation_module.cpp, line 80
> > <https://reviews.apache.org/r/9464/diff/1/?file=258974#file258974line80>
> >
> >     Is there any way to pull the period from a linux header? Will the bandwidth control contain this value initially, or -1?
> >     
> >     Do you see the default ever changing?

It will likely never change. The process to change a default like this is much too contentious and political for anyone to try. Sadly, this is defined fairly deep in the kernel and is not exposed.

The period is set to 100ms on boot so it technically doesn't need to be "initialized" by this code if it is to remain at that value. 


- David


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/9464/#review16667
-----------------------------------------------------------


On Feb. 18, 2013, 11:51 p.m., David Mackey wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/9464/
> -----------------------------------------------------------
> 
> (Updated Feb. 18, 2013, 11:51 p.m.)
> 
> 
> Review request for mesos, Benjamin Hindman, Vinod Kone, and Ben Mahler.
> 
> 
> Description
> -------
> 
> Initial support for CPU "hard limits" via CFS Bandwidth Control in cgroups.
> 
> CFS is unique relative to existing Mesos cgroups support in that it is a "subfeature" of an already supported cgroups subsystem, cpu. Also, there are two "tunables" for configuring CFS bandwidth limiting.
> 
> There are 4 approaches one could take:
> 1) Use the CFS bandwidth limiting if the feature is present
> 2) Expose as separate flag, eg "cpu,cfs,memory,freezer"
> 3) Add feature flag support to subsystems via an additional delimiter, eg "cpu+cfs,memory,freezer". 
> 4) Add an additional control flag via some other means
> 
> Option 2's downside breaks the 1:1 mapping between cgroups subsystems and a cgroups resource flag.
> Option 3's downside is it greatly increases complexity of parsing cgroups subsystem flags.
> 
> This diff takes option 1. 
> 
> 
> This addresses bug MESOS-315.
>     https://issues.apache.org/jira/browse/MESOS-315
> 
> 
> Diffs
> -----
> 
>   src/slave/cgroups_isolation_module.cpp 14f549e 
> 
> Diff: https://reviews.apache.org/r/9464/diff/
> 
> 
> Testing
> -------
> 
> make check + additional testing
> 
> 
> Thanks,
> 
> David Mackey
> 
>


Re: Review Request: Initial support for CPU limits via CFS Bandwidth Control.

Posted by Ben Mahler <be...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/9464/#review16667
-----------------------------------------------------------


This looks pretty good, some higher level comments:

In terms of the approach to turn on cfs bandwidth control, I think it's important
to think of the longer term goal here. We probably don't want to force all tasks
under a slave to be run with hard cpu isolation. Some tasks, like mapreduce jobs,
will want to use as much cpu as possible and are great candidates for soft cpu
isolation. Other tasks, like throughput sensitive tasks, will want the determinism
of hard cpu isolation.

I'm convinced if we turned on hard cpu isolation for an entire production cluster,
we'd see a lot of problems with under-estimated cpu requirements.

This may mean pushing some optional responsibility to the scheduler. Offering
the ability to request the type of isolation (e.g. SOFT, HARD, PINNED). Schedulers
can then surface that complexity to their end-users however they like.

Unfortunately, we don't have the bits in place for that right now, but we should
get to it when we want to release this.

So for now, I think approach (1) is reasonable. But keep the above in mind, and
update any TODOs as appropriate.


src/slave/cgroups_isolation_module.cpp
<https://reviews.apache.org/r/9464/#comment35372>

    can you put a newline here? that way we have three logical sections of constants
    
    Feel free to add a title comment for these two like you did with cfs.
    
    e.g.
    
    // Memory subsystem constants.
    const size_t MIN_MEMORY_MB = 32 * Megabyte;



src/slave/cgroups_isolation_module.cpp
<https://reviews.apache.org/r/9464/#comment35373>

    were these meant to match the variable name or the subsystem control name?



src/slave/cgroups_isolation_module.cpp
<https://reviews.apache.org/r/9464/#comment35371>

    Is there any way to pull the period from a linux header? Will the bandwidth control contain this value initially, or -1?
    
    Do you see the default ever changing?



src/slave/cgroups_isolation_module.cpp
<https://reviews.apache.org/r/9464/#comment35370>

    I think we'll stick with the default period, so you can kill this TODO. If we end up needing to mess with the default we can consider that later.



src/slave/cgroups_isolation_module.cpp
<https://reviews.apache.org/r/9464/#comment35358>

    Unused, were you planning on making use of this?
    
    In general, I don't see us needing to completely disable quota once we've turned it on.



src/slave/cgroups_isolation_module.cpp
<https://reviews.apache.org/r/9464/#comment35369>

    Can you update the comment and TODO here to reflect my higher level comments?



src/slave/cgroups_isolation_module.cpp
<https://reviews.apache.org/r/9464/#comment35367>

    need to also check exists.isError()



src/slave/cgroups_isolation_module.cpp
<https://reviews.apache.org/r/9464/#comment35368>

    s/ERROR/WARNING
    
    The exists.isError case I would indeed consider an error, however.



src/slave/cgroups_isolation_module.cpp
<https://reviews.apache.org/r/9464/#comment35366>

    I think these three lines can be distilled down to:
    
    // Apply CFS bandwidth limiting if available.



src/slave/cgroups_isolation_module.cpp
<https://reviews.apache.org/r/9464/#comment35360>

    kill extra space before 'exists'



src/slave/cgroups_isolation_module.cpp
<https://reviews.apache.org/r/9464/#comment35362>

    newline and space after the 'if'
    
    you also need to check if exists is an error, otherwise get() will fail



src/slave/cgroups_isolation_module.cpp
<https://reviews.apache.org/r/9464/#comment35363>

    s/cpuCFSQuota/cfsQuota
    
    On a philosophical note: We typically try to get variable names down to 1 word, as long as the context is enough to make it clear. I think the above "cpuShares" could have been sufficient as "shares" which makes this seem more appropriate as just "quota"



src/slave/cgroups_isolation_module.cpp
<https://reviews.apache.org/r/9464/#comment35365>

    Let's make this a NOTE and distill it:
    
    // NOTE: cfs_quota_us is dependent upon cfs_period_us,
    // so we have have to write the period first.



src/slave/cgroups_isolation_module.cpp
<https://reviews.apache.org/r/9464/#comment35364>

    newline
    
    This isn't a hard convention but I think we agreed at one point it is more legible to use newlines between the assignment and the Try/Option/Result checking.



src/slave/cgroups_isolation_module.cpp
<https://reviews.apache.org/r/9464/#comment35361>

    newline



src/slave/cgroups_isolation_module.cpp
<https://reviews.apache.org/r/9464/#comment35359>

    Can you merge the previous cfs LOG(INFO) into this one? That way, we can see the updated controls for the executor in a single log line.


- Ben Mahler


On Feb. 15, 2013, 6:42 a.m., David Mackey wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/9464/
> -----------------------------------------------------------
> 
> (Updated Feb. 15, 2013, 6:42 a.m.)
> 
> 
> Review request for mesos, Benjamin Hindman, Vinod Kone, and Ben Mahler.
> 
> 
> Description
> -------
> 
> Initial support for CPU "hard limits" via CFS Bandwidth Control in cgroups.
> 
> CFS is unique relative to existing Mesos cgroups support in that it is a "subfeature" of an already supported cgroups subsystem, cpu. Also, there are two "tunables" for configuring CFS bandwidth limiting.
> 
> There are 4 approaches one could take:
> 1) Use the CFS bandwidth limiting if the feature is present
> 2) Expose as separate flag, eg "cpu,cfs,memory,freezer"
> 3) Add feature flag support to subsystems via an additional delimiter, eg "cpu+cfs,memory,freezer". 
> 4) Add an additional control flag via some other means
> 
> Option 2's downside breaks the 1:1 mapping between cgroups subsystems and a cgroups resource flag.
> Option 3's downside is it greatly increases complexity of parsing cgroups subsystem flags.
> 
> This diff takes option 1. 
> 
> 
> This addresses bug MESOS-315.
>     https://issues.apache.org/jira/browse/MESOS-315
> 
> 
> Diffs
> -----
> 
>   src/slave/cgroups_isolation_module.cpp 14f549e 
> 
> Diff: https://reviews.apache.org/r/9464/diff/
> 
> 
> Testing
> -------
> 
> make check + additional testing
> 
> 
> Thanks,
> 
> David Mackey
> 
>