You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mesos.apache.org by Sam Whitlock <ph...@gmail.com> on 2012/03/04 02:11:33 UTC

Review Request: (work in progress!) adding functionality to monitoring resource usage for each executor on the slave

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/4167/
-----------------------------------------------------------

Review request for mesos, Benjamin Hindman and Charles Reiss.


Summary
-------

This mega-patch is intended to represent the partial completion of the slave monitoring functionality. It is not intended to be committed. Changes based on comments in this review will be reflected in future reviews that are smaller and more modular.

Proc utils is included in this patch, but is already under review here: https://reviews.apache.org/r/3050/

The relevant design doc can be found here: https://docs.google.com/document/d/14Wj9i6TpMR6cV3LL0ySjLjfZOq5QOQeybvt1gSyaQGs/edit

The following items are ones where specific feedback is requested:

* A better mechanism is needed to control the rate at which the slave asks each executor for its UsageMessage. This is currently hard-coded to be at 1 second intervals, but could potentially be read as a command-line option or from a config file. Is there a better or different way to pass in this value?
* Currently, UsageMessages are passed from a ResourceMonitor to the Slave using the Future construct, and used as containers that hold a snapshot of the latest usage. This is to prevent unnecessary marshalling and extra data structures, since messages will eventually be sent in the standard dispatch style from the slave to the master. Is it fine that we are using Protobuf messages in this way?

There are several changes that are not yet implemented in this patch. These changes are as follows:

* Sufficient tests cases have not yet been written for any component (resource monitor, lxc collector, and process collector).
* Code has not been cleaned up to adhere to all style recommendations.
* Process collector code needs to be updated to prevent CPU usage spikes when monitored sub-processes die.
* Code to send UsageMessages from the slave to the master.


Diffs
-----

  src/Makefile.am 1137a3e 
  src/master/allocator.hpp 1ac435b 
  src/master/http.cpp 591433a 
  src/master/master.hpp 53551b0 
  src/master/master.cpp 1d3961e 
  src/messages/messages.proto 11a2c41 
  src/monitoring/linux/lxc_resource_collector.hpp PRE-CREATION 
  src/monitoring/linux/lxc_resource_collector.cpp PRE-CREATION 
  src/monitoring/linux/proc_resource_collector.hpp PRE-CREATION 
  src/monitoring/linux/proc_resource_collector.cpp PRE-CREATION 
  src/monitoring/linux/proc_utils.hpp PRE-CREATION 
  src/monitoring/linux/proc_utils.cpp PRE-CREATION 
  src/monitoring/process_resource_collector.hpp PRE-CREATION 
  src/monitoring/process_resource_collector.cpp PRE-CREATION 
  src/monitoring/process_stats.hpp PRE-CREATION 
  src/monitoring/resource_collector.hpp PRE-CREATION 
  src/slave/http.cpp f03815d 
  src/slave/isolation_module.hpp c896908 
  src/slave/isolation_module.cpp 5b7b4a2 
  src/slave/lxc_isolation_module.hpp b7beefe 
  src/slave/lxc_isolation_module.cpp d544625 
  src/slave/process_based_isolation_module.hpp f6f9554 
  src/slave/process_based_isolation_module.cpp 100b1e3 
  src/slave/resource_monitor.hpp PRE-CREATION 
  src/slave/resource_monitor.cpp PRE-CREATION 
  src/slave/slave.hpp b1a07e9 
  src/slave/slave.cpp ce8fda5 
  src/tests/Makefile.in 6f51be4 
  src/tests/proc_utils_tests.cpp PRE-CREATION 
  src/tests/process_resource_collector_tests.cpp PRE-CREATION 
  src/tests/resource_monitor_tests.cpp PRE-CREATION 

Diff: https://reviews.apache.org/r/4167/diff


Testing
-------

Test cases:
* A test case exercising the basic monitoring code with a mocked-out collector.
* The first of several tests for the process resource monitor, with the proc-based collecting mocked out.

Some ad-hoc testing with log statements to ensure that the monitoring works end-to-end from both the container-based and process-based isolation modules.


Thanks,

Sam


Re: Review Request: (work in progress!) adding functionality to monitoring resource usage for each executor on the slave

Posted by Alex Degtiar <al...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/4167/#review5603
-----------------------------------------------------------


There are a couple comments we forgot to add about the work that still needs to be done.
1. We have not yet done any benchmarking to evaluate the performance cost of this additional monitoring.
2. Another configuration option to consider is the ability to enable/disable monitoring. Perhaps a configured sample rate of 0 should disable monitoring?

- Alex


On 2012-03-04 01:15:55, Sam Whitlock wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/4167/
> -----------------------------------------------------------
> 
> (Updated 2012-03-04 01:15:55)
> 
> 
> Review request for mesos, Benjamin Hindman and Charles Reiss.
> 
> 
> Summary
> -------
> 
> This mega-patch is intended to represent the partial completion of the slave monitoring functionality. It is not intended to be committed. Changes based on comments in this review will be reflected in future reviews that are smaller and more modular.
> 
> Proc utils is included in this patch, but is already under review here: https://reviews.apache.org/r/3050/
> 
> The relevant design doc can be found here: https://docs.google.com/document/d/14Wj9i6TpMR6cV3LL0ySjLjfZOq5QOQeybvt1gSyaQGs/edit
> 
> The following items are ones where specific feedback is requested:
> 
> * A better mechanism is needed to control the rate at which the slave asks each executor for its UsageMessage. This is currently hard-coded to be at 1 second intervals, but could potentially be read as a command-line option or from a config file. Is there a better or different way to pass in this value?
> * Currently, UsageMessages are passed from a ResourceMonitor to the Slave using the Future construct, and used as containers that hold a snapshot of the latest usage. This is to prevent unnecessary marshalling and extra data structures, since messages will eventually be sent in the standard dispatch style from the slave to the master. Is it fine that we are using Protobuf messages in this way?
> 
> There are several changes that are not yet implemented in this patch. These changes are as follows:
> 
> * Sufficient tests cases have not yet been written for any component (resource monitor, lxc collector, and process collector).
> * Code has not been cleaned up to adhere to all style recommendations.
> * Process collector code needs to be updated to prevent CPU usage spikes when monitored sub-processes die.
> * Code to send UsageMessages from the slave to the master.
> 
> 
> This addresses bug MESOS-38.
>     https://issues.apache.org/jira/browse/MESOS-38
> 
> 
> Diffs
> -----
> 
>   src/Makefile.am 1137a3e 
>   src/master/allocator.hpp 1ac435b 
>   src/master/http.cpp 591433a 
>   src/master/master.hpp 53551b0 
>   src/master/master.cpp 1d3961e 
>   src/messages/messages.proto 11a2c41 
>   src/monitoring/linux/lxc_resource_collector.hpp PRE-CREATION 
>   src/monitoring/linux/lxc_resource_collector.cpp PRE-CREATION 
>   src/monitoring/linux/proc_resource_collector.hpp PRE-CREATION 
>   src/monitoring/linux/proc_resource_collector.cpp PRE-CREATION 
>   src/monitoring/linux/proc_utils.hpp PRE-CREATION 
>   src/monitoring/linux/proc_utils.cpp PRE-CREATION 
>   src/monitoring/process_resource_collector.hpp PRE-CREATION 
>   src/monitoring/process_resource_collector.cpp PRE-CREATION 
>   src/monitoring/process_stats.hpp PRE-CREATION 
>   src/monitoring/resource_collector.hpp PRE-CREATION 
>   src/slave/http.cpp f03815d 
>   src/slave/isolation_module.hpp c896908 
>   src/slave/isolation_module.cpp 5b7b4a2 
>   src/slave/lxc_isolation_module.hpp b7beefe 
>   src/slave/lxc_isolation_module.cpp d544625 
>   src/slave/process_based_isolation_module.hpp f6f9554 
>   src/slave/process_based_isolation_module.cpp 100b1e3 
>   src/slave/resource_monitor.hpp PRE-CREATION 
>   src/slave/resource_monitor.cpp PRE-CREATION 
>   src/slave/slave.hpp b1a07e9 
>   src/slave/slave.cpp ce8fda5 
>   src/tests/Makefile.in 6f51be4 
>   src/tests/proc_utils_tests.cpp PRE-CREATION 
>   src/tests/process_resource_collector_tests.cpp PRE-CREATION 
>   src/tests/resource_monitor_tests.cpp PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/4167/diff
> 
> 
> Testing
> -------
> 
> Test cases:
> * A test case exercising the basic monitoring code with a mocked-out collector.
> * The first of several tests for the process resource monitor, with the proc-based collecting mocked out.
> 
> Some ad-hoc testing with log statements to ensure that the monitoring works end-to-end from both the container-based and process-based isolation modules.
> 
> 
> Thanks,
> 
> Sam
> 
>


Re: Review Request: (work in progress!) adding functionality to monitoring resource usage for each executor on the slave

Posted by Sam Whitlock <ph...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/4167/
-----------------------------------------------------------

(Updated 2012-03-07 21:53:30.216474)


Review request for mesos, Benjamin Hindman and Charles Reiss.


Changes
-------

changing resource names to match the resources they are monitoring


Summary
-------

This mega-patch is intended to represent the partial completion of the slave monitoring functionality. It is not intended to be committed. Changes based on comments in this review will be reflected in future reviews that are smaller and more modular.

Proc utils is included in this patch, but is already under review here: https://reviews.apache.org/r/3050/

The relevant design doc can be found here: https://docs.google.com/document/d/14Wj9i6TpMR6cV3LL0ySjLjfZOq5QOQeybvt1gSyaQGs/edit

The following items are ones where specific feedback is requested:

* A better mechanism is needed to control the rate at which the slave asks each executor for its UsageMessage. This is currently hard-coded to be at 1 second intervals, but could potentially be read as a command-line option or from a config file. Is there a better or different way to pass in this value?
* Currently, UsageMessages are passed from a ResourceMonitor to the Slave using the Future construct, and used as containers that hold a snapshot of the latest usage. This is to prevent unnecessary marshalling and extra data structures, since messages will eventually be sent in the standard dispatch style from the slave to the master. Is it fine that we are using Protobuf messages in this way?

There are several changes that are not yet implemented in this patch. These changes are as follows:

* Sufficient tests cases have not yet been written for any component (resource monitor, lxc collector, and process collector).
* Code has not been cleaned up to adhere to all style recommendations.
* Process collector code needs to be updated to prevent CPU usage spikes when monitored sub-processes die.
* Code to send UsageMessages from the slave to the master.


This addresses bug MESOS-38.
    https://issues.apache.org/jira/browse/MESOS-38


Diffs (updated)
-----

  src/Makefile.am 1137a3e 
  src/master/allocator.hpp 1ac435b 
  src/master/http.cpp 591433a 
  src/master/master.hpp 53551b0 
  src/master/master.cpp 1d3961e 
  src/messages/messages.proto 11a2c41 
  src/monitoring/linux/lxc_resource_collector.hpp PRE-CREATION 
  src/monitoring/linux/lxc_resource_collector.cpp PRE-CREATION 
  src/monitoring/linux/proc_resource_collector.hpp PRE-CREATION 
  src/monitoring/linux/proc_resource_collector.cpp PRE-CREATION 
  src/monitoring/linux/proc_utils.hpp PRE-CREATION 
  src/monitoring/linux/proc_utils.cpp PRE-CREATION 
  src/monitoring/process_resource_collector.hpp PRE-CREATION 
  src/monitoring/process_resource_collector.cpp PRE-CREATION 
  src/monitoring/process_stats.hpp PRE-CREATION 
  src/monitoring/resource_collector.hpp PRE-CREATION 
  src/slave/http.cpp f03815d 
  src/slave/isolation_module.hpp c896908 
  src/slave/isolation_module.cpp 5b7b4a2 
  src/slave/lxc_isolation_module.hpp b7beefe 
  src/slave/lxc_isolation_module.cpp d544625 
  src/slave/main.cpp ac780c4 
  src/slave/process_based_isolation_module.hpp f6f9554 
  src/slave/process_based_isolation_module.cpp 100b1e3 
  src/slave/resource_monitor.hpp PRE-CREATION 
  src/slave/resource_monitor.cpp PRE-CREATION 
  src/slave/slave.hpp b1a07e9 
  src/slave/slave.cpp ce8fda5 
  src/tests/Makefile.in 6f51be4 
  src/tests/proc_utils_tests.cpp PRE-CREATION 
  src/tests/process_resource_collector_tests.cpp PRE-CREATION 
  src/tests/resource_monitor_tests.cpp PRE-CREATION 

Diff: https://reviews.apache.org/r/4167/diff


Testing
-------

Test cases:
* A test case exercising the basic monitoring code with a mocked-out collector.
* The first of several tests for the process resource monitor, with the proc-based collecting mocked out.

Some ad-hoc testing with log statements to ensure that the monitoring works end-to-end from both the container-based and process-based isolation modules.


Thanks,

Sam


Re: Review Request: (work in progress!) adding functionality to monitoring resource usage for each executor on the slave

Posted by Sam Whitlock <ph...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/4167/
-----------------------------------------------------------

(Updated 2012-03-06 04:54:32.861529)


Review request for mesos, Benjamin Hindman and Charles Reiss.


Changes
-------

addressing some of the comments from charles


Summary
-------

This mega-patch is intended to represent the partial completion of the slave monitoring functionality. It is not intended to be committed. Changes based on comments in this review will be reflected in future reviews that are smaller and more modular.

Proc utils is included in this patch, but is already under review here: https://reviews.apache.org/r/3050/

The relevant design doc can be found here: https://docs.google.com/document/d/14Wj9i6TpMR6cV3LL0ySjLjfZOq5QOQeybvt1gSyaQGs/edit

The following items are ones where specific feedback is requested:

* A better mechanism is needed to control the rate at which the slave asks each executor for its UsageMessage. This is currently hard-coded to be at 1 second intervals, but could potentially be read as a command-line option or from a config file. Is there a better or different way to pass in this value?
* Currently, UsageMessages are passed from a ResourceMonitor to the Slave using the Future construct, and used as containers that hold a snapshot of the latest usage. This is to prevent unnecessary marshalling and extra data structures, since messages will eventually be sent in the standard dispatch style from the slave to the master. Is it fine that we are using Protobuf messages in this way?

There are several changes that are not yet implemented in this patch. These changes are as follows:

* Sufficient tests cases have not yet been written for any component (resource monitor, lxc collector, and process collector).
* Code has not been cleaned up to adhere to all style recommendations.
* Process collector code needs to be updated to prevent CPU usage spikes when monitored sub-processes die.
* Code to send UsageMessages from the slave to the master.


This addresses bug MESOS-38.
    https://issues.apache.org/jira/browse/MESOS-38


Diffs (updated)
-----

  src/Makefile.am 1137a3e 
  src/master/allocator.hpp 1ac435b 
  src/master/http.cpp 591433a 
  src/master/master.hpp 53551b0 
  src/master/master.cpp 1d3961e 
  src/messages/messages.proto 11a2c41 
  src/monitoring/linux/lxc_resource_collector.hpp PRE-CREATION 
  src/monitoring/linux/lxc_resource_collector.cpp PRE-CREATION 
  src/monitoring/linux/proc_resource_collector.hpp PRE-CREATION 
  src/monitoring/linux/proc_resource_collector.cpp PRE-CREATION 
  src/monitoring/linux/proc_utils.hpp PRE-CREATION 
  src/monitoring/linux/proc_utils.cpp PRE-CREATION 
  src/monitoring/process_resource_collector.hpp PRE-CREATION 
  src/monitoring/process_resource_collector.cpp PRE-CREATION 
  src/monitoring/process_stats.hpp PRE-CREATION 
  src/monitoring/resource_collector.hpp PRE-CREATION 
  src/slave/http.cpp f03815d 
  src/slave/isolation_module.hpp c896908 
  src/slave/isolation_module.cpp 5b7b4a2 
  src/slave/lxc_isolation_module.hpp b7beefe 
  src/slave/lxc_isolation_module.cpp d544625 
  src/slave/main.cpp ac780c4 
  src/slave/process_based_isolation_module.hpp f6f9554 
  src/slave/process_based_isolation_module.cpp 100b1e3 
  src/slave/resource_monitor.hpp PRE-CREATION 
  src/slave/resource_monitor.cpp PRE-CREATION 
  src/slave/slave.hpp b1a07e9 
  src/slave/slave.cpp ce8fda5 
  src/tests/Makefile.in 6f51be4 
  src/tests/proc_utils_tests.cpp PRE-CREATION 
  src/tests/process_resource_collector_tests.cpp PRE-CREATION 
  src/tests/resource_monitor_tests.cpp PRE-CREATION 

Diff: https://reviews.apache.org/r/4167/diff


Testing
-------

Test cases:
* A test case exercising the basic monitoring code with a mocked-out collector.
* The first of several tests for the process resource monitor, with the proc-based collecting mocked out.

Some ad-hoc testing with log statements to ensure that the monitoring works end-to-end from both the container-based and process-based isolation modules.


Thanks,

Sam


Re: Review Request: (work in progress!) adding functionality to monitoring resource usage for each executor on the slave

Posted by Sam Whitlock <ph...@gmail.com>.

> On 2012-03-05 23:04:17, Charles Reiss wrote:
> > src/slave/resource_monitor.cpp, line 67
> > <https://reviews.apache.org/r/4167/diff/2/?file=88033#file88033line67>
> >
> >     I assume that you've discussed calling this 'mem_usage' and 'cpu_usage' rather than 'mem' and 'cpus' with Ben? Can you explain the reasoning briefly?
> 
> Sam Whitlock wrote:
>     I sorta made this choice unilaterally based on its similarity to the naming for cgroups infos.
>     
>     I don't really care what it is called and would certainly be willing to change it if it would make it more similar to the rest of mesos.
> 
> Sam Whitlock wrote:
>     I guess the choice to go with *_usage is because mem and cpus is used elsewhere in mesos for things that are not reported usage, and I wanted to draw a distinction.
> 
> Charles Reiss wrote:
>     I was assuming that usage measurements would always be kept in separate fields from requests. Is there a usecase where this doesn't look like this will be the case?
>     
>     Unless there's a reason to believe it would be deceptive, I'd prefer to name the usage measurements the same as the requested resource they are supposed to measure.

done


- Sam


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/4167/#review5605
-----------------------------------------------------------


On 2012-03-07 21:53:30, Sam Whitlock wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/4167/
> -----------------------------------------------------------
> 
> (Updated 2012-03-07 21:53:30)
> 
> 
> Review request for mesos, Benjamin Hindman and Charles Reiss.
> 
> 
> Summary
> -------
> 
> This mega-patch is intended to represent the partial completion of the slave monitoring functionality. It is not intended to be committed. Changes based on comments in this review will be reflected in future reviews that are smaller and more modular.
> 
> Proc utils is included in this patch, but is already under review here: https://reviews.apache.org/r/3050/
> 
> The relevant design doc can be found here: https://docs.google.com/document/d/14Wj9i6TpMR6cV3LL0ySjLjfZOq5QOQeybvt1gSyaQGs/edit
> 
> The following items are ones where specific feedback is requested:
> 
> * A better mechanism is needed to control the rate at which the slave asks each executor for its UsageMessage. This is currently hard-coded to be at 1 second intervals, but could potentially be read as a command-line option or from a config file. Is there a better or different way to pass in this value?
> * Currently, UsageMessages are passed from a ResourceMonitor to the Slave using the Future construct, and used as containers that hold a snapshot of the latest usage. This is to prevent unnecessary marshalling and extra data structures, since messages will eventually be sent in the standard dispatch style from the slave to the master. Is it fine that we are using Protobuf messages in this way?
> 
> There are several changes that are not yet implemented in this patch. These changes are as follows:
> 
> * Sufficient tests cases have not yet been written for any component (resource monitor, lxc collector, and process collector).
> * Code has not been cleaned up to adhere to all style recommendations.
> * Process collector code needs to be updated to prevent CPU usage spikes when monitored sub-processes die.
> * Code to send UsageMessages from the slave to the master.
> 
> 
> This addresses bug MESOS-38.
>     https://issues.apache.org/jira/browse/MESOS-38
> 
> 
> Diffs
> -----
> 
>   src/Makefile.am 1137a3e 
>   src/master/allocator.hpp 1ac435b 
>   src/master/http.cpp 591433a 
>   src/master/master.hpp 53551b0 
>   src/master/master.cpp 1d3961e 
>   src/messages/messages.proto 11a2c41 
>   src/monitoring/linux/lxc_resource_collector.hpp PRE-CREATION 
>   src/monitoring/linux/lxc_resource_collector.cpp PRE-CREATION 
>   src/monitoring/linux/proc_resource_collector.hpp PRE-CREATION 
>   src/monitoring/linux/proc_resource_collector.cpp PRE-CREATION 
>   src/monitoring/linux/proc_utils.hpp PRE-CREATION 
>   src/monitoring/linux/proc_utils.cpp PRE-CREATION 
>   src/monitoring/process_resource_collector.hpp PRE-CREATION 
>   src/monitoring/process_resource_collector.cpp PRE-CREATION 
>   src/monitoring/process_stats.hpp PRE-CREATION 
>   src/monitoring/resource_collector.hpp PRE-CREATION 
>   src/slave/http.cpp f03815d 
>   src/slave/isolation_module.hpp c896908 
>   src/slave/isolation_module.cpp 5b7b4a2 
>   src/slave/lxc_isolation_module.hpp b7beefe 
>   src/slave/lxc_isolation_module.cpp d544625 
>   src/slave/main.cpp ac780c4 
>   src/slave/process_based_isolation_module.hpp f6f9554 
>   src/slave/process_based_isolation_module.cpp 100b1e3 
>   src/slave/resource_monitor.hpp PRE-CREATION 
>   src/slave/resource_monitor.cpp PRE-CREATION 
>   src/slave/slave.hpp b1a07e9 
>   src/slave/slave.cpp ce8fda5 
>   src/tests/Makefile.in 6f51be4 
>   src/tests/proc_utils_tests.cpp PRE-CREATION 
>   src/tests/process_resource_collector_tests.cpp PRE-CREATION 
>   src/tests/resource_monitor_tests.cpp PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/4167/diff
> 
> 
> Testing
> -------
> 
> Test cases:
> * A test case exercising the basic monitoring code with a mocked-out collector.
> * The first of several tests for the process resource monitor, with the proc-based collecting mocked out.
> 
> Some ad-hoc testing with log statements to ensure that the monitoring works end-to-end from both the container-based and process-based isolation modules.
> 
> 
> Thanks,
> 
> Sam
> 
>


Re: Review Request: (work in progress!) adding functionality to monitoring resource usage for each executor on the slave

Posted by Charles Reiss <wo...@gmail.com>.

> On 2012-03-05 23:04:17, Charles Reiss wrote:
> > src/slave/resource_monitor.cpp, line 67
> > <https://reviews.apache.org/r/4167/diff/2/?file=88033#file88033line67>
> >
> >     I assume that you've discussed calling this 'mem_usage' and 'cpu_usage' rather than 'mem' and 'cpus' with Ben? Can you explain the reasoning briefly?
> 
> Sam Whitlock wrote:
>     I sorta made this choice unilaterally based on its similarity to the naming for cgroups infos.
>     
>     I don't really care what it is called and would certainly be willing to change it if it would make it more similar to the rest of mesos.
> 
> Sam Whitlock wrote:
>     I guess the choice to go with *_usage is because mem and cpus is used elsewhere in mesos for things that are not reported usage, and I wanted to draw a distinction.

I was assuming that usage measurements would always be kept in separate fields from requests. Is there a usecase where this doesn't look like this will be the case?

Unless there's a reason to believe it would be deceptive, I'd prefer to name the usage measurements the same as the requested resource they are supposed to measure.


- Charles


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/4167/#review5605
-----------------------------------------------------------


On 2012-03-06 04:54:32, Sam Whitlock wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/4167/
> -----------------------------------------------------------
> 
> (Updated 2012-03-06 04:54:32)
> 
> 
> Review request for mesos, Benjamin Hindman and Charles Reiss.
> 
> 
> Summary
> -------
> 
> This mega-patch is intended to represent the partial completion of the slave monitoring functionality. It is not intended to be committed. Changes based on comments in this review will be reflected in future reviews that are smaller and more modular.
> 
> Proc utils is included in this patch, but is already under review here: https://reviews.apache.org/r/3050/
> 
> The relevant design doc can be found here: https://docs.google.com/document/d/14Wj9i6TpMR6cV3LL0ySjLjfZOq5QOQeybvt1gSyaQGs/edit
> 
> The following items are ones where specific feedback is requested:
> 
> * A better mechanism is needed to control the rate at which the slave asks each executor for its UsageMessage. This is currently hard-coded to be at 1 second intervals, but could potentially be read as a command-line option or from a config file. Is there a better or different way to pass in this value?
> * Currently, UsageMessages are passed from a ResourceMonitor to the Slave using the Future construct, and used as containers that hold a snapshot of the latest usage. This is to prevent unnecessary marshalling and extra data structures, since messages will eventually be sent in the standard dispatch style from the slave to the master. Is it fine that we are using Protobuf messages in this way?
> 
> There are several changes that are not yet implemented in this patch. These changes are as follows:
> 
> * Sufficient tests cases have not yet been written for any component (resource monitor, lxc collector, and process collector).
> * Code has not been cleaned up to adhere to all style recommendations.
> * Process collector code needs to be updated to prevent CPU usage spikes when monitored sub-processes die.
> * Code to send UsageMessages from the slave to the master.
> 
> 
> This addresses bug MESOS-38.
>     https://issues.apache.org/jira/browse/MESOS-38
> 
> 
> Diffs
> -----
> 
>   src/Makefile.am 1137a3e 
>   src/master/allocator.hpp 1ac435b 
>   src/master/http.cpp 591433a 
>   src/master/master.hpp 53551b0 
>   src/master/master.cpp 1d3961e 
>   src/messages/messages.proto 11a2c41 
>   src/monitoring/linux/lxc_resource_collector.hpp PRE-CREATION 
>   src/monitoring/linux/lxc_resource_collector.cpp PRE-CREATION 
>   src/monitoring/linux/proc_resource_collector.hpp PRE-CREATION 
>   src/monitoring/linux/proc_resource_collector.cpp PRE-CREATION 
>   src/monitoring/linux/proc_utils.hpp PRE-CREATION 
>   src/monitoring/linux/proc_utils.cpp PRE-CREATION 
>   src/monitoring/process_resource_collector.hpp PRE-CREATION 
>   src/monitoring/process_resource_collector.cpp PRE-CREATION 
>   src/monitoring/process_stats.hpp PRE-CREATION 
>   src/monitoring/resource_collector.hpp PRE-CREATION 
>   src/slave/http.cpp f03815d 
>   src/slave/isolation_module.hpp c896908 
>   src/slave/isolation_module.cpp 5b7b4a2 
>   src/slave/lxc_isolation_module.hpp b7beefe 
>   src/slave/lxc_isolation_module.cpp d544625 
>   src/slave/main.cpp ac780c4 
>   src/slave/process_based_isolation_module.hpp f6f9554 
>   src/slave/process_based_isolation_module.cpp 100b1e3 
>   src/slave/resource_monitor.hpp PRE-CREATION 
>   src/slave/resource_monitor.cpp PRE-CREATION 
>   src/slave/slave.hpp b1a07e9 
>   src/slave/slave.cpp ce8fda5 
>   src/tests/Makefile.in 6f51be4 
>   src/tests/proc_utils_tests.cpp PRE-CREATION 
>   src/tests/process_resource_collector_tests.cpp PRE-CREATION 
>   src/tests/resource_monitor_tests.cpp PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/4167/diff
> 
> 
> Testing
> -------
> 
> Test cases:
> * A test case exercising the basic monitoring code with a mocked-out collector.
> * The first of several tests for the process resource monitor, with the proc-based collecting mocked out.
> 
> Some ad-hoc testing with log statements to ensure that the monitoring works end-to-end from both the container-based and process-based isolation modules.
> 
> 
> Thanks,
> 
> Sam
> 
>


Re: Review Request: (work in progress!) adding functionality to monitoring resource usage for each executor on the slave

Posted by Sam Whitlock <ph...@gmail.com>.

> On 2012-03-05 23:04:17, Charles Reiss wrote:
> > src/slave/resource_monitor.cpp, line 67
> > <https://reviews.apache.org/r/4167/diff/2/?file=88033#file88033line67>
> >
> >     I assume that you've discussed calling this 'mem_usage' and 'cpu_usage' rather than 'mem' and 'cpus' with Ben? Can you explain the reasoning briefly?

I sorta made this choice unilaterally based on its similarity to the naming for cgroups infos.

I don't really care what it is called and would certainly be willing to change it if it would make it more similar to the rest of mesos.


> On 2012-03-05 23:04:17, Charles Reiss wrote:
> > src/slave/slave.cpp, line 1446
> > <https://reviews.apache.org/r/4167/diff/2/?file=88035#file88035line1446>
> >
> >     How is this deleted? Does this actually need to be on the heap?

no. before collect, I was doing things on the heap, but now I am using copying instead. thanks for catching this memory leak!


> On 2012-03-05 23:04:17, Charles Reiss wrote:
> > src/slave/slave.cpp, line 1448
> > <https://reviews.apache.org/r/4167/diff/2/?file=88035#file88035line1448>
> >
> >     Suggest foreachpair

done


> On 2012-03-05 23:04:17, Charles Reiss wrote:
> > src/slave/slave.cpp, line 1469
> > <https://reviews.apache.org/r/4167/diff/2/?file=88035#file88035line1469>
> >
> >     const UsageMessage&

done


> On 2012-03-05 23:04:17, Charles Reiss wrote:
> > src/slave/http.cpp, line 76
> > <https://reviews.apache.org/r/4167/diff/2/?file=88024#file88024line76>
> >
> >     Why would we bother adding this before it does anything useful?

This was the start of displaying the usage through the webui.

I just bundled this in because it is a work-in-progress review (not for committing). For a future review that is more committable, this would not be part of it.

Only stuff in src/monitoring and the non-webui-related changes in src/slave would be part of the eventual committable review.


- Sam


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/4167/#review5605
-----------------------------------------------------------


On 2012-03-06 04:54:32, Sam Whitlock wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/4167/
> -----------------------------------------------------------
> 
> (Updated 2012-03-06 04:54:32)
> 
> 
> Review request for mesos, Benjamin Hindman and Charles Reiss.
> 
> 
> Summary
> -------
> 
> This mega-patch is intended to represent the partial completion of the slave monitoring functionality. It is not intended to be committed. Changes based on comments in this review will be reflected in future reviews that are smaller and more modular.
> 
> Proc utils is included in this patch, but is already under review here: https://reviews.apache.org/r/3050/
> 
> The relevant design doc can be found here: https://docs.google.com/document/d/14Wj9i6TpMR6cV3LL0ySjLjfZOq5QOQeybvt1gSyaQGs/edit
> 
> The following items are ones where specific feedback is requested:
> 
> * A better mechanism is needed to control the rate at which the slave asks each executor for its UsageMessage. This is currently hard-coded to be at 1 second intervals, but could potentially be read as a command-line option or from a config file. Is there a better or different way to pass in this value?
> * Currently, UsageMessages are passed from a ResourceMonitor to the Slave using the Future construct, and used as containers that hold a snapshot of the latest usage. This is to prevent unnecessary marshalling and extra data structures, since messages will eventually be sent in the standard dispatch style from the slave to the master. Is it fine that we are using Protobuf messages in this way?
> 
> There are several changes that are not yet implemented in this patch. These changes are as follows:
> 
> * Sufficient tests cases have not yet been written for any component (resource monitor, lxc collector, and process collector).
> * Code has not been cleaned up to adhere to all style recommendations.
> * Process collector code needs to be updated to prevent CPU usage spikes when monitored sub-processes die.
> * Code to send UsageMessages from the slave to the master.
> 
> 
> This addresses bug MESOS-38.
>     https://issues.apache.org/jira/browse/MESOS-38
> 
> 
> Diffs
> -----
> 
>   src/Makefile.am 1137a3e 
>   src/master/allocator.hpp 1ac435b 
>   src/master/http.cpp 591433a 
>   src/master/master.hpp 53551b0 
>   src/master/master.cpp 1d3961e 
>   src/messages/messages.proto 11a2c41 
>   src/monitoring/linux/lxc_resource_collector.hpp PRE-CREATION 
>   src/monitoring/linux/lxc_resource_collector.cpp PRE-CREATION 
>   src/monitoring/linux/proc_resource_collector.hpp PRE-CREATION 
>   src/monitoring/linux/proc_resource_collector.cpp PRE-CREATION 
>   src/monitoring/linux/proc_utils.hpp PRE-CREATION 
>   src/monitoring/linux/proc_utils.cpp PRE-CREATION 
>   src/monitoring/process_resource_collector.hpp PRE-CREATION 
>   src/monitoring/process_resource_collector.cpp PRE-CREATION 
>   src/monitoring/process_stats.hpp PRE-CREATION 
>   src/monitoring/resource_collector.hpp PRE-CREATION 
>   src/slave/http.cpp f03815d 
>   src/slave/isolation_module.hpp c896908 
>   src/slave/isolation_module.cpp 5b7b4a2 
>   src/slave/lxc_isolation_module.hpp b7beefe 
>   src/slave/lxc_isolation_module.cpp d544625 
>   src/slave/main.cpp ac780c4 
>   src/slave/process_based_isolation_module.hpp f6f9554 
>   src/slave/process_based_isolation_module.cpp 100b1e3 
>   src/slave/resource_monitor.hpp PRE-CREATION 
>   src/slave/resource_monitor.cpp PRE-CREATION 
>   src/slave/slave.hpp b1a07e9 
>   src/slave/slave.cpp ce8fda5 
>   src/tests/Makefile.in 6f51be4 
>   src/tests/proc_utils_tests.cpp PRE-CREATION 
>   src/tests/process_resource_collector_tests.cpp PRE-CREATION 
>   src/tests/resource_monitor_tests.cpp PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/4167/diff
> 
> 
> Testing
> -------
> 
> Test cases:
> * A test case exercising the basic monitoring code with a mocked-out collector.
> * The first of several tests for the process resource monitor, with the proc-based collecting mocked out.
> 
> Some ad-hoc testing with log statements to ensure that the monitoring works end-to-end from both the container-based and process-based isolation modules.
> 
> 
> Thanks,
> 
> Sam
> 
>


Re: Review Request: (work in progress!) adding functionality to monitoring resource usage for each executor on the slave

Posted by Sam Whitlock <ph...@gmail.com>.

> On 2012-03-05 23:04:17, Charles Reiss wrote:
> > src/slave/resource_monitor.cpp, line 67
> > <https://reviews.apache.org/r/4167/diff/2/?file=88033#file88033line67>
> >
> >     I assume that you've discussed calling this 'mem_usage' and 'cpu_usage' rather than 'mem' and 'cpus' with Ben? Can you explain the reasoning briefly?
> 
> Sam Whitlock wrote:
>     I sorta made this choice unilaterally based on its similarity to the naming for cgroups infos.
>     
>     I don't really care what it is called and would certainly be willing to change it if it would make it more similar to the rest of mesos.

I guess the choice to go with *_usage is because mem and cpus is used elsewhere in mesos for things that are not reported usage, and I wanted to draw a distinction.


- Sam


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/4167/#review5605
-----------------------------------------------------------


On 2012-03-06 04:54:32, Sam Whitlock wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/4167/
> -----------------------------------------------------------
> 
> (Updated 2012-03-06 04:54:32)
> 
> 
> Review request for mesos, Benjamin Hindman and Charles Reiss.
> 
> 
> Summary
> -------
> 
> This mega-patch is intended to represent the partial completion of the slave monitoring functionality. It is not intended to be committed. Changes based on comments in this review will be reflected in future reviews that are smaller and more modular.
> 
> Proc utils is included in this patch, but is already under review here: https://reviews.apache.org/r/3050/
> 
> The relevant design doc can be found here: https://docs.google.com/document/d/14Wj9i6TpMR6cV3LL0ySjLjfZOq5QOQeybvt1gSyaQGs/edit
> 
> The following items are ones where specific feedback is requested:
> 
> * A better mechanism is needed to control the rate at which the slave asks each executor for its UsageMessage. This is currently hard-coded to be at 1 second intervals, but could potentially be read as a command-line option or from a config file. Is there a better or different way to pass in this value?
> * Currently, UsageMessages are passed from a ResourceMonitor to the Slave using the Future construct, and used as containers that hold a snapshot of the latest usage. This is to prevent unnecessary marshalling and extra data structures, since messages will eventually be sent in the standard dispatch style from the slave to the master. Is it fine that we are using Protobuf messages in this way?
> 
> There are several changes that are not yet implemented in this patch. These changes are as follows:
> 
> * Sufficient tests cases have not yet been written for any component (resource monitor, lxc collector, and process collector).
> * Code has not been cleaned up to adhere to all style recommendations.
> * Process collector code needs to be updated to prevent CPU usage spikes when monitored sub-processes die.
> * Code to send UsageMessages from the slave to the master.
> 
> 
> This addresses bug MESOS-38.
>     https://issues.apache.org/jira/browse/MESOS-38
> 
> 
> Diffs
> -----
> 
>   src/Makefile.am 1137a3e 
>   src/master/allocator.hpp 1ac435b 
>   src/master/http.cpp 591433a 
>   src/master/master.hpp 53551b0 
>   src/master/master.cpp 1d3961e 
>   src/messages/messages.proto 11a2c41 
>   src/monitoring/linux/lxc_resource_collector.hpp PRE-CREATION 
>   src/monitoring/linux/lxc_resource_collector.cpp PRE-CREATION 
>   src/monitoring/linux/proc_resource_collector.hpp PRE-CREATION 
>   src/monitoring/linux/proc_resource_collector.cpp PRE-CREATION 
>   src/monitoring/linux/proc_utils.hpp PRE-CREATION 
>   src/monitoring/linux/proc_utils.cpp PRE-CREATION 
>   src/monitoring/process_resource_collector.hpp PRE-CREATION 
>   src/monitoring/process_resource_collector.cpp PRE-CREATION 
>   src/monitoring/process_stats.hpp PRE-CREATION 
>   src/monitoring/resource_collector.hpp PRE-CREATION 
>   src/slave/http.cpp f03815d 
>   src/slave/isolation_module.hpp c896908 
>   src/slave/isolation_module.cpp 5b7b4a2 
>   src/slave/lxc_isolation_module.hpp b7beefe 
>   src/slave/lxc_isolation_module.cpp d544625 
>   src/slave/main.cpp ac780c4 
>   src/slave/process_based_isolation_module.hpp f6f9554 
>   src/slave/process_based_isolation_module.cpp 100b1e3 
>   src/slave/resource_monitor.hpp PRE-CREATION 
>   src/slave/resource_monitor.cpp PRE-CREATION 
>   src/slave/slave.hpp b1a07e9 
>   src/slave/slave.cpp ce8fda5 
>   src/tests/Makefile.in 6f51be4 
>   src/tests/proc_utils_tests.cpp PRE-CREATION 
>   src/tests/process_resource_collector_tests.cpp PRE-CREATION 
>   src/tests/resource_monitor_tests.cpp PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/4167/diff
> 
> 
> Testing
> -------
> 
> Test cases:
> * A test case exercising the basic monitoring code with a mocked-out collector.
> * The first of several tests for the process resource monitor, with the proc-based collecting mocked out.
> 
> Some ad-hoc testing with log statements to ensure that the monitoring works end-to-end from both the container-based and process-based isolation modules.
> 
> 
> Thanks,
> 
> Sam
> 
>


Re: Review Request: (work in progress!) adding functionality to monitoring resource usage for each executor on the slave

Posted by Charles Reiss <wo...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/4167/#review5605
-----------------------------------------------------------



src/slave/http.cpp
<https://reviews.apache.org/r/4167/#comment12223>

    Why would we bother adding this before it does anything useful?



src/slave/process_based_isolation_module.cpp
<https://reviews.apache.org/r/4167/#comment12241>

    Make sure this entry exists and fail gracefully if it doesn't. (It's probably possible for an executor to die between the time usage is requested and when the ProcessInfo is deleted.)



src/slave/resource_monitor.cpp
<https://reviews.apache.org/r/4167/#comment12242>

    I assume that you've discussed calling this 'mem_usage' and 'cpu_usage' rather than 'mem' and 'cpus' with Ben? Can you explain the reasoning briefly?



src/slave/slave.cpp
<https://reviews.apache.org/r/4167/#comment12243>

    How is this deleted? Does this actually need to be on the heap?



src/slave/slave.cpp
<https://reviews.apache.org/r/4167/#comment12244>

    Suggest foreachpair



src/slave/slave.cpp
<https://reviews.apache.org/r/4167/#comment12245>

    const UsageMessage&


- Charles


On 2012-03-04 20:40:08, Sam Whitlock wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/4167/
> -----------------------------------------------------------
> 
> (Updated 2012-03-04 20:40:08)
> 
> 
> Review request for mesos, Benjamin Hindman and Charles Reiss.
> 
> 
> Summary
> -------
> 
> This mega-patch is intended to represent the partial completion of the slave monitoring functionality. It is not intended to be committed. Changes based on comments in this review will be reflected in future reviews that are smaller and more modular.
> 
> Proc utils is included in this patch, but is already under review here: https://reviews.apache.org/r/3050/
> 
> The relevant design doc can be found here: https://docs.google.com/document/d/14Wj9i6TpMR6cV3LL0ySjLjfZOq5QOQeybvt1gSyaQGs/edit
> 
> The following items are ones where specific feedback is requested:
> 
> * A better mechanism is needed to control the rate at which the slave asks each executor for its UsageMessage. This is currently hard-coded to be at 1 second intervals, but could potentially be read as a command-line option or from a config file. Is there a better or different way to pass in this value?
> * Currently, UsageMessages are passed from a ResourceMonitor to the Slave using the Future construct, and used as containers that hold a snapshot of the latest usage. This is to prevent unnecessary marshalling and extra data structures, since messages will eventually be sent in the standard dispatch style from the slave to the master. Is it fine that we are using Protobuf messages in this way?
> 
> There are several changes that are not yet implemented in this patch. These changes are as follows:
> 
> * Sufficient tests cases have not yet been written for any component (resource monitor, lxc collector, and process collector).
> * Code has not been cleaned up to adhere to all style recommendations.
> * Process collector code needs to be updated to prevent CPU usage spikes when monitored sub-processes die.
> * Code to send UsageMessages from the slave to the master.
> 
> 
> This addresses bug MESOS-38.
>     https://issues.apache.org/jira/browse/MESOS-38
> 
> 
> Diffs
> -----
> 
>   src/monitoring/process_stats.hpp PRE-CREATION 
>   src/monitoring/resource_collector.hpp PRE-CREATION 
>   src/slave/http.cpp f03815d 
>   src/slave/isolation_module.hpp c896908 
>   src/slave/isolation_module.cpp 5b7b4a2 
>   src/slave/lxc_isolation_module.hpp b7beefe 
>   src/slave/lxc_isolation_module.cpp d544625 
>   src/slave/main.cpp ac780c4 
>   src/slave/process_based_isolation_module.hpp f6f9554 
>   src/slave/process_based_isolation_module.cpp 100b1e3 
>   src/slave/resource_monitor.hpp PRE-CREATION 
>   src/slave/resource_monitor.cpp PRE-CREATION 
>   src/slave/slave.hpp b1a07e9 
>   src/slave/slave.cpp ce8fda5 
>   src/tests/Makefile.in 6f51be4 
>   src/tests/proc_utils_tests.cpp PRE-CREATION 
>   src/tests/process_resource_collector_tests.cpp PRE-CREATION 
>   src/tests/resource_monitor_tests.cpp PRE-CREATION 
>   src/monitoring/process_resource_collector.cpp PRE-CREATION 
>   src/monitoring/process_resource_collector.hpp PRE-CREATION 
>   src/monitoring/linux/proc_utils.cpp PRE-CREATION 
>   src/monitoring/linux/proc_utils.hpp PRE-CREATION 
>   src/monitoring/linux/proc_resource_collector.cpp PRE-CREATION 
>   src/monitoring/linux/proc_resource_collector.hpp PRE-CREATION 
>   src/monitoring/linux/lxc_resource_collector.cpp PRE-CREATION 
>   src/master/master.hpp 53551b0 
>   src/master/master.cpp 1d3961e 
>   src/messages/messages.proto 11a2c41 
>   src/monitoring/linux/lxc_resource_collector.hpp PRE-CREATION 
>   src/master/allocator.hpp 1ac435b 
>   src/master/http.cpp 591433a 
>   src/Makefile.am 1137a3e 
> 
> Diff: https://reviews.apache.org/r/4167/diff
> 
> 
> Testing
> -------
> 
> Test cases:
> * A test case exercising the basic monitoring code with a mocked-out collector.
> * The first of several tests for the process resource monitor, with the proc-based collecting mocked out.
> 
> Some ad-hoc testing with log statements to ensure that the monitoring works end-to-end from both the container-based and process-based isolation modules.
> 
> 
> Thanks,
> 
> Sam
> 
>


Re: Review Request: (work in progress!) adding functionality to monitoring resource usage for each executor on the slave

Posted by Sam Whitlock <ph...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/4167/
-----------------------------------------------------------

(Updated 2012-03-04 20:40:08.696183)


Review request for mesos, Benjamin Hindman and Charles Reiss.


Changes
-------

Diff from parent:
- reduced logging level for failing to read info from container to INFO (from ERROR), because it is not necessarily an error
- added command-line arg -f for frequency of usage info scraping (1/f is the time passed to the delay call)


Summary
-------

This mega-patch is intended to represent the partial completion of the slave monitoring functionality. It is not intended to be committed. Changes based on comments in this review will be reflected in future reviews that are smaller and more modular.

Proc utils is included in this patch, but is already under review here: https://reviews.apache.org/r/3050/

The relevant design doc can be found here: https://docs.google.com/document/d/14Wj9i6TpMR6cV3LL0ySjLjfZOq5QOQeybvt1gSyaQGs/edit

The following items are ones where specific feedback is requested:

* A better mechanism is needed to control the rate at which the slave asks each executor for its UsageMessage. This is currently hard-coded to be at 1 second intervals, but could potentially be read as a command-line option or from a config file. Is there a better or different way to pass in this value?
* Currently, UsageMessages are passed from a ResourceMonitor to the Slave using the Future construct, and used as containers that hold a snapshot of the latest usage. This is to prevent unnecessary marshalling and extra data structures, since messages will eventually be sent in the standard dispatch style from the slave to the master. Is it fine that we are using Protobuf messages in this way?

There are several changes that are not yet implemented in this patch. These changes are as follows:

* Sufficient tests cases have not yet been written for any component (resource monitor, lxc collector, and process collector).
* Code has not been cleaned up to adhere to all style recommendations.
* Process collector code needs to be updated to prevent CPU usage spikes when monitored sub-processes die.
* Code to send UsageMessages from the slave to the master.


This addresses bug MESOS-38.
    https://issues.apache.org/jira/browse/MESOS-38


Diffs (updated)
-----

  src/monitoring/process_stats.hpp PRE-CREATION 
  src/monitoring/resource_collector.hpp PRE-CREATION 
  src/slave/http.cpp f03815d 
  src/slave/isolation_module.hpp c896908 
  src/slave/isolation_module.cpp 5b7b4a2 
  src/slave/lxc_isolation_module.hpp b7beefe 
  src/slave/lxc_isolation_module.cpp d544625 
  src/slave/main.cpp ac780c4 
  src/slave/process_based_isolation_module.hpp f6f9554 
  src/slave/process_based_isolation_module.cpp 100b1e3 
  src/slave/resource_monitor.hpp PRE-CREATION 
  src/slave/resource_monitor.cpp PRE-CREATION 
  src/slave/slave.hpp b1a07e9 
  src/slave/slave.cpp ce8fda5 
  src/tests/Makefile.in 6f51be4 
  src/tests/proc_utils_tests.cpp PRE-CREATION 
  src/tests/process_resource_collector_tests.cpp PRE-CREATION 
  src/tests/resource_monitor_tests.cpp PRE-CREATION 
  src/monitoring/process_resource_collector.cpp PRE-CREATION 
  src/monitoring/process_resource_collector.hpp PRE-CREATION 
  src/monitoring/linux/proc_utils.cpp PRE-CREATION 
  src/monitoring/linux/proc_utils.hpp PRE-CREATION 
  src/monitoring/linux/proc_resource_collector.cpp PRE-CREATION 
  src/monitoring/linux/proc_resource_collector.hpp PRE-CREATION 
  src/monitoring/linux/lxc_resource_collector.cpp PRE-CREATION 
  src/master/master.hpp 53551b0 
  src/master/master.cpp 1d3961e 
  src/messages/messages.proto 11a2c41 
  src/monitoring/linux/lxc_resource_collector.hpp PRE-CREATION 
  src/master/allocator.hpp 1ac435b 
  src/master/http.cpp 591433a 
  src/Makefile.am 1137a3e 

Diff: https://reviews.apache.org/r/4167/diff


Testing
-------

Test cases:
* A test case exercising the basic monitoring code with a mocked-out collector.
* The first of several tests for the process resource monitor, with the proc-based collecting mocked out.

Some ad-hoc testing with log statements to ensure that the monitoring works end-to-end from both the container-based and process-based isolation modules.


Thanks,

Sam


Re: Review Request: (work in progress!) adding functionality to monitoring resource usage for each executor on the slave

Posted by Sam Whitlock <ph...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/4167/
-----------------------------------------------------------

(Updated 2012-03-04 01:15:55.499253)


Review request for mesos, Benjamin Hindman and Charles Reiss.


Changes
-------

forgot to link relevant jira


Summary
-------

This mega-patch is intended to represent the partial completion of the slave monitoring functionality. It is not intended to be committed. Changes based on comments in this review will be reflected in future reviews that are smaller and more modular.

Proc utils is included in this patch, but is already under review here: https://reviews.apache.org/r/3050/

The relevant design doc can be found here: https://docs.google.com/document/d/14Wj9i6TpMR6cV3LL0ySjLjfZOq5QOQeybvt1gSyaQGs/edit

The following items are ones where specific feedback is requested:

* A better mechanism is needed to control the rate at which the slave asks each executor for its UsageMessage. This is currently hard-coded to be at 1 second intervals, but could potentially be read as a command-line option or from a config file. Is there a better or different way to pass in this value?
* Currently, UsageMessages are passed from a ResourceMonitor to the Slave using the Future construct, and used as containers that hold a snapshot of the latest usage. This is to prevent unnecessary marshalling and extra data structures, since messages will eventually be sent in the standard dispatch style from the slave to the master. Is it fine that we are using Protobuf messages in this way?

There are several changes that are not yet implemented in this patch. These changes are as follows:

* Sufficient tests cases have not yet been written for any component (resource monitor, lxc collector, and process collector).
* Code has not been cleaned up to adhere to all style recommendations.
* Process collector code needs to be updated to prevent CPU usage spikes when monitored sub-processes die.
* Code to send UsageMessages from the slave to the master.


This addresses bug MESOS-38.
    https://issues.apache.org/jira/browse/MESOS-38


Diffs
-----

  src/Makefile.am 1137a3e 
  src/master/allocator.hpp 1ac435b 
  src/master/http.cpp 591433a 
  src/master/master.hpp 53551b0 
  src/master/master.cpp 1d3961e 
  src/messages/messages.proto 11a2c41 
  src/monitoring/linux/lxc_resource_collector.hpp PRE-CREATION 
  src/monitoring/linux/lxc_resource_collector.cpp PRE-CREATION 
  src/monitoring/linux/proc_resource_collector.hpp PRE-CREATION 
  src/monitoring/linux/proc_resource_collector.cpp PRE-CREATION 
  src/monitoring/linux/proc_utils.hpp PRE-CREATION 
  src/monitoring/linux/proc_utils.cpp PRE-CREATION 
  src/monitoring/process_resource_collector.hpp PRE-CREATION 
  src/monitoring/process_resource_collector.cpp PRE-CREATION 
  src/monitoring/process_stats.hpp PRE-CREATION 
  src/monitoring/resource_collector.hpp PRE-CREATION 
  src/slave/http.cpp f03815d 
  src/slave/isolation_module.hpp c896908 
  src/slave/isolation_module.cpp 5b7b4a2 
  src/slave/lxc_isolation_module.hpp b7beefe 
  src/slave/lxc_isolation_module.cpp d544625 
  src/slave/process_based_isolation_module.hpp f6f9554 
  src/slave/process_based_isolation_module.cpp 100b1e3 
  src/slave/resource_monitor.hpp PRE-CREATION 
  src/slave/resource_monitor.cpp PRE-CREATION 
  src/slave/slave.hpp b1a07e9 
  src/slave/slave.cpp ce8fda5 
  src/tests/Makefile.in 6f51be4 
  src/tests/proc_utils_tests.cpp PRE-CREATION 
  src/tests/process_resource_collector_tests.cpp PRE-CREATION 
  src/tests/resource_monitor_tests.cpp PRE-CREATION 

Diff: https://reviews.apache.org/r/4167/diff


Testing
-------

Test cases:
* A test case exercising the basic monitoring code with a mocked-out collector.
* The first of several tests for the process resource monitor, with the proc-based collecting mocked out.

Some ad-hoc testing with log statements to ensure that the monitoring works end-to-end from both the container-based and process-based isolation modules.


Thanks,

Sam