You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@mesos.apache.org by Eric Biederman <eb...@xmission.com> on 2013/10/29 07:53:06 UTC

Review Request 15015: cgroup_isolator: Report additional memory statistics

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/15015/
-----------------------------------------------------------

Review request for mesos, Ben Mahler, Ian Downes, and Vinod Kone.


Bugs: Mesos-758
    https://issues.apache.org/jira/browse/Mesos-758


Repository: mesos-git


Description
-------

This reports a basic break down of memory usage so jobs whose page
cache expands to their configured hard limit will have information
that allows them to size their jobs.

This does not include all fields from memory.stat and not even all
total_ fields from memory.stat but rather the fields I have reviewed
and seen a need for, and have a reasonable expectation that the fields
will report useful information and work as designed.

Quite a few of the fields appear only useful for kernel level debugging.

In particular the fields:
memory.stat:total_cache         which reports the amount of file backed memory in bytes
memory.stat:total_rss           which reports the amount of anonymous memory in bytes
memory.stat:total_mapped_file   which reports the amount of mapped file memory in bytes

Reading memory.stat is not quite atomic and neither is computing the
values.  As even in the simplest cases the values are sums of per cpu
counters of the various statistics.  However for the most part the numbers are a reasonable
snapshot of the state of the cgroup at a particular instant.


Diffs
-----

  include/mesos/mesos.proto fe1d82b 
  src/slave/cgroups_isolator.cpp fc36f1f 
  src/slave/monitor.cpp 9cb6256 
  src/tests/monitor_tests.cpp 3d3f8af 

Diff: https://reviews.apache.org/r/15015/diff/


Testing
-------

make  and make check pass
The code is a trivial extension of what already exists.


Thanks,

Eric Biederman

Re: Review Request 15015: cgroup_isolator: Report additional memory statistics

Posted by Ben Mahler <be...@gmail.com>.

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/15015/#review28755
-----------------------------------------------------------

Ship it!



include/mesos/mesos.proto
<https://reviews.apache.org/r/15015/#comment55756>

    Since we're exposing this, mem_anonymous_bytes seems more obvious, we try to avoid abbreviations (except for acronyms).


- Ben Mahler


On Oct. 29, 2013, 6:53 a.m., Eric Biederman wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/15015/
> -----------------------------------------------------------
> 
> (Updated Oct. 29, 2013, 6:53 a.m.)
> 
> 
> Review request for mesos, Ben Mahler, Ian Downes, and Vinod Kone.
> 
> 
> Bugs: Mesos-758
>     https://issues.apache.org/jira/browse/Mesos-758
> 
> 
> Repository: mesos-git
> 
> 
> Description
> -------
> 
> This reports a basic break down of memory usage so jobs whose page
> cache expands to their configured hard limit will have information
> that allows them to size their jobs.
> 
> This does not include all fields from memory.stat and not even all
> total_ fields from memory.stat but rather the fields I have reviewed
> and seen a need for, and have a reasonable expectation that the fields
> will report useful information and work as designed.
> 
> Quite a few of the fields appear only useful for kernel level debugging.
> 
> In particular the fields:
> memory.stat:total_cache         which reports the amount of file backed memory in bytes
> memory.stat:total_rss           which reports the amount of anonymous memory in bytes
> memory.stat:total_mapped_file   which reports the amount of mapped file memory in bytes
> 
> Reading memory.stat is not quite atomic and neither is computing the
> values.  As even in the simplest cases the values are sums of per cpu
> counters of the various statistics.  However for the most part the numbers are a reasonable
> snapshot of the state of the cgroup at a particular instant.
> 
> 
> Diffs
> -----
> 
>   include/mesos/mesos.proto fe1d82b 
>   src/slave/cgroups_isolator.cpp fc36f1f 
>   src/slave/monitor.cpp 9cb6256 
>   src/tests/monitor_tests.cpp 3d3f8af 
> 
> Diff: https://reviews.apache.org/r/15015/diff/
> 
> 
> Testing
> -------
> 
> make  and make check pass
> The code is a trivial extension of what already exists.
> 
> 
> Thanks,
> 
> Eric Biederman
> 
>

Re: Review Request 15015: cgroup_isolator: Report additional memory statistics

Posted by Vinod Kone <vi...@gmail.com>.

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/15015/#review28769
-----------------------------------------------------------

Ship it!


please mark this as submitted. lets fix the naming s/anon/anonymous/ in a separate review if needed.

- Vinod Kone


On Oct. 29, 2013, 6:53 a.m., Eric Biederman wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/15015/
> -----------------------------------------------------------
> 
> (Updated Oct. 29, 2013, 6:53 a.m.)
> 
> 
> Review request for mesos, Ben Mahler, Ian Downes, and Vinod Kone.
> 
> 
> Bugs: Mesos-758
>     https://issues.apache.org/jira/browse/Mesos-758
> 
> 
> Repository: mesos-git
> 
> 
> Description
> -------
> 
> This reports a basic break down of memory usage so jobs whose page
> cache expands to their configured hard limit will have information
> that allows them to size their jobs.
> 
> This does not include all fields from memory.stat and not even all
> total_ fields from memory.stat but rather the fields I have reviewed
> and seen a need for, and have a reasonable expectation that the fields
> will report useful information and work as designed.
> 
> Quite a few of the fields appear only useful for kernel level debugging.
> 
> In particular the fields:
> memory.stat:total_cache         which reports the amount of file backed memory in bytes
> memory.stat:total_rss           which reports the amount of anonymous memory in bytes
> memory.stat:total_mapped_file   which reports the amount of mapped file memory in bytes
> 
> Reading memory.stat is not quite atomic and neither is computing the
> values.  As even in the simplest cases the values are sums of per cpu
> counters of the various statistics.  However for the most part the numbers are a reasonable
> snapshot of the state of the cgroup at a particular instant.
> 
> 
> Diffs
> -----
> 
>   include/mesos/mesos.proto fe1d82b 
>   src/slave/cgroups_isolator.cpp fc36f1f 
>   src/slave/monitor.cpp 9cb6256 
>   src/tests/monitor_tests.cpp 3d3f8af 
> 
> Diff: https://reviews.apache.org/r/15015/diff/
> 
> 
> Testing
> -------
> 
> make  and make check pass
> The code is a trivial extension of what already exists.
> 
> 
> Thanks,
> 
> Eric Biederman
> 
>

Re: Review Request 15015: cgroup_isolator: Report additional memory statistics

Posted by Ben Mahler <be...@gmail.com>.


> On Oct. 31, 2013, 10:18 p.m., Ian Downes wrote:
> > src/slave/cgroups_isolator.cpp, line 782
> > <https://reviews.apache.org/r/15015/diff/1/?file=372971#file372971line782>
> >
> >     Why don't we correct the naming while we're at it:
> >     
> >     total_memory      <- memory.usage_in_bytes, this is the primary memory statistic
> >     
> >     # and then the breakdown: 
> >     total_rss         <- total_rss from memory.stat
> >     total_cached      <- total_cache from memory.stat
> >     total_mapped_file <- total_mapped_file from memory.stat
> >     
> >     We probably don't need to include the total_ prefix?
> >     
> >     An executor's memory limit then matches with total_memory and, only if necessary, one can look at the breakdown.
> 
> Eric Biederman wrote:
>     I don't follow.
>     
>     We absolutely need the total_ prefix when reading the fields from the kernel.  And it is the kernel's memory.stat file where the naming is deceptive I can't fix that with a mesos patch.
>     
>     I did do what I can do with a mesos patch which is name the fields in ResourceStatics and in the http interface in a non-deceptive way.
>     
>     We absolutely do not even want to consider passing the kernel names straight through as those names are wrong.  Especially the borked total_rss field name as that is totally wrong.
>     
>     The current mapping implemented by this patch is:
>     
>     mem_rss_bytes  <- memory.usage_in_bytes
>     mem_anon_bytes <- memory.stat:total_rss
>     mem_file_bytes <- memory.stat:total_cache
>     mem_mapped_file_bytes <- memory.stat:total_mapped_file
>     
>     I would include mem_mapped_anon_bytes but by definition that is exactly the same as mem_anon_bytes.
>     
>     I can see renaming mem_rss_bytes something like mem_usage_bytes, however that is going to break existing users of mesos, and breaking existing users is a bad idea.  And resident set size and usage synonyms so I don't see any pressing need for a rename there.
>     
>     
>
> 
> Ian Downes wrote:
>     Sorry, I was meaning we shouldn't expose them as total_* but you're not doing that so please disregard.
>     
>     With the patch's renaming we'd have
>     cgroups: mem_rss_bytes is not rss but the total
>     process: mem_rss_bytes is rss and not the total
>     
>     So running the same executor under different isolators would report (potentially) very different numbers?
> 
> Eric Biederman wrote:
>     The defining question is: How much memory must be available for a job to run successfully and at full performance?
>     
>     When this question is considered in theory the question is stated as what is the minimum resident set size for a job to run successfully at full performance?
>     
>     The resident set size being how much of the job is in memory at one time.  With the rest of the job being on disk.
>     
>     
>     With my three patches applied the cgroups mem_rss_bytes is exactly the jobs current rss.  The other numbers can help guess at the minimum rss, but what we measure is in fact how much of the job is in memory at the current instant.  That is the rss.
>     
>     
>     The rss of a process and the rss of a job are subtlety different.  When measuring a single process the rss is just a count of the number of resident pages in the address space of the process.  For a job the rss, pages that are mapped into two different processes are counted only once, and pages that are used in the for other purposes like I/O are also counted.
>     
>     
>     The process isolator does not accurately compute the rss of a job.   The process isolator reports as mem_rss_bytes the simple sum of the rss of every process.  Which means the process isolator over counts pages that are mapped into multiple processes and does not count cached files.  Depending on how different jobs use memory the process isolator can be widely inaccurate.
>     
>     
>     For a job consisting of a single process that does minimal file I/O the process isolator will report a number that is close to how much memory is being used to run the process.
>     
>     If you are comparing a job of one process that does minimal file I/O the two numbers reported by the cgroup isolator and the process isolator should be comparable.
>     
>     Approximations to the minimal rss computed using the cgroup isolator, should work for sizing memory anywhere.
>     
>     Approximations to the minimal rss computed from the process isolator, will only be approach being accurate if the jobs do minimal file I/O and have very file pages (think text) shared between processes.
>     
>     
>     For sizing the practical challenge with the cgroup isolator is that if a job is over provisioned it may still use all of the memory up to it's memory limit to cache files, because caching old file data is typically better than leaving the memory sitting empty.
>     
>     The process isolator can be so wildly in accurate in the worst case that it is hard to think about.  In particular the worst case for process isolator looks something like:
>     
>     // Allocate and touch 1 GB of memory so the pages are mapped into our address space.
>     ptr = calloc(1, 1024*1024*1024);
>     for (;;) {
>        fork();
>     }
>     
>     In which case there will be a be a large number of processes all with a 1GB rss, but only using a few additional kilobyte per process.
>     
>     Does that help put these numbers into perspective?
>

Hey Eric, do you have advice on how the process isolator could be fixed to accurately report memory?


- Ben


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/15015/#review27967
-----------------------------------------------------------


On Oct. 29, 2013, 6:53 a.m., Eric Biederman wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/15015/
> -----------------------------------------------------------
> 
> (Updated Oct. 29, 2013, 6:53 a.m.)
> 
> 
> Review request for mesos, Ben Mahler, Ian Downes, and Vinod Kone.
> 
> 
> Bugs: Mesos-758
>     https://issues.apache.org/jira/browse/Mesos-758
> 
> 
> Repository: mesos-git
> 
> 
> Description
> -------
> 
> This reports a basic break down of memory usage so jobs whose page
> cache expands to their configured hard limit will have information
> that allows them to size their jobs.
> 
> This does not include all fields from memory.stat and not even all
> total_ fields from memory.stat but rather the fields I have reviewed
> and seen a need for, and have a reasonable expectation that the fields
> will report useful information and work as designed.
> 
> Quite a few of the fields appear only useful for kernel level debugging.
> 
> In particular the fields:
> memory.stat:total_cache         which reports the amount of file backed memory in bytes
> memory.stat:total_rss           which reports the amount of anonymous memory in bytes
> memory.stat:total_mapped_file   which reports the amount of mapped file memory in bytes
> 
> Reading memory.stat is not quite atomic and neither is computing the
> values.  As even in the simplest cases the values are sums of per cpu
> counters of the various statistics.  However for the most part the numbers are a reasonable
> snapshot of the state of the cgroup at a particular instant.
> 
> 
> Diffs
> -----
> 
>   include/mesos/mesos.proto fe1d82b 
>   src/slave/cgroups_isolator.cpp fc36f1f 
>   src/slave/monitor.cpp 9cb6256 
>   src/tests/monitor_tests.cpp 3d3f8af 
> 
> Diff: https://reviews.apache.org/r/15015/diff/
> 
> 
> Testing
> -------
> 
> make  and make check pass
> The code is a trivial extension of what already exists.
> 
> 
> Thanks,
> 
> Eric Biederman
> 
>

Re: Review Request 15015: cgroup_isolator: Report additional memory statistics

Posted by Eric Biederman <eb...@xmission.com>.


> On Oct. 31, 2013, 10:18 p.m., Ian Downes wrote:
> > src/slave/cgroups_isolator.cpp, line 782
> > <https://reviews.apache.org/r/15015/diff/1/?file=372971#file372971line782>
> >
> >     Why don't we correct the naming while we're at it:
> >     
> >     total_memory      <- memory.usage_in_bytes, this is the primary memory statistic
> >     
> >     # and then the breakdown: 
> >     total_rss         <- total_rss from memory.stat
> >     total_cached      <- total_cache from memory.stat
> >     total_mapped_file <- total_mapped_file from memory.stat
> >     
> >     We probably don't need to include the total_ prefix?
> >     
> >     An executor's memory limit then matches with total_memory and, only if necessary, one can look at the breakdown.

I don't follow.

We absolutely need the total_ prefix when reading the fields from the kernel.  And it is the kernel's memory.stat file where the naming is deceptive I can't fix that with a mesos patch.

I did do what I can do with a mesos patch which is name the fields in ResourceStatics and in the http interface in a non-deceptive way.

We absolutely do not even want to consider passing the kernel names straight through as those names are wrong.  Especially the borked total_rss field name as that is totally wrong.

The current mapping implemented by this patch is:

mem_rss_bytes  <- memory.usage_in_bytes
mem_anon_bytes <- memory.stat:total_rss
mem_file_bytes <- memory.stat:total_cache
mem_mapped_file_bytes <- memory.stat:total_mapped_file

I would include mem_mapped_anon_bytes but by definition that is exactly the same as mem_anon_bytes.

I can see renaming mem_rss_bytes something like mem_usage_bytes, however that is going to break existing users of mesos, and breaking existing users is a bad idea.  And resident set size and usage synonyms so I don't see any pressing need for a rename there.


- Eric


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/15015/#review27967
-----------------------------------------------------------


On Oct. 29, 2013, 6:53 a.m., Eric Biederman wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/15015/
> -----------------------------------------------------------
> 
> (Updated Oct. 29, 2013, 6:53 a.m.)
> 
> 
> Review request for mesos, Ben Mahler, Ian Downes, and Vinod Kone.
> 
> 
> Bugs: Mesos-758
>     https://issues.apache.org/jira/browse/Mesos-758
> 
> 
> Repository: mesos-git
> 
> 
> Description
> -------
> 
> This reports a basic break down of memory usage so jobs whose page
> cache expands to their configured hard limit will have information
> that allows them to size their jobs.
> 
> This does not include all fields from memory.stat and not even all
> total_ fields from memory.stat but rather the fields I have reviewed
> and seen a need for, and have a reasonable expectation that the fields
> will report useful information and work as designed.
> 
> Quite a few of the fields appear only useful for kernel level debugging.
> 
> In particular the fields:
> memory.stat:total_cache         which reports the amount of file backed memory in bytes
> memory.stat:total_rss           which reports the amount of anonymous memory in bytes
> memory.stat:total_mapped_file   which reports the amount of mapped file memory in bytes
> 
> Reading memory.stat is not quite atomic and neither is computing the
> values.  As even in the simplest cases the values are sums of per cpu
> counters of the various statistics.  However for the most part the numbers are a reasonable
> snapshot of the state of the cgroup at a particular instant.
> 
> 
> Diffs
> -----
> 
>   include/mesos/mesos.proto fe1d82b 
>   src/slave/cgroups_isolator.cpp fc36f1f 
>   src/slave/monitor.cpp 9cb6256 
>   src/tests/monitor_tests.cpp 3d3f8af 
> 
> Diff: https://reviews.apache.org/r/15015/diff/
> 
> 
> Testing
> -------
> 
> make  and make check pass
> The code is a trivial extension of what already exists.
> 
> 
> Thanks,
> 
> Eric Biederman
> 
>

Re: Review Request 15015: cgroup_isolator: Report additional memory statistics

Posted by Eric Biederman <eb...@xmission.com>.

> On Oct. 31, 2013, 10:18 p.m., Ian Downes wrote:
> > src/slave/cgroups_isolator.cpp, line 782
> > <https://reviews.apache.org/r/15015/diff/1/?file=372971#file372971line782>
> >
> >     Why don't we correct the naming while we're at it:
> >     
> >     total_memory      <- memory.usage_in_bytes, this is the primary memory statistic
> >     
> >     # and then the breakdown: 
> >     total_rss         <- total_rss from memory.stat
> >     total_cached      <- total_cache from memory.stat
> >     total_mapped_file <- total_mapped_file from memory.stat
> >     
> >     We probably don't need to include the total_ prefix?
> >     
> >     An executor's memory limit then matches with total_memory and, only if necessary, one can look at the breakdown.
> 
> Eric Biederman wrote:
>     I don't follow.
>     
>     We absolutely need the total_ prefix when reading the fields from the kernel.  And it is the kernel's memory.stat file where the naming is deceptive I can't fix that with a mesos patch.
>     
>     I did do what I can do with a mesos patch which is name the fields in ResourceStatics and in the http interface in a non-deceptive way.
>     
>     We absolutely do not even want to consider passing the kernel names straight through as those names are wrong.  Especially the borked total_rss field name as that is totally wrong.
>     
>     The current mapping implemented by this patch is:
>     
>     mem_rss_bytes  <- memory.usage_in_bytes
>     mem_anon_bytes <- memory.stat:total_rss
>     mem_file_bytes <- memory.stat:total_cache
>     mem_mapped_file_bytes <- memory.stat:total_mapped_file
>     
>     I would include mem_mapped_anon_bytes but by definition that is exactly the same as mem_anon_bytes.
>     
>     I can see renaming mem_rss_bytes something like mem_usage_bytes, however that is going to break existing users of mesos, and breaking existing users is a bad idea.  And resident set size and usage synonyms so I don't see any pressing need for a rename there.
>     
>     
>
> 
> Ian Downes wrote:
>     Sorry, I was meaning we shouldn't expose them as total_* but you're not doing that so please disregard.
>     
>     With the patch's renaming we'd have
>     cgroups: mem_rss_bytes is not rss but the total
>     process: mem_rss_bytes is rss and not the total
>     
>     So running the same executor under different isolators would report (potentially) very different numbers?

The defining question is: How much memory must be available for a job to run successfully and at full performance?

When this question is considered in theory the question is stated as what is the minimum resident set size for a job to run successfully at full performance?

The resident set size being how much of the job is in memory at one time.  With the rest of the job being on disk.

With my three patches applied the cgroups mem_rss_bytes is exactly the jobs current rss.  The other numbers can help guess at the minimum rss, but what we measure is in fact how much of the job is in memory at the current instant.  That is the rss.

The rss of a process and the rss of a job are subtlety different.  When measuring a single process the rss is just a count of the number of resident pages in the address space of the process.  For a job the rss, pages that are mapped into two different processes are counted only once, and pages that are used in the for other purposes like I/O are also counted.

The process isolator does not accurately compute the rss of a job.   The process isolator reports as mem_rss_bytes the simple sum of the rss of every process.  Which means the process isolator over counts pages that are mapped into multiple processes and does not count cached files.  Depending on how different jobs use memory the process isolator can be widely inaccurate.

For a job consisting of a single process that does minimal file I/O the process isolator will report a number that is close to how much memory is being used to run the process.

If you are comparing a job of one process that does minimal file I/O the two numbers reported by the cgroup isolator and the process isolator should be comparable.

Approximations to the minimal rss computed using the cgroup isolator, should work for sizing memory anywhere.

Approximations to the minimal rss computed from the process isolator, will only be approach being accurate if the jobs do minimal file I/O and have very file pages (think text) shared between processes.

For sizing the practical challenge with the cgroup isolator is that if a job is over provisioned it may still use all of the memory up to it's memory limit to cache files, because caching old file data is typically better than leaving the memory sitting empty.

The process isolator can be so wildly in accurate in the worst case that it is hard to think about.  In particular the worst case for process isolator looks something like:

// Allocate and touch 1 GB of memory so the pages are mapped into our address space.
ptr = calloc(1, 1024*1024*1024);
for (;;) {
   fork();
}

In which case there will be a be a large number of processes all with a 1GB rss, but only using a few additional kilobyte per process.

Does that help put these numbers into perspective?

- Eric

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/15015/#review27967
-----------------------------------------------------------

On Oct. 29, 2013, 6:53 a.m., Eric Biederman wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/15015/
> -----------------------------------------------------------
> 
> (Updated Oct. 29, 2013, 6:53 a.m.)
> 
> 
> Review request for mesos, Ben Mahler, Ian Downes, and Vinod Kone.
> 
> 
> Bugs: Mesos-758
>     https://issues.apache.org/jira/browse/Mesos-758
> 
> 
> Repository: mesos-git
> 
> 
> Description
> -------
> 
> This reports a basic break down of memory usage so jobs whose page
> cache expands to their configured hard limit will have information
> that allows them to size their jobs.
> 
> This does not include all fields from memory.stat and not even all
> total_ fields from memory.stat but rather the fields I have reviewed
> and seen a need for, and have a reasonable expectation that the fields
> will report useful information and work as designed.
> 
> Quite a few of the fields appear only useful for kernel level debugging.
> 
> In particular the fields:
> memory.stat:total_cache         which reports the amount of file backed memory in bytes
> memory.stat:total_rss           which reports the amount of anonymous memory in bytes
> memory.stat:total_mapped_file   which reports the amount of mapped file memory in bytes
> 
> Reading memory.stat is not quite atomic and neither is computing the
> values.  As even in the simplest cases the values are sums of per cpu
> counters of the various statistics.  However for the most part the numbers are a reasonable
> snapshot of the state of the cgroup at a particular instant.
> 
> 
> Diffs
> -----
> 
>   include/mesos/mesos.proto fe1d82b 
>   src/slave/cgroups_isolator.cpp fc36f1f 
>   src/slave/monitor.cpp 9cb6256 
>   src/tests/monitor_tests.cpp 3d3f8af 
> 
> Diff: https://reviews.apache.org/r/15015/diff/
> 
> 
> Testing
> -------
> 
> make  and make check pass
> The code is a trivial extension of what already exists.
> 
> 
> Thanks,
> 
> Eric Biederman
> 
>

Re: Review Request 15015: cgroup_isolator: Report additional memory statistics

Posted by Ian Downes <ia...@gmail.com>.


> On Oct. 31, 2013, 10:18 p.m., Ian Downes wrote:
> > src/slave/cgroups_isolator.cpp, line 782
> > <https://reviews.apache.org/r/15015/diff/1/?file=372971#file372971line782>
> >
> >     Why don't we correct the naming while we're at it:
> >     
> >     total_memory      <- memory.usage_in_bytes, this is the primary memory statistic
> >     
> >     # and then the breakdown: 
> >     total_rss         <- total_rss from memory.stat
> >     total_cached      <- total_cache from memory.stat
> >     total_mapped_file <- total_mapped_file from memory.stat
> >     
> >     We probably don't need to include the total_ prefix?
> >     
> >     An executor's memory limit then matches with total_memory and, only if necessary, one can look at the breakdown.
> 
> Eric Biederman wrote:
>     I don't follow.
>     
>     We absolutely need the total_ prefix when reading the fields from the kernel.  And it is the kernel's memory.stat file where the naming is deceptive I can't fix that with a mesos patch.
>     
>     I did do what I can do with a mesos patch which is name the fields in ResourceStatics and in the http interface in a non-deceptive way.
>     
>     We absolutely do not even want to consider passing the kernel names straight through as those names are wrong.  Especially the borked total_rss field name as that is totally wrong.
>     
>     The current mapping implemented by this patch is:
>     
>     mem_rss_bytes  <- memory.usage_in_bytes
>     mem_anon_bytes <- memory.stat:total_rss
>     mem_file_bytes <- memory.stat:total_cache
>     mem_mapped_file_bytes <- memory.stat:total_mapped_file
>     
>     I would include mem_mapped_anon_bytes but by definition that is exactly the same as mem_anon_bytes.
>     
>     I can see renaming mem_rss_bytes something like mem_usage_bytes, however that is going to break existing users of mesos, and breaking existing users is a bad idea.  And resident set size and usage synonyms so I don't see any pressing need for a rename there.
>     
>     
>

Sorry, I was meaning we shouldn't expose them as total_* but you're not doing that so please disregard.

With the patch's renaming we'd have
cgroups: mem_rss_bytes is not rss but the total
process: mem_rss_bytes is rss and not the total

So running the same executor under different isolators would report (potentially) very different numbers?


- Ian


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/15015/#review27967
-----------------------------------------------------------


On Oct. 29, 2013, 6:53 a.m., Eric Biederman wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/15015/
> -----------------------------------------------------------
> 
> (Updated Oct. 29, 2013, 6:53 a.m.)
> 
> 
> Review request for mesos, Ben Mahler, Ian Downes, and Vinod Kone.
> 
> 
> Bugs: Mesos-758
>     https://issues.apache.org/jira/browse/Mesos-758
> 
> 
> Repository: mesos-git
> 
> 
> Description
> -------
> 
> This reports a basic break down of memory usage so jobs whose page
> cache expands to their configured hard limit will have information
> that allows them to size their jobs.
> 
> This does not include all fields from memory.stat and not even all
> total_ fields from memory.stat but rather the fields I have reviewed
> and seen a need for, and have a reasonable expectation that the fields
> will report useful information and work as designed.
> 
> Quite a few of the fields appear only useful for kernel level debugging.
> 
> In particular the fields:
> memory.stat:total_cache         which reports the amount of file backed memory in bytes
> memory.stat:total_rss           which reports the amount of anonymous memory in bytes
> memory.stat:total_mapped_file   which reports the amount of mapped file memory in bytes
> 
> Reading memory.stat is not quite atomic and neither is computing the
> values.  As even in the simplest cases the values are sums of per cpu
> counters of the various statistics.  However for the most part the numbers are a reasonable
> snapshot of the state of the cgroup at a particular instant.
> 
> 
> Diffs
> -----
> 
>   include/mesos/mesos.proto fe1d82b 
>   src/slave/cgroups_isolator.cpp fc36f1f 
>   src/slave/monitor.cpp 9cb6256 
>   src/tests/monitor_tests.cpp 3d3f8af 
> 
> Diff: https://reviews.apache.org/r/15015/diff/
> 
> 
> Testing
> -------
> 
> make  and make check pass
> The code is a trivial extension of what already exists.
> 
> 
> Thanks,
> 
> Eric Biederman
> 
>

Re: Review Request 15015: cgroup_isolator: Report additional memory statistics

Posted by Ian Downes <ia...@gmail.com>.

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/15015/#review27967
-----------------------------------------------------------



src/slave/cgroups_isolator.cpp
<https://reviews.apache.org/r/15015/#comment54420>

    Why don't we correct the naming while we're at it:
    
    total_memory      <- memory.usage_in_bytes, this is the primary memory statistic
    
    # and then the breakdown: 
    total_rss         <- total_rss from memory.stat
    total_cached      <- total_cache from memory.stat
    total_mapped_file <- total_mapped_file from memory.stat
    
    We probably don't need to include the total_ prefix?
    
    An executor's memory limit then matches with total_memory and, only if necessary, one can look at the breakdown.


- Ian Downes


On Oct. 29, 2013, 6:53 a.m., Eric Biederman wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/15015/
> -----------------------------------------------------------
> 
> (Updated Oct. 29, 2013, 6:53 a.m.)
> 
> 
> Review request for mesos, Ben Mahler, Ian Downes, and Vinod Kone.
> 
> 
> Bugs: Mesos-758
>     https://issues.apache.org/jira/browse/Mesos-758
> 
> 
> Repository: mesos-git
> 
> 
> Description
> -------
> 
> This reports a basic break down of memory usage so jobs whose page
> cache expands to their configured hard limit will have information
> that allows them to size their jobs.
> 
> This does not include all fields from memory.stat and not even all
> total_ fields from memory.stat but rather the fields I have reviewed
> and seen a need for, and have a reasonable expectation that the fields
> will report useful information and work as designed.
> 
> Quite a few of the fields appear only useful for kernel level debugging.
> 
> In particular the fields:
> memory.stat:total_cache         which reports the amount of file backed memory in bytes
> memory.stat:total_rss           which reports the amount of anonymous memory in bytes
> memory.stat:total_mapped_file   which reports the amount of mapped file memory in bytes
> 
> Reading memory.stat is not quite atomic and neither is computing the
> values.  As even in the simplest cases the values are sums of per cpu
> counters of the various statistics.  However for the most part the numbers are a reasonable
> snapshot of the state of the cgroup at a particular instant.
> 
> 
> Diffs
> -----
> 
>   include/mesos/mesos.proto fe1d82b 
>   src/slave/cgroups_isolator.cpp fc36f1f 
>   src/slave/monitor.cpp 9cb6256 
>   src/tests/monitor_tests.cpp 3d3f8af 
> 
> Diff: https://reviews.apache.org/r/15015/diff/
> 
> 
> Testing
> -------
> 
> make  and make check pass
> The code is a trivial extension of what already exists.
> 
> 
> Thanks,
> 
> Eric Biederman
> 
>