You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@mesos.apache.org by Benjamin Hindman <be...@berkeley.edu> on 2015/08/02 00:30:41 UTC

Re: Review Request 36627: Fixed cgroups oom killer and memory pressure tests on Ubuntu 14.04.

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/36627/#review93849
-----------------------------------------------------------

Ship it!



src/tests/containerizer/cgroups_tests.cpp (lines 1143 - 1146)
<https://reviews.apache.org/r/36627/#comment148258>

    But doesn't this mean we might stay in an infinite loop forever? I'm assuming that somehow you figured out that things are just delayed but it eventually converges correctly, is that the case? Can you leave a comment on why not incrementing 'i' won't just make this an infinite loop?



src/tests/containerizer/memory_test_helper.cpp (line 93)
<https://reviews.apache.org/r/36627/#comment148259>

    Any specific reason to memset it to 0 instead of 1?


- Benjamin Hindman


On July 29, 2015, 11:14 p.m., Artem Harutyunyan wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/36627/
> -----------------------------------------------------------
> 
> (Updated July 29, 2015, 11:14 p.m.)
> 
> 
> Review request for mesos, Benjamin Hindman and Joris Van Remoortere.
> 
> 
> Bugs: MESOS-2660
>     https://issues.apache.org/jira/browse/MESOS-2660
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> See summary.
> 
> 
> Diffs
> -----
> 
>   src/tests/containerizer/cgroups_tests.cpp caecd5dfa3fef33dba35cfc1b5934a11e2cc961a 
>   src/tests/containerizer/memory_test_helper.cpp 48a35632786963f484f66642b5c67afd4f7a89cc 
> 
> Diff: https://reviews.apache.org/r/36627/diff/
> 
> 
> Testing
> -------
> 
> It seems there is still one more cgroups memory test failing on more test failing on my box. I'd like to fix that too and commit it together with this one. 
> 
> sudo make check
> 
> Verified that the process actually gets killed by oom-killer:
> 
> ```
> # tail -f /var/log/syslog
> 
> Jul 22 14:56:00 harutyunyan-virtual-machine kernel: [17440.052405] lt-memory-test- invoked oom-killer: gfp_mask=0xd0, order=0, oom_score_adj=0
> Jul 22 14:56:00 harutyunyan-virtual-machine kernel: [17440.052408] lt-memory-test- cpuset=/ mems_allowed=0
> Jul 22 14:56:00 harutyunyan-virtual-machine kernel: [17440.052411] CPU: 7 PID: 76599 Comm: lt-memory-test- Tainted: G           OE 3.16.0-41-generic #57~14.04.1-Ubuntu
> Jul 22 14:56:00 harutyunyan-virtual-machine kernel: [17440.052413] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 05/20/2014
> Jul 22 14:56:00 harutyunyan-virtual-machine kernel: [17440.052413]  ffff88022efc1000 ffff8801fd2efc30 ffffffff81765721 ffff880231f10a30
> Jul 22 14:56:00 harutyunyan-virtual-machine kernel: [17440.052415]  ffff8801fd2efcb8 ffffffff8175f2d5 ffff8802366f30c0 ffff8801e9405b00
> Jul 22 14:56:00 harutyunyan-virtual-machine kernel: [17440.052416]  ffff8801fd2efc80 ffffffff81165067 ffff880231f10ee8 ffff880231f10a30
> Jul 22 14:56:00 harutyunyan-virtual-machine kernel: [17440.052418] Call Trace:
> Jul 22 14:56:00 harutyunyan-virtual-machine kernel: [17440.052424]  [<ffffffff81765721>] dump_stack+0x45/0x56
> Jul 22 14:56:00 harutyunyan-virtual-machine kernel: [17440.052426]  [<ffffffff8175f2d5>] dump_header+0x7f/0x1f1
> Jul 22 14:56:00 harutyunyan-virtual-machine kernel: [17440.052430]  [<ffffffff81165067>] ? find_lock_task_mm+0x47/0xa0
> Jul 22 14:56:00 harutyunyan-virtual-machine kernel: [17440.052432]  [<ffffffff811654e5>] oom_kill_process+0x205/0x360
> Jul 22 14:56:00 harutyunyan-virtual-machine kernel: [17440.052434]  [<ffffffff812eb975>] ? security_capable_noaudit+0x15/0x20
> Jul 22 14:56:00 harutyunyan-virtual-machine kernel: [17440.052437]  [<ffffffff811ca2e1>] mem_cgroup_oom_synchronize+0x581/0x5e0
> Jul 22 14:56:00 harutyunyan-virtual-machine kernel: [17440.052439]  [<ffffffff811c97c0>] ? mem_cgroup_try_charge_mm+0xa0/0xa0
> Jul 22 14:56:00 harutyunyan-virtual-machine kernel: [17440.052440]  [<ffffffff81165ce4>] pagefault_out_of_memory+0x14/0x80
> Jul 22 14:56:00 harutyunyan-virtual-machine kernel: [17440.052442]  [<ffffffff8175d97f>] mm_fault_error+0x67/0x140
> Jul 22 14:56:00 harutyunyan-virtual-machine kernel: [17440.052445]  [<ffffffff8105b28c>] __do_page_fault+0x4ec/0x560
> Jul 22 14:56:00 harutyunyan-virtual-machine kernel: [17440.052449]  [<ffffffff810a6208>] ? __enqueue_entity+0x78/0x80
> Jul 22 14:56:00 harutyunyan-virtual-machine kernel: [17440.052450]  [<ffffffff810a7f35>] ? set_next_entity+0x95/0xb0
> Jul 22 14:56:00 harutyunyan-virtual-machine kernel: [17440.052452]  [<ffffffff81011627>] ? __switch_to+0x167/0x580
> Jul 22 14:56:00 harutyunyan-virtual-machine kernel: [17440.052454]  [<ffffffff8105b331>] do_page_fault+0x31/0x70
> Jul 22 14:56:00 harutyunyan-virtual-machine kernel: [17440.052456]  [<ffffffff8176fe68>] page_fault+0x28/0x30
> Jul 22 14:56:00 harutyunyan-virtual-machine kernel: [17440.052457] Task in /mesos_test killed as a result of limit of /mesos_test
> Jul 22 14:56:00 harutyunyan-virtual-machine kernel: [17440.052459] memory: usage 65536kB, limit 65536kB, failcnt 24
> Jul 22 14:56:00 harutyunyan-virtual-machine kernel: [17440.052460] memory+swap: usage 0kB, limit 18014398509481983kB, failcnt 0
> Jul 22 14:56:00 harutyunyan-virtual-machine kernel: [17440.052461] kmem: usage 0kB, limit 18014398509481983kB, failcnt 0
> Jul 22 14:56:00 harutyunyan-virtual-machine kernel: [17440.052461] Memory cgroup stats for /mesos_test: cache:0KB rss:65536KB rss_huge:63488KB mapped_file:0KB writeback:0KB inactive_anon:0KB active_anon:0KB inactive_file:0KB active_file:0KB unevictable:65536KB
> Jul 22 14:56:00 harutyunyan-virtual-machine kernel: [17440.052467] [ pid ]   uid  tgid total_vm      rss nr_ptes swapents oom_score_adj name
> Jul 22 14:56:00 harutyunyan-virtual-machine kernel: [17440.052574] [76599]     0 76599    82452    31766     122        1             0 lt-memory-test-
> Jul 22 14:56:00 harutyunyan-virtual-machine kernel: [17440.052575] Memory cgroup out of memory: Kill process 76599 (lt-memory-test-) score 14 or sacrifice child
> Jul 22 14:56:00 harutyunyan-virtual-machine kernel: [17440.052577] Killed process 76599 (lt-memory-test-) total-vm:329808kB, anon-rss:67988kB, file-rss:59076kB
> 
> 
> ```
> 
> 
> Thanks,
> 
> Artem Harutyunyan
> 
>


Re: Review Request 36627: Fixed cgroups oom killer and memory pressure tests on Ubuntu 14.04.

Posted by Artem Harutyunyan <ar...@mesosphere.io>.

> On Aug. 1, 2015, 3:30 p.m., Benjamin Hindman wrote:
> > src/tests/containerizer/cgroups_tests.cpp, lines 1143-1146
> > <https://reviews.apache.org/r/36627/diff/5/?file=1024882#file1024882line1143>
> >
> >     But doesn't this mean we might stay in an infinite loop forever? I'm assuming that somehow you figured out that things are just delayed but it eventually converges correctly, is that the case? Can you leave a comment on why not incrementing 'i' won't just make this an infinite loop?

The reasoning here is that counters will eventually become stable (that should happen on the second read) at which point the normal course of action (including incrementing the loop counter) should resume. 

There is a comment about that a couple of lines above (lines 1120-1123):
```
    // We need to know the readings are the same as last time to be
    // sure they are stable, because the reading is not atomic. For
    // example, the medium could turn positive after we read low to be
    // 0, but this should be fixed by the next read immediately.
```
I will ammend this comment too.


> On Aug. 1, 2015, 3:30 p.m., Benjamin Hindman wrote:
> > src/tests/containerizer/memory_test_helper.cpp, line 93
> > <https://reviews.apache.org/r/36627/diff/5/?file=1024883#file1024883line93>
> >
> >     Any specific reason to memset it to 0 instead of 1?

We thought that compiler might treat '0' as a special value that could affect optimization behaviour. It does not really seem to matter, I'll revert the change.


- Artem


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/36627/#review93849
-----------------------------------------------------------


On July 29, 2015, 4:14 p.m., Artem Harutyunyan wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/36627/
> -----------------------------------------------------------
> 
> (Updated July 29, 2015, 4:14 p.m.)
> 
> 
> Review request for mesos, Benjamin Hindman and Joris Van Remoortere.
> 
> 
> Bugs: MESOS-2660
>     https://issues.apache.org/jira/browse/MESOS-2660
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> See summary.
> 
> 
> Diffs
> -----
> 
>   src/tests/containerizer/cgroups_tests.cpp caecd5dfa3fef33dba35cfc1b5934a11e2cc961a 
>   src/tests/containerizer/memory_test_helper.cpp 48a35632786963f484f66642b5c67afd4f7a89cc 
> 
> Diff: https://reviews.apache.org/r/36627/diff/
> 
> 
> Testing
> -------
> 
> It seems there is still one more cgroups memory test failing on more test failing on my box. I'd like to fix that too and commit it together with this one. 
> 
> sudo make check
> 
> Verified that the process actually gets killed by oom-killer:
> 
> ```
> # tail -f /var/log/syslog
> 
> Jul 22 14:56:00 harutyunyan-virtual-machine kernel: [17440.052405] lt-memory-test- invoked oom-killer: gfp_mask=0xd0, order=0, oom_score_adj=0
> Jul 22 14:56:00 harutyunyan-virtual-machine kernel: [17440.052408] lt-memory-test- cpuset=/ mems_allowed=0
> Jul 22 14:56:00 harutyunyan-virtual-machine kernel: [17440.052411] CPU: 7 PID: 76599 Comm: lt-memory-test- Tainted: G           OE 3.16.0-41-generic #57~14.04.1-Ubuntu
> Jul 22 14:56:00 harutyunyan-virtual-machine kernel: [17440.052413] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 05/20/2014
> Jul 22 14:56:00 harutyunyan-virtual-machine kernel: [17440.052413]  ffff88022efc1000 ffff8801fd2efc30 ffffffff81765721 ffff880231f10a30
> Jul 22 14:56:00 harutyunyan-virtual-machine kernel: [17440.052415]  ffff8801fd2efcb8 ffffffff8175f2d5 ffff8802366f30c0 ffff8801e9405b00
> Jul 22 14:56:00 harutyunyan-virtual-machine kernel: [17440.052416]  ffff8801fd2efc80 ffffffff81165067 ffff880231f10ee8 ffff880231f10a30
> Jul 22 14:56:00 harutyunyan-virtual-machine kernel: [17440.052418] Call Trace:
> Jul 22 14:56:00 harutyunyan-virtual-machine kernel: [17440.052424]  [<ffffffff81765721>] dump_stack+0x45/0x56
> Jul 22 14:56:00 harutyunyan-virtual-machine kernel: [17440.052426]  [<ffffffff8175f2d5>] dump_header+0x7f/0x1f1
> Jul 22 14:56:00 harutyunyan-virtual-machine kernel: [17440.052430]  [<ffffffff81165067>] ? find_lock_task_mm+0x47/0xa0
> Jul 22 14:56:00 harutyunyan-virtual-machine kernel: [17440.052432]  [<ffffffff811654e5>] oom_kill_process+0x205/0x360
> Jul 22 14:56:00 harutyunyan-virtual-machine kernel: [17440.052434]  [<ffffffff812eb975>] ? security_capable_noaudit+0x15/0x20
> Jul 22 14:56:00 harutyunyan-virtual-machine kernel: [17440.052437]  [<ffffffff811ca2e1>] mem_cgroup_oom_synchronize+0x581/0x5e0
> Jul 22 14:56:00 harutyunyan-virtual-machine kernel: [17440.052439]  [<ffffffff811c97c0>] ? mem_cgroup_try_charge_mm+0xa0/0xa0
> Jul 22 14:56:00 harutyunyan-virtual-machine kernel: [17440.052440]  [<ffffffff81165ce4>] pagefault_out_of_memory+0x14/0x80
> Jul 22 14:56:00 harutyunyan-virtual-machine kernel: [17440.052442]  [<ffffffff8175d97f>] mm_fault_error+0x67/0x140
> Jul 22 14:56:00 harutyunyan-virtual-machine kernel: [17440.052445]  [<ffffffff8105b28c>] __do_page_fault+0x4ec/0x560
> Jul 22 14:56:00 harutyunyan-virtual-machine kernel: [17440.052449]  [<ffffffff810a6208>] ? __enqueue_entity+0x78/0x80
> Jul 22 14:56:00 harutyunyan-virtual-machine kernel: [17440.052450]  [<ffffffff810a7f35>] ? set_next_entity+0x95/0xb0
> Jul 22 14:56:00 harutyunyan-virtual-machine kernel: [17440.052452]  [<ffffffff81011627>] ? __switch_to+0x167/0x580
> Jul 22 14:56:00 harutyunyan-virtual-machine kernel: [17440.052454]  [<ffffffff8105b331>] do_page_fault+0x31/0x70
> Jul 22 14:56:00 harutyunyan-virtual-machine kernel: [17440.052456]  [<ffffffff8176fe68>] page_fault+0x28/0x30
> Jul 22 14:56:00 harutyunyan-virtual-machine kernel: [17440.052457] Task in /mesos_test killed as a result of limit of /mesos_test
> Jul 22 14:56:00 harutyunyan-virtual-machine kernel: [17440.052459] memory: usage 65536kB, limit 65536kB, failcnt 24
> Jul 22 14:56:00 harutyunyan-virtual-machine kernel: [17440.052460] memory+swap: usage 0kB, limit 18014398509481983kB, failcnt 0
> Jul 22 14:56:00 harutyunyan-virtual-machine kernel: [17440.052461] kmem: usage 0kB, limit 18014398509481983kB, failcnt 0
> Jul 22 14:56:00 harutyunyan-virtual-machine kernel: [17440.052461] Memory cgroup stats for /mesos_test: cache:0KB rss:65536KB rss_huge:63488KB mapped_file:0KB writeback:0KB inactive_anon:0KB active_anon:0KB inactive_file:0KB active_file:0KB unevictable:65536KB
> Jul 22 14:56:00 harutyunyan-virtual-machine kernel: [17440.052467] [ pid ]   uid  tgid total_vm      rss nr_ptes swapents oom_score_adj name
> Jul 22 14:56:00 harutyunyan-virtual-machine kernel: [17440.052574] [76599]     0 76599    82452    31766     122        1             0 lt-memory-test-
> Jul 22 14:56:00 harutyunyan-virtual-machine kernel: [17440.052575] Memory cgroup out of memory: Kill process 76599 (lt-memory-test-) score 14 or sacrifice child
> Jul 22 14:56:00 harutyunyan-virtual-machine kernel: [17440.052577] Killed process 76599 (lt-memory-test-) total-vm:329808kB, anon-rss:67988kB, file-rss:59076kB
> 
> 
> ```
> 
> 
> Thanks,
> 
> Artem Harutyunyan
> 
>