You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mesos.apache.org by "Thomas Marshall (JIRA)" <ji...@apache.org> on 2013/04/24 03:24:14 UTC

[jira] [Comment Edited] (MESOS-424) CgroupsIsolatorTest.BalloonFramework runs forever

    [ https://issues.apache.org/jira/browse/MESOS-424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13639940#comment-13639940 ] 

Thomas Marshall edited comment on MESOS-424 at 4/24/13 1:23 AM:
----------------------------------------------------------------

Sorry if I'm not being very clear, but the behavior I'm seeing is frustratingly random.

I'm certain Ubuntu doesn't mount cgroups during init, unless you have cgroup-lite package installed, which I don't.

The failures are completely nondeterministic - sometimes the very first BalloonFramework tests run on a fresh machine fails, sometimes BalloonFramework succeeds after multiple failed runs.

It's also worth noting that after the BalloonFramework has failed and been allowed to run for very long before being manually killed, it will start to cause other tests on the machine to fail as well, notably MasterTest.ShutdownUnregisteredExecutor
                
      was (Author: twm378):
    Sorry if I'm not being very clear, but the behavior I'm seeing is frustratingly random.

I'm certain Ubuntu doesn't mount cgroups during init, unless you have cgroup-lite package installed, which I don't.

The failures are completely nondeterministic - sometimes the very first BalloonFramework tests run on a fresh machine fails, sometimes BalloonFramework succeeds after multiple failed runs.
                  
> CgroupsIsolatorTest.BalloonFramework runs forever
> -------------------------------------------------
>
>                 Key: MESOS-424
>                 URL: https://issues.apache.org/jira/browse/MESOS-424
>             Project: Mesos
>          Issue Type: Bug
>            Reporter: Thomas Marshall
>
> On Ubuntu 12.04 Server, running as root:
> bin/mesos-tests.sh --gtest_filter=*Balloon* --verbose
> Source directory: /root/mesos
> Build directory: /root/mesos/build
> Note: Google Test filter = *Balloon*-
> [==========] Running 1 test from 1 test case.
> [----------] Global test environment set-up.
> [----------] 1 test from CgroupsIsolatorTest
> [ RUN      ] CgroupsIsolatorTest.ROOT_CGROUPS_BalloonFramework
> Using temporary directory '/tmp/CgroupsIsolatorTest_ROOT_CGROUPS_BalloonFramework_1JMuXO'
> Launched master at 1770
> I0402 15:20:23.570971  1770 main.cpp:116] Build: 2013-04-02 14:41:50 by root
> I0402 15:20:23.571444  1770 main.cpp:117] Starting Mesos master
> I0402 15:20:23.572792  1788 master.cpp:309] Master started on 127.0.1.1:5432
> I0402 15:20:23.573097  1788 master.cpp:324] Master ID: 201304021520-16842879-5432-1770
> W0402 15:20:23.574090  1787 master.cpp:81] No whitelist given. Advertising offers for all slaves
> I0402 15:20:23.577419  1788 master.cpp:603] Elected as master!
> Launched slave at 1790
> I0402 15:20:25.570708  1790 main.cpp:124] Creating "cgroups" isolator
> I0402 15:20:25.571761  1790 main.cpp:132] Build: 2013-04-02 14:41:50 by root
> I0402 15:20:25.571790  1790 main.cpp:133] Starting Mesos slave
> I0402 15:20:25.574848  1808 slave.cpp:203] Slave started on 1)@127.0.1.1:51739
> I0402 15:20:25.574906  1808 slave.cpp:204] Slave resources: cpus=1; mem=96; ports=[31000-32000]; disk=7572
> I0402 15:20:25.575526  1805 cgroups_isolator.cpp:236] Using /cgroup as cgroups hierarchy root
> I0402 15:20:25.577657  1807 slave.cpp:453] New master detected at master@127.0.0.1:5432
> I0402 15:20:25.577888  1807 status_update_manager.cpp:132] New master detected at master@127.0.0.1:5432
> I0402 15:20:25.586076  1805 cgroups_isolator.cpp:690] Recovering isolator
> I0402 15:20:25.586915  1808 slave.cpp:377] Finished recovery
> I0402 15:20:25.588171  1787 master.cpp:968] Attempting to register slave on ubuntu at slave(1)@127.0.1.1:51739
> I0402 15:20:25.588276  1787 master.cpp:1224] Master now considering a slave at ubuntu:51739 as active
> I0402 15:20:25.589035  1787 master.cpp:1862] Adding slave 201304021520-16842879-5432-1770-0 at ubuntu with cpus=1; mem=96; ports=[31000-32000]; disk=7572
> I0402 15:20:25.589582  1787 hierarchical_allocator_process.hpp:395] Added slave 201304021520-16842879-5432-1770-0 (ubuntu) with cpus=1; mem=96; ports=[31000-32000]; disk=7572 (and cpus=1; mem=96; ports=[31000-32000]; disk=7572 available)
> I0402 15:20:25.589867  1807 slave.cpp:487] Registered with master; given slave ID 201304021520-16842879-5432-1770-0
> I0402 15:20:27.567234  1786 master.cpp:646] Registering framework 201304021520-16842879-5432-1770-0000 at scheduler(1)@127.0.1.1:54177
> I0402 15:20:27.567627  1786 hierarchical_allocator_process.hpp:268] Added framework 201304021520-16842879-5432-1770-0000
> I0402 15:20:27.568018  1786 master.hpp:309] Adding offer with resources cpus=1; mem=96; ports=[31000-32000]; disk=7572 on slave 201304021520-16842879-5432-1770-0
> Registered
> I0402 15:20:27.568243  1786 master.cpp:1327] Sending 1 offers to framework 201304021520-16842879-5432-1770-0000
> Resource offers received
> Starting the task
> I0402 15:20:27.569226  1788 master.cpp:1534] Processing reply for offer 201304021520-16842879-5432-1770-0 on slave 201304021520-16842879-5432-1770-0 (ubuntu) for framework 201304021520-16842879-5432-1770-0000
> I0402 15:20:27.569449  1788 master.hpp:289] Adding task with resources mem=32 on slave 201304021520-16842879-5432-1770-0
> I0402 15:20:27.569537  1788 master.cpp:1651] Launching task 1 of framework 201304021520-16842879-5432-1770-0000 with resources mem=32 on slave 201304021520-16842879-5432-1770-0 (ubuntu)
> I0402 15:20:27.569792  1788 master.hpp:318] Removing offer with resources cpus=1; mem=96; ports=[31000-32000]; disk=7572 on slave 201304021520-16842879-5432-1770-0
> I0402 15:20:27.569903  1785 hierarchical_allocator_process.hpp:497] Framework 201304021520-16842879-5432-1770-0000 filtered slave 201304021520-16842879-5432-1770-0 for 5.00secs
> I0402 15:20:27.570047  1805 slave.cpp:587] Got assigned task 1 for framework 201304021520-16842879-5432-1770-0000
> I0402 15:20:27.572463  1805 paths.hpp:302] Created executor directory '/tmp/mesos/slaves/201304021520-16842879-5432-1770-0/frameworks/201304021520-16842879-5432-1770-0000/executors/default/runs/a6115dfa-8195-4cf4-b044-f9b7e7531e9e'
> I0402 15:20:27.573072  1805 slave.cpp:436] Successfully attached file '/tmp/mesos/slaves/201304021520-16842879-5432-1770-0/frameworks/201304021520-16842879-5432-1770-0000/executors/default/runs/a6115dfa-8195-4cf4-b044-f9b7e7531e9e'
> I0402 15:20:27.573310  1806 cgroups_isolator.cpp:488] Launching default (/root/mesos/build/src/balloon-executor) in /tmp/mesos/slaves/201304021520-16842879-5432-1770-0/frameworks/201304021520-16842879-5432-1770-0000/executors/default/runs/a6115dfa-8195-4cf4-b044-f9b7e7531e9e with resources mem=64 for framework 201304021520-16842879-5432-1770-0000 in cgroup mesos/framework_201304021520-16842879-5432-1770-0000_executor_default_tag_a6115dfa-8195-4cf4-b044-f9b7e7531e9e
> I0402 15:20:27.573943  1806 cgroups_isolator.cpp:631] Changing cgroup controls for executor default of framework 201304021520-16842879-5432-1770-0000 with resources mem=64
> I0402 15:20:27.574291  1806 cgroups_isolator.cpp:898] Updated 'memory.limit_in_bytes' to 67108864 for executor default of framework 201304021520-16842879-5432-1770-0000
> I0402 15:20:27.574923  1806 cgroups_isolator.cpp:924] Started listening for OOM events for executor default of framework 201304021520-16842879-5432-1770-0000
> I0402 15:20:27.575889  1806 cgroups_isolator.cpp:517] Forked executor at = 1829
> Fetching resources into '/tmp/mesos/slaves/201304021520-16842879-5432-1770-0/frameworks/201304021520-16842879-5432-1770-0000/executors/default/runs/a6115dfa-8195-4cf4-b044-f9b7e7531e9e'
> I0402 15:20:27.641137  1808 slave.cpp:1046] Got registration for executor 'default' of framework 201304021520-16842879-5432-1770-0000
> I0402 15:20:27.641315  1808 slave.cpp:1121] Flushing queued tasks for framework 201304021520-16842879-5432-1770-0000
> I0402 15:20:27.641386  1808 cgroups_isolator.cpp:631] Changing cgroup controls for executor default of framework 201304021520-16842879-5432-1770-0000 with resources mem=96
> I0402 15:20:27.641913  1808 cgroups_isolator.cpp:898] Updated 'memory.limit_in_bytes' to 100663296 for executor default of framework 201304021520-16842879-5432-1770-0000
> W0402 15:20:28.575875  1788 master.cpp:81] No whitelist given. Advertising offers for all slaves
> I0402 15:20:28.897797  1807 cgroups_isolator.cpp:944] OOM notifier is triggered for executor default of framework 201304021520-16842879-5432-1770-0000 with uuid a6115dfa-8195-4cf4-b044-f9b7e7531e9e
> I0402 15:20:28.897902  1807 cgroups_isolator.cpp:989] OOM detected for executor default of framework 201304021520-16842879-5432-1770-0000 with uuid a6115dfa-8195-4cf4-b044-f9b7e7531e9e
> I0402 15:20:28.899562  1807 cgroups_isolator.cpp:1030] Memory limit exceeded: Requested: 96MB Used: 96MB
> MEMORY STATISTICS: 
> cache 0
> rss 100663296
> mapped_file 0
> swap 2424832
> pgpgin 25936
> pgpgout 1360
> pgfault 31673
> pgmajfault 1
> inactive_anon 0
> active_anon 0
> inactive_file 0
> active_file 0
> unevictable 100663296
> hierarchical_memory_limit 100663296
> hierarchical_memsw_limit 9223372036854775807
> total_cache 0
> total_rss 100663296
> total_mapped_file 0
> total_swap 2424832
> total_pgpgin 25936
> total_pgpgout 1360
> total_pgfault 31673
> total_pgmajfault 1
> total_inactive_anon 0
> total_active_anon 0
> total_inactive_file 0
> total_active_file 0
> total_unevictable 100663296
> I0402 15:20:28.899739  1807 cgroups_isolator.cpp:596] Killing executor default of framework 201304021520-16842879-5432-1770-0000
> I0402 15:20:28.901882  1805 cgroups.cpp:1175] Trying to freeze cgroup /cgroup/mesos/framework_201304021520-16842879-5432-1770-0000_executor_default_tag_a6115dfa-8195-4cf4-b044-f9b7e7531e9e
> W0402 15:20:32.578037  1807 monitor.cpp:212] Failed to collect resource usage for executor 'default' of framework '201304021520-16842879-5432-1770-0000': 1
> W0402 15:20:33.578172  1788 master.cpp:81] No whitelist given. Advertising offers for all slaves
> W0402 15:20:34.065656  1805 cgroups.cpp:1261] Unable to freeze /cgroup/mesos/framework_201304021520-16842879-5432-1770-0000_executor_default_tag_a6115dfa-8195-4cf4-b044-f9b7e7531e9e within 51 attempts
> I0402 15:20:34.067944  1805 cgroups.cpp:1190] Trying to thaw cgroup /cgroup/mesos/framework_201304021520-16842879-5432-1770-0000_executor_default_tag_a6115dfa-8195-4cf4-b044-f9b7e7531e9e
> I0402 15:20:34.068300  1805 cgroups.cpp:1298] Successfully thawed /cgroup/mesos/framework_201304021520-16842879-5432-1770-0000_executor_default_tag_a6115dfa-8195-4cf4-b044-f9b7e7531e9e
> I0402 15:20:34.582098  1805 cgroups_isolator.cpp:766] Executor default of framework 201304021520-16842879-5432-1770-0000 terminated with status 9
> W0402 15:20:37.579793  1807 monitor.cpp:212] Failed to collect resource usage for executor 'default' of framework '201304021520-16842879-5432-1770-0000': 1
> W0402 15:20:38.580425  1787 master.cpp:81] No whitelist given. Advertising offers for all slaves
> I0402 15:20:39.216334  1807 cgroups.cpp:1175] Trying to freeze cgroup /cgroup/mesos/framework_201304021520-16842879-5432-1770-0000_executor_default_tag_a6115dfa-8195-4cf4-b044-f9b7e7531e9e
> W0402 15:20:42.580739  1806 monitor.cpp:212] Failed to collect resource usage for executor 'default' of framework '201304021520-16842879-5432-1770-0000': 1
> W0402 15:20:43.582556  1788 master.cpp:81] No whitelist given. Advertising offers for all slaves
> W0402 15:20:44.377604  1808 cgroups.cpp:1261] Unable to freeze /cgroup/mesos/framework_201304021520-16842879-5432-1770-0000_executor_default_tag_a6115dfa-8195-4cf4-b044-f9b7e7531e9e within 51 attempts
> I0402 15:20:44.379775  1805 cgroups.cpp:1190] Trying to thaw cgroup /cgroup/mesos/framework_201304021520-16842879-5432-1770-0000_executor_default_tag_a6115dfa-8195-4cf4-b044-f9b7e7531e9e
> I0402 15:20:44.379935  1805 cgroups.cpp:1298] Successfully thawed /cgroup/mesos/framework_201304021520-16842879-5432-1770-0000_executor_default_tag_a6115dfa-8195-4cf4-b044-f9b7e7531e9e
> W0402 15:20:47.581902  1807 monitor.cpp:212] Failed to collect resource usage for executor 'default' of framework '201304021520-16842879-5432-1770-0000': 1
> W0402 15:20:48.584782  1786 master.cpp:81] No whitelist given. Advertising offers for all slaves
> I0402 15:20:49.528096  1807 cgroups.cpp:1175] Trying to freeze cgroup /cgroup/mesos/framework_201304021520-16842879-5432-1770-0000_executor_default_tag_a6115dfa-8195-4cf4-b044-f9b7e7531e9e
> W0402 15:20:52.583258  1806 monitor.cpp:212] Failed to collect resource usage for executor 'default' of framework '201304021520-16842879-5432-1770-0000': 1
> W0402 15:20:53.586912  1787 master.cpp:81] No whitelist given. Advertising offers for all slaves
> W0402 15:20:54.691306  1808 cgroups.cpp:1261] Unable to freeze /cgroup/mesos/framework_201304021520-16842879-5432-1770-0000_executor_default_tag_a6115dfa-8195-4cf4-b044-f9b7e7531e9e within 51 attempts
> I0402 15:20:54.693431  1808 cgroups.cpp:1190] Trying to thaw cgroup /cgroup/mesos/framework_201304021520-16842879-5432-1770-0000_executor_default_tag_a6115dfa-8195-4cf4-b044-f9b7e7531e9e
> I0402 15:20:54.693737  1808 cgroups.cpp:1298] Successfully thawed /cgroup/mesos/framework_201304021520-16842879-5432-1770-0000_executor_default_tag_a6115dfa-8195-4cf4-b044-f9b7e7531e9e
> W0402 15:20:57.584837  1806 monitor.cpp:212] Failed to collect resource usage for executor 'default' of framework '201304021520-16842879-5432-1770-0000': 1
> W0402 15:20:58.588543  1788 master.cpp:81] No whitelist given. Advertising offers for all slaves
> I0402 15:20:59.842075  1807 cgroups.cpp:1175] Trying to freeze cgroup /cgroup/mesos/framework_201304021520-16842879-5432-1770-0000_executor_default_tag_a6115dfa-8195-4cf4-b044-f9b7e7531e9e
> W0402 15:21:02.586467  1806 monitor.cpp:212] Failed to collect resource usage for executor 'default' of framework '201304021520-16842879-5432-1770-0000': 1
> W0402 15:21:03.590638  1787 master.cpp:81] No whitelist given. Advertising offers for all slaves
> W0402 15:21:05.003955  1806 cgroups.cpp:1261] Unable to freeze /cgroup/mesos/framework_201304021520-16842879-5432-1770-0000_executor_default_tag_a6115dfa-8195-4cf4-b044-f9b7e7531e9e within 51 attempts
> I0402 15:21:05.006346  1807 cgroups.cpp:1190] Trying to thaw cgroup /cgroup/mesos/framework_201304021520-16842879-5432-1770-0000_executor_default_tag_a6115dfa-8195-4cf4-b044-f9b7e7531e9e
> I0402 15:21:05.006577  1807 cgroups.cpp:1298] Successfully thawed /cgroup/mesos/framework_201304021520-16842879-5432-1770-0000_executor_default_tag_a6115dfa-8195-4cf4-b044-f9b7e7531e9e
> W0402 15:21:07.588361  1807 monitor.cpp:212] Failed to collect resource usage for executor 'default' of framework '201304021520-16842879-5432-1770-0000': 1
> W0402 15:21:08.592641  1786 master.cpp:81] No whitelist given. Advertising offers for all slaves
> I0402 15:21:10.155868  1807 cgroups.cpp:1175] Trying to freeze cgroup /cgroup/mesos/framework_201304021520-16842879-5432-1770-0000_executor_default_tag_a6115dfa-8195-4cf4-b044-f9b7e7531e9e
> W0402 15:21:12.590788  1806 monitor.cpp:212] Failed to collect resource usage for executor 'default' of framework '201304021520-16842879-5432-1770-0000': 1
> W0402 15:21:13.594530  1787 master.cpp:81] No whitelist given. Advertising offers for all slaves
> W0402 15:21:15.316937  1807 cgroups.cpp:1261] Unable to freeze /cgroup/mesos/framework_201304021520-16842879-5432-1770-0000_executor_default_tag_a6115dfa-8195-4cf4-b044-f9b7e7531e9e within 51 attempts
> I0402 15:21:15.319368  1808 cgroups.cpp:1190] Trying to thaw cgroup /cgroup/mesos/framework_201304021520-16842879-5432-1770-0000_executor_default_tag_a6115dfa-8195-4cf4-b044-f9b7e7531e9e
> I0402 15:21:15.319533  1808 cgroups.cpp:1298] Successfully thawed /cgroup/mesos/framework_201304021520-16842879-5432-1770-0000_executor_default_tag_a6115dfa-8195-4cf4-b044-f9b7e7531e9e
> W0402 15:21:17.591588  1805 monitor.cpp:212] Failed to collect resource usage for executor 'default' of framework '201304021520-16842879-5432-1770-0000': 1
> ...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira