You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mesos.apache.org by "Thomas Marshall (JIRA)" <ji...@apache.org> on 2013/04/24 03:24:14 UTC
[jira] [Comment Edited] (MESOS-424)
CgroupsIsolatorTest.BalloonFramework runs forever
[ https://issues.apache.org/jira/browse/MESOS-424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13639940#comment-13639940 ]
Thomas Marshall edited comment on MESOS-424 at 4/24/13 1:23 AM:
----------------------------------------------------------------
Sorry if I'm not being very clear, but the behavior I'm seeing is frustratingly random.
I'm certain Ubuntu doesn't mount cgroups during init, unless you have cgroup-lite package installed, which I don't.
The failures are completely nondeterministic - sometimes the very first BalloonFramework tests run on a fresh machine fails, sometimes BalloonFramework succeeds after multiple failed runs.
It's also worth noting that after the BalloonFramework has failed and been allowed to run for very long before being manually killed, it will start to cause other tests on the machine to fail as well, notably MasterTest.ShutdownUnregisteredExecutor
was (Author: twm378):
Sorry if I'm not being very clear, but the behavior I'm seeing is frustratingly random.
I'm certain Ubuntu doesn't mount cgroups during init, unless you have cgroup-lite package installed, which I don't.
The failures are completely nondeterministic - sometimes the very first BalloonFramework tests run on a fresh machine fails, sometimes BalloonFramework succeeds after multiple failed runs.
> CgroupsIsolatorTest.BalloonFramework runs forever
> -------------------------------------------------
>
> Key: MESOS-424
> URL: https://issues.apache.org/jira/browse/MESOS-424
> Project: Mesos
> Issue Type: Bug
> Reporter: Thomas Marshall
>
> On Ubuntu 12.04 Server, running as root:
> bin/mesos-tests.sh --gtest_filter=*Balloon* --verbose
> Source directory: /root/mesos
> Build directory: /root/mesos/build
> Note: Google Test filter = *Balloon*-
> [==========] Running 1 test from 1 test case.
> [----------] Global test environment set-up.
> [----------] 1 test from CgroupsIsolatorTest
> [ RUN ] CgroupsIsolatorTest.ROOT_CGROUPS_BalloonFramework
> Using temporary directory '/tmp/CgroupsIsolatorTest_ROOT_CGROUPS_BalloonFramework_1JMuXO'
> Launched master at 1770
> I0402 15:20:23.570971 1770 main.cpp:116] Build: 2013-04-02 14:41:50 by root
> I0402 15:20:23.571444 1770 main.cpp:117] Starting Mesos master
> I0402 15:20:23.572792 1788 master.cpp:309] Master started on 127.0.1.1:5432
> I0402 15:20:23.573097 1788 master.cpp:324] Master ID: 201304021520-16842879-5432-1770
> W0402 15:20:23.574090 1787 master.cpp:81] No whitelist given. Advertising offers for all slaves
> I0402 15:20:23.577419 1788 master.cpp:603] Elected as master!
> Launched slave at 1790
> I0402 15:20:25.570708 1790 main.cpp:124] Creating "cgroups" isolator
> I0402 15:20:25.571761 1790 main.cpp:132] Build: 2013-04-02 14:41:50 by root
> I0402 15:20:25.571790 1790 main.cpp:133] Starting Mesos slave
> I0402 15:20:25.574848 1808 slave.cpp:203] Slave started on 1)@127.0.1.1:51739
> I0402 15:20:25.574906 1808 slave.cpp:204] Slave resources: cpus=1; mem=96; ports=[31000-32000]; disk=7572
> I0402 15:20:25.575526 1805 cgroups_isolator.cpp:236] Using /cgroup as cgroups hierarchy root
> I0402 15:20:25.577657 1807 slave.cpp:453] New master detected at master@127.0.0.1:5432
> I0402 15:20:25.577888 1807 status_update_manager.cpp:132] New master detected at master@127.0.0.1:5432
> I0402 15:20:25.586076 1805 cgroups_isolator.cpp:690] Recovering isolator
> I0402 15:20:25.586915 1808 slave.cpp:377] Finished recovery
> I0402 15:20:25.588171 1787 master.cpp:968] Attempting to register slave on ubuntu at slave(1)@127.0.1.1:51739
> I0402 15:20:25.588276 1787 master.cpp:1224] Master now considering a slave at ubuntu:51739 as active
> I0402 15:20:25.589035 1787 master.cpp:1862] Adding slave 201304021520-16842879-5432-1770-0 at ubuntu with cpus=1; mem=96; ports=[31000-32000]; disk=7572
> I0402 15:20:25.589582 1787 hierarchical_allocator_process.hpp:395] Added slave 201304021520-16842879-5432-1770-0 (ubuntu) with cpus=1; mem=96; ports=[31000-32000]; disk=7572 (and cpus=1; mem=96; ports=[31000-32000]; disk=7572 available)
> I0402 15:20:25.589867 1807 slave.cpp:487] Registered with master; given slave ID 201304021520-16842879-5432-1770-0
> I0402 15:20:27.567234 1786 master.cpp:646] Registering framework 201304021520-16842879-5432-1770-0000 at scheduler(1)@127.0.1.1:54177
> I0402 15:20:27.567627 1786 hierarchical_allocator_process.hpp:268] Added framework 201304021520-16842879-5432-1770-0000
> I0402 15:20:27.568018 1786 master.hpp:309] Adding offer with resources cpus=1; mem=96; ports=[31000-32000]; disk=7572 on slave 201304021520-16842879-5432-1770-0
> Registered
> I0402 15:20:27.568243 1786 master.cpp:1327] Sending 1 offers to framework 201304021520-16842879-5432-1770-0000
> Resource offers received
> Starting the task
> I0402 15:20:27.569226 1788 master.cpp:1534] Processing reply for offer 201304021520-16842879-5432-1770-0 on slave 201304021520-16842879-5432-1770-0 (ubuntu) for framework 201304021520-16842879-5432-1770-0000
> I0402 15:20:27.569449 1788 master.hpp:289] Adding task with resources mem=32 on slave 201304021520-16842879-5432-1770-0
> I0402 15:20:27.569537 1788 master.cpp:1651] Launching task 1 of framework 201304021520-16842879-5432-1770-0000 with resources mem=32 on slave 201304021520-16842879-5432-1770-0 (ubuntu)
> I0402 15:20:27.569792 1788 master.hpp:318] Removing offer with resources cpus=1; mem=96; ports=[31000-32000]; disk=7572 on slave 201304021520-16842879-5432-1770-0
> I0402 15:20:27.569903 1785 hierarchical_allocator_process.hpp:497] Framework 201304021520-16842879-5432-1770-0000 filtered slave 201304021520-16842879-5432-1770-0 for 5.00secs
> I0402 15:20:27.570047 1805 slave.cpp:587] Got assigned task 1 for framework 201304021520-16842879-5432-1770-0000
> I0402 15:20:27.572463 1805 paths.hpp:302] Created executor directory '/tmp/mesos/slaves/201304021520-16842879-5432-1770-0/frameworks/201304021520-16842879-5432-1770-0000/executors/default/runs/a6115dfa-8195-4cf4-b044-f9b7e7531e9e'
> I0402 15:20:27.573072 1805 slave.cpp:436] Successfully attached file '/tmp/mesos/slaves/201304021520-16842879-5432-1770-0/frameworks/201304021520-16842879-5432-1770-0000/executors/default/runs/a6115dfa-8195-4cf4-b044-f9b7e7531e9e'
> I0402 15:20:27.573310 1806 cgroups_isolator.cpp:488] Launching default (/root/mesos/build/src/balloon-executor) in /tmp/mesos/slaves/201304021520-16842879-5432-1770-0/frameworks/201304021520-16842879-5432-1770-0000/executors/default/runs/a6115dfa-8195-4cf4-b044-f9b7e7531e9e with resources mem=64 for framework 201304021520-16842879-5432-1770-0000 in cgroup mesos/framework_201304021520-16842879-5432-1770-0000_executor_default_tag_a6115dfa-8195-4cf4-b044-f9b7e7531e9e
> I0402 15:20:27.573943 1806 cgroups_isolator.cpp:631] Changing cgroup controls for executor default of framework 201304021520-16842879-5432-1770-0000 with resources mem=64
> I0402 15:20:27.574291 1806 cgroups_isolator.cpp:898] Updated 'memory.limit_in_bytes' to 67108864 for executor default of framework 201304021520-16842879-5432-1770-0000
> I0402 15:20:27.574923 1806 cgroups_isolator.cpp:924] Started listening for OOM events for executor default of framework 201304021520-16842879-5432-1770-0000
> I0402 15:20:27.575889 1806 cgroups_isolator.cpp:517] Forked executor at = 1829
> Fetching resources into '/tmp/mesos/slaves/201304021520-16842879-5432-1770-0/frameworks/201304021520-16842879-5432-1770-0000/executors/default/runs/a6115dfa-8195-4cf4-b044-f9b7e7531e9e'
> I0402 15:20:27.641137 1808 slave.cpp:1046] Got registration for executor 'default' of framework 201304021520-16842879-5432-1770-0000
> I0402 15:20:27.641315 1808 slave.cpp:1121] Flushing queued tasks for framework 201304021520-16842879-5432-1770-0000
> I0402 15:20:27.641386 1808 cgroups_isolator.cpp:631] Changing cgroup controls for executor default of framework 201304021520-16842879-5432-1770-0000 with resources mem=96
> I0402 15:20:27.641913 1808 cgroups_isolator.cpp:898] Updated 'memory.limit_in_bytes' to 100663296 for executor default of framework 201304021520-16842879-5432-1770-0000
> W0402 15:20:28.575875 1788 master.cpp:81] No whitelist given. Advertising offers for all slaves
> I0402 15:20:28.897797 1807 cgroups_isolator.cpp:944] OOM notifier is triggered for executor default of framework 201304021520-16842879-5432-1770-0000 with uuid a6115dfa-8195-4cf4-b044-f9b7e7531e9e
> I0402 15:20:28.897902 1807 cgroups_isolator.cpp:989] OOM detected for executor default of framework 201304021520-16842879-5432-1770-0000 with uuid a6115dfa-8195-4cf4-b044-f9b7e7531e9e
> I0402 15:20:28.899562 1807 cgroups_isolator.cpp:1030] Memory limit exceeded: Requested: 96MB Used: 96MB
> MEMORY STATISTICS:
> cache 0
> rss 100663296
> mapped_file 0
> swap 2424832
> pgpgin 25936
> pgpgout 1360
> pgfault 31673
> pgmajfault 1
> inactive_anon 0
> active_anon 0
> inactive_file 0
> active_file 0
> unevictable 100663296
> hierarchical_memory_limit 100663296
> hierarchical_memsw_limit 9223372036854775807
> total_cache 0
> total_rss 100663296
> total_mapped_file 0
> total_swap 2424832
> total_pgpgin 25936
> total_pgpgout 1360
> total_pgfault 31673
> total_pgmajfault 1
> total_inactive_anon 0
> total_active_anon 0
> total_inactive_file 0
> total_active_file 0
> total_unevictable 100663296
> I0402 15:20:28.899739 1807 cgroups_isolator.cpp:596] Killing executor default of framework 201304021520-16842879-5432-1770-0000
> I0402 15:20:28.901882 1805 cgroups.cpp:1175] Trying to freeze cgroup /cgroup/mesos/framework_201304021520-16842879-5432-1770-0000_executor_default_tag_a6115dfa-8195-4cf4-b044-f9b7e7531e9e
> W0402 15:20:32.578037 1807 monitor.cpp:212] Failed to collect resource usage for executor 'default' of framework '201304021520-16842879-5432-1770-0000': 1
> W0402 15:20:33.578172 1788 master.cpp:81] No whitelist given. Advertising offers for all slaves
> W0402 15:20:34.065656 1805 cgroups.cpp:1261] Unable to freeze /cgroup/mesos/framework_201304021520-16842879-5432-1770-0000_executor_default_tag_a6115dfa-8195-4cf4-b044-f9b7e7531e9e within 51 attempts
> I0402 15:20:34.067944 1805 cgroups.cpp:1190] Trying to thaw cgroup /cgroup/mesos/framework_201304021520-16842879-5432-1770-0000_executor_default_tag_a6115dfa-8195-4cf4-b044-f9b7e7531e9e
> I0402 15:20:34.068300 1805 cgroups.cpp:1298] Successfully thawed /cgroup/mesos/framework_201304021520-16842879-5432-1770-0000_executor_default_tag_a6115dfa-8195-4cf4-b044-f9b7e7531e9e
> I0402 15:20:34.582098 1805 cgroups_isolator.cpp:766] Executor default of framework 201304021520-16842879-5432-1770-0000 terminated with status 9
> W0402 15:20:37.579793 1807 monitor.cpp:212] Failed to collect resource usage for executor 'default' of framework '201304021520-16842879-5432-1770-0000': 1
> W0402 15:20:38.580425 1787 master.cpp:81] No whitelist given. Advertising offers for all slaves
> I0402 15:20:39.216334 1807 cgroups.cpp:1175] Trying to freeze cgroup /cgroup/mesos/framework_201304021520-16842879-5432-1770-0000_executor_default_tag_a6115dfa-8195-4cf4-b044-f9b7e7531e9e
> W0402 15:20:42.580739 1806 monitor.cpp:212] Failed to collect resource usage for executor 'default' of framework '201304021520-16842879-5432-1770-0000': 1
> W0402 15:20:43.582556 1788 master.cpp:81] No whitelist given. Advertising offers for all slaves
> W0402 15:20:44.377604 1808 cgroups.cpp:1261] Unable to freeze /cgroup/mesos/framework_201304021520-16842879-5432-1770-0000_executor_default_tag_a6115dfa-8195-4cf4-b044-f9b7e7531e9e within 51 attempts
> I0402 15:20:44.379775 1805 cgroups.cpp:1190] Trying to thaw cgroup /cgroup/mesos/framework_201304021520-16842879-5432-1770-0000_executor_default_tag_a6115dfa-8195-4cf4-b044-f9b7e7531e9e
> I0402 15:20:44.379935 1805 cgroups.cpp:1298] Successfully thawed /cgroup/mesos/framework_201304021520-16842879-5432-1770-0000_executor_default_tag_a6115dfa-8195-4cf4-b044-f9b7e7531e9e
> W0402 15:20:47.581902 1807 monitor.cpp:212] Failed to collect resource usage for executor 'default' of framework '201304021520-16842879-5432-1770-0000': 1
> W0402 15:20:48.584782 1786 master.cpp:81] No whitelist given. Advertising offers for all slaves
> I0402 15:20:49.528096 1807 cgroups.cpp:1175] Trying to freeze cgroup /cgroup/mesos/framework_201304021520-16842879-5432-1770-0000_executor_default_tag_a6115dfa-8195-4cf4-b044-f9b7e7531e9e
> W0402 15:20:52.583258 1806 monitor.cpp:212] Failed to collect resource usage for executor 'default' of framework '201304021520-16842879-5432-1770-0000': 1
> W0402 15:20:53.586912 1787 master.cpp:81] No whitelist given. Advertising offers for all slaves
> W0402 15:20:54.691306 1808 cgroups.cpp:1261] Unable to freeze /cgroup/mesos/framework_201304021520-16842879-5432-1770-0000_executor_default_tag_a6115dfa-8195-4cf4-b044-f9b7e7531e9e within 51 attempts
> I0402 15:20:54.693431 1808 cgroups.cpp:1190] Trying to thaw cgroup /cgroup/mesos/framework_201304021520-16842879-5432-1770-0000_executor_default_tag_a6115dfa-8195-4cf4-b044-f9b7e7531e9e
> I0402 15:20:54.693737 1808 cgroups.cpp:1298] Successfully thawed /cgroup/mesos/framework_201304021520-16842879-5432-1770-0000_executor_default_tag_a6115dfa-8195-4cf4-b044-f9b7e7531e9e
> W0402 15:20:57.584837 1806 monitor.cpp:212] Failed to collect resource usage for executor 'default' of framework '201304021520-16842879-5432-1770-0000': 1
> W0402 15:20:58.588543 1788 master.cpp:81] No whitelist given. Advertising offers for all slaves
> I0402 15:20:59.842075 1807 cgroups.cpp:1175] Trying to freeze cgroup /cgroup/mesos/framework_201304021520-16842879-5432-1770-0000_executor_default_tag_a6115dfa-8195-4cf4-b044-f9b7e7531e9e
> W0402 15:21:02.586467 1806 monitor.cpp:212] Failed to collect resource usage for executor 'default' of framework '201304021520-16842879-5432-1770-0000': 1
> W0402 15:21:03.590638 1787 master.cpp:81] No whitelist given. Advertising offers for all slaves
> W0402 15:21:05.003955 1806 cgroups.cpp:1261] Unable to freeze /cgroup/mesos/framework_201304021520-16842879-5432-1770-0000_executor_default_tag_a6115dfa-8195-4cf4-b044-f9b7e7531e9e within 51 attempts
> I0402 15:21:05.006346 1807 cgroups.cpp:1190] Trying to thaw cgroup /cgroup/mesos/framework_201304021520-16842879-5432-1770-0000_executor_default_tag_a6115dfa-8195-4cf4-b044-f9b7e7531e9e
> I0402 15:21:05.006577 1807 cgroups.cpp:1298] Successfully thawed /cgroup/mesos/framework_201304021520-16842879-5432-1770-0000_executor_default_tag_a6115dfa-8195-4cf4-b044-f9b7e7531e9e
> W0402 15:21:07.588361 1807 monitor.cpp:212] Failed to collect resource usage for executor 'default' of framework '201304021520-16842879-5432-1770-0000': 1
> W0402 15:21:08.592641 1786 master.cpp:81] No whitelist given. Advertising offers for all slaves
> I0402 15:21:10.155868 1807 cgroups.cpp:1175] Trying to freeze cgroup /cgroup/mesos/framework_201304021520-16842879-5432-1770-0000_executor_default_tag_a6115dfa-8195-4cf4-b044-f9b7e7531e9e
> W0402 15:21:12.590788 1806 monitor.cpp:212] Failed to collect resource usage for executor 'default' of framework '201304021520-16842879-5432-1770-0000': 1
> W0402 15:21:13.594530 1787 master.cpp:81] No whitelist given. Advertising offers for all slaves
> W0402 15:21:15.316937 1807 cgroups.cpp:1261] Unable to freeze /cgroup/mesos/framework_201304021520-16842879-5432-1770-0000_executor_default_tag_a6115dfa-8195-4cf4-b044-f9b7e7531e9e within 51 attempts
> I0402 15:21:15.319368 1808 cgroups.cpp:1190] Trying to thaw cgroup /cgroup/mesos/framework_201304021520-16842879-5432-1770-0000_executor_default_tag_a6115dfa-8195-4cf4-b044-f9b7e7531e9e
> I0402 15:21:15.319533 1808 cgroups.cpp:1298] Successfully thawed /cgroup/mesos/framework_201304021520-16842879-5432-1770-0000_executor_default_tag_a6115dfa-8195-4cf4-b044-f9b7e7531e9e
> W0402 15:21:17.591588 1805 monitor.cpp:212] Failed to collect resource usage for executor 'default' of framework '201304021520-16842879-5432-1770-0000': 1
> ...
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira