You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mesos.apache.org by "Benjamin Mahler (JIRA)" <ji...@apache.org> on 2013/05/05 00:06:16 UTC

[jira] [Created] (MESOS-461) Freezer invoked multiple times for the same cgroup.

Benjamin Mahler created MESOS-461:
-------------------------------------

             Summary: Freezer invoked multiple times for the same cgroup.
                 Key: MESOS-461
                 URL: https://issues.apache.org/jira/browse/MESOS-461
             Project: Mesos
          Issue Type: Bug
            Reporter: Benjamin Mahler


Possibly related to MESOS-424 reported by Thomas, where the Freezer is being invoked multiple times.

This also just occurred in a production environment, first an OOM was triggered (interestingly, MESOS-424 also occurs with an OOM).

I kept the logs relevant to a second executor 'gc', in case they are relevant.

I0504 20:48:44.481727 44475 cgroups_isolation_module.cpp:819] OOM notifier is triggered for executor E1 of framework F with tag T
I0504 20:48:44.571171 44475 cgroups_isolation_module.cpp:863] OOM detected for executor E1 of framework F with tag T
I0504 20:48:44.573693 44475 cgroups_isolation_module.cpp:889] MEMORY LIMIT: 1707081728 bytes
MEMORY USAGE: 1707081728 bytes
MEMORY STATISTICS:
cache 229376
rss 1706852352
mapped_file 16384
pgpgin 4722471
pgpgout 4305703
inactive_anon 0
active_anon 1706713088
inactive_file 102400
active_file 53248
unevictable 0
hierarchical_memory_limit 1707081728
total_cache 229376
total_rss 1706852352
total_mapped_file 16384
total_pgpgin 4722471
total_pgpgout 4305703
total_inactive_anon 0
total_active_anon 1706713088
total_inactive_file 102400
total_active_file 53248
total_unevictable 0
I0504 20:48:44.573789 44475 cgroups_isolation_module.cpp:534] Killing executor E1 of framework F
I0504 20:48:44.578125 44471 cgroups.cpp:1146] Trying to freeze cgroup /cgroup/mesos/F_executor_E1_tag_UUID
I0504 20:48:46.009472 44464 slave.cpp:830] Status update: task system-gc-30cd1abb-84cf-4a12-8c1b-13396a27867f of framework F is now in state TASK_FINISHED
I0504 20:48:46.014714 44464 slave.cpp:727] Got acknowledgement of status update for task system-gc-30cd1abb-84cf-4a12-8c1b-13396a27867f of framework F
I0504 20:48:46.015807 44470 cgroups_isolation_module.cpp:571] Changing cgroup controls for executor gc-30cd1abb-84cf-4a12-8c1b-13396a27867f of framework F with resources cpus=0.19; disk=15; mem=127
I0504 20:48:46.024636 44470 cgroups_isolation_module.cpp:676] Updated 'cpu.shares' to 194 for executor gc of framework F
I0504 20:48:46.043144 44470 cgroups_isolation_module.cpp:774] Updated 'memory.soft_limit_in_bytes' to 133169152 for executor gc of framework F
I0504 20:48:51.829247 44472 cgroups_isolation_module.cpp:633] Telling slave of terminated executor gc of framework F
I0504 20:48:51.830458 44472 cgroups_isolation_module.cpp:534] Killing executor gc of framework F
I0504 20:48:51.830574 44462 slave.cpp:1053] Executor 'gc' of framework F has exited with status 0
I0504 20:48:51.846076 44472 cgroups_isolation_module.cpp:819] OOM notifier is triggered for executor gc of framework F with tag T
I0504 20:48:51.846158 44472 cgroups_isolation_module.cpp:824] Discarded OOM notifier for executor gc of framework F with tag T
I0504 20:48:51.859122 44472 cgroups.cpp:1146] Trying to freeze cgroup /cgroup/mesos/framework_F_executor_gc_tag_T
I0504 20:48:51.859297 44472 cgroups.cpp:1185] Successfully froze cgroup /cgroup/mesos/framework_F_executor_gc_tag_T after 1 attempts
I0504 20:48:51.893443 44473 cgroups.cpp:1161] Trying to thaw cgroup /cgroup/mesos/framework_F_executor_gc_tag_T
I0504 20:48:51.893656 44473 cgroups.cpp:1268] Successfully thawed /cgroup/mesos/framework_F_executor_gc_tag_T
I0504 20:48:51.946413 44473 cgroups_isolation_module.cpp:903] Successfully destroyed the cgroup mesos/framework_F_executor_gc_tag_T
I0504 20:49:05.155886 44463 cgroups.cpp:1185] Successfully froze cgroup /cgroup/mesos/framework_F_executor_E1_tag_UUID after 5 attempts
I0504 20:49:05.181869 44472 cgroups.cpp:1161] Trying to thaw cgroup /cgroup/mesos/framework_F_executor_E1_tag_UUID
I0504 20:49:05.195574 44472 cgroups.cpp:1268] Successfully thawed /cgroup/mesos/framework_F_executor_E1_tag_UUID

At this point, the cgroup is frozen and thawed, and we then to freeze the cgroup again!

I0504 20:49:11.364585 44468 cgroups.cpp:1146] Trying to freeze cgroup /cgroup/mesos/framework_F_executor_E1_tag_UUID
F0504 20:49:11.687748 44468 cgroups_isolation_module.cpp:905] Failed to destroy the cgroup mesos/framework_F_executor_E1_tag_UUID: Failed to kill tasks in nested cgroups: Collect failed: Failed to get process statistics: Failed to open '/proc/33071/stat'
*** Check failure stack trace: ***
    @     0x7fb4c489169d  google::LogMessage::Fail()
    @     0x7fb4c4897307  google::LogMessage::SendToLog()
    @     0x7fb4c4892f4c  google::LogMessage::Flush()
    @     0x7fb4c48931b6  google::LogMessageFatal::~LogMessageFatal()
    @     0x7fb4c46431f0  mesos::internal::slave::CgroupsIsolationModule::destroyWaited()
    @     0x7fb4c4653f77  std::tr1::_Function_handler<>::_M_invoke()
    @     0x7fb4c4656c24  std::tr1::_Function_handler<>::_M_invoke()
    @     0x7fb4c478cd7b  std::tr1::function<>::operator()()
    @     0x7fb4c4745f9f  process::ProcessBase::visit()
    @     0x7fb4c475aec8  process::DispatchEvent::visit()
    @     0x7fb4c474f11d  process::ProcessManager::resume()
    @     0x7fb4c474f968  process::schedule()
    @     0x7fb4c3e6873d  start_thread
    @     0x7fb4c284cf6d  clone

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira