You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mesos.apache.org by "Benjamin Mahler (JIRA)" <ji...@apache.org> on 2013/05/07 18:45:16 UTC

[jira] [Updated] (MESOS-461) Freezer failure while in FREEZING state.

     [ https://issues.apache.org/jira/browse/MESOS-461?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Benjamin Mahler updated MESOS-461:
----------------------------------

    Summary: Freezer failure while in FREEZING state.  (was: Freezer invoked multiple times for the same cgroup.)
    
> Freezer failure while in FREEZING state.
> ----------------------------------------
>
>                 Key: MESOS-461
>                 URL: https://issues.apache.org/jira/browse/MESOS-461
>             Project: Mesos
>          Issue Type: Bug
>            Reporter: Benjamin Mahler
>            Assignee: Benjamin Mahler
>
> I kept the logs relevant to a second executor 'gc', in case they are relevant.
> An OOM is triggered for the executor, after which we freeze it successfully.
> I0504 20:48:44.481727 44475 cgroups_isolation_module.cpp:819] OOM notifier is triggered for executor E1 of framework F with tag T
> I0504 20:48:44.571171 44475 cgroups_isolation_module.cpp:863] OOM detected for executor E1 of framework F with tag T
> I0504 20:48:44.573693 44475 cgroups_isolation_module.cpp:889] MEMORY LIMIT: 1707081728 bytes
> MEMORY USAGE: 1707081728 bytes
> MEMORY STATISTICS:
> cache 229376
> rss 1706852352
> mapped_file 16384
> pgpgin 4722471
> pgpgout 4305703
> inactive_anon 0
> active_anon 1706713088
> inactive_file 102400
> active_file 53248
> unevictable 0
> hierarchical_memory_limit 1707081728
> total_cache 229376
> total_rss 1706852352
> total_mapped_file 16384
> total_pgpgin 4722471
> total_pgpgout 4305703
> total_inactive_anon 0
> total_active_anon 1706713088
> total_inactive_file 102400
> total_active_file 53248
> total_unevictable 0
> I0504 20:48:44.573789 44475 cgroups_isolation_module.cpp:534] Killing executor E1 of framework F
> I0504 20:48:44.578125 44471 cgroups.cpp:1146] Trying to freeze cgroup /cgroup/mesos/F_executor_E1_tag_UUID
> I0504 20:48:46.009472 44464 slave.cpp:830] Status update: task system-gc-30cd1abb-84cf-4a12-8c1b-13396a27867f of framework F is now in state TASK_FINISHED
> I0504 20:48:46.014714 44464 slave.cpp:727] Got acknowledgement of status update for task system-gc-30cd1abb-84cf-4a12-8c1b-13396a27867f of framework F
> I0504 20:48:46.015807 44470 cgroups_isolation_module.cpp:571] Changing cgroup controls for executor gc-30cd1abb-84cf-4a12-8c1b-13396a27867f of framework F with resources cpus=0.19; disk=15; mem=127
> I0504 20:48:46.024636 44470 cgroups_isolation_module.cpp:676] Updated 'cpu.shares' to 194 for executor gc of framework F
> I0504 20:48:46.043144 44470 cgroups_isolation_module.cpp:774] Updated 'memory.soft_limit_in_bytes' to 133169152 for executor gc of framework F
> I0504 20:48:51.829247 44472 cgroups_isolation_module.cpp:633] Telling slave of terminated executor gc of framework F
> I0504 20:48:51.830458 44472 cgroups_isolation_module.cpp:534] Killing executor gc of framework F
> I0504 20:48:51.830574 44462 slave.cpp:1053] Executor 'gc' of framework F has exited with status 0
> I0504 20:48:51.846076 44472 cgroups_isolation_module.cpp:819] OOM notifier is triggered for executor gc of framework F with tag T
> I0504 20:48:51.846158 44472 cgroups_isolation_module.cpp:824] Discarded OOM notifier for executor gc of framework F with tag T
> I0504 20:48:51.859122 44472 cgroups.cpp:1146] Trying to freeze cgroup /cgroup/mesos/framework_F_executor_gc_tag_T
> I0504 20:48:51.859297 44472 cgroups.cpp:1185] Successfully froze cgroup /cgroup/mesos/framework_F_executor_gc_tag_T after 1 attempts
> I0504 20:48:51.893443 44473 cgroups.cpp:1161] Trying to thaw cgroup /cgroup/mesos/framework_F_executor_gc_tag_T
> I0504 20:48:51.893656 44473 cgroups.cpp:1268] Successfully thawed /cgroup/mesos/framework_F_executor_gc_tag_T
> I0504 20:48:51.946413 44473 cgroups_isolation_module.cpp:903] Successfully destroyed the cgroup mesos/framework_F_executor_gc_tag_T
> I0504 20:49:05.155886 44463 cgroups.cpp:1185] Successfully froze cgroup /cgroup/mesos/framework_F_executor_E1_tag_UUID after 5 attempts
> I0504 20:49:05.181869 44472 cgroups.cpp:1161] Trying to thaw cgroup /cgroup/mesos/framework_F_executor_E1_tag_UUID
> I0504 20:49:05.195574 44472 cgroups.cpp:1268] Successfully thawed /cgroup/mesos/framework_F_executor_E1_tag_UUID
> At this point, the cgroup is frozen and thawed, and we then to freeze the cgroup again!
> I0504 20:49:11.364585 44468 cgroups.cpp:1146] Trying to freeze cgroup /cgroup/mesos/framework_F_executor_E1_tag_UUID
> F0504 20:49:11.687748 44468 cgroups_isolation_module.cpp:905] Failed to destroy the cgroup mesos/framework_F_executor_E1_tag_UUID: Failed to kill tasks in nested cgroups: Collect failed: Failed to get process statistics: Failed to open '/proc/33071/stat'
> *** Check failure stack trace: ***
>     @     0x7fb4c489169d  google::LogMessage::Fail()
>     @     0x7fb4c4897307  google::LogMessage::SendToLog()
>     @     0x7fb4c4892f4c  google::LogMessage::Flush()
>     @     0x7fb4c48931b6  google::LogMessageFatal::~LogMessageFatal()
>     @     0x7fb4c46431f0  mesos::internal::slave::CgroupsIsolationModule::destroyWaited()
>     @     0x7fb4c4653f77  std::tr1::_Function_handler<>::_M_invoke()
>     @     0x7fb4c4656c24  std::tr1::_Function_handler<>::_M_invoke()
>     @     0x7fb4c478cd7b  std::tr1::function<>::operator()()
>     @     0x7fb4c4745f9f  process::ProcessBase::visit()
>     @     0x7fb4c475aec8  process::DispatchEvent::visit()
>     @     0x7fb4c474f11d  process::ProcessManager::resume()
>     @     0x7fb4c474f968  process::schedule()
>     @     0x7fb4c3e6873d  start_thread
>     @     0x7fb4c284cf6d  clone

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira