You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mesos.apache.org by "Benjamin Mahler (JIRA)" <ji...@apache.org> on 2013/03/14 03:02:12 UTC
[jira] [Created] (MESOS-391) Slave GarbageCollector needs to also
take into account the number of links, when determining removal time.
Benjamin Mahler created MESOS-391:
-------------------------------------
Summary: Slave GarbageCollector needs to also take into account the number of links, when determining removal time.
Key: MESOS-391
URL: https://issues.apache.org/jira/browse/MESOS-391
Project: Mesos
Issue Type: Bug
Reporter: Benjamin Mahler
The slave garbage collector does not take into account the number of links present, which means that if we create a lot of executor directories (up to LINK_MAX), we won't necessarily GC.
As a result of this, the slave crashes:
F0313 21:40:02.926494 33746 paths.hpp:233] CHECK_SOME(mkdir) failed: Failed to create executor directory '/var/lib/mesos/slaves/201303090208-1937777162-5050-38880-267/frameworks/201103282247-0000000019-0000/executors/thermos-1363210801777-mesos-meta_slave_0-27-e74e4b30-dcf1-4e88-8954-dd2b40b7dd89/runs/499fcc13-c391-421c-93d2-a56d1a4a931e': Too many links
*** Check failure stack trace: ***
@ 0x7f9320f82f9d google::LogMessage::Fail()
@ 0x7f9320f88c07 google::LogMessage::SendToLog()
@ 0x7f9320f8484c google::LogMessage::Flush()
@ 0x7f9320f84ab6 google::LogMessageFatal::~LogMessageFatal()
@ 0x7f9320c70312 _CheckSome::~_CheckSome()
@ 0x7f9320c9dd5c mesos::internal::slave::paths::createExecutorDirectory()
@ 0x7f9320c9e60d mesos::internal::slave::Framework::createExecutor()
@ 0x7f9320c7a7f7 mesos::internal::slave::Slave::runTask()
@ 0x7f9320c9cb43 ProtobufProcess<>::handler4<>()
@ 0x7f9320c8678b std::tr1::_Function_handler<>::_M_invoke()
@ 0x7f9320c9d1ab ProtobufProcess<>::visit()
@ 0x7f9320e4c774 process::MessageEvent::visit()
@ 0x7f9320e40a1d process::ProcessManager::resume()
@ 0x7f9320e41268 process::schedule()
@ 0x7f932055973d start_thread
@ 0x7f931ef3df6d clone
The fix here is to take into account the number of links (st_nlinks), when determining whether we need to GC.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira