You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@mesos.apache.org by "Tobi Knaup (JIRA)" <ji...@apache.org> on 2015/05/31 03:10:17 UTC
[jira] [Created] (MESOS-2785) Slave crashes during checkpointing
when no space is left on disk
Tobi Knaup created MESOS-2785:
---------------------------------
Summary: Slave crashes during checkpointing when no space is left on disk
Key: MESOS-2785
URL: https://issues.apache.org/jira/browse/MESOS-2785
Project: Mesos
Issue Type: Bug
Components: slave
Reporter: Tobi Knaup
This happened on a slave where tasks filled up the disk that work_dir is on.
Slave logs:
{noformat}
May 30 23:36:03 ip-10-0-1-119.ec2.internal mesos-slave[1350]: I0530 23:36:03.088995 1354 slave.cpp:1144] Got assigned task broker-2-fde59f6b-7437-4678-995e-8f9812e4f4bf for framework 20150530-210001-419692554-5050-1832-0001
May 30 23:36:03 ip-10-0-1-119.ec2.internal mesos-slave[1350]: F0530 23:36:03.089443 1354 slave.cpp:4136] CHECK_SOME(state::checkpoint(path, info)): Failed to write temporary file '/var/lib/mesos/slave/meta/slaves/20150530-210001-419692554-5050-1832-S4/frameworks/20150530-210001-419692554-5050-1832-0001/QuDFUs': Failed to write size: No space left on device
May 30 23:36:03 ip-10-0-1-119.ec2.internal mesos-slave[1350]: *** Check failure stack trace: ***
May 30 23:36:03 ip-10-0-1-119.ec2.internal mesos-slave[1350]: @ 0x7fb8625c69fd google::LogMessage::Fail()
May 30 23:36:03 ip-10-0-1-119.ec2.internal mesos-slave[1350]: @ 0x7fb8625c889d google::LogMessage::SendToLog()
May 30 23:36:03 ip-10-0-1-119.ec2.internal mesos-slave[1350]: @ 0x7fb8625c65ec google::LogMessage::Flush()
May 30 23:36:03 ip-10-0-1-119.ec2.internal mesos-slave[1350]: @ 0x7fb8625c91be google::LogMessageFatal::~LogMessageFatal()
May 30 23:36:03 ip-10-0-1-119.ec2.internal mesos-slave[1350]: @ 0x7fb86308291c mesos::internal::slave::Framework::Framework()
May 30 23:36:03 ip-10-0-1-119.ec2.internal mesos-slave[1350]: @ 0x7fb863085699 mesos::internal::slave::Slave::runTask()
May 30 23:36:03 ip-10-0-1-119.ec2.internal mesos-slave[1350]: @ 0x7fb8630af1fa ProtobufProcess<>::handler4<>()
May 30 23:36:03 ip-10-0-1-119.ec2.internal mesos-slave[1350]: @ 0x7fb86309346e std::_Function_handler<>::_M_invoke()
May 30 23:36:03 ip-10-0-1-119.ec2.internal mesos-slave[1350]: @ 0x7fb8630ab34a ProtobufProcess<>::visit()
May 30 23:36:03 ip-10-0-1-119.ec2.internal mesos-slave[1350]: @ 0x7fb86342100a process::ProcessManager::resume()
May 30 23:36:03 ip-10-0-1-119.ec2.internal mesos-slave[1350]: @ 0x7fb8634212cc process::schedule()
May 30 23:36:03 ip-10-0-1-119.ec2.internal mesos-slave[1350]: @ 0x7fb86186a53d (unknown)
May 30 23:36:03 ip-10-0-1-119.ec2.internal mesos-slave[1350]: @ 0x7fb8615a2f7d (unknown)
{noformat}
One workaround would be to add a command line option to configure a different path for the sandbox dir so it can be configured to use a different disk.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)