You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@mesos.apache.org by "Tobi Knaup (JIRA)" <ji...@apache.org> on 2015/05/31 03:10:17 UTC

[jira] [Created] (MESOS-2785) Slave crashes during checkpointing when no space is left on disk

Tobi Knaup created MESOS-2785:
---------------------------------

             Summary: Slave crashes during checkpointing when no space is left on disk
                 Key: MESOS-2785
                 URL: https://issues.apache.org/jira/browse/MESOS-2785
             Project: Mesos
          Issue Type: Bug
          Components: slave
            Reporter: Tobi Knaup


This happened on a slave where tasks filled up the disk that work_dir is on.
Slave logs:

{noformat}
May 30 23:36:03 ip-10-0-1-119.ec2.internal mesos-slave[1350]: I0530 23:36:03.088995  1354 slave.cpp:1144] Got assigned task broker-2-fde59f6b-7437-4678-995e-8f9812e4f4bf for framework 20150530-210001-419692554-5050-1832-0001
May 30 23:36:03 ip-10-0-1-119.ec2.internal mesos-slave[1350]: F0530 23:36:03.089443  1354 slave.cpp:4136] CHECK_SOME(state::checkpoint(path, info)): Failed to write temporary file '/var/lib/mesos/slave/meta/slaves/20150530-210001-419692554-5050-1832-S4/frameworks/20150530-210001-419692554-5050-1832-0001/QuDFUs': Failed to write size: No space left on device
May 30 23:36:03 ip-10-0-1-119.ec2.internal mesos-slave[1350]: *** Check failure stack trace: ***
May 30 23:36:03 ip-10-0-1-119.ec2.internal mesos-slave[1350]: @     0x7fb8625c69fd  google::LogMessage::Fail()
May 30 23:36:03 ip-10-0-1-119.ec2.internal mesos-slave[1350]: @     0x7fb8625c889d  google::LogMessage::SendToLog()
May 30 23:36:03 ip-10-0-1-119.ec2.internal mesos-slave[1350]: @     0x7fb8625c65ec  google::LogMessage::Flush()
May 30 23:36:03 ip-10-0-1-119.ec2.internal mesos-slave[1350]: @     0x7fb8625c91be  google::LogMessageFatal::~LogMessageFatal()
May 30 23:36:03 ip-10-0-1-119.ec2.internal mesos-slave[1350]: @     0x7fb86308291c  mesos::internal::slave::Framework::Framework()
May 30 23:36:03 ip-10-0-1-119.ec2.internal mesos-slave[1350]: @     0x7fb863085699  mesos::internal::slave::Slave::runTask()
May 30 23:36:03 ip-10-0-1-119.ec2.internal mesos-slave[1350]: @     0x7fb8630af1fa  ProtobufProcess<>::handler4<>()
May 30 23:36:03 ip-10-0-1-119.ec2.internal mesos-slave[1350]: @     0x7fb86309346e  std::_Function_handler<>::_M_invoke()
May 30 23:36:03 ip-10-0-1-119.ec2.internal mesos-slave[1350]: @     0x7fb8630ab34a  ProtobufProcess<>::visit()
May 30 23:36:03 ip-10-0-1-119.ec2.internal mesos-slave[1350]: @     0x7fb86342100a  process::ProcessManager::resume()
May 30 23:36:03 ip-10-0-1-119.ec2.internal mesos-slave[1350]: @     0x7fb8634212cc  process::schedule()
May 30 23:36:03 ip-10-0-1-119.ec2.internal mesos-slave[1350]: @     0x7fb86186a53d  (unknown)
May 30 23:36:03 ip-10-0-1-119.ec2.internal mesos-slave[1350]: @     0x7fb8615a2f7d  (unknown)
{noformat}

One workaround would be to add a command line option to configure a different path for the sandbox dir so it can be configured to use a different disk.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)