You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@mesos.apache.org by "Till Toenshoff (JIRA)" <ji...@apache.org> on 2018/07/02 19:57:00 UTC

[jira] [Created] (MESOS-9046) Agent restart may fail on checkpointed resources.

Till Toenshoff created MESOS-9046:
-------------------------------------

             Summary: Agent restart may fail on checkpointed resources.
                 Key: MESOS-9046
                 URL: https://issues.apache.org/jira/browse/MESOS-9046
             Project: Mesos
          Issue Type: Improvement
    Affects Versions: 1.6.0
            Reporter: Till Toenshoff


When the user changes the agent resources, the resulting error message does not help in getting the problem resolved.

Consider a user having added or changed a mounted volume, then restart the agent while only having erased {{${MESOS_WORK_DIR}/meta/slaves/latest}} - the result may look as follows;

{noformat}
E0702 11:44:53.000000  2278 slave.cpp:7305] EXIT with status 1: Failed to perform recovery:
Checkpointed resources
[...]
 [MOUNT:/dcos/volume1,5b0ca558-7e1f-463a-87ab-4c52899c4727:name-data]:5851
are incompatible with agent resources
[...]
{noformat}

This error message, while certainly being correct, may not be as helpful as it could be. We should consider offering advice on how to work around or fix this very common issue.


We may want to tell the user to:
1. {{rm -rf ${MESOS_WORK_DIR}/meta/slaves/latest}}
2. {{rm -rf ${MESOS_WORK_DIR}/meta/resources}}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)