You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@mesos.apache.org by "Gilbert Song (JIRA)" <ji...@apache.org> on 2019/01/30 20:27:00 UTC

[jira] [Assigned] (MESOS-9507) Agent could not recover due to empty docker volume checkpointed files.

     [ https://issues.apache.org/jira/browse/MESOS-9507?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Gilbert Song reassigned MESOS-9507:
-----------------------------------

        Assignee: Gilbert Song
          Sprint: Containerization RI10 Spr 39
    Story Points: 5

> Agent could not recover due to empty docker volume checkpointed files.
> ----------------------------------------------------------------------
>
>                 Key: MESOS-9507
>                 URL: https://issues.apache.org/jira/browse/MESOS-9507
>             Project: Mesos
>          Issue Type: Bug
>          Components: containerization
>            Reporter: Gilbert Song
>            Assignee: Gilbert Song
>            Priority: Critical
>              Labels: containerizer
>
> Agent could not recover due to empty docker volume checkpointed files. Please see logs:
> {noformat}
> Nov 12 17:12:00 guppy mesos-agent[38960]: E1112 17:12:00.978682 38969 slave.cpp:6279] EXIT with status 1: Failed to perform recovery: Collect failed: Collect failed: Failed to recover docker volumes for orphan container e1b04051-1e4a-47a9-b866-1d625cda1d22: JSON parse failed: syntax error at line 1 near:
> Nov 12 17:12:00 guppy mesos-agent[38960]: To remedy this do as follows: 
> Nov 12 17:12:00 guppy mesos-agent[38960]: Step 1: rm -f /var/lib/mesos/slave/meta/slaves/latest
> Nov 12 17:12:00 guppy mesos-agent[38960]: This ensures agent doesn't recover old live executors.
> Nov 12 17:12:00 guppy mesos-agent[38960]: Step 2: Restart the agent. 
> Nov 12 17:12:00 guppy systemd[1]: dcos-mesos-slave.service: main process exited, code=exited, status=1/FAILURE
> Nov 12 17:12:00 guppy systemd[1]: Unit dcos-mesos-slave.service entered failed state.
> Nov 12 17:12:00 guppy systemd[1]: dcos-mesos-slave.service failed.
> {noformat}
> This is caused by agent recovery after the volume state file is created but before checkpointing finishes. Basically the docker volume is not mounted yet, so the docker volume isolator should skip recovering this volume.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)