You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@mesos.apache.org by "Greg Mann (JIRA)" <ji...@apache.org> on 2019/04/08 18:55:00 UTC

[jira] [Created] (MESOS-9709) Docker executor can become stuck terminating

Greg Mann created MESOS-9709:
--------------------------------

             Summary: Docker executor can become stuck terminating
                 Key: MESOS-9709
                 URL: https://issues.apache.org/jira/browse/MESOS-9709
             Project: Mesos
          Issue Type: Bug
    Affects Versions: 1.8.0
            Reporter: Greg Mann
         Attachments: docker-executor-stuck.txt

See attached agent log; the executor container ID is {{d2bfec33-f6bd-44ee-9345-b5710780bb59}} and the executor ID contains the string {{819f7ef7-4f42-11e9-a566-72ec67496045}}.

After launching the executor, we see
{code}
Mar 29 18:23:36 int-mountvolumeagent9-soak113s.testing.mesosphe.re mesos-agent[10238]: I0329 18:23:36.967316 10257 slave.cpp:3550] Launching container d2bfec33-f6bd-44ee-9345-b5710780bb59 for executor 'datastax-dse.instance-819f7ef7-4f42-11e9-a566-72ec67496045._app.339' of framework a221eeb3-b9c0-4e92-ae20-1e1d4af25321-0000
Mar 29 18:23:36 int-mountvolumeagent9-soak113s.testing.mesosphe.re mesos-agent[10238]: I0329 18:23:36.968968 10253 docker.cpp:1161] No container info found, skipping launch
{code}

I'm not sure why the container info was not set. Once the executor reregistration timeout elapses, the agent attempts to terminate the executor but it does not seem to be successful. The scheduler continues to try to kill the task but we repeatedly see
{code}
Mar 29 18:35:19 int-mountvolumeagent9-soak113s.testing.mesosphe.re mesos-agent[10238]: W0329 18:35:19.855063 10253 slave.cpp:3823] Ignoring kill task datastax-dse.instance-819f7ef7-4f42-11e9-a566-72ec67496045._app.339 because the executor 'datastax-dse.instance-819f7ef7-4f42-11e9-a566-72ec67496045._app.339' of framework a221eeb3-b9c0-4e92-ae20-1e1d4af25321-0000 is terminating
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)