You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@mesos.apache.org by "A. Dukhovniy (JIRA)" <ji...@apache.org> on 2018/08/17 13:12:00 UTC

[jira] [Created] (MESOS-9162) Unkillable pod container stuck in ISOLATING

A. Dukhovniy created MESOS-9162:
-----------------------------------

             Summary: Unkillable pod container stuck in ISOLATING
                 Key: MESOS-9162
                 URL: https://issues.apache.org/jira/browse/MESOS-9162
             Project: Mesos
          Issue Type: Bug
    Affects Versions: 1.6.0, 1.7.0
            Reporter: A. Dukhovniy


We have a simple test that launches a pod with two containers (one writes in a file and the other reads it). This test flaky because the container sometimes fails to start.
Marathon app definition:

{code:java}
{
  "id": "/simple-pod",
  "scaling": {
    "kind": "fixed",
    "instances": 1
  },
  "environment": {
    "PING": "PONG"
  },
  "containers": [
    {
      "name": "ct1",
      "resources": {
        "cpus": 0.1,
        "mem": 32
      },
      "image": {
        "kind": "DOCKER",
        "id": "busybox"
      },
      "exec": {
        "command": {
          "shell": "while true; do echo the current time is $(date) > ./test-v1/clock; sleep 1; done"
        }
      },
      "volumeMounts": [
        {
          "name": "v1",
          "mountPath": "test-v1"
        }
      ]
    },
    {
      "name": "ct2",
      "resources": {
        "cpus": 0.1,
        "mem": 32
      },
      "exec": {
        "command": {
          "shell": "while true; do echo -n $PING ' '; cat ./etc/clock; sleep 1; done"
        }
      },
      "volumeMounts": [
        {
          "name": "v1",
          "mountPath": "etc"
        },
        {
          "name": "v2",
          "mountPath": "docker"
        }
      ]
    }
  ],
  "networks": [
    {
      "mode": "host"
    }
  ],
  "volumes": [
    {
      "name": "v1"
    },
    {
      "name": "v2",
      "host": "/var/lib/docker"
    }
  ]
}
{code}

During the test, Marathon tries to launch the pod but doesn't receive a {{TASK_RUNNING}} status and so after 2min decides to kill the pod which also fails. 

Pod sandbox (attached to this ticket) shows that one of the containers wasn't started properly - the last line in the agent log says:
{code}
Transitioning the state of container ff4f4fdc-9327-42fb-be40-29e919e15aee.e9b05652-e779-46f8-9b76-b2e1ce7e5940 from PREPARING to ISOLATING
{code}
Until then the log looks pretty unspektakular. 

Relevant Ids for grepping the logs:
{code}
Marathon app id: /simple-pod-bcc8f180b611494aa972520b8b650ca9
Mesos tasks id: simple-pod-bcc8f180b611494aa972520b8b650ca9.instance-1ad9ecbb-a1a7-11e8-b35a-6e17842c13e2.ct1
Mesos container id: e9b05652-e779-46f8-9b76-b2e1ce7e5940
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)