You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@mesos.apache.org by "Jie Yu (JIRA)" <ji...@apache.org> on 2018/02/08 17:25:00 UTC

[jira] [Assigned] (MESOS-8105) Docker containerizer fails with "Unable to get executor pid after launch"

     [ https://issues.apache.org/jira/browse/MESOS-8105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jie Yu reassigned MESOS-8105:
-----------------------------

    Assignee:     (was: Jie Yu)

> Docker containerizer fails with "Unable to get executor pid after launch"
> -------------------------------------------------------------------------
>
>                 Key: MESOS-8105
>                 URL: https://issues.apache.org/jira/browse/MESOS-8105
>             Project: Mesos
>          Issue Type: Bug
>          Components: containerization
>            Reporter: maybob
>            Priority: Major
>              Labels: docker
>
> When running lots of command at the same time by each command using same executor with different executorId by docker,some executor occur error "Unable to get executor pid after launch". 
> Reason of this error may be "docker inspect" hangs or exit 0 with pid 0. Another reason may be lots of docker consume many resources, e.g file descriptor.
> {color:red}Log:{color}
> {code:java}
> I1012 16:15:01.003931 124081 slave.cpp:1619] Got assigned task '920860' for framework framework-id-daily
> I1012 16:15:01.006091 124081 slave.cpp:1900] Authorizing task '920860' for framework framework-id-daily
> I1012 16:15:01.008281 124081 slave.cpp:2087] Launching task '920860' for framework framework-id-daily
> I1012 16:15:01.008779 124081 paths.cpp:573] Trying to chown '/volumes/sdb1/mesos/slaves/89192f68-d28f-498c-808f-442a1ef576b3-S2/frameworks/framework-id-daily/executors/Executor_920860/runs/29c82b61-1242-4de9-80cf-16f46c30e7e3' to user 'maybob'
> I1012 16:15:01.009027 124081 slave.cpp:7401] Checkpointing ExecutorInfo to '/volumes/sdb1/mesos/meta/slaves/89192f68-d28f-498c-808f-442a1ef576b3-S2/frameworks/framework-id-daily/executors/Executor_920860/executor.info'
> I1012 16:15:01.009546 124081 slave.cpp:7038] Launching executor 'Executor_920860' of framework framework-id-daily with resources {} in work directory '/volumes/sdb1/mesos/slaves/89192f68-d28f-498c-808f-442a1ef576b3-S2/frameworks/framework-id-daily/executors/Executor_920860/runs/29c82b61-1242-4de9-80cf-16f46c30e7e3'
> I1012 16:15:01.010339 124081 slave.cpp:7429] Checkpointing TaskInfo to '/volumes/sdb1/mesos/meta/slaves/89192f68-d28f-498c-808f-442a1ef576b3-S2/frameworks/framework-id-daily/executors/Executor_920860/runs/29c82b61-1242-4de9-80cf-16f46c30e7e3/tasks/920860/task.info'
> I1012 16:15:01.010726 124081 slave.cpp:2316] Queued task '920860' for executor 'Executor_920860' of framework framework-id-daily
> I1012 16:15:01.011740 124088 docker.cpp:1175] Starting container '29c82b61-1242-4de9-80cf-16f46c30e7e3' for executor 'Executor_920860' and framework framework-id-daily
> I1012 16:15:01.013123 124081 slave.cpp:877] Successfully attached file '/volumes/sdb1/mesos/slaves/89192f68-d28f-498c-808f-442a1ef576b3-S2/frameworks/framework-id-daily/executors/Executor_920860/runs/29c82b61-1242-4de9-80cf-16f46c30e7e3'
> I1012 16:15:01.013290 124080 fetcher.cpp:353] Starting to fetch URIs for container: 29c82b61-1242-4de9-80cf-16f46c30e7e3, directory: /volumes/sdb1/mesos/slaves/89192f68-d28f-498c-808f-442a1ef576b3-S2/frameworks/framework-id-daily/executors/Executor_920860/runs/29c82b61-1242-4de9-80cf-16f46c30e7e3
> I1012 16:15:01.706429 124071 docker.cpp:909] Running docker -H unix:///var/run/docker.sock run --cpu-shares 378 --memory 427819008 -e LIBPROCESS_PORT=0 -e MESOS_AGENT_ENDPOINT=xxx.xxx.xxx.xxx:5051 -e MESOS_CHECKPOINT=1 -e MESOS_CONTAINER_NAME=mesos-89192f68-d28f-498c-808f-442a1ef576b3-S2.29c82b61-1242-4de9-80cf-16f46c30e7e3 -e MESOS_DIRECTORY=/volumes/sdb1/mesos/slaves/89192f68-d28f-498c-808f-442a1ef576b3-S2/frameworks/framework-id-daily/executors/Executor_920860/runs/29c82b61-1242-4de9-80cf-16f46c30e7e3 -e MESOS_EXECUTOR_ID=Executor_920860 -e MESOS_EXECUTOR_SHUTDOWN_GRACE_PERIOD=5secs -e MESOS_FRAMEWORK_ID=framework-id-daily -e MESOS_HTTP_COMMAND_EXECUTOR=0 -e MESOS_NATIVE_JAVA_LIBRARY=/usr/local/lib/libmesos-1.3.1.so -e MESOS_NATIVE_LIBRARY=/usr/local/lib/libmesos-1.3.1.so -e MESOS_RECOVERY_TIMEOUT=15mins -e MESOS_SANDBOX=/mnt/mesos/sandbox -e MESOS_SLAVE_ID=89192f68-d28f-498c-808f-442a1ef576b3-S2 -e MESOS_SLAVE_PID=slave(1)@xxx.xxx.xxx.xxx:5051 -e MESOS_SUBSCRIPTION_BACKOFF_MAX=2secs -v /volumes/sdb1/mesos/slaves/89192f68-d28f-498c-808f-442a1ef576b3-S2/frameworks/framework-id-daily/executors/Executor_920860/runs/29c82b61-1242-4de9-80cf-16f46c30e7e3:/mnt/mesos/sandbox --net host --entrypoint /bin/sh --name mesos-89192f68-d28f-498c-808f-442a1ef576b3-S2.29c82b61-1242-4de9-80cf-16f46c30e7e3 reg.docker.xxx/xxxxxx/executor:v25 -c env && cd $MESOS_SANDBOX && ./executor.sh
> I1012 16:15:01.717859 124071 docker.cpp:1071] Running docker -H unix:///var/run/docker.sock inspect mesos-89192f68-d28f-498c-808f-442a1ef576b3-S2.29c82b61-1242-4de9-80cf-16f46c30e7e3
> I1012 16:15:02.033951 124085 docker.cpp:1118] Retrying inspect with non-zero status code. cmd: 'docker -H unix:///var/run/docker.sock inspect mesos-89192f68-d28f-498c-808f-442a1ef576b3-S2.29c82b61-1242-4de9-80cf-16f46c30e7e3', interval: 1secs
> I1012 16:15:03.034230 124090 docker.cpp:1071] Running docker -H unix:///var/run/docker.sock inspect mesos-89192f68-d28f-498c-808f-442a1ef576b3-S2.29c82b61-1242-4de9-80cf-16f46c30e7e3
> I1012 16:15:03.518020 124078 docker.cpp:1071] Running docker -H unix:///var/run/docker.sock inspect mesos-89192f68-d28f-498c-808f-442a1ef576b3-S2.29c82b61-1242-4de9-80cf-16f46c30e7e3
> I1012 16:15:29.554232 124076 docker.cpp:1753] Updated 'cpu.shares' to 378 at /sys/fs/cgroup/cpuset,cpu,cpuacct/docker/506757580a6fe6529e58560a60db7c7311f9411185211b14e64586b12a7a8427 for container 29c82b61-1242-4de9-80cf-16f46c30e7e3
> I1012 16:15:29.556495 124076 docker.cpp:1817] Updated 'memory.soft_limit_in_bytes' to 408MB for container 29c82b61-1242-4de9-80cf-16f46c30e7e3
> E1012 16:15:29.559406 124082 slave.cpp:5097] Container '29c82b61-1242-4de9-80cf-16f46c30e7e3' for executor 'Executor_920860' of framework framework-id-daily failed to start: Unable to get executor pid after launch
> I1012 16:15:29.559644 124068 docker.cpp:2102] Container 29c82b61-1242-4de9-80cf-16f46c30e7e3 launch failed
> I1012 16:15:29.559890 124077 slave.cpp:5210] Executor 'Executor_920860' of framework framework-id-daily has terminated with unknown status
> E1012 16:15:29.561193 124077 slave.cpp:4545] Failed to update resources for container 29c82b61-1242-4de9-80cf-16f46c30e7e3 of executor 'Executor_920860' running task 920860 on status update for terminal task, destroying container: Container not found
> W1012 16:15:29.561326 124074 composing.cpp:646] Attempted to destroy unknown container 29c82b61-1242-4de9-80cf-16f46c30e7e3
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)