You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@mesos.apache.org by "Yu Yang (JIRA)" <ji...@apache.org> on 2017/04/01 01:38:41 UTC

[jira] [Commented] (MESOS-6810) Tasks getting stuck in STAGING state when using unified containerizer

    [ https://issues.apache.org/jira/browse/MESOS-6810?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15951914#comment-15951914 ] 

Yu Yang commented on MESOS-6810:
--------------------------------

Sorry for forgetting to post my solution here.

This error is caused by connection problem between mesos cluster and docker registry, so the solution is clear, if you are in china, you may need to deploy a docker mirror or a private docker registry. some third part service such as Daocloud, aliyun also works. just do test and find the best one for you, then change {{--docker_registry}} config, increasing the value of {{--registry_fetch_timeout}} also helps when your network is not stable.

> Tasks getting stuck in STAGING state when using unified containerizer
> ---------------------------------------------------------------------
>
>                 Key: MESOS-6810
>                 URL: https://issues.apache.org/jira/browse/MESOS-6810
>             Project: Mesos
>          Issue Type: Bug
>          Components: containerization, docker
>    Affects Versions: 1.0.0, 1.0.1, 1.1.0
>         Environment: *OS*: ubuntu16.04 64bit
> *mesos*: 1.1.0, one master and one agent on same machine
> *Agent flag*: {{sudo ./bin/mesos-agent.sh --master=192.168.1.192:5050 --work_dir=/tmp/mesos_slave --image_providers=docker --isolation=docker/runtime,filesystem/linux,cgroups/devices,gpu/nvidia --containerizers=mesos,docker --executor_environment_variables="{}"}}
>            Reporter: Yu Yang
>
> when submit tasks using container settings like:
> {code}
> {
>     "container": {
>         "mesos": {
> 	    "image": {
> 	        "docker": {
> 		    "name": "nvidia/cuda"
> 		},
> 		"type": "DOCKER"
> 	    }
>         },
>        "type": "MESOS"
>     },
> }
> {code}
> then task will get stuck in STAGING state, and finally it will fail with message {{Failed to launch container: Collect failed: Failed to perform 'curl': curl: (56) GnuTLS recv error (-54): Error in pull function}}                                                            this is the related log on agent
> {code}
> I1217 13:05:35.406365 20780 slave.cpp:1539] Got assigned task 'mesos_containerizer_test.2a845a72-7b54-4a95-b6fa-6aeda8c6b591' for framework 02083c57-b2d9-4054-babe-90e962816813-0001
> I1217 13:05:35.406749 20780 slave.cpp:1701] Launching task 'mesos_containerizer_test.2a845a72-7b54-4a95-b6fa-6aeda8c6b591' for framework 02083c57-b2d9-4054-babe-90e962816813-0001
> I1217 13:05:35.406970 20780 paths.cpp:536] Trying to chown '/tmp/mesos_slave/slaves/02083c57-b2d9-4054-babe-90e962816813-S0/frameworks/02083c57-b2d9-4054-babe-90e962816813-0001/executors/mesos_containerizer_test.2a845a72-7b54-4a95-b6fa-6aeda8c6b591/runs/8be3b5cd-afa3-4189-aa2a-f09d73529f8c' to user 'root'
> I1217 13:05:35.409272 20780 slave.cpp:6179] Launching executor 'mesos_containerizer_test.2a845a72-7b54-4a95-b6fa-6aeda8c6b591' of framework 02083c57-b2d9-4054-babe-90e962816813-0001 with resources cpus(*):0.1; mem(*):32 in work directory '/tmp/mesos_slave/slaves/02083c57-b2d9-4054-babe-90e962816813-S0/frameworks/02083c57-b2d9-4054-babe-90e962816813-0001/executors/mesos_containerizer_test.2a845a72-7b54-4a95-b6fa-6aeda8c6b591/runs/8be3b5cd-afa3-4189-aa2a-f09d73529f8c'
> I1217 13:05:35.409958 20780 slave.cpp:1987] Queued task 'mesos_containerizer_test.2a845a72-7b54-4a95-b6fa-6aeda8c6b591' for executor 'mesos_containerizer_test.2a845a72-7b54-4a95-b6fa-6aeda8c6b591' of framework 02083c57-b2d9-4054-babe-90e962816813-0001
> I1217 13:05:35.410163 20779 docker.cpp:1000] Skipping non-docker container
> I1217 13:05:35.410636 20776 containerizer.cpp:938] Starting container 8be3b5cd-afa3-4189-aa2a-f09d73529f8c for executor 'mesos_containerizer_test.2a845a72-7b54-4a95-b6fa-6aeda8c6b591' of framework 02083c57-b2d9-4054-babe-90e962816813-0001
> I1217 13:05:44.459362 20778 slave.cpp:4992] Terminating executor ''cuda_mesos_nvidia_tf.72e9b9cf-8220-49bd-86fe-1667ee5e7a02' of framework 02083c57-b2d9-4054-babe-90e962816813-0001' because it did not register within 1mins
> I1217 13:05:53.586819 20780 slave.cpp:5044] Current disk usage 63.59%. Max allowed age: 1.848503351525151days
> I1217 13:06:35.410905 20777 slave.cpp:4992] Terminating executor ''mesos_containerizer_test.2a845a72-7b54-4a95-b6fa-6aeda8c6b591' of framework 02083c57-b2d9-4054-babe-90e962816813-0001' because it did not register within 1mins
> I1217 13:06:35.411175 20780 containerizer.cpp:1950] Destroying container 8be3b5cd-afa3-4189-aa2a-f09d73529f8c in PROVISIONING state
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)