You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@mesos.apache.org by "Rogier Dikkes (JIRA)" <ji...@apache.org> on 2016/10/27 15:08:58 UTC

[jira] [Comment Edited] (MESOS-6327) Large docker images causes container launch failures: Too many levels of symbolic links

    [ https://issues.apache.org/jira/browse/MESOS-6327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15612156#comment-15612156 ] 

Rogier Dikkes edited comment on MESOS-6327 at 10/27/16 3:08 PM:
----------------------------------------------------------------

More information: 
Last week i created an docker image containing 21 layers which is based on ubuntu:16.04 containing a few packages, today i updated the image to remove a typo in it and the image increased 30MB in size (not layers) i suspect because of package updates. Now im running into the issue as above.

imagename  0.2.7               be78f88bb969        37 minutes ago      418.3 MB
imagename  0.2.6               2022190ada2c        7 days ago          391.9 MB

Some years ago the lxc community ran into this too, back then it was autofs causing issues. I have ensured autofs and automount were not running on the hosts.


was (Author: a-nldisr):
More information: 
Last week i created an docker image containing 21 layers which is based on ubuntu:16.04 containing a few packages, today i updated the image to remove a typo in it and the image increased 30MB in size (not layers). Now im running into the issue as above.

imagename  0.2.7               be78f88bb969        37 minutes ago      418.3 MB
imagename  0.2.6               2022190ada2c        7 days ago          391.9 MB

Some years ago the lxc community ran into this too, back then it was autofs causing issues. I have ensured autofs and automount were not running on the hosts.

> Large docker images causes container launch failures: Too many levels of symbolic links
> ---------------------------------------------------------------------------------------
>
>                 Key: MESOS-6327
>                 URL: https://issues.apache.org/jira/browse/MESOS-6327
>             Project: Mesos
>          Issue Type: Bug
>          Components: containerization, docker
>    Affects Versions: 1.0.0, 1.0.1
>         Environment: centos 7.2 (1511), ubuntu 14.04 (trusty). Replicated in the Apache Aurora vagrant image
>            Reporter: Rogier Dikkes
>            Priority: Critical
>
> When deploying Mesos containers with large (6G+, 60+ layers) Docker images the task crashes with the error: 
> Mesos agent logs: 
> E1007 08:40:12.954227  8117 slave.cpp:3976] Container 'a1d759ae-5bc6-4c4e-ac03-717fbb8e5da4' for executor 'thermos-www-data-devel-hello_docker_image-0-d42d2af6-6b44-4b2b-be95-e1ba93a6b365' of framework df
> c91a86-84b9-4539-a7be-4ace7b7b44a1-0000 failed to start: Collect failed: Collect failed: Failed to copy layer: cp: cannot stat ‘/var/lib/mesos/provisioner/containers/a1d759ae-5bc6-4c4e-ac03-717fbb8e5da4/b
> ackends/copy/rootfses/5f328f72-25d4-4a26-ac83-8d30bbc44e97/usr/share/zoneinfo/right/Asia/Urumqi’: Too many levels of symbolic links
> ... (complete pastebin: http://pastebin.com/umZ4Q5d1 )
> How to replicate:
> Start the aurora vagrant image. Adjust the /etc/mesos-slave/executor_registration_timeout to 5 mins. Adjust the file /vagrant/examples/jobs/hello_docker_image.aurora to start a large Docker image instead of the example. (you can use anldisr/jupyter:0.4 i created as a test image, this is based upon the jupyter notebook stacks.). Create the job, watch it fail after x number of minutes. 
> The mesos sandbox is empty. 
> Aurora errors i see: 
> 28 minutes ago - FAILED : Failed to launch container: Collect failed: Collect failed: Failed to copy layer: cp: cannot stat ‘/var/lib/mesos/provisioner/containers/93420a36-0e0c-4f04-b401-74c426c25686/backends/copy/rootfses/6e185a51-7174-4b0d-a305-42b634eb91bb/usr/share/zoneinfo/right/Asia/Urumqi’: Too many levels of symbolic links cp: cannot stat ... 
> Too many levels of symbolic links ; Container destroyed while provisioning images
> (complete pastebin: http://pastebin.com/uecHYD5J )
> To rule out the image i started this and more images as a normal Docker container. This works without issues. 
> Mesos flags related configured: 
> -appc_store_dir 
> /tmp/mesos/images/appc
> -containerizers 
> docker,mesos
> -executor_registration_timeout 
> 5mins
> -image_providers 
> appc,docker
> -image_provisioner_backend 
> copy
> -isolation 
> filesystem/linux,docker/runtime
> Affected Mesos versions tested: 1.0.1 & 1.0.0



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)