You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@aurora.apache.org by "Rogier Dikkes (JIRA)" <ji...@apache.org> on 2016/10/04 18:09:20 UTC

[jira] [Commented] (AURORA-1781) Sandbox taskfs setup fails (groupadd error)

    [ https://issues.apache.org/jira/browse/AURORA-1781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15546178#comment-15546178 ] 

Rogier Dikkes commented on AURORA-1781:
---------------------------------------

Same issue: 
OS: CentOS Linux release 7.2.1511 (Core) 
version Aurora: 
0.16.0.
version Mesos: 
Version: 1.0.1

Used the hello_docker_image.aurora as a test from https://github.com/apache/aurora/tree/master/examples/jobs

I created the aurora rpm from the aurora-packaging repository and used the 0.16.0 source distribution to create all packages. 

The error: 
8 minutes ago - FAILED : Failed to initialize sandbox: Failed to create group in sandbox for task image: Command '['groupadd', '-R', '/var/lib/mesos/slaves/ab28b3ed-85d1-4bce-898e-e57a5f332762-S2074/frameworks/ab28b3ed-85d1-4bce-898e-e57a5f332762-0000/executors/thermos-blauser-prod-hello_docker_image-0-f8232fb7-be9c-4910-bbb8-136ba369ce3f/runs/8bddc079-9a6d-4047-afe6-d4969dad2d4d/taskfs', '-g', '1000', 'blauser']' returned non-zero exit status 10

When using the vagrant image i did not run into this issue.

What is in the mesos log:
I1004 18:07:38.698328 108146 fetcher.cpp:498] Fetcher Info: {"cache_directory":"\/tmp\/mesos\/fetch\/slaves\/ab28b3ed-85d1-4bce-898e-e57a5f332762-S2074\/root","items":[{"action":"BYPASS_CACHE","uri":{"executable":true,"extract":true,"value":"\/usr\/bin\/thermos_executor"}}],"sandbox_directory":"\/var\/lib\/mesos\/slaves\/ab28b3ed-85d1-4bce-898e-e57a5f332762-S2074\/frameworks\/ab28b3ed-85d1-4bce-898e-e57a5f332762-0000\/executors\/thermos-blauser-prod-hello_docker_image-0-0639d3f6-5fab-4154-bef6-304d82a26de1\/runs\/831a4a74-6053-42df-b830-77660e5125c5","user":"root"}
I1004 18:07:38.703634 108146 fetcher.cpp:409] Fetching URI '/usr/bin/thermos_executor'
I1004 18:07:38.703665 108146 fetcher.cpp:250] Fetching directly into the sandbox directory
I1004 18:07:38.703697 108146 fetcher.cpp:187] Fetching URI '/usr/bin/thermos_executor'
I1004 18:07:38.703718 108146 fetcher.cpp:167] Copying resource with command:cp '/usr/bin/thermos_executor' '/var/lib/mesos/slaves/ab28b3ed-85d1-4bce-898e-e57a5f332762-S2074/frameworks/ab28b3ed-85d1-4bce-898e-e57a5f332762-0000/executors/thermos-blauser-prod-hello_docker_image-0-0639d3f6-5fab-4154-bef6-304d82a26de1/runs/831a4a74-6053-42df-b830-77660e5125c5/thermos_executor'
I1004 18:07:38.718241 108146 fetcher.cpp:547] Fetched '/usr/bin/thermos_executor' to '/var/lib/mesos/slaves/ab28b3ed-85d1-4bce-898e-e57a5f332762-S2074/frameworks/ab28b3ed-85d1-4bce-898e-e57a5f332762-0000/executors/thermos-blauser-prod-hello_docker_image-0-0639d3f6-5fab-4154-bef6-304d82a26de1/runs/831a4a74-6053-42df-b830-77660e5125c5/thermos_executor'
twitter.common.app debug: Initializing: twitter.common.log (Logging subsystem.)
Writing log files to disk in /var/lib/mesos/slaves/ab28b3ed-85d1-4bce-898e-e57a5f332762-S2074/frameworks/ab28b3ed-85d1-4bce-898e-e57a5f332762-0000/executors/thermos-blauser-prod-hello_docker_image-0-0639d3f6-5fab-4154-bef6-304d82a26de1/runs/831a4a74-6053-42df-b830-77660e5125c5
I1004 18:07:39.536164 108143 exec.cpp:161] Version: 1.0.0
I1004 18:07:39.548815 108199 exec.cpp:236] Executor registered on agent ab28b3ed-85d1-4bce-898e-e57a5f332762-S2074
groupadd: failure while writing changes to /etc/group
FATAL] Failed to initialize sandbox: Failed to create group in sandbox for task image: Command '['groupadd', '-R', '/var/lib/mesos/slaves/ab28b3ed-85d1-4bce-898e-e57a5f332762-S2074/frameworks/ab28b3ed-85d1-4bce-898e-e57a5f332762-0000/executors/thermos-blauser-prod-hello_docker_image-0-0639d3f6-5fab-4154-bef6-304d82a26de1/runs/831a4a74-6053-42df-b830-77660e5125c5/taskfs', '-g', '1000', 'blauser']' returned non-zero exit status 10
twitter.common.app debug: Shutting application down.
twitter.common.app debug: Running exit function for twitter.common.log (Logging subsystem.)
twitter.common.app debug: Finishing up module teardown.
twitter.common.app debug:   Active thread: <_MainThread(MainThread, started 140211855935296)>
twitter.common.app debug:   Active thread (daemon): <_DummyThread(Dummy-2, started daemon 140211681986304)>
twitter.common.app debug: Exiting cleanly.

> Sandbox taskfs setup fails (groupadd error)
> -------------------------------------------
>
>                 Key: AURORA-1781
>                 URL: https://issues.apache.org/jira/browse/AURORA-1781
>             Project: Aurora
>          Issue Type: Bug
>    Affects Versions: 0.16.0
>            Reporter: Justin Venus
>
> I hit what smells like a permission issue w/ `/etc/group` when trying to use a docker-image (unified containerizer setup) with mesos-1.0.0. and aurora-0.16.0-rc2.  I cannot reproduce issue w/ mesos-0.28.2 and aurora-015.0.
> {code}
> Failed to initialize sandbox: Failed to create group in sandbox for task image: Command '['groupadd', '-R', '/var/lib/mesos/slaves/5d28d0cc-2793-4471-82d5-e67276c53f70-S2/frameworks/20160221-001235-3801519626-5050-1-0000/executors/thermos-nobody-prod-jenkins-0-47cc7824-565b-4265-9ab4-9ba3f364ebed/runs/a3f78288-4865-4166-8685-1ad941562f2f/taskfs', '-g', '99', 'nobody']' returned non-zero exit status 10
> {code}
> {code}
> [root@mesos-master01of2 taskfs]# pwd
> /var/lib/mesos/slaves/5d28d0cc-2793-4471-82d5-e67276c53f70-S2/frameworks/20160221-001235-3801519626-5050-1-0000/executors/thermos-nobody-prod-jenkins-0-47cc7824-565b-4265-9ab4-9ba3f364ebed/runs/a3f78288-4865-4166-8685-1ad941562f2f/taskfs
> [root@mesos-master01of2 taskfs]# groupadd -R $PWD -g 99 nobody
> groupadd: cannot lock /etc/group; try again later.
> {code}
> Maybe related to AURORA-1761
> I'm running CoreOS with the mesos-agent (and thermos) inside docker.  Here is the gist of how it's started.
> {code}
> /usr/bin/sh -c "exec /usr/bin/docker run \
>     --name=mesos_slave \
>     --net=host \
>     --pid=host \
>     --privileged \
>     -v /sys:/sys \
>     -v /usr/bin/docker:/usr/bin/docker:ro \
>     -v /var/lib/docker:/var/lib/docker \
>     -v /var/run/docker.sock:/root/docker.sock \
>     -v /run/systemd/system:/run/systemd/system \
>     -v /lib64/libdevmapper.so.1.02:/lib/libdevmapper.so.1.02:ro \
>     -v /sys/fs/cgroup:/sys/fs/cgroup \
>     -v /var/lib/mesos:/var/lib/mesos \
>     -e MESOS_CONTAINERIZERS=docker,mesos \
>     -e MESOS_EXECUTOR_REGISTRATION_TIMEOUT=5mins \
>     -e MESOS_WORK_DIR=/var/lib/mesos \
>     -e MESOS_LOGGING_LEVEL=INFO \
>     -e AMAZON_REGION=us-office-2 \
>     -e AVAILABILITY_ZONE=us-office-2b \
>     -e MESOS_ATTRIBUTES=\"platform:linux;host:$(hostname);rack:us-office-2b\" \
>     -e MESOS_CLUSTER=ZeroZero \
>     -e MESOS_DOCKER_SOCKET=/root/docker.sock \
>     -e MESOS_MASTER=zk://10.150.150.224:2181,10.150.150.225:2181,10.150.150.226:2181/mesos \
>     -e MESOS_LOG_DIR=/var/log/mesos \
>     -e MESOS_ISOLATION=\"filesystem/linux,cgroups/cpu,cgroups/mem,docker/runtime\" \
>     -e MESOS_IMAGE_PROVIDERS=docker \
>     -e MESOS_IMAGE_PROVISIONER_BACKEND=copy \
>     -e MESOS_DOCKER_REGISTRY=http://docker-registry:31000 \
>     -e MESOS_DOCKER_STORE_DIR=/var/lib/mesos/docker \
>     --entrypoint=/usr/sbin/mesos-slave \
>     docker-registry.thebrighttag.com:31000/mesos:latest \
>         --no-systemd_enable_support \
>     || rm -f /var/lib/mesos/meta/slaves/latest"
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)