You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@mesos.apache.org by "Chris Fortier (JIRA)" <ji...@apache.org> on 2015/08/28 00:04:45 UTC

[jira] [Updated] (MESOS-3325) Running mesos-slave@0.23 in a container causes slave to be lost after a restart

     [ https://issues.apache.org/jira/browse/MESOS-3325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Chris Fortier updated MESOS-3325:
---------------------------------
    Description: 
We are attempting to run mesos-slave 0.23 in a container. However it appears that the mesos-slave agent registers as a new slave instead of re-registering. This causes the formerly-launched tasks to continue running.

systemd unit being used:

```
[Unit]
Description=MesosSlave
After=docker.service dockercfg.service
Requires=docker.service dockercfg.service

[Service]
Environment=MESOS_IMAGE=mesosphere/mesos-slave:0.23.0-1.0.ubuntu1404
Environment=ZOOKEEPER=redacted
User=core
KillMode=process
Restart=always
RestartSec=20
TimeoutStartSec=0
ExecStartPre=-/usr/bin/docker kill mesos_slave
ExecStartPre=-/usr/bin/docker rm mesos_slave
ExecStartPre=/usr/bin/docker pull ${MESOS_IMAGE}
ExecStart=/usr/bin/sh -c "sudo /usr/bin/docker run \
    --name=mesos_slave \
    --net=host \
    --pid=host \
    --privileged \
    -v /home/core/.dockercfg:/root/.dockercfg:ro \
    -v /sys:/sys \
    -v /usr/bin/docker:/usr/bin/docker:ro \
    -v /var/run/docker.sock:/var/run/docker.sock \
    -v /lib64/libdevmapper.so.1.02:/lib/libdevmapper.so.1.02:ro \
    -v /var/lib/mesos/slave:/var/lib/mesos/slave \
    ${MESOS_IMAGE} \
    --ip=`curl -s http://169.254.169.254/latest/meta-data/local-ipv4` \
    --attributes=zone:$(curl -s http://169.254.169.254/latest/meta-data/placement/availability-zone)\;os:coreos \
    --containerizers=docker,mesos \
    --executor_registration_timeout=10mins \
    --hostname=`curl -s http://169.254.169.254/latest/meta-data/public-hostname` \
    --log_dir=/var/log/mesos \
    --master=zk://${ZOOKEEPER}/mesos \
    --work_dir=/var/lib/mesos/slave"
ExecStop=/usr/bin/docker stop mesos_slave

[Install]
WantedBy=multi-user.target

[X-Fleet]
Global=true
MachineMetadata=role=worker
```


ps, yes I saw the coreos-setup repo was deprecated.

  was:
We are attempting to run mesos-slave 0.23 in a container. However it appears that the mesos-slave agent registers as a new slave instead of re-registering. This causes the formerly-launched tasks to continue running.

systemd unit being used:

```
[Unit]
Description=MesosSlave
After=docker.service dockercfg.service
Requires=docker.service dockercfg.service

[Service]
Environment=MESOS_IMAGE=mesosphere/mesos-slave:0.23.0-1.0.ubuntu1404
Environment=ZOOKEEPER=redacted
User=core
KillMode=process
Restart=always
RestartSec=20
TimeoutStartSec=0
ExecStartPre=-/usr/bin/docker kill mesos_slave
ExecStartPre=-/usr/bin/docker rm mesos_slave
ExecStartPre=/usr/bin/sudo /usr/bin/rm -f /var/lib/mesos/slave/meta/slaves/latest
ExecStartPre=/usr/bin/docker pull ${MESOS_IMAGE}
ExecStart=/usr/bin/sh -c "sudo /usr/bin/docker run \
    --name=mesos_slave \
    --net=host \
    --pid=host \
    --privileged \
    -v /home/core/.dockercfg:/root/.dockercfg:ro \
    -v /sys:/sys \
    -v /usr/bin/docker:/usr/bin/docker:ro \
    -v /var/run/docker.sock:/var/run/docker.sock \
    -v /lib64/libdevmapper.so.1.02:/lib/libdevmapper.so.1.02:ro \
    -v /var/lib/mesos/slave:/var/lib/mesos/slave \
    ${MESOS_IMAGE} \
    --ip=`curl -s http://169.254.169.254/latest/meta-data/local-ipv4` \
    --attributes=zone:$(curl -s http://169.254.169.254/latest/meta-data/placement/availability-zone)\;os:coreos \
    --containerizers=docker,mesos \
    --executor_registration_timeout=10mins \
    --hostname=`curl -s http://169.254.169.254/latest/meta-data/public-hostname` \
    --log_dir=/var/log/mesos \
    --master=zk://${ZOOKEEPER}/mesos \
    --work_dir=/var/lib/mesos/slave"
ExecStop=/usr/bin/docker stop mesos_slave

[Install]
WantedBy=multi-user.target

[X-Fleet]
Global=true
MachineMetadata=role=worker
```


ps, yes I saw the coreos-setup repo was deprecated.


> Running mesos-slave@0.23 in a container causes slave to be lost after a restart
> -------------------------------------------------------------------------------
>
>                 Key: MESOS-3325
>                 URL: https://issues.apache.org/jira/browse/MESOS-3325
>             Project: Mesos
>          Issue Type: Bug
>          Components: slave
>    Affects Versions: 0.23.0
>         Environment: CoreOS, Container, Docker
>            Reporter: Chris Fortier
>            Priority: Critical
>
> We are attempting to run mesos-slave 0.23 in a container. However it appears that the mesos-slave agent registers as a new slave instead of re-registering. This causes the formerly-launched tasks to continue running.
> systemd unit being used:
> ```
> [Unit]
> Description=MesosSlave
> After=docker.service dockercfg.service
> Requires=docker.service dockercfg.service
> [Service]
> Environment=MESOS_IMAGE=mesosphere/mesos-slave:0.23.0-1.0.ubuntu1404
> Environment=ZOOKEEPER=redacted
> User=core
> KillMode=process
> Restart=always
> RestartSec=20
> TimeoutStartSec=0
> ExecStartPre=-/usr/bin/docker kill mesos_slave
> ExecStartPre=-/usr/bin/docker rm mesos_slave
> ExecStartPre=/usr/bin/docker pull ${MESOS_IMAGE}
> ExecStart=/usr/bin/sh -c "sudo /usr/bin/docker run \
>     --name=mesos_slave \
>     --net=host \
>     --pid=host \
>     --privileged \
>     -v /home/core/.dockercfg:/root/.dockercfg:ro \
>     -v /sys:/sys \
>     -v /usr/bin/docker:/usr/bin/docker:ro \
>     -v /var/run/docker.sock:/var/run/docker.sock \
>     -v /lib64/libdevmapper.so.1.02:/lib/libdevmapper.so.1.02:ro \
>     -v /var/lib/mesos/slave:/var/lib/mesos/slave \
>     ${MESOS_IMAGE} \
>     --ip=`curl -s http://169.254.169.254/latest/meta-data/local-ipv4` \
>     --attributes=zone:$(curl -s http://169.254.169.254/latest/meta-data/placement/availability-zone)\;os:coreos \
>     --containerizers=docker,mesos \
>     --executor_registration_timeout=10mins \
>     --hostname=`curl -s http://169.254.169.254/latest/meta-data/public-hostname` \
>     --log_dir=/var/log/mesos \
>     --master=zk://${ZOOKEEPER}/mesos \
>     --work_dir=/var/lib/mesos/slave"
> ExecStop=/usr/bin/docker stop mesos_slave
> [Install]
> WantedBy=multi-user.target
> [X-Fleet]
> Global=true
> MachineMetadata=role=worker
> ```
> ps, yes I saw the coreos-setup repo was deprecated.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)