You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mesos.apache.org by Xiaodong Zhang <xd...@alauda.io> on 2017/02/20 17:13:30 UTC

Re: Running mesos-slave in the docker that leave many zombie process

Hi guys.

I try to fix zombie container as this email. It works well when I restart mesos-slave. No zombie containers occur. But this  just works on  restarting mesos-slave.

If I restart the executor, the executor will quit, and the container which executor start, will be a zombie container.

Any idea about this? My mesos version is 0.28.

Here is some pic:


  1.  Start a container.

[cid:C1C8B5B9-DE97-486B-9512-117DCD305B3F]

2. Restart mesos-slave. Everything is ok.
[cid:41C46ACA-5234-418E-8ACC-B1FB50533AF4]

3. Kill the executor container, zombie container occur.

[cid:B6685AC2-E65B-4F16-AC3B-7C4636C96D10]

How can I fix this?

Thanks,
Xiaodong

发件人: tommy xiao <xi...@gmail.com>>
答复: "user@mesos.apache.org<ma...@mesos.apache.org>" <us...@mesos.apache.org>>
日期: 2016年11月22日 星期二 上午12:32
至: user <us...@mesos.apache.org>>
主题: Re: Running mesos-slave in the docker that leave many zombie process

you need it  --pid=host

2016-11-21 15:01 GMT+08:00 X Brick <ng...@gmail.com>>:
Thanks @haosdent, let me try it.

2016-11-21 14:33 GMT+08:00 haosdent <ha...@gmail.com>>:
Pass the `--pid=host` flag when starting the docker container  may resolve this.
>start the mesos_slave container with "--pid=host" so that it uses the process namespace of the host.

On Mon, Nov 21, 2016 at 2:30 PM, haosdent <ha...@gmail.com>> wrote:
No sure if it related to this issue https://github.com/mesosphere/docker-containers/issues/9

On Mon, Nov 21, 2016 at 12:27 PM, X Brick <ng...@gmail.com>> wrote:

Hi,

I meet a problem when running mesos-slave in the docker. Here are some zombie process in this way.

```
root     10547 19464  0 Oct25 ?        00:00:00 [docker] <defunct>
root     14505 19464  0 Oct25 ?        00:00:00 [docker] <defunct>
root     16069 19464  0 Oct25 ?        00:00:00 [docker] <defunct>
root     19962 19464  0 Oct25 ?        00:00:00 [docker] <defunct>
root     23346 19464  0 Oct25 ?        00:00:00 [docker] <defunct>
root     24544 19464  0 Oct25 ?        00:00:00 [docker] <defunct>
```

And I find the zombies come from mesos-slave process:

```
pstree -p -s 10547
systemd(1)───docker-containe(19448)───mesos-slave(19464)───docker(10547)
```

The logs has been deleted by the cron job a few weeks ago, but I remember so many `Failed to shutdown socket with fd xx: Transport endpoint is not connected` in the log.

I report this to the JIRA: https://issues.apache.org/jira/browse/MESOS-6615

Is there anyone saw this issue before ?



--
Best Regards,
Haosdent Huang



--
Best Regards,
Haosdent Huang




--
Deshi Xiao
Twitter: xds2000
E-mail: xiaods(AT)gmail.com<http://gmail.com>

Re: Running mesos-slave in the docker that leave many zombie process

Posted by haosdent <ha...@gmail.com>.
Sorry for a typo.

s/those containers they don't wait/those containers it could not recovered/.

On Fri, Feb 24, 2017 at 12:52 AM, haosdent <ha...@gmail.com> wrote:

> Hi, @xiaodong
>
> >If I restart mesos-slave. Then the zombie container are gone.
>
> Yep, during recovering stage, mesos-agent would remove those containers
> they don't wait
>
> > Remove executor
>
> What you mean about "remove executor", do you mean kill
> `mesos-docker-executor`?
> If you mean this, it is an expected behavior. Because the docker container
> is reaped by mesos-docker-executor, if you kill it.
> Mesos agent isn't aware of the status of docker container (the status of
> the docker container is checked by mesos-docker-executor and send task
> statuses to the mesos agent ).
>
>
> On Thu, Feb 23, 2017 at 10:44 AM, Xiaodong Zhang <xd...@alauda.io>
> wrote:
>
>> What can I do for this. Do you guys need more info?
>>
>> 发件人: Xiaodong Zhang <xd...@alauda.io>
>> 答复: "user@mesos.apache.org" <us...@mesos.apache.org>
>> 日期: 2017年2月21日 星期二 下午6:18
>> 至: "user@mesos.apache.org" <us...@mesos.apache.org>
>>
>> 主题: Re: Running mesos-slave in the docker that leave many zombie process
>>
>> Hi @Haosdent thanks for your reply.
>> I tried 1.0.3, 1.1.0. They both have the same problem.
>>
>>    1. Create container.
>>
>>
>>   2. Restart container. Works well.
>>
>>   3. Remove executor
>>
>>
>> If I restart mesos-slave. Then the zombie container gone.
>>
>> Any thoughts?
>>
>> Thanks,
>> Xiaodong
>>
>> 发件人: haosdent <ha...@gmail.com>
>> 答复: "user@mesos.apache.org" <us...@mesos.apache.org>
>> 日期: 2017年2月21日 星期二 上午1:19
>> 至: user <us...@mesos.apache.org>
>> 主题: Re: Running mesos-slave in the docker that leave many zombie process
>>
>> Hi, @xiaodong May you try if this problem still exists after 1.0? I
>> remember Mesos change the recovery for docker containers to avoid this
>> after 1.0.
>>
>> On Tue, Feb 21, 2017 at 1:13 AM, Xiaodong Zhang <xd...@alauda.io>
>> wrote:
>>
>>> Hi guys.
>>>
>>> I try to fix zombie container as this email. It works well when I
>>> restart mesos-slave. No zombie containers occur. But this  just works on
>>>  restarting mesos-slave.
>>>
>>> If I restart the executor, the executor will quit, and the container
>>> which executor start, will be a zombie container.
>>>
>>> Any idea about this? My mesos version is 0.28.
>>>
>>> Here is some pic:
>>>
>>>
>>>    1. Start a container.
>>>
>>>
>>> 2. Restart mesos-slave. Everything is ok.
>>>
>>> 3. Kill the executor container, zombie container occur.
>>>
>>>
>>> How can I fix this?
>>>
>>> Thanks,
>>> Xiaodong
>>>
>>> 发件人: tommy xiao <xi...@gmail.com>
>>> 答复: "user@mesos.apache.org" <us...@mesos.apache.org>
>>> 日期: 2016年11月22日 星期二 上午12:32
>>> 至: user <us...@mesos.apache.org>
>>> 主题: Re: Running mesos-slave in the docker that leave many zombie process
>>>
>>> you need it  --pid=host
>>>
>>> 2016-11-21 15:01 GMT+08:00 X Brick <ng...@gmail.com>:
>>>
>>>> Thanks @haosdent, let me try it.
>>>>
>>>> 2016-11-21 14:33 GMT+08:00 haosdent <ha...@gmail.com>:
>>>>
>>>>> Pass the `--pid=host` flag when starting the docker container  may
>>>>> resolve this.
>>>>> >start the mesos_slave container with "--pid=host" so that it uses
>>>>> the process namespace of the host.
>>>>>
>>>>> On Mon, Nov 21, 2016 at 2:30 PM, haosdent <ha...@gmail.com> wrote:
>>>>>
>>>>>> No sure if it related to this issue https://github.com/mesos
>>>>>> phere/docker-containers/issues/9
>>>>>>
>>>>>> On Mon, Nov 21, 2016 at 12:27 PM, X Brick <ng...@gmail.com> wrote:
>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> I meet a problem when running mesos-slave in the docker. Here are
>>>>>>> some zombie process in this way.
>>>>>>>
>>>>>>> ```
>>>>>>> root     10547 19464  0 Oct25 ?        00:00:00 [docker] <defunct>
>>>>>>> root     14505 19464  0 Oct25 ?        00:00:00 [docker] <defunct>
>>>>>>> root     16069 19464  0 Oct25 ?        00:00:00 [docker] <defunct>
>>>>>>> root     19962 19464  0 Oct25 ?        00:00:00 [docker] <defunct>
>>>>>>> root     23346 19464  0 Oct25 ?        00:00:00 [docker] <defunct>
>>>>>>> root     24544 19464  0 Oct25 ?        00:00:00 [docker] <defunct>
>>>>>>> ```
>>>>>>>
>>>>>>> And I find the zombies come from mesos-slave process:
>>>>>>>
>>>>>>> ```
>>>>>>> pstree -p -s 10547
>>>>>>> systemd(1)───docker-containe(19448)───mesos-slave(19464)───docker(10547)
>>>>>>> ```
>>>>>>>
>>>>>>> The logs has been deleted by the cron job a few weeks ago, but I
>>>>>>> remember so many `Failed to shutdown socket with fd xx: Transport
>>>>>>> endpoint is not connected` in the log.
>>>>>>>
>>>>>>> I report this to the JIRA: https://issues.apache.org/jira
>>>>>>> /browse/MESOS-6615
>>>>>>>
>>>>>>> Is there anyone saw this issue before ?
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Best Regards,
>>>>>> Haosdent Huang
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Best Regards,
>>>>> Haosdent Huang
>>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> Deshi Xiao
>>> Twitter: xds2000
>>> E-mail: xiaods(AT)gmail.com
>>>
>>
>>
>>
>> --
>> Best Regards,
>> Haosdent Huang
>>
>
>
>
> --
> Best Regards,
> Haosdent Huang
>



-- 
Best Regards,
Haosdent Huang

Re: Running mesos-slave in the docker that leave many zombie process

Posted by haosdent <ha...@gmail.com>.
>If mesas-docker-executor exit itself, what does mesos-slave do?
I just tried(kill the executor), and I get a TASK_LOST.

On Fri, Feb 24, 2017 at 10:27 AM, haosdent <ha...@gmail.com> wrote:

> >Mesos still can’t prevent the zombie container occurs, Right
>
> I don' think so. Usually mesos-docker-executor would not exit while the
> corresponding container running unless you kill it. If it exits unexpected,
> it should be a bug.
>
> On Fri, Feb 24, 2017 at 10:17 AM, Xiaodong Zhang <xd...@alauda.io>
> wrote:
>
>> Hi Haosdent,
>>
>> Thanks for your reply.
>>
>> What you mean about "remove executor", do you mean kill
>> `mesos-docker-executor`?
>> If you mean this, it is an expected behavior. Because the docker
>> container is reaped by mesos-docker-executor, if you kill it.
>> Mesos agent isn't aware of the status of docker container (the status of
>> the docker container is checked by mesos-docker-executor and send task
>> statuses to the mesos agent ).
>>
>> If this is an expected behavior. Mesos still can’t prevent the zombie
>> container occurs, Right? If mesas-docker-executor exit itself, what does
>> mesos-slave do?
>>
>> 发件人: haosdent <ha...@gmail.com>
>> 答复: "user@mesos.apache.org" <us...@mesos.apache.org>
>> 日期: 2017年2月24日 星期五 上午12:52
>>
>> 至: user <us...@mesos.apache.org>
>> 主题: Re: Running mesos-slave in the docker that leave many zombie process
>>
>> Hi, @xiaodong
>>
>> >If I restart mesos-slave. Then the zombie container are gone.
>>
>> Yep, during recovering stage, mesos-agent would remove those containers
>> they don't wait
>>
>> > Remove executor
>>
>> What you mean about "remove executor", do you mean kill
>> `mesos-docker-executor`?
>> If you mean this, it is an expected behavior. Because the docker
>> container is reaped by mesos-docker-executor, if you kill it.
>> Mesos agent isn't aware of the status of docker container (the status of
>> the docker container is checked by mesos-docker-executor and send task
>> statuses to the mesos agent ).
>>
>>
>> On Thu, Feb 23, 2017 at 10:44 AM, Xiaodong Zhang <xd...@alauda.io>
>> wrote:
>>
>>> What can I do for this. Do you guys need more info?
>>>
>>> 发件人: Xiaodong Zhang <xd...@alauda.io>
>>> 答复: "user@mesos.apache.org" <us...@mesos.apache.org>
>>> 日期: 2017年2月21日 星期二 下午6:18
>>> 至: "user@mesos.apache.org" <us...@mesos.apache.org>
>>>
>>> 主题: Re: Running mesos-slave in the docker that leave many zombie process
>>>
>>> Hi @Haosdent thanks for your reply.
>>> I tried 1.0.3, 1.1.0. They both have the same problem.
>>>
>>>    1. Create container.
>>>
>>>
>>>   2. Restart container. Works well.
>>>
>>>   3. Remove executor
>>>
>>>
>>> If I restart mesos-slave. Then the zombie container gone.
>>>
>>> Any thoughts?
>>>
>>> Thanks,
>>> Xiaodong
>>>
>>> 发件人: haosdent <ha...@gmail.com>
>>> 答复: "user@mesos.apache.org" <us...@mesos.apache.org>
>>> 日期: 2017年2月21日 星期二 上午1:19
>>> 至: user <us...@mesos.apache.org>
>>> 主题: Re: Running mesos-slave in the docker that leave many zombie process
>>>
>>> Hi, @xiaodong May you try if this problem still exists after 1.0? I
>>> remember Mesos change the recovery for docker containers to avoid this
>>> after 1.0.
>>>
>>> On Tue, Feb 21, 2017 at 1:13 AM, Xiaodong Zhang <xd...@alauda.io>
>>> wrote:
>>>
>>>> Hi guys.
>>>>
>>>> I try to fix zombie container as this email. It works well when I
>>>> restart mesos-slave. No zombie containers occur. But this  just works on
>>>>  restarting mesos-slave.
>>>>
>>>> If I restart the executor, the executor will quit, and the container
>>>> which executor start, will be a zombie container.
>>>>
>>>> Any idea about this? My mesos version is 0.28.
>>>>
>>>> Here is some pic:
>>>>
>>>>
>>>>    1. Start a container.
>>>>
>>>>
>>>> 2. Restart mesos-slave. Everything is ok.
>>>>
>>>> 3. Kill the executor container, zombie container occur.
>>>>
>>>>
>>>> How can I fix this?
>>>>
>>>> Thanks,
>>>> Xiaodong
>>>>
>>>> 发件人: tommy xiao <xi...@gmail.com>
>>>> 答复: "user@mesos.apache.org" <us...@mesos.apache.org>
>>>> 日期: 2016年11月22日 星期二 上午12:32
>>>> 至: user <us...@mesos.apache.org>
>>>> 主题: Re: Running mesos-slave in the docker that leave many zombie
>>>> process
>>>>
>>>> you need it  --pid=host
>>>>
>>>> 2016-11-21 15:01 GMT+08:00 X Brick <ng...@gmail.com>:
>>>>
>>>>> Thanks @haosdent, let me try it.
>>>>>
>>>>> 2016-11-21 14:33 GMT+08:00 haosdent <ha...@gmail.com>:
>>>>>
>>>>>> Pass the `--pid=host` flag when starting the docker container  may
>>>>>> resolve this.
>>>>>> >start the mesos_slave container with "--pid=host" so that it uses
>>>>>> the process namespace of the host.
>>>>>>
>>>>>> On Mon, Nov 21, 2016 at 2:30 PM, haosdent <ha...@gmail.com> wrote:
>>>>>>
>>>>>>> No sure if it related to this issue https://github.com/mesos
>>>>>>> phere/docker-containers/issues/9
>>>>>>>
>>>>>>> On Mon, Nov 21, 2016 at 12:27 PM, X Brick <ng...@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> I meet a problem when running mesos-slave in the docker. Here are
>>>>>>>> some zombie process in this way.
>>>>>>>>
>>>>>>>> ```
>>>>>>>> root     10547 19464  0 Oct25 ?        00:00:00 [docker] <defunct>
>>>>>>>> root     14505 19464  0 Oct25 ?        00:00:00 [docker] <defunct>
>>>>>>>> root     16069 19464  0 Oct25 ?        00:00:00 [docker] <defunct>
>>>>>>>> root     19962 19464  0 Oct25 ?        00:00:00 [docker] <defunct>
>>>>>>>> root     23346 19464  0 Oct25 ?        00:00:00 [docker] <defunct>
>>>>>>>> root     24544 19464  0 Oct25 ?        00:00:00 [docker] <defunct>
>>>>>>>> ```
>>>>>>>>
>>>>>>>> And I find the zombies come from mesos-slave process:
>>>>>>>>
>>>>>>>> ```
>>>>>>>> pstree -p -s 10547
>>>>>>>> systemd(1)───docker-containe(19448)───mesos-slave(19464)───docker(10547)
>>>>>>>> ```
>>>>>>>>
>>>>>>>> The logs has been deleted by the cron job a few weeks ago, but I
>>>>>>>> remember so many `Failed to shutdown socket with fd xx: Transport
>>>>>>>> endpoint is not connected` in the log.
>>>>>>>>
>>>>>>>> I report this to the JIRA: https://issues.apache.org/jira
>>>>>>>> /browse/MESOS-6615
>>>>>>>>
>>>>>>>> Is there anyone saw this issue before ?
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Best Regards,
>>>>>>> Haosdent Huang
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Best Regards,
>>>>>> Haosdent Huang
>>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Deshi Xiao
>>>> Twitter: xds2000
>>>> E-mail: xiaods(AT)gmail.com
>>>>
>>>
>>>
>>>
>>> --
>>> Best Regards,
>>> Haosdent Huang
>>>
>>
>>
>>
>> --
>> Best Regards,
>> Haosdent Huang
>>
>
>
>
> --
> Best Regards,
> Haosdent Huang
>



-- 
Best Regards,
Haosdent Huang

Re: Running mesos-slave in the docker that leave many zombie process

Posted by Xiaodong Zhang <xd...@alauda.io>.
Thank you for your quick response! @Haosdent.

This answers my confusion. I will do more tests.

Best regards.

Xiaodong

发件人: haosdent <ha...@gmail.com>>
答复: "user@mesos.apache.org<ma...@mesos.apache.org>" <us...@mesos.apache.org>>
日期: 2017年2月24日 星期五 上午10:27
至: user <us...@mesos.apache.org>>
主题: Re: Running mesos-slave in the docker that leave many zombie process

>Mesos still can’t prevent the zombie container occurs, Right

I don' think so. Usually mesos-docker-executor would not exit while the corresponding container running unless you kill it. If it exits unexpected, it should be a bug.

On Fri, Feb 24, 2017 at 10:17 AM, Xiaodong Zhang <xd...@alauda.io>> wrote:
Hi Haosdent,

Thanks for your reply.

What you mean about "remove executor", do you mean kill `mesos-docker-executor`?
If you mean this, it is an expected behavior. Because the docker container is reaped by mesos-docker-executor, if you kill it.
Mesos agent isn't aware of the status of docker container (the status of the docker container is checked by mesos-docker-executor and send task statuses to the mesos agent ).

If this is an expected behavior. Mesos still can’t prevent the zombie container occurs, Right? If mesas-docker-executor exit itself, what does mesos-slave do?

发件人: haosdent <ha...@gmail.com>>
答复: "user@mesos.apache.org<ma...@mesos.apache.org>" <us...@mesos.apache.org>>
日期: 2017年2月24日 星期五 上午12:52

至: user <us...@mesos.apache.org>>
主题: Re: Running mesos-slave in the docker that leave many zombie process

Hi, @xiaodong

>If I restart mesos-slave. Then the zombie container are gone.

Yep, during recovering stage, mesos-agent would remove those containers they don't wait

> Remove executor

What you mean about "remove executor", do you mean kill `mesos-docker-executor`?
If you mean this, it is an expected behavior. Because the docker container is reaped by mesos-docker-executor, if you kill it.
Mesos agent isn't aware of the status of docker container (the status of the docker container is checked by mesos-docker-executor and send task statuses to the mesos agent ).


On Thu, Feb 23, 2017 at 10:44 AM, Xiaodong Zhang <xd...@alauda.io>> wrote:
What can I do for this. Do you guys need more info?

发件人: Xiaodong Zhang <xd...@alauda.io>>
答复: "user@mesos.apache.org<ma...@mesos.apache.org>" <us...@mesos.apache.org>>
日期: 2017年2月21日 星期二 下午6:18
至: "user@mesos.apache.org<ma...@mesos.apache.org>" <us...@mesos.apache.org>>

主题: Re: Running mesos-slave in the docker that leave many zombie process

Hi @Haosdent thanks for your reply.
I tried 1.0.3, 1.1.0. They both have the same problem.

  1.  Create container.

[cid:89116669-04F7-4B75-A1C5-9067FC6CE59D]

  2. Restart container. Works well.

  3. Remove executor
     [cid:2F546391-5FD8-4676-ADE3-35F29B8F7326]

If I restart mesos-slave. Then the zombie container gone.

Any thoughts?

Thanks,
Xiaodong

发件人: haosdent <ha...@gmail.com>>
答复: "user@mesos.apache.org<ma...@mesos.apache.org>" <us...@mesos.apache.org>>
日期: 2017年2月21日 星期二 上午1:19
至: user <us...@mesos.apache.org>>
主题: Re: Running mesos-slave in the docker that leave many zombie process

Hi, @xiaodong May you try if this problem still exists after 1.0? I remember Mesos change the recovery for docker containers to avoid this after 1.0.

On Tue, Feb 21, 2017 at 1:13 AM, Xiaodong Zhang <xd...@alauda.io>> wrote:
Hi guys.

I try to fix zombie container as this email. It works well when I restart mesos-slave. No zombie containers occur. But this  just works on  restarting mesos-slave.

If I restart the executor, the executor will quit, and the container which executor start, will be a zombie container.

Any idea about this? My mesos version is 0.28.

Here is some pic:


  1.  Start a container.

[cid:C1C8B5B9-DE97-486B-9512-117DCD305B3F]

2. Restart mesos-slave. Everything is ok.
[cid:41C46ACA-5234-418E-8ACC-B1FB50533AF4]

3. Kill the executor container, zombie container occur.

[cid:B6685AC2-E65B-4F16-AC3B-7C4636C96D10]

How can I fix this?

Thanks,
Xiaodong

发件人: tommy xiao <xi...@gmail.com>>
答复: "user@mesos.apache.org<ma...@mesos.apache.org>" <us...@mesos.apache.org>>
日期: 2016年11月22日 星期二 上午12:32
至: user <us...@mesos.apache.org>>
主题: Re: Running mesos-slave in the docker that leave many zombie process

you need it  --pid=host

2016-11-21 15:01 GMT+08:00 X Brick <ng...@gmail.com>>:
Thanks @haosdent, let me try it.

2016-11-21 14:33 GMT+08:00 haosdent <ha...@gmail.com>>:
Pass the `--pid=host` flag when starting the docker container  may resolve this.
>start the mesos_slave container with "--pid=host" so that it uses the process namespace of the host.

On Mon, Nov 21, 2016 at 2:30 PM, haosdent <ha...@gmail.com>> wrote:
No sure if it related to this issue https://github.com/mesosphere/docker-containers/issues/9

On Mon, Nov 21, 2016 at 12:27 PM, X Brick <ng...@gmail.com>> wrote:

Hi,

I meet a problem when running mesos-slave in the docker. Here are some zombie process in this way.

```
root     10547 19464  0 Oct25 ?        00:00:00 [docker] <defunct>
root     14505 19464  0 Oct25 ?        00:00:00 [docker] <defunct>
root     16069 19464  0 Oct25 ?        00:00:00 [docker] <defunct>
root     19962 19464  0 Oct25 ?        00:00:00 [docker] <defunct>
root     23346 19464  0 Oct25 ?        00:00:00 [docker] <defunct>
root     24544 19464  0 Oct25 ?        00:00:00 [docker] <defunct>
```

And I find the zombies come from mesos-slave process:

```
pstree -p -s 10547
systemd(1)───docker-containe(19448)───mesos-slave(19464)───docker(10547)
```

The logs has been deleted by the cron job a few weeks ago, but I remember so many `Failed to shutdown socket with fd xx: Transport endpoint is not connected` in the log.

I report this to the JIRA: https://issues.apache.org/jira/browse/MESOS-6615

Is there anyone saw this issue before ?



--
Best Regards,
Haosdent Huang



--
Best Regards,
Haosdent Huang




--
Deshi Xiao
Twitter: xds2000
E-mail: xiaods(AT)gmail.com<http://gmail.com>



--
Best Regards,
Haosdent Huang



--
Best Regards,
Haosdent Huang



--
Best Regards,
Haosdent Huang

Re: Running mesos-slave in the docker that leave many zombie process

Posted by haosdent <ha...@gmail.com>.
>Mesos still can’t prevent the zombie container occurs, Right

I don' think so. Usually mesos-docker-executor would not exit while the
corresponding container running unless you kill it. If it exits unexpected,
it should be a bug.

On Fri, Feb 24, 2017 at 10:17 AM, Xiaodong Zhang <xd...@alauda.io> wrote:

> Hi Haosdent,
>
> Thanks for your reply.
>
> What you mean about "remove executor", do you mean kill
> `mesos-docker-executor`?
> If you mean this, it is an expected behavior. Because the docker container
> is reaped by mesos-docker-executor, if you kill it.
> Mesos agent isn't aware of the status of docker container (the status of
> the docker container is checked by mesos-docker-executor and send task
> statuses to the mesos agent ).
>
> If this is an expected behavior. Mesos still can’t prevent the zombie
> container occurs, Right? If mesas-docker-executor exit itself, what does
> mesos-slave do?
>
> 发件人: haosdent <ha...@gmail.com>
> 答复: "user@mesos.apache.org" <us...@mesos.apache.org>
> 日期: 2017年2月24日 星期五 上午12:52
>
> 至: user <us...@mesos.apache.org>
> 主题: Re: Running mesos-slave in the docker that leave many zombie process
>
> Hi, @xiaodong
>
> >If I restart mesos-slave. Then the zombie container are gone.
>
> Yep, during recovering stage, mesos-agent would remove those containers
> they don't wait
>
> > Remove executor
>
> What you mean about "remove executor", do you mean kill
> `mesos-docker-executor`?
> If you mean this, it is an expected behavior. Because the docker container
> is reaped by mesos-docker-executor, if you kill it.
> Mesos agent isn't aware of the status of docker container (the status of
> the docker container is checked by mesos-docker-executor and send task
> statuses to the mesos agent ).
>
>
> On Thu, Feb 23, 2017 at 10:44 AM, Xiaodong Zhang <xd...@alauda.io>
> wrote:
>
>> What can I do for this. Do you guys need more info?
>>
>> 发件人: Xiaodong Zhang <xd...@alauda.io>
>> 答复: "user@mesos.apache.org" <us...@mesos.apache.org>
>> 日期: 2017年2月21日 星期二 下午6:18
>> 至: "user@mesos.apache.org" <us...@mesos.apache.org>
>>
>> 主题: Re: Running mesos-slave in the docker that leave many zombie process
>>
>> Hi @Haosdent thanks for your reply.
>> I tried 1.0.3, 1.1.0. They both have the same problem.
>>
>>    1. Create container.
>>
>>
>>   2. Restart container. Works well.
>>
>>   3. Remove executor
>>
>>
>> If I restart mesos-slave. Then the zombie container gone.
>>
>> Any thoughts?
>>
>> Thanks,
>> Xiaodong
>>
>> 发件人: haosdent <ha...@gmail.com>
>> 答复: "user@mesos.apache.org" <us...@mesos.apache.org>
>> 日期: 2017年2月21日 星期二 上午1:19
>> 至: user <us...@mesos.apache.org>
>> 主题: Re: Running mesos-slave in the docker that leave many zombie process
>>
>> Hi, @xiaodong May you try if this problem still exists after 1.0? I
>> remember Mesos change the recovery for docker containers to avoid this
>> after 1.0.
>>
>> On Tue, Feb 21, 2017 at 1:13 AM, Xiaodong Zhang <xd...@alauda.io>
>> wrote:
>>
>>> Hi guys.
>>>
>>> I try to fix zombie container as this email. It works well when I
>>> restart mesos-slave. No zombie containers occur. But this  just works on
>>>  restarting mesos-slave.
>>>
>>> If I restart the executor, the executor will quit, and the container
>>> which executor start, will be a zombie container.
>>>
>>> Any idea about this? My mesos version is 0.28.
>>>
>>> Here is some pic:
>>>
>>>
>>>    1. Start a container.
>>>
>>>
>>> 2. Restart mesos-slave. Everything is ok.
>>>
>>> 3. Kill the executor container, zombie container occur.
>>>
>>>
>>> How can I fix this?
>>>
>>> Thanks,
>>> Xiaodong
>>>
>>> 发件人: tommy xiao <xi...@gmail.com>
>>> 答复: "user@mesos.apache.org" <us...@mesos.apache.org>
>>> 日期: 2016年11月22日 星期二 上午12:32
>>> 至: user <us...@mesos.apache.org>
>>> 主题: Re: Running mesos-slave in the docker that leave many zombie process
>>>
>>> you need it  --pid=host
>>>
>>> 2016-11-21 15:01 GMT+08:00 X Brick <ng...@gmail.com>:
>>>
>>>> Thanks @haosdent, let me try it.
>>>>
>>>> 2016-11-21 14:33 GMT+08:00 haosdent <ha...@gmail.com>:
>>>>
>>>>> Pass the `--pid=host` flag when starting the docker container  may
>>>>> resolve this.
>>>>> >start the mesos_slave container with "--pid=host" so that it uses
>>>>> the process namespace of the host.
>>>>>
>>>>> On Mon, Nov 21, 2016 at 2:30 PM, haosdent <ha...@gmail.com> wrote:
>>>>>
>>>>>> No sure if it related to this issue https://github.com/mesos
>>>>>> phere/docker-containers/issues/9
>>>>>>
>>>>>> On Mon, Nov 21, 2016 at 12:27 PM, X Brick <ng...@gmail.com> wrote:
>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> I meet a problem when running mesos-slave in the docker. Here are
>>>>>>> some zombie process in this way.
>>>>>>>
>>>>>>> ```
>>>>>>> root     10547 19464  0 Oct25 ?        00:00:00 [docker] <defunct>
>>>>>>> root     14505 19464  0 Oct25 ?        00:00:00 [docker] <defunct>
>>>>>>> root     16069 19464  0 Oct25 ?        00:00:00 [docker] <defunct>
>>>>>>> root     19962 19464  0 Oct25 ?        00:00:00 [docker] <defunct>
>>>>>>> root     23346 19464  0 Oct25 ?        00:00:00 [docker] <defunct>
>>>>>>> root     24544 19464  0 Oct25 ?        00:00:00 [docker] <defunct>
>>>>>>> ```
>>>>>>>
>>>>>>> And I find the zombies come from mesos-slave process:
>>>>>>>
>>>>>>> ```
>>>>>>> pstree -p -s 10547
>>>>>>> systemd(1)───docker-containe(19448)───mesos-slave(19464)───docker(10547)
>>>>>>> ```
>>>>>>>
>>>>>>> The logs has been deleted by the cron job a few weeks ago, but I
>>>>>>> remember so many `Failed to shutdown socket with fd xx: Transport
>>>>>>> endpoint is not connected` in the log.
>>>>>>>
>>>>>>> I report this to the JIRA: https://issues.apache.org/jira
>>>>>>> /browse/MESOS-6615
>>>>>>>
>>>>>>> Is there anyone saw this issue before ?
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Best Regards,
>>>>>> Haosdent Huang
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Best Regards,
>>>>> Haosdent Huang
>>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> Deshi Xiao
>>> Twitter: xds2000
>>> E-mail: xiaods(AT)gmail.com
>>>
>>
>>
>>
>> --
>> Best Regards,
>> Haosdent Huang
>>
>
>
>
> --
> Best Regards,
> Haosdent Huang
>



-- 
Best Regards,
Haosdent Huang

Re: Running mesos-slave in the docker that leave many zombie process

Posted by Xiaodong Zhang <xd...@alauda.io>.
Hi Haosdent,

Thanks for your reply.

What you mean about "remove executor", do you mean kill `mesos-docker-executor`?
If you mean this, it is an expected behavior. Because the docker container is reaped by mesos-docker-executor, if you kill it.
Mesos agent isn't aware of the status of docker container (the status of the docker container is checked by mesos-docker-executor and send task statuses to the mesos agent ).

If this is an expected behavior. Mesos still can’t prevent the zombie container occurs, Right? If mesas-docker-executor exit itself, what does mesos-slave do?

发件人: haosdent <ha...@gmail.com>>
答复: "user@mesos.apache.org<ma...@mesos.apache.org>" <us...@mesos.apache.org>>
日期: 2017年2月24日 星期五 上午12:52
至: user <us...@mesos.apache.org>>
主题: Re: Running mesos-slave in the docker that leave many zombie process

Hi, @xiaodong

>If I restart mesos-slave. Then the zombie container are gone.

Yep, during recovering stage, mesos-agent would remove those containers they don't wait

> Remove executor

What you mean about "remove executor", do you mean kill `mesos-docker-executor`?
If you mean this, it is an expected behavior. Because the docker container is reaped by mesos-docker-executor, if you kill it.
Mesos agent isn't aware of the status of docker container (the status of the docker container is checked by mesos-docker-executor and send task statuses to the mesos agent ).


On Thu, Feb 23, 2017 at 10:44 AM, Xiaodong Zhang <xd...@alauda.io>> wrote:
What can I do for this. Do you guys need more info?

发件人: Xiaodong Zhang <xd...@alauda.io>>
答复: "user@mesos.apache.org<ma...@mesos.apache.org>" <us...@mesos.apache.org>>
日期: 2017年2月21日 星期二 下午6:18
至: "user@mesos.apache.org<ma...@mesos.apache.org>" <us...@mesos.apache.org>>

主题: Re: Running mesos-slave in the docker that leave many zombie process

Hi @Haosdent thanks for your reply.
I tried 1.0.3, 1.1.0. They both have the same problem.

  1.  Create container.

[cid:89116669-04F7-4B75-A1C5-9067FC6CE59D]

  2. Restart container. Works well.

  3. Remove executor
     [cid:2F546391-5FD8-4676-ADE3-35F29B8F7326]

If I restart mesos-slave. Then the zombie container gone.

Any thoughts?

Thanks,
Xiaodong

发件人: haosdent <ha...@gmail.com>>
答复: "user@mesos.apache.org<ma...@mesos.apache.org>" <us...@mesos.apache.org>>
日期: 2017年2月21日 星期二 上午1:19
至: user <us...@mesos.apache.org>>
主题: Re: Running mesos-slave in the docker that leave many zombie process

Hi, @xiaodong May you try if this problem still exists after 1.0? I remember Mesos change the recovery for docker containers to avoid this after 1.0.

On Tue, Feb 21, 2017 at 1:13 AM, Xiaodong Zhang <xd...@alauda.io>> wrote:
Hi guys.

I try to fix zombie container as this email. It works well when I restart mesos-slave. No zombie containers occur. But this  just works on  restarting mesos-slave.

If I restart the executor, the executor will quit, and the container which executor start, will be a zombie container.

Any idea about this? My mesos version is 0.28.

Here is some pic:


  1.  Start a container.

[cid:C1C8B5B9-DE97-486B-9512-117DCD305B3F]

2. Restart mesos-slave. Everything is ok.
[cid:41C46ACA-5234-418E-8ACC-B1FB50533AF4]

3. Kill the executor container, zombie container occur.

[cid:B6685AC2-E65B-4F16-AC3B-7C4636C96D10]

How can I fix this?

Thanks,
Xiaodong

发件人: tommy xiao <xi...@gmail.com>>
答复: "user@mesos.apache.org<ma...@mesos.apache.org>" <us...@mesos.apache.org>>
日期: 2016年11月22日 星期二 上午12:32
至: user <us...@mesos.apache.org>>
主题: Re: Running mesos-slave in the docker that leave many zombie process

you need it  --pid=host

2016-11-21 15:01 GMT+08:00 X Brick <ng...@gmail.com>>:
Thanks @haosdent, let me try it.

2016-11-21 14:33 GMT+08:00 haosdent <ha...@gmail.com>>:
Pass the `--pid=host` flag when starting the docker container  may resolve this.
>start the mesos_slave container with "--pid=host" so that it uses the process namespace of the host.

On Mon, Nov 21, 2016 at 2:30 PM, haosdent <ha...@gmail.com>> wrote:
No sure if it related to this issue https://github.com/mesosphere/docker-containers/issues/9

On Mon, Nov 21, 2016 at 12:27 PM, X Brick <ng...@gmail.com>> wrote:

Hi,

I meet a problem when running mesos-slave in the docker. Here are some zombie process in this way.

```
root     10547 19464  0 Oct25 ?        00:00:00 [docker] <defunct>
root     14505 19464  0 Oct25 ?        00:00:00 [docker] <defunct>
root     16069 19464  0 Oct25 ?        00:00:00 [docker] <defunct>
root     19962 19464  0 Oct25 ?        00:00:00 [docker] <defunct>
root     23346 19464  0 Oct25 ?        00:00:00 [docker] <defunct>
root     24544 19464  0 Oct25 ?        00:00:00 [docker] <defunct>
```

And I find the zombies come from mesos-slave process:

```
pstree -p -s 10547
systemd(1)───docker-containe(19448)───mesos-slave(19464)───docker(10547)
```

The logs has been deleted by the cron job a few weeks ago, but I remember so many `Failed to shutdown socket with fd xx: Transport endpoint is not connected` in the log.

I report this to the JIRA: https://issues.apache.org/jira/browse/MESOS-6615

Is there anyone saw this issue before ?



--
Best Regards,
Haosdent Huang



--
Best Regards,
Haosdent Huang




--
Deshi Xiao
Twitter: xds2000
E-mail: xiaods(AT)gmail.com<http://gmail.com>



--
Best Regards,
Haosdent Huang



--
Best Regards,
Haosdent Huang

Re: Running mesos-slave in the docker that leave many zombie process

Posted by haosdent <ha...@gmail.com>.
Hi, @xiaodong

>If I restart mesos-slave. Then the zombie container are gone.

Yep, during recovering stage, mesos-agent would remove those containers
they don't wait

> Remove executor

What you mean about "remove executor", do you mean kill
`mesos-docker-executor`?
If you mean this, it is an expected behavior. Because the docker container
is reaped by mesos-docker-executor, if you kill it.
Mesos agent isn't aware of the status of docker container (the status of
the docker container is checked by mesos-docker-executor and send task
statuses to the mesos agent ).


On Thu, Feb 23, 2017 at 10:44 AM, Xiaodong Zhang <xd...@alauda.io> wrote:

> What can I do for this. Do you guys need more info?
>
> 发件人: Xiaodong Zhang <xd...@alauda.io>
> 答复: "user@mesos.apache.org" <us...@mesos.apache.org>
> 日期: 2017年2月21日 星期二 下午6:18
> 至: "user@mesos.apache.org" <us...@mesos.apache.org>
>
> 主题: Re: Running mesos-slave in the docker that leave many zombie process
>
> Hi @Haosdent thanks for your reply.
> I tried 1.0.3, 1.1.0. They both have the same problem.
>
>    1. Create container.
>
>
>   2. Restart container. Works well.
>
>   3. Remove executor
>
>
> If I restart mesos-slave. Then the zombie container gone.
>
> Any thoughts?
>
> Thanks,
> Xiaodong
>
> 发件人: haosdent <ha...@gmail.com>
> 答复: "user@mesos.apache.org" <us...@mesos.apache.org>
> 日期: 2017年2月21日 星期二 上午1:19
> 至: user <us...@mesos.apache.org>
> 主题: Re: Running mesos-slave in the docker that leave many zombie process
>
> Hi, @xiaodong May you try if this problem still exists after 1.0? I
> remember Mesos change the recovery for docker containers to avoid this
> after 1.0.
>
> On Tue, Feb 21, 2017 at 1:13 AM, Xiaodong Zhang <xd...@alauda.io> wrote:
>
>> Hi guys.
>>
>> I try to fix zombie container as this email. It works well when I restart
>> mesos-slave. No zombie containers occur. But this  just works on
>>  restarting mesos-slave.
>>
>> If I restart the executor, the executor will quit, and the container
>> which executor start, will be a zombie container.
>>
>> Any idea about this? My mesos version is 0.28.
>>
>> Here is some pic:
>>
>>
>>    1. Start a container.
>>
>>
>> 2. Restart mesos-slave. Everything is ok.
>>
>> 3. Kill the executor container, zombie container occur.
>>
>>
>> How can I fix this?
>>
>> Thanks,
>> Xiaodong
>>
>> 发件人: tommy xiao <xi...@gmail.com>
>> 答复: "user@mesos.apache.org" <us...@mesos.apache.org>
>> 日期: 2016年11月22日 星期二 上午12:32
>> 至: user <us...@mesos.apache.org>
>> 主题: Re: Running mesos-slave in the docker that leave many zombie process
>>
>> you need it  --pid=host
>>
>> 2016-11-21 15:01 GMT+08:00 X Brick <ng...@gmail.com>:
>>
>>> Thanks @haosdent, let me try it.
>>>
>>> 2016-11-21 14:33 GMT+08:00 haosdent <ha...@gmail.com>:
>>>
>>>> Pass the `--pid=host` flag when starting the docker container  may
>>>> resolve this.
>>>> >start the mesos_slave container with "--pid=host" so that it uses the
>>>> process namespace of the host.
>>>>
>>>> On Mon, Nov 21, 2016 at 2:30 PM, haosdent <ha...@gmail.com> wrote:
>>>>
>>>>> No sure if it related to this issue https://github.com/mesos
>>>>> phere/docker-containers/issues/9
>>>>>
>>>>> On Mon, Nov 21, 2016 at 12:27 PM, X Brick <ng...@gmail.com> wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> I meet a problem when running mesos-slave in the docker. Here are
>>>>>> some zombie process in this way.
>>>>>>
>>>>>> ```
>>>>>> root     10547 19464  0 Oct25 ?        00:00:00 [docker] <defunct>
>>>>>> root     14505 19464  0 Oct25 ?        00:00:00 [docker] <defunct>
>>>>>> root     16069 19464  0 Oct25 ?        00:00:00 [docker] <defunct>
>>>>>> root     19962 19464  0 Oct25 ?        00:00:00 [docker] <defunct>
>>>>>> root     23346 19464  0 Oct25 ?        00:00:00 [docker] <defunct>
>>>>>> root     24544 19464  0 Oct25 ?        00:00:00 [docker] <defunct>
>>>>>> ```
>>>>>>
>>>>>> And I find the zombies come from mesos-slave process:
>>>>>>
>>>>>> ```
>>>>>> pstree -p -s 10547
>>>>>> systemd(1)───docker-containe(19448)───mesos-slave(19464)───docker(10547)
>>>>>> ```
>>>>>>
>>>>>> The logs has been deleted by the cron job a few weeks ago, but I
>>>>>> remember so many `Failed to shutdown socket with fd xx: Transport
>>>>>> endpoint is not connected` in the log.
>>>>>>
>>>>>> I report this to the JIRA: https://issues.apache.org/jira
>>>>>> /browse/MESOS-6615
>>>>>>
>>>>>> Is there anyone saw this issue before ?
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Best Regards,
>>>>> Haosdent Huang
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Best Regards,
>>>> Haosdent Huang
>>>>
>>>
>>>
>>
>>
>> --
>> Deshi Xiao
>> Twitter: xds2000
>> E-mail: xiaods(AT)gmail.com
>>
>
>
>
> --
> Best Regards,
> Haosdent Huang
>



-- 
Best Regards,
Haosdent Huang

Re: Running mesos-slave in the docker that leave many zombie process

Posted by Xiaodong Zhang <xd...@alauda.io>.
What can I do for this. Do you guys need more info?

发件人: Xiaodong Zhang <xd...@alauda.io>>
答复: "user@mesos.apache.org<ma...@mesos.apache.org>" <us...@mesos.apache.org>>
日期: 2017年2月21日 星期二 下午6:18
至: "user@mesos.apache.org<ma...@mesos.apache.org>" <us...@mesos.apache.org>>
主题: Re: Running mesos-slave in the docker that leave many zombie process

Hi @Haosdent thanks for your reply.
I tried 1.0.3, 1.1.0. They both have the same problem.

  1.  Create container.

[cid:89116669-04F7-4B75-A1C5-9067FC6CE59D]

  2. Restart container. Works well.

  3. Remove executor
     [cid:2F546391-5FD8-4676-ADE3-35F29B8F7326]

If I restart mesos-slave. Then the zombie container gone.

Any thoughts?

Thanks,
Xiaodong

发件人: haosdent <ha...@gmail.com>>
答复: "user@mesos.apache.org<ma...@mesos.apache.org>" <us...@mesos.apache.org>>
日期: 2017年2月21日 星期二 上午1:19
至: user <us...@mesos.apache.org>>
主题: Re: Running mesos-slave in the docker that leave many zombie process

Hi, @xiaodong May you try if this problem still exists after 1.0? I remember Mesos change the recovery for docker containers to avoid this after 1.0.

On Tue, Feb 21, 2017 at 1:13 AM, Xiaodong Zhang <xd...@alauda.io>> wrote:
Hi guys.

I try to fix zombie container as this email. It works well when I restart mesos-slave. No zombie containers occur. But this  just works on  restarting mesos-slave.

If I restart the executor, the executor will quit, and the container which executor start, will be a zombie container.

Any idea about this? My mesos version is 0.28.

Here is some pic:


  1.  Start a container.

[cid:C1C8B5B9-DE97-486B-9512-117DCD305B3F]

2. Restart mesos-slave. Everything is ok.
[cid:41C46ACA-5234-418E-8ACC-B1FB50533AF4]

3. Kill the executor container, zombie container occur.

[cid:B6685AC2-E65B-4F16-AC3B-7C4636C96D10]

How can I fix this?

Thanks,
Xiaodong

发件人: tommy xiao <xi...@gmail.com>>
答复: "user@mesos.apache.org<ma...@mesos.apache.org>" <us...@mesos.apache.org>>
日期: 2016年11月22日 星期二 上午12:32
至: user <us...@mesos.apache.org>>
主题: Re: Running mesos-slave in the docker that leave many zombie process

you need it  --pid=host

2016-11-21 15:01 GMT+08:00 X Brick <ng...@gmail.com>>:
Thanks @haosdent, let me try it.

2016-11-21 14:33 GMT+08:00 haosdent <ha...@gmail.com>>:
Pass the `--pid=host` flag when starting the docker container  may resolve this.
>start the mesos_slave container with "--pid=host" so that it uses the process namespace of the host.

On Mon, Nov 21, 2016 at 2:30 PM, haosdent <ha...@gmail.com>> wrote:
No sure if it related to this issue https://github.com/mesosphere/docker-containers/issues/9

On Mon, Nov 21, 2016 at 12:27 PM, X Brick <ng...@gmail.com>> wrote:

Hi,

I meet a problem when running mesos-slave in the docker. Here are some zombie process in this way.

```
root     10547 19464  0 Oct25 ?        00:00:00 [docker] <defunct>
root     14505 19464  0 Oct25 ?        00:00:00 [docker] <defunct>
root     16069 19464  0 Oct25 ?        00:00:00 [docker] <defunct>
root     19962 19464  0 Oct25 ?        00:00:00 [docker] <defunct>
root     23346 19464  0 Oct25 ?        00:00:00 [docker] <defunct>
root     24544 19464  0 Oct25 ?        00:00:00 [docker] <defunct>
```

And I find the zombies come from mesos-slave process:

```
pstree -p -s 10547
systemd(1)───docker-containe(19448)───mesos-slave(19464)───docker(10547)
```

The logs has been deleted by the cron job a few weeks ago, but I remember so many `Failed to shutdown socket with fd xx: Transport endpoint is not connected` in the log.

I report this to the JIRA: https://issues.apache.org/jira/browse/MESOS-6615

Is there anyone saw this issue before ?



--
Best Regards,
Haosdent Huang



--
Best Regards,
Haosdent Huang




--
Deshi Xiao
Twitter: xds2000
E-mail: xiaods(AT)gmail.com<http://gmail.com>



--
Best Regards,
Haosdent Huang

Re: Running mesos-slave in the docker that leave many zombie process

Posted by Xiaodong Zhang <xd...@alauda.io>.
Hi guys,

About this issue.

What can I do, or are there any other info I can offer?

Thanks,
Xiaodong

发件人: Xiaodong Zhang <xd...@alauda.io>>
答复: "user@mesos.apache.org<ma...@mesos.apache.org>" <us...@mesos.apache.org>>
日期: 2017年2月21日 星期二 下午6:18
至: "user@mesos.apache.org<ma...@mesos.apache.org>" <us...@mesos.apache.org>>
主题: Re: Running mesos-slave in the docker that leave many zombie process

Hi @Haosdent thanks for your reply.
I tried 1.0.3, 1.1.0. They both have the same problem.

  1.  Create container.

[cid:89116669-04F7-4B75-A1C5-9067FC6CE59D]

  2. Restart container. Works well.

  3. Remove executor
     [cid:2F546391-5FD8-4676-ADE3-35F29B8F7326]

If I restart mesos-slave. Then the zombie container gone.

Any thoughts?

Thanks,
Xiaodong

发件人: haosdent <ha...@gmail.com>>
答复: "user@mesos.apache.org<ma...@mesos.apache.org>" <us...@mesos.apache.org>>
日期: 2017年2月21日 星期二 上午1:19
至: user <us...@mesos.apache.org>>
主题: Re: Running mesos-slave in the docker that leave many zombie process

Hi, @xiaodong May you try if this problem still exists after 1.0? I remember Mesos change the recovery for docker containers to avoid this after 1.0.

On Tue, Feb 21, 2017 at 1:13 AM, Xiaodong Zhang <xd...@alauda.io>> wrote:
Hi guys.

I try to fix zombie container as this email. It works well when I restart mesos-slave. No zombie containers occur. But this  just works on  restarting mesos-slave.

If I restart the executor, the executor will quit, and the container which executor start, will be a zombie container.

Any idea about this? My mesos version is 0.28.

Here is some pic:


  1.  Start a container.

[cid:C1C8B5B9-DE97-486B-9512-117DCD305B3F]

2. Restart mesos-slave. Everything is ok.
[cid:41C46ACA-5234-418E-8ACC-B1FB50533AF4]

3. Kill the executor container, zombie container occur.

[cid:B6685AC2-E65B-4F16-AC3B-7C4636C96D10]

How can I fix this?

Thanks,
Xiaodong

发件人: tommy xiao <xi...@gmail.com>>
答复: "user@mesos.apache.org<ma...@mesos.apache.org>" <us...@mesos.apache.org>>
日期: 2016年11月22日 星期二 上午12:32
至: user <us...@mesos.apache.org>>
主题: Re: Running mesos-slave in the docker that leave many zombie process

you need it  --pid=host

2016-11-21 15:01 GMT+08:00 X Brick <ng...@gmail.com>>:
Thanks @haosdent, let me try it.

2016-11-21 14:33 GMT+08:00 haosdent <ha...@gmail.com>>:
Pass the `--pid=host` flag when starting the docker container  may resolve this.
>start the mesos_slave container with "--pid=host" so that it uses the process namespace of the host.

On Mon, Nov 21, 2016 at 2:30 PM, haosdent <ha...@gmail.com>> wrote:
No sure if it related to this issue https://github.com/mesosphere/docker-containers/issues/9

On Mon, Nov 21, 2016 at 12:27 PM, X Brick <ng...@gmail.com>> wrote:

Hi,

I meet a problem when running mesos-slave in the docker. Here are some zombie process in this way.

```
root     10547 19464  0 Oct25 ?        00:00:00 [docker] <defunct>
root     14505 19464  0 Oct25 ?        00:00:00 [docker] <defunct>
root     16069 19464  0 Oct25 ?        00:00:00 [docker] <defunct>
root     19962 19464  0 Oct25 ?        00:00:00 [docker] <defunct>
root     23346 19464  0 Oct25 ?        00:00:00 [docker] <defunct>
root     24544 19464  0 Oct25 ?        00:00:00 [docker] <defunct>
```

And I find the zombies come from mesos-slave process:

```
pstree -p -s 10547
systemd(1)───docker-containe(19448)───mesos-slave(19464)───docker(10547)
```

The logs has been deleted by the cron job a few weeks ago, but I remember so many `Failed to shutdown socket with fd xx: Transport endpoint is not connected` in the log.

I report this to the JIRA: https://issues.apache.org/jira/browse/MESOS-6615

Is there anyone saw this issue before ?



--
Best Regards,
Haosdent Huang



--
Best Regards,
Haosdent Huang




--
Deshi Xiao
Twitter: xds2000
E-mail: xiaods(AT)gmail.com<http://gmail.com>



--
Best Regards,
Haosdent Huang

Re: Running mesos-slave in the docker that leave many zombie process

Posted by Xiaodong Zhang <xd...@alauda.io>.
Hi @Haosdent thanks for your reply.
I tried 1.0.3, 1.1.0. They both have the same problem.

  1.  Create container.

[cid:89116669-04F7-4B75-A1C5-9067FC6CE59D]

  2. Restart container. Works well.

  3. Remove executor
     [cid:2F546391-5FD8-4676-ADE3-35F29B8F7326]

If I restart mesos-slave. Then the zombie container gone.

Any thoughts?

Thanks,
Xiaodong

发件人: haosdent <ha...@gmail.com>>
答复: "user@mesos.apache.org<ma...@mesos.apache.org>" <us...@mesos.apache.org>>
日期: 2017年2月21日 星期二 上午1:19
至: user <us...@mesos.apache.org>>
主题: Re: Running mesos-slave in the docker that leave many zombie process

Hi, @xiaodong May you try if this problem still exists after 1.0? I remember Mesos change the recovery for docker containers to avoid this after 1.0.

On Tue, Feb 21, 2017 at 1:13 AM, Xiaodong Zhang <xd...@alauda.io>> wrote:
Hi guys.

I try to fix zombie container as this email. It works well when I restart mesos-slave. No zombie containers occur. But this  just works on  restarting mesos-slave.

If I restart the executor, the executor will quit, and the container which executor start, will be a zombie container.

Any idea about this? My mesos version is 0.28.

Here is some pic:


  1.  Start a container.

[cid:C1C8B5B9-DE97-486B-9512-117DCD305B3F]

2. Restart mesos-slave. Everything is ok.
[cid:41C46ACA-5234-418E-8ACC-B1FB50533AF4]

3. Kill the executor container, zombie container occur.

[cid:B6685AC2-E65B-4F16-AC3B-7C4636C96D10]

How can I fix this?

Thanks,
Xiaodong

发件人: tommy xiao <xi...@gmail.com>>
答复: "user@mesos.apache.org<ma...@mesos.apache.org>" <us...@mesos.apache.org>>
日期: 2016年11月22日 星期二 上午12:32
至: user <us...@mesos.apache.org>>
主题: Re: Running mesos-slave in the docker that leave many zombie process

you need it  --pid=host

2016-11-21 15:01 GMT+08:00 X Brick <ng...@gmail.com>>:
Thanks @haosdent, let me try it.

2016-11-21 14:33 GMT+08:00 haosdent <ha...@gmail.com>>:
Pass the `--pid=host` flag when starting the docker container  may resolve this.
>start the mesos_slave container with "--pid=host" so that it uses the process namespace of the host.

On Mon, Nov 21, 2016 at 2:30 PM, haosdent <ha...@gmail.com>> wrote:
No sure if it related to this issue https://github.com/mesosphere/docker-containers/issues/9

On Mon, Nov 21, 2016 at 12:27 PM, X Brick <ng...@gmail.com>> wrote:

Hi,

I meet a problem when running mesos-slave in the docker. Here are some zombie process in this way.

```
root     10547 19464  0 Oct25 ?        00:00:00 [docker] <defunct>
root     14505 19464  0 Oct25 ?        00:00:00 [docker] <defunct>
root     16069 19464  0 Oct25 ?        00:00:00 [docker] <defunct>
root     19962 19464  0 Oct25 ?        00:00:00 [docker] <defunct>
root     23346 19464  0 Oct25 ?        00:00:00 [docker] <defunct>
root     24544 19464  0 Oct25 ?        00:00:00 [docker] <defunct>
```

And I find the zombies come from mesos-slave process:

```
pstree -p -s 10547
systemd(1)───docker-containe(19448)───mesos-slave(19464)───docker(10547)
```

The logs has been deleted by the cron job a few weeks ago, but I remember so many `Failed to shutdown socket with fd xx: Transport endpoint is not connected` in the log.

I report this to the JIRA: https://issues.apache.org/jira/browse/MESOS-6615

Is there anyone saw this issue before ?



--
Best Regards,
Haosdent Huang



--
Best Regards,
Haosdent Huang




--
Deshi Xiao
Twitter: xds2000
E-mail: xiaods(AT)gmail.com<http://gmail.com>



--
Best Regards,
Haosdent Huang

Re: Running mesos-slave in the docker that leave many zombie process

Posted by haosdent <ha...@gmail.com>.
Hi, @xiaodong May you try if this problem still exists after 1.0? I
remember Mesos change the recovery for docker containers to avoid this
after 1.0.

On Tue, Feb 21, 2017 at 1:13 AM, Xiaodong Zhang <xd...@alauda.io> wrote:

> Hi guys.
>
> I try to fix zombie container as this email. It works well when I restart
> mesos-slave. No zombie containers occur. But this  just works on
>  restarting mesos-slave.
>
> If I restart the executor, the executor will quit, and the container which
> executor start, will be a zombie container.
>
> Any idea about this? My mesos version is 0.28.
>
> Here is some pic:
>
>
>    1. Start a container.
>
>
> 2. Restart mesos-slave. Everything is ok.
>
> 3. Kill the executor container, zombie container occur.
>
>
> How can I fix this?
>
> Thanks,
> Xiaodong
>
> 发件人: tommy xiao <xi...@gmail.com>
> 答复: "user@mesos.apache.org" <us...@mesos.apache.org>
> 日期: 2016年11月22日 星期二 上午12:32
> 至: user <us...@mesos.apache.org>
> 主题: Re: Running mesos-slave in the docker that leave many zombie process
>
> you need it  --pid=host
>
> 2016-11-21 15:01 GMT+08:00 X Brick <ng...@gmail.com>:
>
>> Thanks @haosdent, let me try it.
>>
>> 2016-11-21 14:33 GMT+08:00 haosdent <ha...@gmail.com>:
>>
>>> Pass the `--pid=host` flag when starting the docker container  may
>>> resolve this.
>>> >start the mesos_slave container with "--pid=host" so that it uses the
>>> process namespace of the host.
>>>
>>> On Mon, Nov 21, 2016 at 2:30 PM, haosdent <ha...@gmail.com> wrote:
>>>
>>>> No sure if it related to this issue https://github.com/mesos
>>>> phere/docker-containers/issues/9
>>>>
>>>> On Mon, Nov 21, 2016 at 12:27 PM, X Brick <ng...@gmail.com> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> I meet a problem when running mesos-slave in the docker. Here are some
>>>>> zombie process in this way.
>>>>>
>>>>> ```
>>>>> root     10547 19464  0 Oct25 ?        00:00:00 [docker] <defunct>
>>>>> root     14505 19464  0 Oct25 ?        00:00:00 [docker] <defunct>
>>>>> root     16069 19464  0 Oct25 ?        00:00:00 [docker] <defunct>
>>>>> root     19962 19464  0 Oct25 ?        00:00:00 [docker] <defunct>
>>>>> root     23346 19464  0 Oct25 ?        00:00:00 [docker] <defunct>
>>>>> root     24544 19464  0 Oct25 ?        00:00:00 [docker] <defunct>
>>>>> ```
>>>>>
>>>>> And I find the zombies come from mesos-slave process:
>>>>>
>>>>> ```
>>>>> pstree -p -s 10547
>>>>> systemd(1)───docker-containe(19448)───mesos-slave(19464)───docker(10547)
>>>>> ```
>>>>>
>>>>> The logs has been deleted by the cron job a few weeks ago, but I
>>>>> remember so many `Failed to shutdown socket with fd xx: Transport
>>>>> endpoint is not connected` in the log.
>>>>>
>>>>> I report this to the JIRA: https://issues.apache.org/jira
>>>>> /browse/MESOS-6615
>>>>>
>>>>> Is there anyone saw this issue before ?
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Best Regards,
>>>> Haosdent Huang
>>>>
>>>
>>>
>>>
>>> --
>>> Best Regards,
>>> Haosdent Huang
>>>
>>
>>
>
>
> --
> Deshi Xiao
> Twitter: xds2000
> E-mail: xiaods(AT)gmail.com
>



-- 
Best Regards,
Haosdent Huang