You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mesos.apache.org by Giulio Eulisse <Gi...@cern.ch> on 2015/03/23 17:52:42 UTC

Mesos slaves connecting but not active.

Hi,

I'm running using 0.20.1 and I seem to have troubles due to the fact a 
mesos slave is not able to recover the docker containers after a 
restart, resulting in a very long wait.

Is this some known issue?

-- 
Ciao,
Giulio

Re: Mesos slaves connecting but not active.

Posted by Giulio Eulisse <Gi...@cern.ch>.
Ciao,

> How many containers are you running, and what is your system like?

I've something like a dozen of slaves a 2 / 3 containers per slave. I'm 
running on a Centos6 derived distribution (Scientific Linux CERN). On 
the specific slave I do not have any running container:

```
[root@cmsbuild11 ~]# docker ps -q  | wc
       0       0       0
```

but I do have a bunch of dead one:

```
[root@cmsbuild11 ~]# docker ps -qa  | wc
     999     999   12987
```

due to some runaway process.

By attaching via gdb to the docker daemon I get:

```
#0  0x00000000005b0ad4 in syscall.Syscall ()
#1  0x000000000084f91b in 
github.com/docker/docker/pkg/devicemapper.ioctlBlkDiscard ()
#2  0x0000000000000010 in ?? ()
#3  0x000000000000000b in ?? ()
#4  0x0000000000001277 in ?? ()
#5  0x00007f06d004e128 in ?? ()
#6  0x000000c209341e68 in ?? ()
#7  0x00007f06d004e140 in ?? ()
#8  0x0000000000000018 in ?? ()
#9  0x000000c209341e40 in ?? ()
#10 0x0000000000000000 in ?? ()
```

for a few of the running threads (the other ones are blocked in some 
futex). Notice I'm running on a CEPH volume.

-- 
Ciao,
Giulio


> Also are you able to capture through perf or strace what docker rm is
> blocked on?
>
> Tim
>
>
> On Mon, Mar 23, 2015 at 10:12 AM, Giulio Eulisse 
> <Gi...@cern.ch>
> wrote:
>
>> I suspect my problem is that "docker rm" takes forever in my case. 
>> I'm not
>> running docker in docker though.
>>
>>
>> On 23 Mar 2015, at 18:01, haosdent wrote:
>>
>> Are your issue relevant to this?
>>> https://issues.apache.org/jira/browse/MESOS-2115
>>>
>>> On Tue, Mar 24, 2015 at 12:52 AM, Giulio Eulisse 
>>> <Gi...@cern.ch>
>>> wrote:
>>>
>>> Hi,
>>>>
>>>> I'm running using 0.20.1 and I seem to have troubles due to the 
>>>> fact a
>>>> mesos slave is not able to recover the docker containers after a 
>>>> restart,
>>>> resulting in a very long wait.
>>>>
>>>> Is this some known issue?
>>>>
>>>> --
>>>> Ciao,
>>>> Giulio
>>>>
>>>>
>>>
>>>
>>> --
>>> Best Regards,
>>> Haosdent Huang
>>>
>>

Re: Mesos slaves connecting but not active.

Posted by Giulio Eulisse <Gi...@cern.ch>.
Ciao,

I updated to 0.21.1 and seems to have fixed the issue (at least the 
slave reconnects). docker is still slow deleting stuff.

-- 
Ciao,
Giulio

On 23 Mar 2015, at 18:20, Tim Chen wrote:

> How many containers are you running, and what is your system like?
>
> Also are you able to capture through perf or strace what docker rm is
> blocked on?
>
> Tim
>
>
> On Mon, Mar 23, 2015 at 10:12 AM, Giulio Eulisse 
> <Gi...@cern.ch>
> wrote:
>
>> I suspect my problem is that "docker rm" takes forever in my case. 
>> I'm not
>> running docker in docker though.
>>
>>
>> On 23 Mar 2015, at 18:01, haosdent wrote:
>>
>> Are your issue relevant to this?
>>> https://issues.apache.org/jira/browse/MESOS-2115
>>>
>>> On Tue, Mar 24, 2015 at 12:52 AM, Giulio Eulisse 
>>> <Gi...@cern.ch>
>>> wrote:
>>>
>>> Hi,
>>>>
>>>> I'm running using 0.20.1 and I seem to have troubles due to the 
>>>> fact a
>>>> mesos slave is not able to recover the docker containers after a 
>>>> restart,
>>>> resulting in a very long wait.
>>>>
>>>> Is this some known issue?
>>>>
>>>> --
>>>> Ciao,
>>>> Giulio
>>>>
>>>>
>>>
>>>
>>> --
>>> Best Regards,
>>> Haosdent Huang
>>>
>>

Re: Mesos slaves connecting but not active.

Posted by Tim Chen <ti...@mesosphere.io>.
How many containers are you running, and what is your system like?

Also are you able to capture through perf or strace what docker rm is
blocked on?

Tim


On Mon, Mar 23, 2015 at 10:12 AM, Giulio Eulisse <Gi...@cern.ch>
wrote:

> I suspect my problem is that "docker rm" takes forever in my case. I'm not
> running docker in docker though.
>
>
> On 23 Mar 2015, at 18:01, haosdent wrote:
>
>  Are your issue relevant to this?
>> https://issues.apache.org/jira/browse/MESOS-2115
>>
>> On Tue, Mar 24, 2015 at 12:52 AM, Giulio Eulisse <Gi...@cern.ch>
>> wrote:
>>
>>  Hi,
>>>
>>> I'm running using 0.20.1 and I seem to have troubles due to the fact a
>>> mesos slave is not able to recover the docker containers after a restart,
>>> resulting in a very long wait.
>>>
>>> Is this some known issue?
>>>
>>> --
>>> Ciao,
>>> Giulio
>>>
>>>
>>
>>
>> --
>> Best Regards,
>> Haosdent Huang
>>
>

Re: Mesos slaves connecting but not active.

Posted by Giulio Eulisse <Gi...@cern.ch>.
I suspect my problem is that "docker rm" takes forever in my case. I'm 
not running docker in docker though.

On 23 Mar 2015, at 18:01, haosdent wrote:

> Are your issue relevant to this?
> https://issues.apache.org/jira/browse/MESOS-2115
>
> On Tue, Mar 24, 2015 at 12:52 AM, Giulio Eulisse 
> <Gi...@cern.ch>
> wrote:
>
>> Hi,
>>
>> I'm running using 0.20.1 and I seem to have troubles due to the fact 
>> a
>> mesos slave is not able to recover the docker containers after a 
>> restart,
>> resulting in a very long wait.
>>
>> Is this some known issue?
>>
>> --
>> Ciao,
>> Giulio
>>
>
>
>
> -- 
> Best Regards,
> Haosdent Huang

Re: Mesos slaves connecting but not active.

Posted by haosdent <ha...@gmail.com>.
Are your issue relevant to this?
https://issues.apache.org/jira/browse/MESOS-2115

On Tue, Mar 24, 2015 at 12:52 AM, Giulio Eulisse <Gi...@cern.ch>
wrote:

> Hi,
>
> I'm running using 0.20.1 and I seem to have troubles due to the fact a
> mesos slave is not able to recover the docker containers after a restart,
> resulting in a very long wait.
>
> Is this some known issue?
>
> --
> Ciao,
> Giulio
>



-- 
Best Regards,
Haosdent Huang