You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@mesos.apache.org by "Alexander Rukletsov (JIRA)" <ji...@apache.org> on 2015/08/11 17:52:45 UTC

[jira] [Commented] (MESOS-2706) When the docker-tasks grow, the time spare between Queuing task and Starting container grows

    [ https://issues.apache.org/jira/browse/MESOS-2706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14681976#comment-14681976 ] 

Alexander Rukletsov commented on MESOS-2706:
--------------------------------------------

I was able to reproduce the issue as well. Here is an excerpt of slave events for one of the first task is the sequence:
{code}
I0806 08:24:32.835737  1245 slave.cpp:1144] Got assigned task
I0806 08:24:32.936342  1245 slave.cpp:4208] Launching executor
I0806 08:24:32.938051  1245 slave.cpp:1401] Queuing task
I0806 08:24:32.952314  1242 docker.cpp:626] Starting container
I0806 08:24:33.637434  1243 docker.cpp:277] Checkpointing pid 21930
I0806 08:24:33.716608  1240 slave.cpp:3165] Monitoring executor
I0806 08:24:33.830728  1241 slave.cpp:1555] Sending queued task ... to executor
I0806 08:24:33.858212  1245 slave.cpp:2776] Forwarding the update TASK_RUNNING
{code}

and one of the last:
{code}
I0806 08:31:27.482077  1245 slave.cpp:1144] Got assigned task
I0806 08:31:27.502507  1245 slave.cpp:4208] Launching executor
I0806 08:31:27.503300  1245 slave.cpp:1401] Queuing task
I0806 08:31:39.053246  1246 docker.cpp:626] Starting container
I0806 08:32:47.695961  1246 docker.cpp:277] Checkpointing pid 19414
I0806 08:33:11.880014  1241 slave.cpp:3165] Monitoring executor
I0806 08:33:12.060046  1241 slave.cpp:1555] Sending queued task ... to executor
I0806 08:33:12.076020  1240 slave.cpp:2776] Forwarding the update TASK_RUNNING
{code}

> When the docker-tasks grow, the time spare between Queuing task and Starting container grows
> --------------------------------------------------------------------------------------------
>
>                 Key: MESOS-2706
>                 URL: https://issues.apache.org/jira/browse/MESOS-2706
>             Project: Mesos
>          Issue Type: Bug
>          Components: docker
>    Affects Versions: 0.22.0
>         Environment: My Environment info:
> Mesos 0.22.0 & Marathon 0.82-RC1 both running in one host-server.
> Every docker-task require 0.02 CPU and 128MB ,and the server has 8 cpus and 24G mems.
> So Mesos can launch thousands of task in theory.
> And the docker-task is very light-weight to launch a sshd service .
>            Reporter: chenqiuhao
>
> At the beginning, Marathon can launch docker-task very fast,but when the number of tasks in the only-one mesos-slave host reached 50,It seemed Marathon lauch docker-task slow.
> So I check the mesos-slave log,and I found that the time spare between Queuing task and Starting container grew .
> For example, 
> launch the 1st docker task, it takes about 0.008s
> [root@CNSH231434 mesos-slave]# tail -f slave.out |egrep 'Queuing task|Starting container'
> I0508 15:54:00.188350 225779 slave.cpp:1378] Queuing task 'dev-rhel-sf.631d454d-f557-11e4-b4f4-628e0a30542b' for executor dev-rhel-sf.631d454d-f557-11e4-b4f4-628e0a30542b of framework '20150202-112355-2684495626-5050-26153-0000
> I0508 15:54:00.196832 225781 docker.cpp:581] Starting container 'd0b0813a-6cb6-4dfd-bbce-f1b338744285' for task 'dev-rhel-sf.631d454d-f557-11e4-b4f4-628e0a30542b' (and executor 'dev-rhel-sf.631d454d-f557-11e4-b4f4-628e0a30542b') of framework '20150202-112355-2684495626-5050-26153-0000'
> launch the 50th docker task, it takes about 4.9s
> I0508 16:12:10.908596 225781 slave.cpp:1378] Queuing task 'dev-rhel-sf.ed3a6922-f559-11e4-ae87-628e0a30542b' for executor dev-rhel-sf.ed3a6922-f559-11e4-ae87-628e0a30542b of framework '20150202-112355-2684495626-5050-26153-0000
> I0508 16:12:15.801503 225778 docker.cpp:581] Starting container '482dd47f-b9ab-4b09-b89e-e361d6f004a4' for task 'dev-rhel-sf.ed3a6922-f559-11e4-ae87-628e0a30542b' (and executor 'dev-rhel-sf.ed3a6922-f559-11e4-ae87-628e0a30542b') of framework '20150202-112355-2684495626-5050-26153-0000'
> And when i launch the 100th docker task,it takes about 13s!
> And I did the same test in one 24 Cpus and 256G mems server-host, it got the same result.
> Did somebody have the same experience , or Can help to do the same pressure test ?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)