You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@mesos.apache.org by "Jie Yu (JIRA)" <ji...@apache.org> on 2018/10/02 05:52:00 UTC

[jira] [Created] (MESOS-9283) Docker containerizer actor can get backlogged with large number of containers.

Jie Yu created MESOS-9283:
-----------------------------

             Summary: Docker containerizer actor can get backlogged with large number of containers.
                 Key: MESOS-9283
                 URL: https://issues.apache.org/jira/browse/MESOS-9283
             Project: Mesos
          Issue Type: Bug
          Components: containerization
    Affects Versions: 1.7.0, 1.6.1, 1.5.1
            Reporter: Jie Yu


We observed during some scale testing that we do internally.

When launching 300+ Docker containers on a single agent box, it's possible that the Docker containerizer actor gets backlogged. As a result, API processing like `GET_CONTAINERS` will become unresponsive. It'll also block Mesos containerizer from launching containers if one specified `--containers=docker,mesos` because Docker containerizer launch will be invoked first by the composing containerizer (and queued).

Profiling results show that the bottleneck is `os::killtree`, which will be invoked when the Docker commands are discarded (e.g., client disconnect, etc.).

For this particular case, killtree is not really necessary because the docker command does not fork additional subprocesses. If we use the argv version of `subprocess` to launch docker commands, we can simply use os::kill instead. We confirmed that, by switching to os::kill, the performance issues goes away, and the agent can easily scale up to 300+ containers.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)