You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@mesos.apache.org by "Greg Mann (JIRA)" <ji...@apache.org> on 2019/07/31 16:27:00 UTC

[jira] [Created] (MESOS-9918) Agent fails to scale many tasks/containers with command health checks

Greg Mann created MESOS-9918:
--------------------------------

             Summary: Agent fails to scale many tasks/containers with command health checks
                 Key: MESOS-9918
                 URL: https://issues.apache.org/jira/browse/MESOS-9918
             Project: Mesos
          Issue Type: Task
          Components: agent, containerization
            Reporter: Greg Mann


When ~50 containers are launched simultaneously in a task group on an agent, all of which specify command health checks, they will fail to become healthy. The {{LAUNCH_NESTED_CONTAINER_SESSION}} calls for the health checks time out, leading to task group failure.

We should both investigate the cause of the timeouts (based on previous profiling efforts, it is likely due to the cost of forking from the agent process), as well as consider rate-limiting options to allow operators to simultaneously scale large numbers of containers.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)