You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@mesos.apache.org by "Greg Mann (JIRA)" <ji...@apache.org> on 2019/07/31 16:27:00 UTC
[jira] [Created] (MESOS-9918) Agent fails to scale many
tasks/containers with command health checks
Greg Mann created MESOS-9918:
--------------------------------
Summary: Agent fails to scale many tasks/containers with command health checks
Key: MESOS-9918
URL: https://issues.apache.org/jira/browse/MESOS-9918
Project: Mesos
Issue Type: Task
Components: agent, containerization
Reporter: Greg Mann
When ~50 containers are launched simultaneously in a task group on an agent, all of which specify command health checks, they will fail to become healthy. The {{LAUNCH_NESTED_CONTAINER_SESSION}} calls for the health checks time out, leading to task group failure.
We should both investigate the cause of the timeouts (based on previous profiling efforts, it is likely due to the cost of forking from the agent process), as well as consider rate-limiting options to allow operators to simultaneously scale large numbers of containers.
--
This message was sent by Atlassian JIRA
(v7.6.14#76016)