You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@mesos.apache.org by "Greg Mann (JIRA)" <ji...@apache.org> on 2019/01/17 01:00:59 UTC
[jira] [Commented] (MESOS-9509) Benchmark command health checks in default executor

    [ https://issues.apache.org/jira/browse/MESOS-9509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16744576#comment-16744576 ] 

Greg Mann commented on MESOS-9509:
----------------------------------

I have a repository here containing some tooling for benchmarking Mesos checks by launching a variable number of tasks in a single pod using {{mesos-execute}}: [https://github.com/greggomann/mesos-healthcheck-benchmark]

The main results I've produced so far show how the overall check rate and check responsiveness vary with the number of tasks in the pod:
 !check-rate.png|width=544,height=408!
 !check-responsiveness.png|width=544,height=408! 

In the above tests, the check interval was set to zero and the timeout was set to 5 minutes so that all checks would be launched again immediately once they completed.

I have perf traces from these tests as well, and I'll update this ticket with flame graphs from those when I have them. I'd also like to analyze the logs to determine how long the agent is spending in each stage of check container launch.

For now I'm moving this ticket back to Accepted; myself or someone else can pick it back up when they have time, as I believe there's much more work to do here.

> Benchmark command health checks in default executor
> ---------------------------------------------------
>
>                 Key: MESOS-9509
>                 URL: https://issues.apache.org/jira/browse/MESOS-9509
>             Project: Mesos
>          Issue Type: Task
>          Components: executor
>            Reporter: Vinod Kone
>            Assignee: Greg Mann
>            Priority: Major
>              Labels: default-executor, foundations, mesosphere, perfomance
>         Attachments: check-rate.png, check-responsiveness.png
>
>
> TCP/HTTP health checks were extensively scale tested as part of https://mesosphere.com/blog/introducing-mesos-native-health-checks-apache-mesos-part-2/. 
> We should do the same for command checks by default executor because it uses a very different mechanism (agent fork/execs the check command as a nested container) and will have very different scalability characteristics.
> We should also use these benchmarks as an opportunity to produce perf traces of the Mesos agent (both with and without process inheritance) so that a thorough analysis of the performance can be done as part of MESOS-9513.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)