You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@mesos.apache.org by "Armand Grillet (JIRA)" <ji...@apache.org> on 2017/11/08 11:24:00 UTC

[jira] [Commented] (MESOS-7361) Command checks via agent pollute agent logs.

    [ https://issues.apache.org/jira/browse/MESOS-7361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16243758#comment-16243758 ] 

Armand Grillet commented on MESOS-7361:
---------------------------------------

This issue is due to the fact that if the check is a COMMAND check, the checker will delegate the execution of the check to the Mesos agent via the {{LaunchNestedContainerSession}} API call: https://github.com/apache/mesos/blob/b13c4c3683fd6bad702a7fb9e24cfc3414b921da/src/checks/checker_process.cpp#L555? If we use the universal containerizer, this means following that workflow: https://docs.google.com/document/d/1FtcyQkDfGp-bPHTW4pUoqQCgVlPde936bo-IIENO_ho/edit# 

We do not differentiate regular nested containers and health checks (that are periodically started nested containers), this results in 26 lines of logs for one command health check on a Mesos agent due to its nested container state:

{code}
Oct 30 14:43:15 mesos-agent[4718]: I1017 14:43:15.118629  4727 containerizer.cpp:1056] Trying to chown '/var/lib/mesos/slave/slaves/d71d7fe2-348d-4091-9c26-445d103b3cf1-S1/frameworks/d71d7fe2-348d-4091-9c26-445d103b3cf1-0001/executors/instance-pod-with-healthcheck.8c197d66-b347-11e7-a146-42c7ec5fa9b9/runs/61a37663-ebbe-4e7e-8a0d-9dd7918b2348/containers/ba6aecb9-1b08-4356-af26-7a42a32043ff/containers/check-5517dae6-9d41-492e-b5ae-43bacc8692bf' to user 'root'
Oct 30 14:43:15 mesos-agent[4718]: I1017 14:43:15.118783  4727 containerizer.cpp:1085] Starting container 61a37663-ebbe-4e7e-8a0d-9dd7918b2348.ba6aecb9-1b08-4356-af26-7a42a32043ff.check-5517dae6-9d41-492e-b5ae-43bacc8692bf
Oct 30 14:43:15 mesos-agent[4718]: I1017 14:43:15.119021  4727 containerizer.cpp:2714] Transitioning the state of container 61a37663-ebbe-4e7e-8a0d-9dd7918b2348.ba6aecb9-1b08-4356-af26-7a42a32043ff.check-5517dae6-9d41-492e-b5ae-43bacc8692bf from PROVISIONING to PREPARING
Oct 30 14:43:15 mesos-agent[4718]: I1017 14:43:15.144028  4721 switchboard.cpp:575] Created I/O switchboard server (pid: 6669) listening on socket file '/tmp/mesos-io-switchboard-2ee398cd-f448-4bb9-9d80-e57069b9c199' for container 61a37663-ebbe-4e7e-8a0d-9dd7918b2348.ba6aecb9-1b08-4356-af26-7a42a32043ff.check-5517dae6-9d41-492e-b5ae-43bacc8692bf
Oct 30 14:43:15 mesos-agent[4718]: I1017 14:43:15.171833  4723 linux_launcher.cpp:429] Launching nested container 61a37663-ebbe-4e7e-8a0d-9dd7918b2348.ba6aecb9-1b08-4356-af26-7a42a32043ff.check-5517dae6-9d41-492e-b5ae-43bacc8692bf and cloning with namespaces
Oct 30 14:43:15 mesos-agent[4718]: I1017 14:43:15.203270  4725 containerizer.cpp:2714] Transitioning the state of container 61a37663-ebbe-4e7e-8a0d-9dd7918b2348.ba6aecb9-1b08-4356-af26-7a42a32043ff.check-5517dae6-9d41-492e-b5ae-43bacc8692bf from PREPARING to ISOLATING
Oct 30 14:43:15 mesos-agent[4718]: I1017 14:43:15.206295  4725 containerizer.cpp:2714] Transitioning the state of container 61a37663-ebbe-4e7e-8a0d-9dd7918b2348.ba6aecb9-1b08-4356-af26-7a42a32043ff.check-5517dae6-9d41-492e-b5ae-43bacc8692bf from ISOLATING to FETCHING
Oct 30 14:43:15 mesos-agent[4718]: I1017 14:43:15.206835  4729 containerizer.cpp:2714] Transitioning the state of container 61a37663-ebbe-4e7e-8a0d-9dd7918b2348.ba6aecb9-1b08-4356-af26-7a42a32043ff.check-5517dae6-9d41-492e-b5ae-43bacc8692bf from FETCHING to RUNNING
Oct 30 14:43:15 mesos-agent[4718]: I1017 14:43:15.241371  4728 http.cpp:2854] Received EOF attach response for 61a37663-ebbe-4e7e-8a0d-9dd7918b2348.ba6aecb9-1b08-4356-af26-7a42a32043ff.check-5517dae6-9d41-492e-b5ae-43bacc8692bf
Oct 30 14:43:15 mesos-agent[4718]: I1017 14:43:15.242112  4725 containerizer.cpp:2168] Destroying container
61a37663-ebbe-4e7e-8a0d-9dd7918b2348.ba6aecb9-1b08-4356-af26-7a42a32043ff.check-5517dae6-9d41-492e-b5ae-43bacc8692bf in RUNNING state
Oct 30 14:43:15 mesos-agent[4718]: W1017 14:43:15.242159  4721 http.cpp:2869] Launch nested container session connection for container 61a37663-ebbe-4e7e-8a0d-9dd7918b2348.ba6aecb9-1b08-4356-af26-7a42a32043ff.check-5517dae6-9d41-492e-b5ae-43bacc8692bf closed
Oct 30 14:43:15 mesos-agent[4718]: I1017 14:43:15.242184  4725 containerizer.cpp:2714] Transitioning the state of container 61a37663-ebbe-4e7e-8a0d-9dd7918b2348.ba6aecb9-1b08-4356-af26-7a42a32043ff.check-5517dae6-9d41-492e-b5ae-43bacc8692bf from RUNNING to DESTROYING
Oct 30 14:43:15 mesos-agent[4718]: I1017 14:43:15.242352  4729 linux_launcher.cpp:505] Asked to destroy container 61a37663-ebbe-4e7e-8a0d-9dd7918b2348.ba6aecb9-1b08-4356-af26-7a42a32043ff.check-5517dae6-9d41-492e-b5ae-43bacc8692bf
Oct 30 14:43:15 mesos-agent[4718]: I1017 14:43:15.243599  4729 linux_launcher.cpp:548] Using freezer to destroy cgroup mesos/61a37663-ebbe-4e7e-8a0d-9dd7918b2348/mesos/ba6aecb9-1b08-4356-af26-7a42a32043ff/mesos/check-5517dae6-9d41-492e-b5ae-43bacc8692bf
Oct 30 14:43:15 mesos-agent[4718]: I1017 14:43:15.246537  4724 cgroups.cpp:3055] Freezing cgroup /sys/fs/cgroup/freezer/mesos/61a37663-ebbe-4e7e-8a0d-9dd7918b2348/mesos/ba6aecb9-1b08-4356-af26-7a42a32043ff/mesos/check-5517dae6-9d41-492e-b5ae-43bacc8692bf
Oct 30 14:43:15 mesos-agent[4718]: I1017 14:43:15.250340  4722 cgroups.cpp:1413] Successfully froze cgroup /sys/fs/cgroup/freezer/mesos/61a37663-ebbe-4e7e-8a0d-9dd7918b2348/mesos/ba6aecb9-1b08-4356-af26-7a42a32043ff/mesos/check-5517dae6-9d41-492e-b5ae-43bacc8692bf after 3.11808ms
Oct 30 14:43:15 mesos-agent[4718]: I1017 14:43:15.253630  4729 cgroups.cpp:3073] Thawing cgroup /sys/fs/cgroup/freezer/mesos/61a37663-ebbe-4e7e-8a0d-9dd7918b2348/mesos/ba6aecb9-1b08-4356-af26-7a42a32043ff/mesos/check-5517dae6-9d41-492e-b5ae-43bacc8692bf
Oct 30 14:43:15 mesos-agent[4718]: I1017 14:43:15.256338  4722 cgroups.cpp:1442] Successfully thawed cgroup /sys/fs/cgroup/freezer/mesos/61a37663-ebbe-4e7e-8a0d-9dd7918b2348/mesos/ba6aecb9-1b08-4356-af26-7a42a32043ff/mesos/check-5517dae6-9d41-492e-b5ae-43bacc8692bf after 2.663936ms
Oct 30 14:43:15 mesos-agent[4718]: I1017 14:43:15.336752  4729 switchboard.cpp:888] I/O switchboard server process for container 61a37663-ebbe-4e7e-8a0d-9dd7918b2348.ba6aecb9-1b08-4356-af26-7a42a32043ff.check-5517dae6-9d41-492e-b5ae-43bacc8692bf has terminated (status=0)
Oct 30 14:43:15 mesos-agent[4718]: I1017 14:43:15.336876  4725 containerizer.cpp:2614] Container 61a37663-ebbe-4e7e-8a0d-9dd7918b2348.ba6aecb9-1b08-4356-af26-7a42a32043ff.check-5517dae6-9d41-492e-b5ae-43bacc8692bf has exited
Oct 30 14:43:15 mesos-agent[4718]: I1017 14:43:15.338035  4724 container_assigner.cpp:101] Unregistering container_id[value: "check-5517dae6-9d41-492e-b5ae-43bacc8692bf" parent { value: "ba6aecb9-1b08-4356-af26-7a42a32043ff" parent { value: "61a37663-ebbe-4e7e-8a0d-9dd7918b2348" } }].
Oct 30 14:43:15 mesos-agent[4718]: W1017 14:43:15.338174  4843 container_assigner_strategy.cpp:192] No ephemeral-port reader had been assigned to container[value: "check-5517dae6-9d41-492e-b5ae-43bacc8692bf" parent { value: "ba6aecb9-1b08-4356-af26-7a42a32043ff" parent { value: "61a37663-ebbe-4e7e-8a0d-9dd7918b2348" } }], cannot unregister
Oct 30 14:43:15 mesos-agent[4718]: I1017 14:43:15.338228  4843 container_state_cache_impl.cpp:145] Removing container file[/run/dcos/mesos/isolators/com_mesosphere_MetricsIsolatorModule/containers/check-5517dae6-9d41-492e-b5ae-43bacc8692bf]
Oct 30 14:43:15 mesos-agent[4718]: E1017 14:43:15.338282  4843 container_state_cache_impl.cpp:148] Failed to remove container file[/run/dcos/mesos/isolators/com_mesosphere_MetricsIsolatorModule/containers/check-5517dae6-9d41-492e-b5ae-43bacc8692bf]: No such file or directory
Oct 30 14:43:15 mesos-agent[4718]: I1017 14:43:15.338367  4725 containerizer.cpp:2452] Checkpointing termination state to nested container's runtime directory '/var/run/mesos/containers/61a37663-ebbe-4e7e-8a0d-9dd7918b2348/containers/ba6aecb9-1b08-4356-af26-7a42a32043ff/containers/check-5517dae6-9d41-492e-b5ae-43bacc8692bf/termination'
{code}

This is excessive as, for a cluster running 100 tasks each having a health check running every 30 seconds, this would mean around 5200 lines of logs per minute.


> Command checks via agent pollute agent logs.
> --------------------------------------------
>
>                 Key: MESOS-7361
>                 URL: https://issues.apache.org/jira/browse/MESOS-7361
>             Project: Mesos
>          Issue Type: Improvement
>          Components: agent
>            Reporter: Alexander Rukletsov
>            Assignee: Armand Grillet
>              Labels: check, health-check, mesosphere
>
> Command checks via agent leverage debug container API of the agent to start checks. Each such invocation triggers a bunch of logs on the agent, because the API was not originally designed with periodic invocations in mind. We should find a way to avoid excessive logging on the agent.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)