You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@mesos.apache.org by "Anthony Scalisi (JIRA)" <ji...@apache.org> on 2016/03/04 22:24:40 UTC

[jira] [Updated] (MESOS-4869) /usr/libexec/mesos/mesos-health-check using/leaking a lot of memory

     [ https://issues.apache.org/jira/browse/MESOS-4869?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Anthony Scalisi updated MESOS-4869:
-----------------------------------
    Description: 
We switched our health checks in Marathon from HTTP to COMMAND:

{noformat}
"healthChecks": [
    {
      "protocol": "COMMAND",
      "path": "/ops/ping",
      "command": { "value": "curl --silent -f -X GET http://$HOST:$PORT0/ops/ping > /dev/null" },
      "gracePeriodSeconds": 90,
      "intervalSeconds": 2,
      "portIndex": 0,
      "timeoutSeconds": 5,
      "maxConsecutiveFailures": 3
    }
  ]
{noformat}

All our applications have the same health check (and /ops/ping endpoint).

Even though we have the issue on all our Meos slaves, I'm going to focus on a particular one: *mesos-slave-i-e3a9c724*.

The slave has 16 gigs of memory, with about 12 gigs allocated for 8 tasks:

!https://i.imgur.com/gbRf804.png!

Here is a *docker ps* on it:

{noformat}
root@mesos-slave-i-e3a9c724 # docker ps
CONTAINER ID        IMAGE               COMMAND                  CREATED             STATUS              PORTS                     NAMES
4f7c0aa8d03a        java:8              "/bin/sh -c 'JAVA_OPT"   6 hours ago         Up 6 hours          0.0.0.0:31926->8080/tcp   mesos-29e183be-f611-41b4-824c-2d05b052231b-S6.3dbb1004-5bb8-432f-8fd8-b863bd29341d
66f2fc8f8056        java:8              "/bin/sh -c 'JAVA_OPT"   6 hours ago         Up 6 hours          0.0.0.0:31939->8080/tcp   mesos-29e183be-f611-41b4-824c-2d05b052231b-S6.60972150-b2b1-45d8-8a55-d63e81b8372a
f7382f241fce        java:8              "/bin/sh -c 'JAVA_OPT"   6 hours ago         Up 6 hours          0.0.0.0:31656->8080/tcp   mesos-29e183be-f611-41b4-824c-2d05b052231b-S6.39731a2f-d29e-48d1-9927-34ab8c5f557d
880934c0049e        java:8              "/bin/sh -c 'JAVA_OPT"   24 hours ago        Up 24 hours         0.0.0.0:31371->8080/tcp   mesos-29e183be-f611-41b4-824c-2d05b052231b-S6.23dfe408-ab8f-40be-bf6f-ce27fe885ee0
5eab1f8dac4a        java:8              "/bin/sh -c 'JAVA_OPT"   46 hours ago        Up 46 hours         0.0.0.0:31500->8080/tcp   mesos-29e183be-f611-41b4-824c-2d05b052231b-S6.5ac75198-283f-4349-a220-9e9645b313e7
b63740fe56e7        java:8              "/bin/sh -c 'JAVA_OPT"   46 hours ago        Up 46 hours         0.0.0.0:31382->8080/tcp   mesos-29e183be-f611-41b4-824c-2d05b052231b-S6.5d417f16-df24-49d5-a5b0-38a7966460fe
5c7a9ea77b0e        java:8              "/bin/sh -c 'JAVA_OPT"   2 days ago          Up 2 days           0.0.0.0:31186->8080/tcp   mesos-29e183be-f611-41b4-824c-2d05b052231b-S6.b05043c5-44fc-40bf-aea2-10354e8f5ab4
53065e7a31ad        java:8              "/bin/sh -c 'JAVA_OPT"   2 days ago          Up 2 days           0.0.0.0:31839->8080/tcp   mesos-29e183be-f611-41b4-824c-2d05b052231b-S6.f0a3f4c5-ecdb-4f97-bede-d744feda670c
{quote}

Here is a *docker stats* on it:

{quote}
root@mesos-slave-i-e3a9c724  # docker stats
CONTAINER           CPU %               MEM USAGE / LIMIT     MEM %               NET I/O               BLOCK I/O
4f7c0aa8d03a        2.93%               797.3 MB / 1.611 GB   49.50%              1.277 GB / 1.189 GB   155.6 kB / 151.6 kB
53065e7a31ad        8.30%               738.9 MB / 1.611 GB   45.88%              419.6 MB / 554.3 MB   98.3 kB / 61.44 kB
5c7a9ea77b0e        4.91%               1.081 GB / 1.611 GB   67.10%              423 MB / 526.5 MB     3.219 MB / 61.44 kB
5eab1f8dac4a        3.13%               1.007 GB / 1.611 GB   62.53%              2.737 GB / 2.564 GB   6.566 MB / 118.8 kB
66f2fc8f8056        3.15%               768.1 MB / 1.611 GB   47.69%              258.5 MB / 252.8 MB   1.86 MB / 151.6 kB
880934c0049e        10.07%              735.1 MB / 1.611 GB   45.64%              1.451 GB / 1.399 GB   573.4 kB / 94.21 kB
b63740fe56e7        12.04%              629 MB / 1.611 GB     39.06%              10.29 GB / 9.344 GB   8.102 MB / 61.44 kB
f7382f241fce        6.21%               505 MB / 1.611 GB     31.36%              153.4 MB / 151.9 MB   5.837 MB / 94.21 kB
{noformat}

Not much else is running on the slave, yet the used memory doesn't map to the tasks memory:

{noformat}
Mem:16047M used:13340M buffers:1139M cache:776M
{noformat}


If I exec into the container (*java:8* image), I can see correctly the shell calls to execute the curl specified in the health check as expected and exit correctly.

The only change we noticed since the memory usage woes was related to moving to Mesos doing the health checks instead, so I decided to take a look:

{noformat}
root@mesos-slave-i-e3a9c724 # ps awwx | grep health_check | grep -v grep
 2504 ?        Sl    47:33 /usr/libexec/mesos/mesos-health-check --executor=(1)@10.92.32.63:53432 --health_check_json={"command":{"shell":true,"value":"docker exec mesos-29e183be-f611-41b4-824c-2d05b052231b-S6.f0a3f4c5-ecdb-4f97-bede-d744feda670c sh -c \" curl --silent -f -X GET http:\/\/$HOST:$PORT0\/ops\/ping > \/dev\/null \""},"consecutive_failures":3,"delay_seconds":0.0,"grace_period_seconds":90.0,"interval_seconds":2.0,"timeout_seconds":5.0} --task_id=prod_talkk_email-green.b086206a-e000-11e5-a617-02429957d388
 4220 ?        Sl    47:26 /usr/libexec/mesos/mesos-health-check --executor=(1)@10.92.32.63:54982 --health_check_json={"command":{"shell":true,"value":"docker exec mesos-29e183be-f611-41b4-824c-2d05b052231b-S6.b05043c5-44fc-40bf-aea2-10354e8f5ab4 sh -c \" curl --silent -f -X GET http:\/\/$HOST:$PORT0\/ops\/ping > \/dev\/null \""},"consecutive_failures":3,"delay_seconds":0.0,"grace_period_seconds":90.0,"interval_seconds":2.0,"timeout_seconds":5.0} --task_id=prod_talkk_chat-green.ed53ec41-e000-11e5-a617-02429957d388
 7444 ?        Sl     1:31 /usr/libexec/mesos/mesos-health-check --executor=(1)@10.92.32.63:59422 --health_check_json={"command":{"shell":true,"value":"docker exec mesos-29e183be-f611-41b4-824c-2d05b052231b-S6.60972150-b2b1-45d8-8a55-d63e81b8372a sh -c \" curl --silent -f -X GET http:\/\/$HOST:$PORT0\/ops\/ping > \/dev\/null \""},"consecutive_failures":3,"delay_seconds":0.0,"grace_period_seconds":90.0,"interval_seconds":2.0,"timeout_seconds":5.0} --task_id=prod_talkk_identity-green.aeb2ef3b-e219-11e5-a617-02429957d388
10368 ?        Sl     1:30 /usr/libexec/mesos/mesos-health-check --executor=(1)@10.92.32.63:40981 --health_check_json={"command":{"shell":true,"value":"docker exec mesos-29e183be-f611-41b4-824c-2d05b052231b-S6.3dbb1004-5bb8-432f-8fd8-b863bd29341d sh -c \" curl --silent -f -X GET http:\/\/$HOST:$PORT0\/ops\/ping > \/dev\/null \""},"consecutive_failures":3,"delay_seconds":0.0,"grace_period_seconds":90.0,"interval_seconds":2.0,"timeout_seconds":5.0} --task_id=prod_talkk_channel-green.c6fbd2ac-e219-11e5-a617-02429957d388
12399 ?        Sl     9:45 /usr/libexec/mesos/mesos-health-check --executor=(1)@10.92.32.63:44815 --health_check_json={"command":{"shell":true,"value":"docker exec mesos-29e183be-f611-41b4-824c-2d05b052231b-S6.23dfe408-ab8f-40be-bf6f-ce27fe885ee0 sh -c \" curl --silent -f -X GET http:\/\/$HOST:$PORT0\/ops\/ping > \/dev\/null \""},"consecutive_failures":3,"delay_seconds":0.0,"grace_period_seconds":90.0,"interval_seconds":2.0,"timeout_seconds":5.0} --task_id=prod_talkk_integration-green.143865d5-e17d-11e5-a617-02429957d388
13538 ?        Sl    24:54 /usr/libexec/mesos/mesos-health-check --executor=(1)@10.92.32.63:56598 --health_check_json={"command":{"shell":true,"value":"docker exec mesos-29e183be-f611-41b4-824c-2d05b052231b-S6.5d417f16-df24-49d5-a5b0-38a7966460fe sh -c \" curl --silent -f -X GET http:\/\/$HOST:$PORT0\/ops\/ping > \/dev\/null \""},"consecutive_failures":3,"delay_seconds":0.0,"grace_period_seconds":90.0,"interval_seconds":2.0,"timeout_seconds":5.0} --task_id=prod_talkk_metric-green.75296986-e0c7-11e5-a617-02429957d388
32034 ?        Sl     1:31 /usr/libexec/mesos/mesos-health-check --executor=(1)@10.92.32.63:48119 --health_check_json={"command":{"shell":true,"value":"docker exec mesos-29e183be-f611-41b4-824c-2d05b052231b-S6.39731a2f-d29e-48d1-9927-34ab8c5f557d sh -c \" curl --silent -f -X GET http:\/\/$HOST:$PORT0\/ops\/ping > \/dev\/null \""},"consecutive_failures":3,"delay_seconds":0.0,"grace_period_seconds":90.0,"interval_seconds":2.0,"timeout_seconds":5.0} --task_id=prod_talkk_push-green.601337e6-e219-11e5-a617-02429957d388
{noformat}

The memory usage is really bad:

{noformat}
root@mesos-slave-i-e3a9c724 # ps -eo size,pid,user,command --sort -size | grep health_check | awk '{ hr=$1/1024 ; printf("%13.2f Mb ",hr) } { for ( x=4 ; x<=NF ; x++ ) { printf("%s ",$x) } print "" }'
      2185.39 Mb /usr/libexec/mesos/mesos-health-check --executor=(1)@10.92.32.63:53432 --health_check_json={"command":{"shell":true,"value":"docker exec mesos-29e183be-f611-41b4-824c-2d05b052231b-S6.f0a3f4c5-ecdb-4f97-bede-d744feda670c sh -c \" curl --silent -f -X GET http:\/\/$HOST:$PORT0\/ops\/ping > \/dev\/null \""},"consecutive_failures":3,"delay_seconds":0.0,"grace_period_seconds":90.0,"interval_seconds":2.0,"timeout_seconds":5.0} --task_id=prod_talkk_email-green.b086206a-e000-11e5-a617-02429957d388
      2185.39 Mb /usr/libexec/mesos/mesos-health-check --executor=(1)@10.92.32.63:54982 --health_check_json={"command":{"shell":true,"value":"docker exec mesos-29e183be-f611-41b4-824c-2d05b052231b-S6.b05043c5-44fc-40bf-aea2-10354e8f5ab4 sh -c \" curl --silent -f -X GET http:\/\/$HOST:$PORT0\/ops\/ping > \/dev\/null \""},"consecutive_failures":3,"delay_seconds":0.0,"grace_period_seconds":90.0,"interval_seconds":2.0,"timeout_seconds":5.0} --task_id=prod_talkk_chat-green.ed53ec41-e000-11e5-a617-02429957d388
      1673.39 Mb /usr/libexec/mesos/mesos-health-check --executor=(1)@10.92.32.63:56598 --health_check_json={"command":{"shell":true,"value":"docker exec mesos-29e183be-f611-41b4-824c-2d05b052231b-S6.5d417f16-df24-49d5-a5b0-38a7966460fe sh -c \" curl --silent -f -X GET http:\/\/$HOST:$PORT0\/ops\/ping > \/dev\/null \""},"consecutive_failures":3,"delay_seconds":0.0,"grace_period_seconds":90.0,"interval_seconds":2.0,"timeout_seconds":5.0} --task_id=prod_talkk_metric-green.75296986-e0c7-11e5-a617-02429957d388
      1161.39 Mb /usr/libexec/mesos/mesos-health-check --executor=(1)@10.92.32.63:44815 --health_check_json={"command":{"shell":true,"value":"docker exec mesos-29e183be-f611-41b4-824c-2d05b052231b-S6.23dfe408-ab8f-40be-bf6f-ce27fe885ee0 sh -c \" curl --silent -f -X GET http:\/\/$HOST:$PORT0\/ops\/ping > \/dev\/null \""},"consecutive_failures":3,"delay_seconds":0.0,"grace_period_seconds":90.0,"interval_seconds":2.0,"timeout_seconds":5.0} --task_id=prod_talkk_integration-green.143865d5-e17d-11e5-a617-02429957d388
       649.39 Mb /usr/libexec/mesos/mesos-health-check --executor=(1)@10.92.32.63:59422 --health_check_json={"command":{"shell":true,"value":"docker exec mesos-29e183be-f611-41b4-824c-2d05b052231b-S6.60972150-b2b1-45d8-8a55-d63e81b8372a sh -c \" curl --silent -f -X GET http:\/\/$HOST:$PORT0\/ops\/ping > \/dev\/null \""},"consecutive_failures":3,"delay_seconds":0.0,"grace_period_seconds":90.0,"interval_seconds":2.0,"timeout_seconds":5.0} --task_id=prod_talkk_identity-green.aeb2ef3b-e219-11e5-a617-02429957d388
       649.39 Mb /usr/libexec/mesos/mesos-health-check --executor=(1)@10.92.32.63:40981 --health_check_json={"command":{"shell":true,"value":"docker exec mesos-29e183be-f611-41b4-824c-2d05b052231b-S6.3dbb1004-5bb8-432f-8fd8-b863bd29341d sh -c \" curl --silent -f -X GET http:\/\/$HOST:$PORT0\/ops\/ping > \/dev\/null \""},"consecutive_failures":3,"delay_seconds":0.0,"grace_period_seconds":90.0,"interval_seconds":2.0,"timeout_seconds":5.0} --task_id=prod_talkk_channel-green.c6fbd2ac-e219-11e5-a617-02429957d388
       649.39 Mb /usr/libexec/mesos/mesos-health-check --executor=(1)@10.92.32.63:48119 --health_check_json={"command":{"shell":true,"value":"docker exec mesos-29e183be-f611-41b4-824c-2d05b052231b-S6.39731a2f-d29e-48d1-9927-34ab8c5f557d sh -c \" curl --silent -f -X GET http:\/\/$HOST:$PORT0\/ops\/ping > \/dev\/null \""},"consecutive_failures":3,"delay_seconds":0.0,"grace_period_seconds":90.0,"interval_seconds":2.0,"timeout_seconds":5.0} --task_id=prod_talkk_push-green.601337e6-e219-11e5-a617-02429957d388
         0.32 Mb grep --color=auto health_check
{noformat}

Killing the *mesos-health-check* process for each container fix our memory issues (but I'm assuming health checks won't be reported anymore or something):

{noformat}
root@mesos-slave-i-e3a9c724 # date ; free -m ; ps awwx | grep health_check | grep -v grep | awk '{print $1}' | xargs -I% -P1 kill % ; date ; free -m

Fri Mar  4 21:20:55 UTC 2016

             total       used       free     shared    buffers     cached
Mem:         16047      13538       2508          0       1140        774
-/+ buffers/cache:      11623       4423
Swap:            0          0          0

Fri Mar  4 21:20:56 UTC 2016
             total       used       free     shared    buffers     cached
Mem:         16047       9101       6945          0       1140        774
-/+ buffers/cache:       7186       8860
Swap:            0          0          0
{noformat}

We're reverting to Marathon doing the health checks for now but would like to emphasize it's happening across all our slaves (not an isolated issue).

Thanks for looking into it :)

  was:
We switched our health checks in Marathon from HTTP to COMMAND:

{quote}
"healthChecks": [
    {
      "protocol": "COMMAND",
      "path": "/ops/ping",
      "command": { "value": "curl --silent -f -X GET http://$HOST:$PORT0/ops/ping > /dev/null" },
      "gracePeriodSeconds": 90,
      "intervalSeconds": 2,
      "portIndex": 0,
      "timeoutSeconds": 5,
      "maxConsecutiveFailures": 3
    }
  ]
{quote}

All our applications have the same health check (and /ops/ping endpoint).

Even though we have the issue on all our Meos slaves, I'm going to focus on a particular one: *mesos-slave-i-e3a9c724*.

The slave has 16 gigs of memory, with about 12 gigs allocated for 8 tasks:

!https://i.imgur.com/gbRf804.png!

Here is a *docker ps* on it:

{quote}
root@mesos-slave-i-e3a9c724 # docker ps
CONTAINER ID        IMAGE               COMMAND                  CREATED             STATUS              PORTS                     NAMES
4f7c0aa8d03a        java:8              "/bin/sh -c 'JAVA_OPT"   6 hours ago         Up 6 hours          0.0.0.0:31926->8080/tcp   mesos-29e183be-f611-41b4-824c-2d05b052231b-S6.3dbb1004-5bb8-432f-8fd8-b863bd29341d
66f2fc8f8056        java:8              "/bin/sh -c 'JAVA_OPT"   6 hours ago         Up 6 hours          0.0.0.0:31939->8080/tcp   mesos-29e183be-f611-41b4-824c-2d05b052231b-S6.60972150-b2b1-45d8-8a55-d63e81b8372a
f7382f241fce        java:8              "/bin/sh -c 'JAVA_OPT"   6 hours ago         Up 6 hours          0.0.0.0:31656->8080/tcp   mesos-29e183be-f611-41b4-824c-2d05b052231b-S6.39731a2f-d29e-48d1-9927-34ab8c5f557d
880934c0049e        java:8              "/bin/sh -c 'JAVA_OPT"   24 hours ago        Up 24 hours         0.0.0.0:31371->8080/tcp   mesos-29e183be-f611-41b4-824c-2d05b052231b-S6.23dfe408-ab8f-40be-bf6f-ce27fe885ee0
5eab1f8dac4a        java:8              "/bin/sh -c 'JAVA_OPT"   46 hours ago        Up 46 hours         0.0.0.0:31500->8080/tcp   mesos-29e183be-f611-41b4-824c-2d05b052231b-S6.5ac75198-283f-4349-a220-9e9645b313e7
b63740fe56e7        java:8              "/bin/sh -c 'JAVA_OPT"   46 hours ago        Up 46 hours         0.0.0.0:31382->8080/tcp   mesos-29e183be-f611-41b4-824c-2d05b052231b-S6.5d417f16-df24-49d5-a5b0-38a7966460fe
5c7a9ea77b0e        java:8              "/bin/sh -c 'JAVA_OPT"   2 days ago          Up 2 days           0.0.0.0:31186->8080/tcp   mesos-29e183be-f611-41b4-824c-2d05b052231b-S6.b05043c5-44fc-40bf-aea2-10354e8f5ab4
53065e7a31ad        java:8              "/bin/sh -c 'JAVA_OPT"   2 days ago          Up 2 days           0.0.0.0:31839->8080/tcp   mesos-29e183be-f611-41b4-824c-2d05b052231b-S6.f0a3f4c5-ecdb-4f97-bede-d744feda670c
{quote}

Here is a *docker stats* on it:

{quote}
root@mesos-slave-i-e3a9c724  # docker stats
CONTAINER           CPU %               MEM USAGE / LIMIT     MEM %               NET I/O               BLOCK I/O
4f7c0aa8d03a        2.93%               797.3 MB / 1.611 GB   49.50%              1.277 GB / 1.189 GB   155.6 kB / 151.6 kB
53065e7a31ad        8.30%               738.9 MB / 1.611 GB   45.88%              419.6 MB / 554.3 MB   98.3 kB / 61.44 kB
5c7a9ea77b0e        4.91%               1.081 GB / 1.611 GB   67.10%              423 MB / 526.5 MB     3.219 MB / 61.44 kB
5eab1f8dac4a        3.13%               1.007 GB / 1.611 GB   62.53%              2.737 GB / 2.564 GB   6.566 MB / 118.8 kB
66f2fc8f8056        3.15%               768.1 MB / 1.611 GB   47.69%              258.5 MB / 252.8 MB   1.86 MB / 151.6 kB
880934c0049e        10.07%              735.1 MB / 1.611 GB   45.64%              1.451 GB / 1.399 GB   573.4 kB / 94.21 kB
b63740fe56e7        12.04%              629 MB / 1.611 GB     39.06%              10.29 GB / 9.344 GB   8.102 MB / 61.44 kB
f7382f241fce        6.21%               505 MB / 1.611 GB     31.36%              153.4 MB / 151.9 MB   5.837 MB / 94.21 kB
{quote}

Not much else is running on the slave, yet the used memory doesn't map to the tasks memory:

{quote}
Mem:16047M used:13340M buffers:1139M cache:776M
{quote}


If I exec into the container (*java:8* image), I can see correctly the shell calls to execute the curl specified in the health check as expected and exit correctly.

The only change we noticed since the memory usage woes was related to moving to Mesos doing the health checks instead, so I decided to take a look:

{quote}
root@mesos-slave-i-e3a9c724 # ps awwx | grep health_check | grep -v grep
 2504 ?        Sl    47:33 /usr/libexec/mesos/mesos-health-check --executor=(1)@10.92.32.63:53432 --health_check_json={"command":{"shell":true,"value":"docker exec mesos-29e183be-f611-41b4-824c-2d05b052231b-S6.f0a3f4c5-ecdb-4f97-bede-d744feda670c sh -c \" curl --silent -f -X GET http:\/\/$HOST:$PORT0\/ops\/ping > \/dev\/null \""},"consecutive_failures":3,"delay_seconds":0.0,"grace_period_seconds":90.0,"interval_seconds":2.0,"timeout_seconds":5.0} --task_id=prod_talkk_email-green.b086206a-e000-11e5-a617-02429957d388
 4220 ?        Sl    47:26 /usr/libexec/mesos/mesos-health-check --executor=(1)@10.92.32.63:54982 --health_check_json={"command":{"shell":true,"value":"docker exec mesos-29e183be-f611-41b4-824c-2d05b052231b-S6.b05043c5-44fc-40bf-aea2-10354e8f5ab4 sh -c \" curl --silent -f -X GET http:\/\/$HOST:$PORT0\/ops\/ping > \/dev\/null \""},"consecutive_failures":3,"delay_seconds":0.0,"grace_period_seconds":90.0,"interval_seconds":2.0,"timeout_seconds":5.0} --task_id=prod_talkk_chat-green.ed53ec41-e000-11e5-a617-02429957d388
 7444 ?        Sl     1:31 /usr/libexec/mesos/mesos-health-check --executor=(1)@10.92.32.63:59422 --health_check_json={"command":{"shell":true,"value":"docker exec mesos-29e183be-f611-41b4-824c-2d05b052231b-S6.60972150-b2b1-45d8-8a55-d63e81b8372a sh -c \" curl --silent -f -X GET http:\/\/$HOST:$PORT0\/ops\/ping > \/dev\/null \""},"consecutive_failures":3,"delay_seconds":0.0,"grace_period_seconds":90.0,"interval_seconds":2.0,"timeout_seconds":5.0} --task_id=prod_talkk_identity-green.aeb2ef3b-e219-11e5-a617-02429957d388
10368 ?        Sl     1:30 /usr/libexec/mesos/mesos-health-check --executor=(1)@10.92.32.63:40981 --health_check_json={"command":{"shell":true,"value":"docker exec mesos-29e183be-f611-41b4-824c-2d05b052231b-S6.3dbb1004-5bb8-432f-8fd8-b863bd29341d sh -c \" curl --silent -f -X GET http:\/\/$HOST:$PORT0\/ops\/ping > \/dev\/null \""},"consecutive_failures":3,"delay_seconds":0.0,"grace_period_seconds":90.0,"interval_seconds":2.0,"timeout_seconds":5.0} --task_id=prod_talkk_channel-green.c6fbd2ac-e219-11e5-a617-02429957d388
12399 ?        Sl     9:45 /usr/libexec/mesos/mesos-health-check --executor=(1)@10.92.32.63:44815 --health_check_json={"command":{"shell":true,"value":"docker exec mesos-29e183be-f611-41b4-824c-2d05b052231b-S6.23dfe408-ab8f-40be-bf6f-ce27fe885ee0 sh -c \" curl --silent -f -X GET http:\/\/$HOST:$PORT0\/ops\/ping > \/dev\/null \""},"consecutive_failures":3,"delay_seconds":0.0,"grace_period_seconds":90.0,"interval_seconds":2.0,"timeout_seconds":5.0} --task_id=prod_talkk_integration-green.143865d5-e17d-11e5-a617-02429957d388
13538 ?        Sl    24:54 /usr/libexec/mesos/mesos-health-check --executor=(1)@10.92.32.63:56598 --health_check_json={"command":{"shell":true,"value":"docker exec mesos-29e183be-f611-41b4-824c-2d05b052231b-S6.5d417f16-df24-49d5-a5b0-38a7966460fe sh -c \" curl --silent -f -X GET http:\/\/$HOST:$PORT0\/ops\/ping > \/dev\/null \""},"consecutive_failures":3,"delay_seconds":0.0,"grace_period_seconds":90.0,"interval_seconds":2.0,"timeout_seconds":5.0} --task_id=prod_talkk_metric-green.75296986-e0c7-11e5-a617-02429957d388
32034 ?        Sl     1:31 /usr/libexec/mesos/mesos-health-check --executor=(1)@10.92.32.63:48119 --health_check_json={"command":{"shell":true,"value":"docker exec mesos-29e183be-f611-41b4-824c-2d05b052231b-S6.39731a2f-d29e-48d1-9927-34ab8c5f557d sh -c \" curl --silent -f -X GET http:\/\/$HOST:$PORT0\/ops\/ping > \/dev\/null \""},"consecutive_failures":3,"delay_seconds":0.0,"grace_period_seconds":90.0,"interval_seconds":2.0,"timeout_seconds":5.0} --task_id=prod_talkk_push-green.601337e6-e219-11e5-a617-02429957d388
{quote}

The memory usage is really bad:

{quote}
root@mesos-slave-i-e3a9c724 # ps -eo size,pid,user,command --sort -size | grep health_check | awk '{ hr=$1/1024 ; printf("%13.2f Mb ",hr) } { for ( x=4 ; x<=NF ; x++ ) { printf("%s ",$x) } print "" }'
      2185.39 Mb /usr/libexec/mesos/mesos-health-check --executor=(1)@10.92.32.63:53432 --health_check_json={"command":{"shell":true,"value":"docker exec mesos-29e183be-f611-41b4-824c-2d05b052231b-S6.f0a3f4c5-ecdb-4f97-bede-d744feda670c sh -c \" curl --silent -f -X GET http:\/\/$HOST:$PORT0\/ops\/ping > \/dev\/null \""},"consecutive_failures":3,"delay_seconds":0.0,"grace_period_seconds":90.0,"interval_seconds":2.0,"timeout_seconds":5.0} --task_id=prod_talkk_email-green.b086206a-e000-11e5-a617-02429957d388
      2185.39 Mb /usr/libexec/mesos/mesos-health-check --executor=(1)@10.92.32.63:54982 --health_check_json={"command":{"shell":true,"value":"docker exec mesos-29e183be-f611-41b4-824c-2d05b052231b-S6.b05043c5-44fc-40bf-aea2-10354e8f5ab4 sh -c \" curl --silent -f -X GET http:\/\/$HOST:$PORT0\/ops\/ping > \/dev\/null \""},"consecutive_failures":3,"delay_seconds":0.0,"grace_period_seconds":90.0,"interval_seconds":2.0,"timeout_seconds":5.0} --task_id=prod_talkk_chat-green.ed53ec41-e000-11e5-a617-02429957d388
      1673.39 Mb /usr/libexec/mesos/mesos-health-check --executor=(1)@10.92.32.63:56598 --health_check_json={"command":{"shell":true,"value":"docker exec mesos-29e183be-f611-41b4-824c-2d05b052231b-S6.5d417f16-df24-49d5-a5b0-38a7966460fe sh -c \" curl --silent -f -X GET http:\/\/$HOST:$PORT0\/ops\/ping > \/dev\/null \""},"consecutive_failures":3,"delay_seconds":0.0,"grace_period_seconds":90.0,"interval_seconds":2.0,"timeout_seconds":5.0} --task_id=prod_talkk_metric-green.75296986-e0c7-11e5-a617-02429957d388
      1161.39 Mb /usr/libexec/mesos/mesos-health-check --executor=(1)@10.92.32.63:44815 --health_check_json={"command":{"shell":true,"value":"docker exec mesos-29e183be-f611-41b4-824c-2d05b052231b-S6.23dfe408-ab8f-40be-bf6f-ce27fe885ee0 sh -c \" curl --silent -f -X GET http:\/\/$HOST:$PORT0\/ops\/ping > \/dev\/null \""},"consecutive_failures":3,"delay_seconds":0.0,"grace_period_seconds":90.0,"interval_seconds":2.0,"timeout_seconds":5.0} --task_id=prod_talkk_integration-green.143865d5-e17d-11e5-a617-02429957d388
       649.39 Mb /usr/libexec/mesos/mesos-health-check --executor=(1)@10.92.32.63:59422 --health_check_json={"command":{"shell":true,"value":"docker exec mesos-29e183be-f611-41b4-824c-2d05b052231b-S6.60972150-b2b1-45d8-8a55-d63e81b8372a sh -c \" curl --silent -f -X GET http:\/\/$HOST:$PORT0\/ops\/ping > \/dev\/null \""},"consecutive_failures":3,"delay_seconds":0.0,"grace_period_seconds":90.0,"interval_seconds":2.0,"timeout_seconds":5.0} --task_id=prod_talkk_identity-green.aeb2ef3b-e219-11e5-a617-02429957d388
       649.39 Mb /usr/libexec/mesos/mesos-health-check --executor=(1)@10.92.32.63:40981 --health_check_json={"command":{"shell":true,"value":"docker exec mesos-29e183be-f611-41b4-824c-2d05b052231b-S6.3dbb1004-5bb8-432f-8fd8-b863bd29341d sh -c \" curl --silent -f -X GET http:\/\/$HOST:$PORT0\/ops\/ping > \/dev\/null \""},"consecutive_failures":3,"delay_seconds":0.0,"grace_period_seconds":90.0,"interval_seconds":2.0,"timeout_seconds":5.0} --task_id=prod_talkk_channel-green.c6fbd2ac-e219-11e5-a617-02429957d388
       649.39 Mb /usr/libexec/mesos/mesos-health-check --executor=(1)@10.92.32.63:48119 --health_check_json={"command":{"shell":true,"value":"docker exec mesos-29e183be-f611-41b4-824c-2d05b052231b-S6.39731a2f-d29e-48d1-9927-34ab8c5f557d sh -c \" curl --silent -f -X GET http:\/\/$HOST:$PORT0\/ops\/ping > \/dev\/null \""},"consecutive_failures":3,"delay_seconds":0.0,"grace_period_seconds":90.0,"interval_seconds":2.0,"timeout_seconds":5.0} --task_id=prod_talkk_push-green.601337e6-e219-11e5-a617-02429957d388
         0.32 Mb grep --color=auto health_check
{quote}

Killing the *mesos-health-check* process for each container fix our memory issues (but I'm assuming health checks won't be reported anymore or something):

{quote}
root@mesos-slave-i-e3a9c724 # date ; free -m ; ps awwx | grep health_check | grep -v grep | awk '{print $1}' | xargs -I% -P1 kill % ; date ; free -m

Fri Mar  4 21:20:55 UTC 2016

             total       used       free     shared    buffers     cached
Mem:         16047      13538       2508          0       1140        774
-/+ buffers/cache:      11623       4423
Swap:            0          0          0

Fri Mar  4 21:20:56 UTC 2016
             total       used       free     shared    buffers     cached
Mem:         16047       9101       6945          0       1140        774
-/+ buffers/cache:       7186       8860
Swap:            0          0          0
{quote}

We're reverting to Marathon doing the health checks for now but would like to emphasize it's happening across all our slaves (not an isolated issue).

Thanks for looking into it :)


> /usr/libexec/mesos/mesos-health-check using/leaking a lot of memory
> -------------------------------------------------------------------
>
>                 Key: MESOS-4869
>                 URL: https://issues.apache.org/jira/browse/MESOS-4869
>             Project: Mesos
>          Issue Type: Bug
>    Affects Versions: 0.27.1
>            Reporter: Anthony Scalisi
>            Priority: Critical
>
> We switched our health checks in Marathon from HTTP to COMMAND:
> {noformat}
> "healthChecks": [
>     {
>       "protocol": "COMMAND",
>       "path": "/ops/ping",
>       "command": { "value": "curl --silent -f -X GET http://$HOST:$PORT0/ops/ping > /dev/null" },
>       "gracePeriodSeconds": 90,
>       "intervalSeconds": 2,
>       "portIndex": 0,
>       "timeoutSeconds": 5,
>       "maxConsecutiveFailures": 3
>     }
>   ]
> {noformat}
> All our applications have the same health check (and /ops/ping endpoint).
> Even though we have the issue on all our Meos slaves, I'm going to focus on a particular one: *mesos-slave-i-e3a9c724*.
> The slave has 16 gigs of memory, with about 12 gigs allocated for 8 tasks:
> !https://i.imgur.com/gbRf804.png!
> Here is a *docker ps* on it:
> {noformat}
> root@mesos-slave-i-e3a9c724 # docker ps
> CONTAINER ID        IMAGE               COMMAND                  CREATED             STATUS              PORTS                     NAMES
> 4f7c0aa8d03a        java:8              "/bin/sh -c 'JAVA_OPT"   6 hours ago         Up 6 hours          0.0.0.0:31926->8080/tcp   mesos-29e183be-f611-41b4-824c-2d05b052231b-S6.3dbb1004-5bb8-432f-8fd8-b863bd29341d
> 66f2fc8f8056        java:8              "/bin/sh -c 'JAVA_OPT"   6 hours ago         Up 6 hours          0.0.0.0:31939->8080/tcp   mesos-29e183be-f611-41b4-824c-2d05b052231b-S6.60972150-b2b1-45d8-8a55-d63e81b8372a
> f7382f241fce        java:8              "/bin/sh -c 'JAVA_OPT"   6 hours ago         Up 6 hours          0.0.0.0:31656->8080/tcp   mesos-29e183be-f611-41b4-824c-2d05b052231b-S6.39731a2f-d29e-48d1-9927-34ab8c5f557d
> 880934c0049e        java:8              "/bin/sh -c 'JAVA_OPT"   24 hours ago        Up 24 hours         0.0.0.0:31371->8080/tcp   mesos-29e183be-f611-41b4-824c-2d05b052231b-S6.23dfe408-ab8f-40be-bf6f-ce27fe885ee0
> 5eab1f8dac4a        java:8              "/bin/sh -c 'JAVA_OPT"   46 hours ago        Up 46 hours         0.0.0.0:31500->8080/tcp   mesos-29e183be-f611-41b4-824c-2d05b052231b-S6.5ac75198-283f-4349-a220-9e9645b313e7
> b63740fe56e7        java:8              "/bin/sh -c 'JAVA_OPT"   46 hours ago        Up 46 hours         0.0.0.0:31382->8080/tcp   mesos-29e183be-f611-41b4-824c-2d05b052231b-S6.5d417f16-df24-49d5-a5b0-38a7966460fe
> 5c7a9ea77b0e        java:8              "/bin/sh -c 'JAVA_OPT"   2 days ago          Up 2 days           0.0.0.0:31186->8080/tcp   mesos-29e183be-f611-41b4-824c-2d05b052231b-S6.b05043c5-44fc-40bf-aea2-10354e8f5ab4
> 53065e7a31ad        java:8              "/bin/sh -c 'JAVA_OPT"   2 days ago          Up 2 days           0.0.0.0:31839->8080/tcp   mesos-29e183be-f611-41b4-824c-2d05b052231b-S6.f0a3f4c5-ecdb-4f97-bede-d744feda670c
> {quote}
> Here is a *docker stats* on it:
> {quote}
> root@mesos-slave-i-e3a9c724  # docker stats
> CONTAINER           CPU %               MEM USAGE / LIMIT     MEM %               NET I/O               BLOCK I/O
> 4f7c0aa8d03a        2.93%               797.3 MB / 1.611 GB   49.50%              1.277 GB / 1.189 GB   155.6 kB / 151.6 kB
> 53065e7a31ad        8.30%               738.9 MB / 1.611 GB   45.88%              419.6 MB / 554.3 MB   98.3 kB / 61.44 kB
> 5c7a9ea77b0e        4.91%               1.081 GB / 1.611 GB   67.10%              423 MB / 526.5 MB     3.219 MB / 61.44 kB
> 5eab1f8dac4a        3.13%               1.007 GB / 1.611 GB   62.53%              2.737 GB / 2.564 GB   6.566 MB / 118.8 kB
> 66f2fc8f8056        3.15%               768.1 MB / 1.611 GB   47.69%              258.5 MB / 252.8 MB   1.86 MB / 151.6 kB
> 880934c0049e        10.07%              735.1 MB / 1.611 GB   45.64%              1.451 GB / 1.399 GB   573.4 kB / 94.21 kB
> b63740fe56e7        12.04%              629 MB / 1.611 GB     39.06%              10.29 GB / 9.344 GB   8.102 MB / 61.44 kB
> f7382f241fce        6.21%               505 MB / 1.611 GB     31.36%              153.4 MB / 151.9 MB   5.837 MB / 94.21 kB
> {noformat}
> Not much else is running on the slave, yet the used memory doesn't map to the tasks memory:
> {noformat}
> Mem:16047M used:13340M buffers:1139M cache:776M
> {noformat}
> If I exec into the container (*java:8* image), I can see correctly the shell calls to execute the curl specified in the health check as expected and exit correctly.
> The only change we noticed since the memory usage woes was related to moving to Mesos doing the health checks instead, so I decided to take a look:
> {noformat}
> root@mesos-slave-i-e3a9c724 # ps awwx | grep health_check | grep -v grep
>  2504 ?        Sl    47:33 /usr/libexec/mesos/mesos-health-check --executor=(1)@10.92.32.63:53432 --health_check_json={"command":{"shell":true,"value":"docker exec mesos-29e183be-f611-41b4-824c-2d05b052231b-S6.f0a3f4c5-ecdb-4f97-bede-d744feda670c sh -c \" curl --silent -f -X GET http:\/\/$HOST:$PORT0\/ops\/ping > \/dev\/null \""},"consecutive_failures":3,"delay_seconds":0.0,"grace_period_seconds":90.0,"interval_seconds":2.0,"timeout_seconds":5.0} --task_id=prod_talkk_email-green.b086206a-e000-11e5-a617-02429957d388
>  4220 ?        Sl    47:26 /usr/libexec/mesos/mesos-health-check --executor=(1)@10.92.32.63:54982 --health_check_json={"command":{"shell":true,"value":"docker exec mesos-29e183be-f611-41b4-824c-2d05b052231b-S6.b05043c5-44fc-40bf-aea2-10354e8f5ab4 sh -c \" curl --silent -f -X GET http:\/\/$HOST:$PORT0\/ops\/ping > \/dev\/null \""},"consecutive_failures":3,"delay_seconds":0.0,"grace_period_seconds":90.0,"interval_seconds":2.0,"timeout_seconds":5.0} --task_id=prod_talkk_chat-green.ed53ec41-e000-11e5-a617-02429957d388
>  7444 ?        Sl     1:31 /usr/libexec/mesos/mesos-health-check --executor=(1)@10.92.32.63:59422 --health_check_json={"command":{"shell":true,"value":"docker exec mesos-29e183be-f611-41b4-824c-2d05b052231b-S6.60972150-b2b1-45d8-8a55-d63e81b8372a sh -c \" curl --silent -f -X GET http:\/\/$HOST:$PORT0\/ops\/ping > \/dev\/null \""},"consecutive_failures":3,"delay_seconds":0.0,"grace_period_seconds":90.0,"interval_seconds":2.0,"timeout_seconds":5.0} --task_id=prod_talkk_identity-green.aeb2ef3b-e219-11e5-a617-02429957d388
> 10368 ?        Sl     1:30 /usr/libexec/mesos/mesos-health-check --executor=(1)@10.92.32.63:40981 --health_check_json={"command":{"shell":true,"value":"docker exec mesos-29e183be-f611-41b4-824c-2d05b052231b-S6.3dbb1004-5bb8-432f-8fd8-b863bd29341d sh -c \" curl --silent -f -X GET http:\/\/$HOST:$PORT0\/ops\/ping > \/dev\/null \""},"consecutive_failures":3,"delay_seconds":0.0,"grace_period_seconds":90.0,"interval_seconds":2.0,"timeout_seconds":5.0} --task_id=prod_talkk_channel-green.c6fbd2ac-e219-11e5-a617-02429957d388
> 12399 ?        Sl     9:45 /usr/libexec/mesos/mesos-health-check --executor=(1)@10.92.32.63:44815 --health_check_json={"command":{"shell":true,"value":"docker exec mesos-29e183be-f611-41b4-824c-2d05b052231b-S6.23dfe408-ab8f-40be-bf6f-ce27fe885ee0 sh -c \" curl --silent -f -X GET http:\/\/$HOST:$PORT0\/ops\/ping > \/dev\/null \""},"consecutive_failures":3,"delay_seconds":0.0,"grace_period_seconds":90.0,"interval_seconds":2.0,"timeout_seconds":5.0} --task_id=prod_talkk_integration-green.143865d5-e17d-11e5-a617-02429957d388
> 13538 ?        Sl    24:54 /usr/libexec/mesos/mesos-health-check --executor=(1)@10.92.32.63:56598 --health_check_json={"command":{"shell":true,"value":"docker exec mesos-29e183be-f611-41b4-824c-2d05b052231b-S6.5d417f16-df24-49d5-a5b0-38a7966460fe sh -c \" curl --silent -f -X GET http:\/\/$HOST:$PORT0\/ops\/ping > \/dev\/null \""},"consecutive_failures":3,"delay_seconds":0.0,"grace_period_seconds":90.0,"interval_seconds":2.0,"timeout_seconds":5.0} --task_id=prod_talkk_metric-green.75296986-e0c7-11e5-a617-02429957d388
> 32034 ?        Sl     1:31 /usr/libexec/mesos/mesos-health-check --executor=(1)@10.92.32.63:48119 --health_check_json={"command":{"shell":true,"value":"docker exec mesos-29e183be-f611-41b4-824c-2d05b052231b-S6.39731a2f-d29e-48d1-9927-34ab8c5f557d sh -c \" curl --silent -f -X GET http:\/\/$HOST:$PORT0\/ops\/ping > \/dev\/null \""},"consecutive_failures":3,"delay_seconds":0.0,"grace_period_seconds":90.0,"interval_seconds":2.0,"timeout_seconds":5.0} --task_id=prod_talkk_push-green.601337e6-e219-11e5-a617-02429957d388
> {noformat}
> The memory usage is really bad:
> {noformat}
> root@mesos-slave-i-e3a9c724 # ps -eo size,pid,user,command --sort -size | grep health_check | awk '{ hr=$1/1024 ; printf("%13.2f Mb ",hr) } { for ( x=4 ; x<=NF ; x++ ) { printf("%s ",$x) } print "" }'
>       2185.39 Mb /usr/libexec/mesos/mesos-health-check --executor=(1)@10.92.32.63:53432 --health_check_json={"command":{"shell":true,"value":"docker exec mesos-29e183be-f611-41b4-824c-2d05b052231b-S6.f0a3f4c5-ecdb-4f97-bede-d744feda670c sh -c \" curl --silent -f -X GET http:\/\/$HOST:$PORT0\/ops\/ping > \/dev\/null \""},"consecutive_failures":3,"delay_seconds":0.0,"grace_period_seconds":90.0,"interval_seconds":2.0,"timeout_seconds":5.0} --task_id=prod_talkk_email-green.b086206a-e000-11e5-a617-02429957d388
>       2185.39 Mb /usr/libexec/mesos/mesos-health-check --executor=(1)@10.92.32.63:54982 --health_check_json={"command":{"shell":true,"value":"docker exec mesos-29e183be-f611-41b4-824c-2d05b052231b-S6.b05043c5-44fc-40bf-aea2-10354e8f5ab4 sh -c \" curl --silent -f -X GET http:\/\/$HOST:$PORT0\/ops\/ping > \/dev\/null \""},"consecutive_failures":3,"delay_seconds":0.0,"grace_period_seconds":90.0,"interval_seconds":2.0,"timeout_seconds":5.0} --task_id=prod_talkk_chat-green.ed53ec41-e000-11e5-a617-02429957d388
>       1673.39 Mb /usr/libexec/mesos/mesos-health-check --executor=(1)@10.92.32.63:56598 --health_check_json={"command":{"shell":true,"value":"docker exec mesos-29e183be-f611-41b4-824c-2d05b052231b-S6.5d417f16-df24-49d5-a5b0-38a7966460fe sh -c \" curl --silent -f -X GET http:\/\/$HOST:$PORT0\/ops\/ping > \/dev\/null \""},"consecutive_failures":3,"delay_seconds":0.0,"grace_period_seconds":90.0,"interval_seconds":2.0,"timeout_seconds":5.0} --task_id=prod_talkk_metric-green.75296986-e0c7-11e5-a617-02429957d388
>       1161.39 Mb /usr/libexec/mesos/mesos-health-check --executor=(1)@10.92.32.63:44815 --health_check_json={"command":{"shell":true,"value":"docker exec mesos-29e183be-f611-41b4-824c-2d05b052231b-S6.23dfe408-ab8f-40be-bf6f-ce27fe885ee0 sh -c \" curl --silent -f -X GET http:\/\/$HOST:$PORT0\/ops\/ping > \/dev\/null \""},"consecutive_failures":3,"delay_seconds":0.0,"grace_period_seconds":90.0,"interval_seconds":2.0,"timeout_seconds":5.0} --task_id=prod_talkk_integration-green.143865d5-e17d-11e5-a617-02429957d388
>        649.39 Mb /usr/libexec/mesos/mesos-health-check --executor=(1)@10.92.32.63:59422 --health_check_json={"command":{"shell":true,"value":"docker exec mesos-29e183be-f611-41b4-824c-2d05b052231b-S6.60972150-b2b1-45d8-8a55-d63e81b8372a sh -c \" curl --silent -f -X GET http:\/\/$HOST:$PORT0\/ops\/ping > \/dev\/null \""},"consecutive_failures":3,"delay_seconds":0.0,"grace_period_seconds":90.0,"interval_seconds":2.0,"timeout_seconds":5.0} --task_id=prod_talkk_identity-green.aeb2ef3b-e219-11e5-a617-02429957d388
>        649.39 Mb /usr/libexec/mesos/mesos-health-check --executor=(1)@10.92.32.63:40981 --health_check_json={"command":{"shell":true,"value":"docker exec mesos-29e183be-f611-41b4-824c-2d05b052231b-S6.3dbb1004-5bb8-432f-8fd8-b863bd29341d sh -c \" curl --silent -f -X GET http:\/\/$HOST:$PORT0\/ops\/ping > \/dev\/null \""},"consecutive_failures":3,"delay_seconds":0.0,"grace_period_seconds":90.0,"interval_seconds":2.0,"timeout_seconds":5.0} --task_id=prod_talkk_channel-green.c6fbd2ac-e219-11e5-a617-02429957d388
>        649.39 Mb /usr/libexec/mesos/mesos-health-check --executor=(1)@10.92.32.63:48119 --health_check_json={"command":{"shell":true,"value":"docker exec mesos-29e183be-f611-41b4-824c-2d05b052231b-S6.39731a2f-d29e-48d1-9927-34ab8c5f557d sh -c \" curl --silent -f -X GET http:\/\/$HOST:$PORT0\/ops\/ping > \/dev\/null \""},"consecutive_failures":3,"delay_seconds":0.0,"grace_period_seconds":90.0,"interval_seconds":2.0,"timeout_seconds":5.0} --task_id=prod_talkk_push-green.601337e6-e219-11e5-a617-02429957d388
>          0.32 Mb grep --color=auto health_check
> {noformat}
> Killing the *mesos-health-check* process for each container fix our memory issues (but I'm assuming health checks won't be reported anymore or something):
> {noformat}
> root@mesos-slave-i-e3a9c724 # date ; free -m ; ps awwx | grep health_check | grep -v grep | awk '{print $1}' | xargs -I% -P1 kill % ; date ; free -m
> Fri Mar  4 21:20:55 UTC 2016
>              total       used       free     shared    buffers     cached
> Mem:         16047      13538       2508          0       1140        774
> -/+ buffers/cache:      11623       4423
> Swap:            0          0          0
> Fri Mar  4 21:20:56 UTC 2016
>              total       used       free     shared    buffers     cached
> Mem:         16047       9101       6945          0       1140        774
> -/+ buffers/cache:       7186       8860
> Swap:            0          0          0
> {noformat}
> We're reverting to Marathon doing the health checks for now but would like to emphasize it's happening across all our slaves (not an isolated issue).
> Thanks for looking into it :)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)