You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@mesos.apache.org by "Qian Zhang (Jira)" <ji...@apache.org> on 2020/06/10 01:24:00 UTC

[jira] [Created] (MESOS-10139) Mesos agent host may become unresponsive when it is under low memory pressure

Qian Zhang created MESOS-10139:
----------------------------------

             Summary: Mesos agent host may become unresponsive when it is under low memory pressure
                 Key: MESOS-10139
                 URL: https://issues.apache.org/jira/browse/MESOS-10139
             Project: Mesos
          Issue Type: Bug
            Reporter: Qian Zhang


When user launches a task to use a large number of memory on an agent host (e.g., launch a task to run `stress --vm 1 --vm-bytes 29800M --vm-hang 0` on an agent host which have 32GB memory), the whole agent host will become unresponsive (no commands can be executed anymore, but still pingable). A few minutes later Mesos master will mark this agent as unreachable and update all its task’s state to `TASK_UNREACHABLE`.
{code:java}
May 26 02:13:31 ip-172-16-15-17.us-west-2.compute.internal mesos-master[15468]: I0526 02:13:31.103382 15491 master.cpp:260] Scheduling transition of agent 89d2d679-fa08-49be-94c3-880ebb595212-S0 to UNREACHABLE because of health check timeout
May 26 02:13:31 ip-172-16-15-17.us-west-2.compute.internal mesos-master[15468]: I0526 02:13:31.103612 15491 master.cpp:8592] Marking agent 89d2d679-fa08-49be-94c3-880ebb595212-S0 (172.16.3.236) unreachable: health check timed out
May 26 02:13:31 ip-172-16-15-17.us-west-2.compute.internal mesos-master[15468]: I0526 02:13:31.108093 15495 master.cpp:8635] Marked agent 89d2d679-fa08-49be-94c3-880ebb595212-S0 (172.16.3.236) unreachable: health check timed out
…
May 26 02:13:31 ip-172-16-15-17.us-west-2.compute.internal mesos-master[15468]: I0526 02:13:31.108419 15495 master.cpp:11149] Updating the state of task app10.instance-1f70be9f-9ef5-11ea-8981-9a93e42a6514._app.2 of framework 89d2d679-fa08-49be-94c3-880ebb595212-0000 (latest state: TASK_UNREACHABLE, status update state: TASK_UNREACHABLE)
May 26 02:13:31 ip-172-16-15-17.us-west-2.compute.internal mesos-master[15468]: I0526 02:13:31.108865 15495 master.cpp:11149] Updating the state of task app9.instance-954f91ad-9ef4-11ea-8981-9a93e42a6514._app.1 of framework 89d2d679-fa08-49be-94c3-880ebb595212-0000 (latest state: TASK_UNREACHABLE, status update state: TASK_UNREACHABLE)
...{code}
 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)