You are viewing a plain text version of this content. The canonical link for it is here.
Posted to yarn-issues@hadoop.apache.org by "liyakun (JIRA)" <ji...@apache.org> on 2019/03/05 07:16:00 UTC

[jira] [Updated] (YARN-9345) NM actively does not accept new containers in the heartbeat

     [ https://issues.apache.org/jira/browse/YARN-9345?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

liyakun updated YARN-9345:
--------------------------
    Description: 
At present, NM has only one health check mechanism. If it enters an unhealthy state, all the containers running on it will be killed.
 However, the unhealthy condition of node can be divided into two types, one is long-term unavailable (current health mechanism), and the other is only temporary pressure.
 For temporary stress, node only needs to wait for a while to return to normal (such as temporary load high).
 To do this, we need to extend the functionality of the health check to join the state of temporarily not accepting new tasks(do not kill the container that is already running).

  was:
At present, NM has only one health check mechanism. If it enters an unhealthy state, all the containers running on it will be killed.
However, the unhealthy condition of node can be divided into two types, one is long-term unavailable (current health mechanism), and the other is only temporary pressure.
For temporary stress, node only needs to wait for a while to return to normal (such as temporary load high).
To do this, we need to extend the functionality of the health check to join the state of temporarily not accepting new tasks.


> NM actively does not accept new containers in the heartbeat
> -----------------------------------------------------------
>
>                 Key: YARN-9345
>                 URL: https://issues.apache.org/jira/browse/YARN-9345
>             Project: Hadoop YARN
>          Issue Type: New Feature
>          Components: nodemanager
>            Reporter: liyakun
>            Assignee: liyakun
>            Priority: Major
>
> At present, NM has only one health check mechanism. If it enters an unhealthy state, all the containers running on it will be killed.
>  However, the unhealthy condition of node can be divided into two types, one is long-term unavailable (current health mechanism), and the other is only temporary pressure.
>  For temporary stress, node only needs to wait for a while to return to normal (such as temporary load high).
>  To do this, we need to extend the functionality of the health check to join the state of temporarily not accepting new tasks(do not kill the container that is already running).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org