You are viewing a plain text version of this content. The canonical link for it is here.

Posted to yarn-issues@hadoop.apache.org by "fanshilun (Jira)" <ji...@apache.org> on 2022/08/26 02:33:00 UTC

[jira] [Updated] (YARN-6667) Handle containerId duplicate without failing the heartbeat in Federation Interceptor

     [ https://issues.apache.org/jira/browse/YARN-6667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

fanshilun updated YARN-6667:
----------------------------
    Description: 
From the actual situation, the probability of this happening is very low. 
It can only be caused by the master-slave fail-hover of YARN and the wrong Epoch parameter configuration.

We will try to be compatible with this situation and let the Application run as much as possible, using the following measures:
1. Select a node whose heartbeat does not time out for allocation, and at the same time require the node to be in the RUNNING state.
2. If the heartbeat of both RMs does not time out, and both are in the RUNNING state, select the previously allocated RM for Container processing.

> Handle containerId duplicate without failing the heartbeat in Federation Interceptor
> ------------------------------------------------------------------------------------
>
>                 Key: YARN-6667
>                 URL: https://issues.apache.org/jira/browse/YARN-6667
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>            Reporter: Botong Huang
>            Assignee: Botong Huang
>            Priority: Minor
>
> From the actual situation, the probability of this happening is very low. 
> It can only be caused by the master-slave fail-hover of YARN and the wrong Epoch parameter configuration.
> We will try to be compatible with this situation and let the Application run as much as possible, using the following measures:
> 1. Select a node whose heartbeat does not time out for allocation, and at the same time require the node to be in the RUNNING state.
> 2. If the heartbeat of both RMs does not time out, and both are in the RUNNING state, select the previously allocated RM for Container processing.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org