You are viewing a plain text version of this content. The canonical link for it is here.
Posted to yarn-issues@hadoop.apache.org by "Sushanta Sen (Jira)" <ji...@apache.org> on 2021/03/04 14:16:00 UTC

[jira] [Updated] (YARN-10670) YARN: Opportunistic Container : : In distributed shell job if containers are killed then application is failed. But in this case as containers are killed to make room for guaranteed containers which is not correct to fail an application

     [ https://issues.apache.org/jira/browse/YARN-10670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sushanta Sen updated YARN-10670:
--------------------------------
    Description: 
Preconditions:
 # Secure Hadoop 3.1.1 - 3 Nodes cluster is installed
 # Set the below parameters  in RM::<property>
 <name>yarn.resourcemanager.opportunistic-container-allocation.enabled</name>
 <value>true</value>
 </property>
 # Set this in NM[s]: <property>
 <name>yarn.nodemanager.opportunistic-containers-max-queue-length</name>
 <value>30</value>
 </property>

 
 Test Steps:

Job Command : : yarn org.apache.hadoop.yarn.applications.distributedshell.Client -jar HDFS/hadoop/share/hadoop/yarn/hadoop-yarn-applications-distributedshell-3.1.1-hw-ei-310001-SNAPSHOT.jar -shell_command sleep -shell_args 20 -num_containers 20 -container_type OPPORTUNISTIC

Actual Result: Distributed Shell Yarn Job Failed with below Diagnostics message
{noformat}
Attempt recovered after RM restartApplication Failure: desired = 20, completed = 20, allocated = 20, failed = 1, diagnostics = [2021-02-09 22:11:48.440]Container De-queued to meet NM queuing limits.
[2021-02-09 22:11:48.441]Container terminated before launch.
{noformat}
 Expected Result: Distributed Shell Yarn Job should not fail.

  was:
Preconditions:
 # Secure Hadoop 3.1.1 c3 Nodes cluster is installed
 # Set the below parameters  in RM::<property>
 <name>yarn.resourcemanager.opportunistic-container-allocation.enabled</name>
 <value>true</value>
 </property>
 # Set this in NM[s]: <property>
 <name>yarn.nodemanager.opportunistic-containers-max-queue-length</name>
 <value>30</value>
 </property>

 
 Test Steps:

Job Command : : yarn org.apache.hadoop.yarn.applications.distributedshell.Client -jar HDFS/hadoop/share/hadoop/yarn/hadoop-yarn-applications-distributedshell-3.1.1-hw-ei-310001-SNAPSHOT.jar -shell_command sleep -shell_args 20 -num_containers 20 -container_type OPPORTUNISTIC

Actual Result: Distributed Shell Yarn Job Failed with below Diagnostics message
{noformat}
Attempt recovered after RM restartApplication Failure: desired = 20, completed = 20, allocated = 20, failed = 1, diagnostics = [2021-02-09 22:11:48.440]Container De-queued to meet NM queuing limits.
[2021-02-09 22:11:48.441]Container terminated before launch.
{noformat}
 Expected Result: Distributed Shell Yarn Job should not fail.


> YARN: Opportunistic Container : : In distributed shell job if containers are killed then application is failed. But in this case as containers are killed to make room for guaranteed containers which is not correct to fail an application
> --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: YARN-10670
>                 URL: https://issues.apache.org/jira/browse/YARN-10670
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: distributed-shell
>    Affects Versions: 3.1.1
>            Reporter: Sushanta Sen
>            Priority: Major
>
> Preconditions:
>  # Secure Hadoop 3.1.1 - 3 Nodes cluster is installed
>  # Set the below parameters  in RM::<property>
>  <name>yarn.resourcemanager.opportunistic-container-allocation.enabled</name>
>  <value>true</value>
>  </property>
>  # Set this in NM[s]: <property>
>  <name>yarn.nodemanager.opportunistic-containers-max-queue-length</name>
>  <value>30</value>
>  </property>
>  
>  Test Steps:
> Job Command : : yarn org.apache.hadoop.yarn.applications.distributedshell.Client -jar HDFS/hadoop/share/hadoop/yarn/hadoop-yarn-applications-distributedshell-3.1.1-hw-ei-310001-SNAPSHOT.jar -shell_command sleep -shell_args 20 -num_containers 20 -container_type OPPORTUNISTIC
> Actual Result: Distributed Shell Yarn Job Failed with below Diagnostics message
> {noformat}
> Attempt recovered after RM restartApplication Failure: desired = 20, completed = 20, allocated = 20, failed = 1, diagnostics = [2021-02-09 22:11:48.440]Container De-queued to meet NM queuing limits.
> [2021-02-09 22:11:48.441]Container terminated before launch.
> {noformat}
>  Expected Result: Distributed Shell Yarn Job should not fail.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org