You are viewing a plain text version of this content. The canonical link for it is here.
Posted to yarn-issues@hadoop.apache.org by "Giovanni Matteo Fumarola (JIRA)" <ji...@apache.org> on 2018/07/25 21:49:00 UTC

[jira] [Resolved] (YARN-8580) yarn.resourcemanager.am.max-attempts is not respected for yarn services

     [ https://issues.apache.org/jira/browse/YARN-8580?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Giovanni Matteo Fumarola resolved YARN-8580.
--------------------------------------------
    Resolution: Invalid

> yarn.resourcemanager.am.max-attempts is not respected for yarn services
> -----------------------------------------------------------------------
>
>                 Key: YARN-8580
>                 URL: https://issues.apache.org/jira/browse/YARN-8580
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: yarn-native-services
>    Affects Versions: 3.1.1
>            Reporter: Yesha Vora
>            Priority: Major
>
> 1) Max am attempt is set to 100 on all nodes. ( including gateway)
> {code}
>  <property>
>       <name>yarn.resourcemanager.am.max-attempts</name>
>       <value>100</value>
>     </property>{code}
> 2) Start a Yarn service ( Hbase tarball ) application
> 3) Kill AM 20 times
> Here, App fails with below diagnostics.
> {code}
> bash-4.2$ /usr/hdp/current/hadoop-yarn-client/bin/yarn application -status application_1532481557746_0001
> 18/07/25 18:43:34 INFO client.AHSProxy: Connecting to Application History server at xxx/xxx:10200
> 18/07/25 18:43:34 INFO client.ConfiguredRMFailoverProxyProvider: Failing over to rm2
> 18/07/25 18:43:34 INFO conf.Configuration: found resource resource-types.xml at file:/etc/hadoop/3.0.0.0-1634/0/resource-types.xml
> Application Report : 
> 	Application-Id : application_1532481557746_0001
> 	Application-Name : hbase-tarball-lr
> 	Application-Type : yarn-service
> 	User : hbase
> 	Queue : default
> 	Application Priority : 0
> 	Start-Time : 1532481864863
> 	Finish-Time : 1532522943103
> 	Progress : 100%
> 	State : FAILED
> 	Final-State : FAILED
> 	Tracking-URL : https://xxx:8090/cluster/app/application_1532481557746_0001
> 	RPC Port : -1
> 	AM Host : N/A
> 	Aggregate Resource Allocation : 252150112 MB-seconds, 164141 vcore-seconds
> 	Aggregate Resource Preempted : 0 MB-seconds, 0 vcore-seconds
> 	Log Aggregation Status : SUCCEEDED
> 	Diagnostics : Application application_1532481557746_0001 failed 20 times (global limit =100; local limit is =20) due to AM Container for appattempt_1532481557746_0001_000020 exited with  exitCode: 137
> Failing this attempt.Diagnostics: [2018-07-25 12:49:00.784]Container killed on request. Exit code is 137
> [2018-07-25 12:49:03.045]Container exited with a non-zero exit code 137. 
> [2018-07-25 12:49:03.045]Killed by external signal
> For more detailed output, check the application tracking page: https://xxx:8090/cluster/app/application_1532481557746_0001 Then click on links to logs of each attempt.
> . Failing the application.
> 	Unmanaged Application : false
> 	Application Node Label Expression : <Not set>
> 	AM container Node Label Expression : <DEFAULT_PARTITION>
> 	TimeoutType : LIFETIME	ExpiryTime : 2018-07-25T22:26:15.419+0000	RemainingTime : 0seconds
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org