You are viewing a plain text version of this content. The canonical link for it is here.
Posted to yarn-dev@hadoop.apache.org by Henry Saputra <he...@gmail.com> on 2015/06/03 23:01:06 UTC

Question about effect of yarn.resourcemanager.am.max-attempts config for already running ApplicationMaster

Hi All,

I would like to know if "yarn.resourcemanager.am.max-attempts" config
parameter will make the already running ApplicationMaster (AM) to have
HA mode in YARN once it is already running?

Meaning that if the running AM process dies (though permgen, OOM, or
kill JVM with kill signal) then ResourceManager (RM) should be able to
restart the number of times specified by
"yarn.resourcemanager.am.max-attempts" config value ?

I was trying it and it seems like the there was an attempt to restart
the AppMaster but dies immediately.


Thanks,

Henry

Re: Question about effect of yarn.resourcemanager.am.max-attempts config for already running ApplicationMaster

Posted by Henry Saputra <he...@gmail.com>.
Thanks Steve and Rohith,

So yeah just realize it is start counter instead of restart =P

Try it with different AppMaster after restarting the RM and it is
working. Seemed like problem with the other AppMaster. Will try to dig
it why it went wrong.

Thanks again for the insights and help!

- Henry

On Thu, Jun 4, 2015 at 2:12 AM, Steve Loughran <st...@hortonworks.com> wrote:
>
>> On 3 Jun 2015, at 22:01, Henry Saputra <he...@gmail.com> wrote:
>>
>> Hi All,
>>
>> I would like to know if "yarn.resourcemanager.am.max-attempts" config
>> parameter will make the already running ApplicationMaster (AM) to have
>> HA mode in YARN once it is already running?
>>
>
> if you can reconfigure the RM and restart it, the value will be picked up by the RM (rolling upgrades and an HA cluster lets you do that)
>
> for long-lived services, you should have the cluster set up with a window for failures, so that sporadic, intermittent failures don't kill the app.
>
>> Meaning that if the running AM process dies (though permgen, OOM, or
>> kill JVM with kill signal) then ResourceManager (RM) should be able to
>> restart the number of times specified by
>> "yarn.resourcemanager.am.max-attempts" config value ?
>
> yes, though its a "start counter", not a restart counter. That first run counts as attempt #1
>
>>
>> I was trying it and it seems like the there was an attempt to restart
>> the AppMaster but dies immediately.
>>
>
> with a default cluster restart value of 2, two failures in a row is enough to kill the app.
>
> In https://issues.apache.org/jira/browse/YARN-2392  I've a patch to give you more details on count-exceeded values; global and app limits, plus window details.

Re: Question about effect of yarn.resourcemanager.am.max-attempts config for already running ApplicationMaster

Posted by Steve Loughran <st...@hortonworks.com>.
> On 3 Jun 2015, at 22:01, Henry Saputra <he...@gmail.com> wrote:
> 
> Hi All,
> 
> I would like to know if "yarn.resourcemanager.am.max-attempts" config
> parameter will make the already running ApplicationMaster (AM) to have
> HA mode in YARN once it is already running?
> 

if you can reconfigure the RM and restart it, the value will be picked up by the RM (rolling upgrades and an HA cluster lets you do that)

for long-lived services, you should have the cluster set up with a window for failures, so that sporadic, intermittent failures don't kill the app.

> Meaning that if the running AM process dies (though permgen, OOM, or
> kill JVM with kill signal) then ResourceManager (RM) should be able to
> restart the number of times specified by
> "yarn.resourcemanager.am.max-attempts" config value ?

yes, though its a "start counter", not a restart counter. That first run counts as attempt #1

> 
> I was trying it and it seems like the there was an attempt to restart
> the AppMaster but dies immediately.
> 

with a default cluster restart value of 2, two failures in a row is enough to kill the app.

In https://issues.apache.org/jira/browse/YARN-2392  I've a patch to give you more details on count-exceeded values; global and app limits, plus window details.

RE: Question about effect of yarn.resourcemanager.am.max-attempts config for already running ApplicationMaster

Posted by Rohith Sharma K S <ro...@huawei.com>.
Hi

 "yarn.resourcemanager.am.max-attempts" is at global level configuration for all the app master. And it is maximum application attempts by any application can launch AM. 

Similarly, individual application can specify max-attempts using ApplicaitonSubmissionContext#setMaxAppAttempts while submitting application and this value should be in the range [1,globalMaxAttempts]. This works in HA scenario also.

 
In your case, RM is able to restart the attempt but it is not able to launch AM and dies immediately. For finding reason, you need to look into application master console log for why AM launch is failing?  


Thanks & Regards
Rohith Sharma K S

-----Original Message-----
From: Henry Saputra [mailto:henry.saputra@gmail.com] 
Sent: 04 June 2015 02:31
To: yarn-dev@hadoop.apache.org
Subject: Question about effect of yarn.resourcemanager.am.max-attempts config for already running ApplicationMaster

Hi All,

I would like to know if "yarn.resourcemanager.am.max-attempts" config parameter will make the already running ApplicationMaster (AM) to have HA mode in YARN once it is already running?

Meaning that if the running AM process dies (though permgen, OOM, or kill JVM with kill signal) then ResourceManager (RM) should be able to restart the number of times specified by "yarn.resourcemanager.am.max-attempts" config value ?

I was trying it and it seems like the there was an attempt to restart the AppMaster but dies immediately.


Thanks,

Henry