You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Prabhu Joseph <pr...@gmail.com> on 2016/02/09 05:34:57 UTC

Long running Spark job on YARN throws "No AMRMToken"

Hi All,

    A long running Spark job on YARN throws below exception after running
for few days.

yarn.ApplicationMaster: Reporter thread fails 1 time(s) in a row.
org.apache.hadoop.yarn.exceptions.YarnException: *No AMRMToken found* for
user prabhu at org.apache.hadoop.yarn.ipc.RPCUtil.getRemoteException(RPCUti
l.java:45)

Do any of the below renew the AMRMToken and solve the issue

1. yarn-resourcemanager.delegation.token.max-lifetime increase from 7 days

2. Configuring Proxy user:

<property> <name>hadoop.proxyuser.yarn.hosts</name> <value>*</value>
</property>
<property> <name>hadoop.proxyuser.yarn.groups</name> <value>*</value>
</property>

3. Can Spark-1.4.0 handle with fix
https://issues.apache.org/jira/browse/SPARK-5342

    spark.yarn.credentials.file


How to renew the AMRMToken for a long running job on YARN?


Thanks,
Prabhu Joseph

Re: Long running Spark job on YARN throws "No AMRMToken"

Posted by Steve Loughran <st...@hortonworks.com>.
On 11 Feb 2016, at 15:24, Prabhu Joseph <pr...@gmail.com>> wrote:

Steve,


      When ResourceManager is submitted with an application, AMLauncher creates the token YARN_AM_RM_TOKEN (token used between RM and AM). When ApplicationMaster
is launched, it tries to contact RM for registering request, allocate request to receive containers, finish request. In all the requests,

yes, see

https://github.com/steveloughran/hadoop-trunk/blob/HADOOP-12649-security/YARN-4653-yarn/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/YarnApplicationSecurity.md


ResourceManager does the
authorizeRequest, where it checks if the Current User has the token YARN_AM_RM_TOKEN, if not throws the "No AMRMToken".

yes; prior to YARN-3103 it did the login user


       ResourceManager for every yarn.resourcemanager.am-rm-tokens.master-key-rolling-interval-sec rolls the master key, before rolling it, it has a period
of 1.5 *  yarn.am.liveness-monitor.expiry-interval-ms during which if AM contacts RM with allocate request, RM checks if the AM has the YARN_AM_RM_TOKEN
prepared using the previous master key, if so, it updates the AM user with YARN_AM_RM_TOKEN prepared using new master key.

     If AM contacts with an YARN_AM_RM_TOKEN which is neither constructed using current master key nor previous master key, then "Invalid AMRMToken" message is thrown. This
error is the one will happen if AM has not been updated with new RM master key. [YARN-3103 and YARN-2212 ]

Need your help to find scenario where "No AMRMToken" will happen, an user added with a token but later that token is missing. Is token removed since expired?


...or there's some confusion about the current user

I've got a java class to help with credential creation and diagnostics, not yet ported to hadoop core, which can do some listing & dumping of credentials

https://github.com/apache/incubator-slider/blob/develop/slider-core/src/main/java/org/apache/slider/core/launch/CredentialUtils.java

you may be able to copy that code and use it to print out what tokens the current user has; otherwise I don't know. I've never personally hit the message

Re: Long running Spark job on YARN throws "No AMRMToken"

Posted by Prabhu Joseph <pr...@gmail.com>.
Steve,


      When ResourceManager is submitted with an application, AMLauncher
creates the token YARN_AM_RM_TOKEN (token used between RM and AM). When
ApplicationMaster
is launched, it tries to contact RM for registering request, allocate
request to receive containers, finish request. In all the requests,
ResourceManager does the
authorizeRequest, where it checks if the Current User has the token
YARN_AM_RM_TOKEN, if not throws the *"No AMRMToken". *

       ResourceManager for every
yarn.resourcemanager.am-rm-tokens.master-key-rolling-interval-sec rolls the
master key, before rolling it, it has a period
of 1.5 *  yarn.am.liveness-monitor.expiry-interval-ms during which if AM
contacts RM with allocate request, RM checks if the AM has the
YARN_AM_RM_TOKEN
prepared using the previous master key, if so, it updates the AM user with
YARN_AM_RM_TOKEN prepared using new master key.

     If AM contacts with an YARN_AM_RM_TOKEN which is neither constructed
using current master key nor previous master key, then *"Invalid AMRMToken"*
message is thrown. This
error is the one will happen if AM has not been updated with new RM master
key. [YARN-3103 and YARN-2212 ]

Need your help to find scenario where "No AMRMToken" will happen, an user
added with a token but later that token is missing. Is token removed since
expired?


Thanks,
Prabhu Joseph

On Wed, Feb 10, 2016 at 12:59 AM, Hari Shreedharan <
hshreedharan@cloudera.com> wrote:

> The credentials file approach (using keytab for spark apps) will only
> update HDFS tokens. YARN's AMRM tokens should be taken care of by YARN
> internally.
>
> Steve - correct me if I am wrong here: If the AMRM tokens are disappearing
> it might be a YARN bug (does the AMRM token have a 7 day limit as well? I
> thought that was only for HDFS).
>
>
> Thanks,
> Hari
>
> On Tue, Feb 9, 2016 at 8:44 AM, Steve Loughran <st...@hortonworks.com>
> wrote:
>
>>
>> On 9 Feb 2016, at 11:26, Steve Loughran <st...@hortonworks.com> wrote:
>>
>>
>> On 9 Feb 2016, at 05:55, Prabhu Joseph <pr...@gmail.com>
>> wrote:
>>
>> + Spark-Dev
>>
>> On Tue, Feb 9, 2016 at 10:04 AM, Prabhu Joseph <
>> prabhujose.gates@gmail.com> wrote:
>>
>>> Hi All,
>>>
>>>     A long running Spark job on YARN throws below exception after
>>> running for few days.
>>>
>>> yarn.ApplicationMaster: Reporter thread fails 1 time(s) in a row.
>>> org.apache.hadoop.yarn.exceptions.YarnException: *No AMRMToken found* for
>>> user prabhu at org.apache.hadoop.yarn.ipc.RPC
>>> Util.getRemoteException(RPCUtil.java:45)
>>>
>>> Do any of the below renew the AMRMToken and solve the issue
>>>
>>> 1. yarn-resourcemanager.delegation.token.max-lifetime increase from 7
>>> days
>>>
>>> 2. Configuring Proxy user:
>>>
>>> <property> <name>hadoop.proxyuser.yarn.hosts</name> <value>*</value>
>>> </property>
>>> <property> <name>hadoop.proxyuser.yarn.groups</name> <value>*</value>
>>> </property>
>>>
>>
>> wouldnt do that: security issues
>>
>>
>>> 3. Can Spark-1.4.0 handle with fix
>>> https://issues.apache.org/jira/browse/SPARK-5342
>>>
>>>     spark.yarn.credentials.file
>>>
>>>
>>>
>> I'll say "maybe" there
>>
>>
>> uprated to a no, having looked at the code more
>>
>>
>> How to renew the AMRMToken for a long running job on YARN?
>>>
>>>
>>>
>>
>> AMRM token renewal should be automatic in AM; Yarn sends a message to the
>> AM (actually an allocate() response with no containers but a new token at
>> the tail of the message.
>>
>> i don't see any logging in the Hadoopp code there (AMRMClientImpl); filed
>> YARN-4682 to add a log statement
>>
>> if someone other than me were to supply a patch to that JIRA to add a log
>> statement *by the end of the day* I'll review it and get it in to Hadoop 2.8
>>
>>
>> like I said: I'll get this in to hadoop-2.8 if someone is timely with the
>> diff
>>
>>
>

Re: Long running Spark job on YARN throws "No AMRMToken"

Posted by Hari Shreedharan <hs...@cloudera.com>.
The credentials file approach (using keytab for spark apps) will only
update HDFS tokens. YARN's AMRM tokens should be taken care of by YARN
internally.

Steve - correct me if I am wrong here: If the AMRM tokens are disappearing
it might be a YARN bug (does the AMRM token have a 7 day limit as well? I
thought that was only for HDFS).


Thanks,
Hari

On Tue, Feb 9, 2016 at 8:44 AM, Steve Loughran <st...@hortonworks.com>
wrote:

>
> On 9 Feb 2016, at 11:26, Steve Loughran <st...@hortonworks.com> wrote:
>
>
> On 9 Feb 2016, at 05:55, Prabhu Joseph <pr...@gmail.com> wrote:
>
> + Spark-Dev
>
> On Tue, Feb 9, 2016 at 10:04 AM, Prabhu Joseph <prabhujose.gates@gmail.com
> > wrote:
>
>> Hi All,
>>
>>     A long running Spark job on YARN throws below exception after running
>> for few days.
>>
>> yarn.ApplicationMaster: Reporter thread fails 1 time(s) in a row.
>> org.apache.hadoop.yarn.exceptions.YarnException: *No AMRMToken found* for
>> user prabhu at org.apache.hadoop.yarn.ipc.RPC
>> Util.getRemoteException(RPCUtil.java:45)
>>
>> Do any of the below renew the AMRMToken and solve the issue
>>
>> 1. yarn-resourcemanager.delegation.token.max-lifetime increase from 7 days
>>
>> 2. Configuring Proxy user:
>>
>> <property> <name>hadoop.proxyuser.yarn.hosts</name> <value>*</value>
>> </property>
>> <property> <name>hadoop.proxyuser.yarn.groups</name> <value>*</value>
>> </property>
>>
>
> wouldnt do that: security issues
>
>
>> 3. Can Spark-1.4.0 handle with fix
>> https://issues.apache.org/jira/browse/SPARK-5342
>>
>>     spark.yarn.credentials.file
>>
>>
>>
> I'll say "maybe" there
>
>
> uprated to a no, having looked at the code more
>
>
> How to renew the AMRMToken for a long running job on YARN?
>>
>>
>>
>
> AMRM token renewal should be automatic in AM; Yarn sends a message to the
> AM (actually an allocate() response with no containers but a new token at
> the tail of the message.
>
> i don't see any logging in the Hadoopp code there (AMRMClientImpl); filed
> YARN-4682 to add a log statement
>
> if someone other than me were to supply a patch to that JIRA to add a log
> statement *by the end of the day* I'll review it and get it in to Hadoop 2.8
>
>
> like I said: I'll get this in to hadoop-2.8 if someone is timely with the
> diff
>
>

Re: Long running Spark job on YARN throws "No AMRMToken"

Posted by Steve Loughran <st...@hortonworks.com>.
On 9 Feb 2016, at 11:26, Steve Loughran <st...@hortonworks.com>> wrote:


On 9 Feb 2016, at 05:55, Prabhu Joseph <pr...@gmail.com>> wrote:

+ Spark-Dev

On Tue, Feb 9, 2016 at 10:04 AM, Prabhu Joseph <pr...@gmail.com>> wrote:
Hi All,

    A long running Spark job on YARN throws below exception after running for few days.

yarn.ApplicationMaster: Reporter thread fails 1 time(s) in a row. org.apache.hadoop.yarn.exceptions.YarnException: No AMRMToken found for user prabhu at org.apache.hadoop.yarn.ipc.RPCUtil.getRemoteException(RPCUtil.java:45)

Do any of the below renew the AMRMToken and solve the issue

1. yarn-resourcemanager.delegation.token.max-lifetime increase from 7 days

2. Configuring Proxy user:

<property> <name>hadoop.proxyuser.yarn.hosts</name> <value>*</value> </property>
<property> <name>hadoop.proxyuser.yarn.groups</name> <value>*</value> </property>

wouldnt do that: security issues


3. Can Spark-1.4.0 handle with fix https://issues.apache.org/jira/browse/SPARK-5342

    spark.yarn.credentials.file



I'll say "maybe" there

uprated to a no, having looked at the code more


How to renew the AMRMToken for a long running job on YARN?




AMRM token renewal should be automatic in AM; Yarn sends a message to the AM (actually an allocate() response with no containers but a new token at the tail of the message.

i don't see any logging in the Hadoopp code there (AMRMClientImpl); filed YARN-4682 to add a log statement

if someone other than me were to supply a patch to that JIRA to add a log statement *by the end of the day* I'll review it and get it in to Hadoop 2.8


like I said: I'll get this in to hadoop-2.8 if someone is timely with the diff


Re: Long running Spark job on YARN throws "No AMRMToken"

Posted by Steve Loughran <st...@hortonworks.com>.
On 9 Feb 2016, at 05:55, Prabhu Joseph <pr...@gmail.com>> wrote:

+ Spark-Dev

On Tue, Feb 9, 2016 at 10:04 AM, Prabhu Joseph <pr...@gmail.com>> wrote:
Hi All,

    A long running Spark job on YARN throws below exception after running for few days.

yarn.ApplicationMaster: Reporter thread fails 1 time(s) in a row. org.apache.hadoop.yarn.exceptions.YarnException: No AMRMToken found for user prabhu at org.apache.hadoop.yarn.ipc.RPCUtil.getRemoteException(RPCUtil.java:45)

Do any of the below renew the AMRMToken and solve the issue

1. yarn-resourcemanager.delegation.token.max-lifetime increase from 7 days

2. Configuring Proxy user:

<property> <name>hadoop.proxyuser.yarn.hosts</name> <value>*</value> </property>
<property> <name>hadoop.proxyuser.yarn.groups</name> <value>*</value> </property>

wouldnt do that: security issues


3. Can Spark-1.4.0 handle with fix https://issues.apache.org/jira/browse/SPARK-5342

    spark.yarn.credentials.file



I'll say "maybe" there

How to renew the AMRMToken for a long running job on YARN?




AMRM token renewal should be automatic in AM; Yarn sends a message to the AM (actually an allocate() response with no containers but a new token at the tail of the message.

i don't see any logging in the Hadoopp code there (AMRMClientImpl); filed YARN-4682 to add a log statement

if someone other than me were to supply a patch to that JIRA to add a log statement *by the end of the day* I'll review it and get it in to Hadoop 2.8

-Steve

Re: Long running Spark job on YARN throws "No AMRMToken"

Posted by Prabhu Joseph <pr...@gmail.com>.
+ Spark-Dev

On Tue, Feb 9, 2016 at 10:04 AM, Prabhu Joseph <pr...@gmail.com>
wrote:

> Hi All,
>
>     A long running Spark job on YARN throws below exception after running
> for few days.
>
> yarn.ApplicationMaster: Reporter thread fails 1 time(s) in a row.
> org.apache.hadoop.yarn.exceptions.YarnException: *No AMRMToken found* for
> user prabhu at org.apache.hadoop.yarn.ipc.RPC
> Util.getRemoteException(RPCUtil.java:45)
>
> Do any of the below renew the AMRMToken and solve the issue
>
> 1. yarn-resourcemanager.delegation.token.max-lifetime increase from 7 days
>
> 2. Configuring Proxy user:
>
> <property> <name>hadoop.proxyuser.yarn.hosts</name> <value>*</value>
> </property>
> <property> <name>hadoop.proxyuser.yarn.groups</name> <value>*</value>
> </property>
>
> 3. Can Spark-1.4.0 handle with fix
> https://issues.apache.org/jira/browse/SPARK-5342
>
>     spark.yarn.credentials.file
>
>
> How to renew the AMRMToken for a long running job on YARN?
>
>
> Thanks,
> Prabhu Joseph
>
>
>
>
>

Re: Long running Spark job on YARN throws "No AMRMToken"

Posted by Prabhu Joseph <pr...@gmail.com>.
+ Spark-Dev

On Tue, Feb 9, 2016 at 10:04 AM, Prabhu Joseph <pr...@gmail.com>
wrote:

> Hi All,
>
>     A long running Spark job on YARN throws below exception after running
> for few days.
>
> yarn.ApplicationMaster: Reporter thread fails 1 time(s) in a row.
> org.apache.hadoop.yarn.exceptions.YarnException: *No AMRMToken found* for
> user prabhu at org.apache.hadoop.yarn.ipc.RPC
> Util.getRemoteException(RPCUtil.java:45)
>
> Do any of the below renew the AMRMToken and solve the issue
>
> 1. yarn-resourcemanager.delegation.token.max-lifetime increase from 7 days
>
> 2. Configuring Proxy user:
>
> <property> <name>hadoop.proxyuser.yarn.hosts</name> <value>*</value>
> </property>
> <property> <name>hadoop.proxyuser.yarn.groups</name> <value>*</value>
> </property>
>
> 3. Can Spark-1.4.0 handle with fix
> https://issues.apache.org/jira/browse/SPARK-5342
>
>     spark.yarn.credentials.file
>
>
> How to renew the AMRMToken for a long running job on YARN?
>
>
> Thanks,
> Prabhu Joseph
>
>
>
>
>

Re: Long running Spark job on YARN throws "No AMRMToken"

Posted by Prabhu Joseph <pr...@gmail.com>.
+ Spark-Dev

On Tue, Feb 9, 2016 at 10:04 AM, Prabhu Joseph <pr...@gmail.com>
wrote:

> Hi All,
>
>     A long running Spark job on YARN throws below exception after running
> for few days.
>
> yarn.ApplicationMaster: Reporter thread fails 1 time(s) in a row.
> org.apache.hadoop.yarn.exceptions.YarnException: *No AMRMToken found* for
> user prabhu at org.apache.hadoop.yarn.ipc.RPC
> Util.getRemoteException(RPCUtil.java:45)
>
> Do any of the below renew the AMRMToken and solve the issue
>
> 1. yarn-resourcemanager.delegation.token.max-lifetime increase from 7 days
>
> 2. Configuring Proxy user:
>
> <property> <name>hadoop.proxyuser.yarn.hosts</name> <value>*</value>
> </property>
> <property> <name>hadoop.proxyuser.yarn.groups</name> <value>*</value>
> </property>
>
> 3. Can Spark-1.4.0 handle with fix
> https://issues.apache.org/jira/browse/SPARK-5342
>
>     spark.yarn.credentials.file
>
>
> How to renew the AMRMToken for a long running job on YARN?
>
>
> Thanks,
> Prabhu Joseph
>
>
>
>
>

Re: Long running Spark job on YARN throws "No AMRMToken"

Posted by Prabhu Joseph <pr...@gmail.com>.
+ Spark-Dev

On Tue, Feb 9, 2016 at 10:04 AM, Prabhu Joseph <pr...@gmail.com>
wrote:

> Hi All,
>
>     A long running Spark job on YARN throws below exception after running
> for few days.
>
> yarn.ApplicationMaster: Reporter thread fails 1 time(s) in a row.
> org.apache.hadoop.yarn.exceptions.YarnException: *No AMRMToken found* for
> user prabhu at org.apache.hadoop.yarn.ipc.RPC
> Util.getRemoteException(RPCUtil.java:45)
>
> Do any of the below renew the AMRMToken and solve the issue
>
> 1. yarn-resourcemanager.delegation.token.max-lifetime increase from 7 days
>
> 2. Configuring Proxy user:
>
> <property> <name>hadoop.proxyuser.yarn.hosts</name> <value>*</value>
> </property>
> <property> <name>hadoop.proxyuser.yarn.groups</name> <value>*</value>
> </property>
>
> 3. Can Spark-1.4.0 handle with fix
> https://issues.apache.org/jira/browse/SPARK-5342
>
>     spark.yarn.credentials.file
>
>
> How to renew the AMRMToken for a long running job on YARN?
>
>
> Thanks,
> Prabhu Joseph
>
>
>
>
>

Re: Long running Spark job on YARN throws "No AMRMToken"

Posted by Prabhu Joseph <pr...@gmail.com>.
+ Spark-Dev

On Tue, Feb 9, 2016 at 10:04 AM, Prabhu Joseph <pr...@gmail.com>
wrote:

> Hi All,
>
>     A long running Spark job on YARN throws below exception after running
> for few days.
>
> yarn.ApplicationMaster: Reporter thread fails 1 time(s) in a row.
> org.apache.hadoop.yarn.exceptions.YarnException: *No AMRMToken found* for
> user prabhu at org.apache.hadoop.yarn.ipc.RPC
> Util.getRemoteException(RPCUtil.java:45)
>
> Do any of the below renew the AMRMToken and solve the issue
>
> 1. yarn-resourcemanager.delegation.token.max-lifetime increase from 7 days
>
> 2. Configuring Proxy user:
>
> <property> <name>hadoop.proxyuser.yarn.hosts</name> <value>*</value>
> </property>
> <property> <name>hadoop.proxyuser.yarn.groups</name> <value>*</value>
> </property>
>
> 3. Can Spark-1.4.0 handle with fix
> https://issues.apache.org/jira/browse/SPARK-5342
>
>     spark.yarn.credentials.file
>
>
> How to renew the AMRMToken for a long running job on YARN?
>
>
> Thanks,
> Prabhu Joseph
>
>
>
>
>

Re: Long running Spark job on YARN throws "No AMRMToken"

Posted by Prabhu Joseph <pr...@gmail.com>.
+ Spark-Dev

On Tue, Feb 9, 2016 at 10:04 AM, Prabhu Joseph <pr...@gmail.com>
wrote:

> Hi All,
>
>     A long running Spark job on YARN throws below exception after running
> for few days.
>
> yarn.ApplicationMaster: Reporter thread fails 1 time(s) in a row.
> org.apache.hadoop.yarn.exceptions.YarnException: *No AMRMToken found* for
> user prabhu at org.apache.hadoop.yarn.ipc.RPC
> Util.getRemoteException(RPCUtil.java:45)
>
> Do any of the below renew the AMRMToken and solve the issue
>
> 1. yarn-resourcemanager.delegation.token.max-lifetime increase from 7 days
>
> 2. Configuring Proxy user:
>
> <property> <name>hadoop.proxyuser.yarn.hosts</name> <value>*</value>
> </property>
> <property> <name>hadoop.proxyuser.yarn.groups</name> <value>*</value>
> </property>
>
> 3. Can Spark-1.4.0 handle with fix
> https://issues.apache.org/jira/browse/SPARK-5342
>
>     spark.yarn.credentials.file
>
>
> How to renew the AMRMToken for a long running job on YARN?
>
>
> Thanks,
> Prabhu Joseph
>
>
>
>
>