You are viewing a plain text version of this content. The canonical link for it is here.
Posted to yarn-issues@hadoop.apache.org by "Yesha Vora (JIRA)" <ji...@apache.org> on 2016/05/17 00:44:12 UTC

[jira] [Updated] (YARN-5098) Yarn Application log Aggreagation fails due to NM can not get correct HDFS delegation token

     [ https://issues.apache.org/jira/browse/YARN-5098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Yesha Vora updated YARN-5098:
-----------------------------
    Description: 
Scenario (HA-env):
* set dfs.namenode.delegation.token.max-lifetime=43200000 and dfs.namenode.delegation.token.renew-interval=28800000
* Start long running applications
* Let these applications run for ~3 days
* After 3 days, kill Kill the applications
* try to get application logs for above long running apps.

However, Yarn application logs for long running application could not be gathered because Nodemanager failed to talk to HDFS with below error.
{code}
2016-05-16 18:18:28,533 INFO  logaggregation.AppLogAggregatorImpl (AppLogAggregatorImpl.java:finishLogAggregation(555)) - Application just finished : application_1463170334122_0002
2016-05-16 18:18:28,545 WARN  ipc.Client (Client.java:run(705)) - Exception encountered while connecting to the server :
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken): token (HDFS_DELEGATION_TOKEN token 171 for hrt_qa) can't be found in cache
        at org.apache.hadoop.security.SaslRpcClient.saslConnect(SaslRpcClient.java:375)
        at org.apache.hadoop.ipc.Client$Connection.setupSaslConnection(Client.java:583)
        at org.apache.hadoop.ipc.Client$Connection.access$1900(Client.java:398)
        at org.apache.hadoop.ipc.Client$Connection$2.run(Client.java:752)
        at org.apache.hadoop.ipc.Client$Connection$2.run(Client.java:748)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1719)
        at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:747)
        at org.apache.hadoop.ipc.Client$Connection.access$3100(Client.java:398)
        at org.apache.hadoop.ipc.Client.getConnection(Client.java:1597)
        at org.apache.hadoop.ipc.Client.call(Client.java:1439)
        at org.apache.hadoop.ipc.Client.call(Client.java:1386)
        at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:240)
        at com.sun.proxy.$Proxy83.getServerDefaults(Unknown Source)
        at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getServerDefaults(ClientNamenodeProtocolTranslatorPB.java:282)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:256)
        at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:104)
        at com.sun.proxy.$Proxy84.getServerDefaults(Unknown Source)
        at org.apache.hadoop.hdfs.DFSClient.getServerDefaults(DFSClient.java:1018)
        at org.apache.hadoop.fs.Hdfs.getServerDefaults(Hdfs.java:156)
        at org.apache.hadoop.fs.AbstractFileSystem.create(AbstractFileSystem.java:550)
        at org.apache.hadoop.fs.FileContext$3.next(FileContext.java:687)
{code}

  was:
Scenario:
* set dfs.namenode.delegation.token.max-lifetime=43200000 and dfs.namenode.delegation.token.renew-interval=28800000
* Start long running applications
* Let these applications run for ~3 days
* After 3 days, kill Kill the applications
* try to get application logs for above long running apps.

However, Yarn application logs for long running application could not be gathered because Nodemanager failed to talk to HDFS with below error.
{code}
2016-05-16 18:18:28,533 INFO  logaggregation.AppLogAggregatorImpl (AppLogAggregatorImpl.java:finishLogAggregation(555)) - Application just finished : application_1463170334122_0002
2016-05-16 18:18:28,545 WARN  ipc.Client (Client.java:run(705)) - Exception encountered while connecting to the server :
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken): token (HDFS_DELEGATION_TOKEN token 171 for hrt_qa) can't be found in cache
        at org.apache.hadoop.security.SaslRpcClient.saslConnect(SaslRpcClient.java:375)
        at org.apache.hadoop.ipc.Client$Connection.setupSaslConnection(Client.java:583)
        at org.apache.hadoop.ipc.Client$Connection.access$1900(Client.java:398)
        at org.apache.hadoop.ipc.Client$Connection$2.run(Client.java:752)
        at org.apache.hadoop.ipc.Client$Connection$2.run(Client.java:748)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1719)
        at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:747)
        at org.apache.hadoop.ipc.Client$Connection.access$3100(Client.java:398)
        at org.apache.hadoop.ipc.Client.getConnection(Client.java:1597)
        at org.apache.hadoop.ipc.Client.call(Client.java:1439)
        at org.apache.hadoop.ipc.Client.call(Client.java:1386)
        at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:240)
        at com.sun.proxy.$Proxy83.getServerDefaults(Unknown Source)
        at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getServerDefaults(ClientNamenodeProtocolTranslatorPB.java:282)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:256)
        at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:104)
        at com.sun.proxy.$Proxy84.getServerDefaults(Unknown Source)
        at org.apache.hadoop.hdfs.DFSClient.getServerDefaults(DFSClient.java:1018)
        at org.apache.hadoop.fs.Hdfs.getServerDefaults(Hdfs.java:156)
        at org.apache.hadoop.fs.AbstractFileSystem.create(AbstractFileSystem.java:550)
        at org.apache.hadoop.fs.FileContext$3.next(FileContext.java:687)
{code}


> Yarn Application log Aggreagation fails due to NM can not get correct HDFS delegation token
> -------------------------------------------------------------------------------------------
>
>                 Key: YARN-5098
>                 URL: https://issues.apache.org/jira/browse/YARN-5098
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: yarn
>            Reporter: Yesha Vora
>
> Scenario (HA-env):
> * set dfs.namenode.delegation.token.max-lifetime=43200000 and dfs.namenode.delegation.token.renew-interval=28800000
> * Start long running applications
> * Let these applications run for ~3 days
> * After 3 days, kill Kill the applications
> * try to get application logs for above long running apps.
> However, Yarn application logs for long running application could not be gathered because Nodemanager failed to talk to HDFS with below error.
> {code}
> 2016-05-16 18:18:28,533 INFO  logaggregation.AppLogAggregatorImpl (AppLogAggregatorImpl.java:finishLogAggregation(555)) - Application just finished : application_1463170334122_0002
> 2016-05-16 18:18:28,545 WARN  ipc.Client (Client.java:run(705)) - Exception encountered while connecting to the server :
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken): token (HDFS_DELEGATION_TOKEN token 171 for hrt_qa) can't be found in cache
>         at org.apache.hadoop.security.SaslRpcClient.saslConnect(SaslRpcClient.java:375)
>         at org.apache.hadoop.ipc.Client$Connection.setupSaslConnection(Client.java:583)
>         at org.apache.hadoop.ipc.Client$Connection.access$1900(Client.java:398)
>         at org.apache.hadoop.ipc.Client$Connection$2.run(Client.java:752)
>         at org.apache.hadoop.ipc.Client$Connection$2.run(Client.java:748)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:422)
>         at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1719)
>         at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:747)
>         at org.apache.hadoop.ipc.Client$Connection.access$3100(Client.java:398)
>         at org.apache.hadoop.ipc.Client.getConnection(Client.java:1597)
>         at org.apache.hadoop.ipc.Client.call(Client.java:1439)
>         at org.apache.hadoop.ipc.Client.call(Client.java:1386)
>         at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:240)
>         at com.sun.proxy.$Proxy83.getServerDefaults(Unknown Source)
>         at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getServerDefaults(ClientNamenodeProtocolTranslatorPB.java:282)
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>         at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>         at java.lang.reflect.Method.invoke(Method.java:498)
>         at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:256)
>         at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:104)
>         at com.sun.proxy.$Proxy84.getServerDefaults(Unknown Source)
>         at org.apache.hadoop.hdfs.DFSClient.getServerDefaults(DFSClient.java:1018)
>         at org.apache.hadoop.fs.Hdfs.getServerDefaults(Hdfs.java:156)
>         at org.apache.hadoop.fs.AbstractFileSystem.create(AbstractFileSystem.java:550)
>         at org.apache.hadoop.fs.FileContext$3.next(FileContext.java:687)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org