You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "Nicolas Fraison (JIRA)" <ji...@apache.org> on 2017/03/29 14:45:41 UTC

[jira] [Created] (HADOOP-14252) Nodemanagers have DDoS our namenode due to HDFS_DELEGATION_TOKEN expired or not in the cache

Nicolas Fraison created HADOOP-14252:
----------------------------------------

             Summary: Nodemanagers have DDoS our namenode due to HDFS_DELEGATION_TOKEN expired or not in the cache
                 Key: HADOOP-14252
                 URL: https://issues.apache.org/jira/browse/HADOOP-14252
             Project: Hadoop Common
          Issue Type: Bug
          Components: hdfs-client
    Affects Versions: 2.6.0
         Environment: Releases:
cloudera release cdh-5.5.0
openjdk version "1.8.0_91"
linux centos6 servers

Cluster info:
Namenode and resourcemanager in HA with kerberos authentication
More than 1300 datanodes/nodemanagers
            Reporter: Nicolas Fraison
            Priority: Minor


We have faced some huge slowdowns on our namenode due to all our nodemanagers continuing to retry to renew a lease and reconnecting to the namenode every second during 1 hour due to some HDFS_DELEGATION_TOKEN being expired or not in the cache.
The number of time_wait connection on our namenode was stuck to the maximum configured of 250k during this period due to the reconnections each time.

{code}
2017-03-02 11:51:42,817 INFO SecurityLogger.org.apache.hadoop.security.authorize.ServiceAuthorizationManager: Authorization successful for appattempt_1488396860014_156103_000001 (auth:TOKEN) for protocol=interface org.apache.hadoop.yarn.api.ContainerManagementProtocolPB
  2017-03-02 11:51:43,414 INFO SecurityLogger.org.apache.hadoop.security.authorize.ServiceAuthorizationManager: Authorization successful for appattempt_1488396860014_156120_000001 (auth:TOKEN) for protocol=interface org.apache.hadoop.yarn.api.ContainerManagementProtocolPB
  2017-03-02 11:51:51,994 WARN org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException as:prediction (auth:SIMPLE) cause:org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken): token (HDFS_DELEGATION_TOKEN token 111018676 for prediction) is expired
  2017-03-02 11:51:51,995 WARN org.apache.hadoop.ipc.Client: Exception encountered while connecting to the server : org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken): token (HDFS_DELEGATION_TOKEN token 111018676 for prediction) is expired
  2017-03-02 11:51:51,995 WARN org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException as:prediction (auth:SIMPLE) cause:org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken): token (HDFS_DELEGATION_TOKEN token 111018676 for prediction) is expired
  2017-03-02 11:51:51,995 WARN org.apache.hadoop.hdfs.LeaseRenewer: Failed to renew lease for [DFSClient_NONMAPREDUCE_1560141256_4187204] for 30 seconds.  Will retry shortly ...
  token (HDFS_DELEGATION_TOKEN token 111018676 for prediction) is expired
     at org.apache.hadoop.ipc.Client.call(Client.java:1472)
     at org.apache.hadoop.ipc.Client.call(Client.java:1403)
     at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:230)
     at com.sun.proxy.$Proxy20.renewLease(Unknown Source)
     at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.renewLease(ClientNamenodeProtocolTranslatorPB.java:571)
     at sun.reflect.GeneratedMethodAccessor74.invoke(Unknown Source)
     at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
     at java.lang.reflect.Method.invoke(Method.java:498)
     at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:252)
     at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:104)
     at com.sun.proxy.$Proxy21.renewLease(Unknown Source)
     at org.apache.hadoop.hdfs.DFSClient.renewLease(DFSClient.java:921)
     at org.apache.hadoop.hdfs.LeaseRenewer.renew(LeaseRenewer.java:423)
     at org.apache.hadoop.hdfs.LeaseRenewer.run(LeaseRenewer.java:448)
     at org.apache.hadoop.hdfs.LeaseRenewer.access$700(LeaseRenewer.java:71)
     at org.apache.hadoop.hdfs.LeaseRenewer$1.run(LeaseRenewer.java:304)
     at java.lang.Thread.run(Thread.java:745)


  2017-03-02 12:51:22,032 WARN org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException as:prediction (auth:SIMPLE) cause:org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken): token (HDFS_DELEGATION_TOKEN token 111018676 for prediction) can't be found in cache
  2017-03-02 12:51:22,032 WARN org.apache.hadoop.ipc.Client: Exception encountered while connecting to the server : org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken): token (HDFS_DELEGATION_TOKEN token 111018676 for prediction) can't be found in cache
  2017-03-02 12:51:22,033 WARN org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException as:prediction (auth:SIMPLE) cause:org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken): token (HDFS_DELEGATION_TOKEN token 111018676 for prediction) can't be found in cache
  2017-03-02 12:51:22,033 WARN org.apache.hadoop.hdfs.DFSClient: Failed to renew lease for DFSClient_NONMAPREDUCE_1560141256_4187204 for 3600 seconds (>= hard-limit =3600 seconds.) Closing all files being written ...
  token (HDFS_DELEGATION_TOKEN token 111018676 for prediction) can't be found in cache
     at org.apache.hadoop.ipc.Client.call(Client.java:1472)
     at org.apache.hadoop.ipc.Client.call(Client.java:1403)
     at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:230)
     at com.sun.proxy.$Proxy20.renewLease(Unknown Source)
     at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.renewLease(ClientNamenodeProtocolTranslatorPB.java:571)
     at sun.reflect.GeneratedMethodAccessor74.invoke(Unknown Source)
     at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
     at java.lang.reflect.Method.invoke(Method.java:498)
     at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:252)
     at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:104)
     at com.sun.proxy.$Proxy21.renewLease(Unknown Source)
     at org.apache.hadoop.hdfs.DFSClient.renewLease(DFSClient.java:921)
     at org.apache.hadoop.hdfs.LeaseRenewer.renew(LeaseRenewer.java:423)
     at org.apache.hadoop.hdfs.LeaseRenewer.run(LeaseRenewer.java:448)
     at org.apache.hadoop.hdfs.LeaseRenewer.access$700(LeaseRenewer.java:71)
     at org.apache.hadoop.hdfs.LeaseRenewer$1.run(LeaseRenewer.java:304)
     at java.lang.Thread.run(Thread.java:745)
  2017-03-02 12:51:27,364 WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl: rollingMonitorInterval is set as -1. The log rolling mornitoring interval is disabled. The logs will be aggregated after this application is finished.
{code}

The root cause is the yarn proxy configuration having been removed, which in turn causes the resource manager to be unable to renew the HDFS_DELEGATION_TOKEN.
Even though the root cause has been identified, I don't think retrying to renew a lease every second for an hour when there is an expiry/not found token issue is normal because this is not an issue that can be recovered.




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-dev-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-dev-help@hadoop.apache.org