You are viewing a plain text version of this content. The canonical link for it is here.
Posted to yarn-dev@hadoop.apache.org by "kyungwan nam (Jira)" <ji...@apache.org> on 2020/06/02 06:56:00 UTC

[jira] [Created] (YARN-10305) Lost system-credentials when restarting RM

kyungwan nam created YARN-10305:
-----------------------------------

             Summary: Lost system-credentials when restarting RM
                 Key: YARN-10305
                 URL: https://issues.apache.org/jira/browse/YARN-10305
             Project: Hadoop YARN
          Issue Type: Bug
            Reporter: kyungwan nam
            Assignee: kyungwan nam


System-credentials introduced in YARN-2704, it makes it to keep the long-running apps.
I’ve met a situation where system-credentials lost when restarting RM.
Since then, if an app’s AM is stopped, restarting AM will be failed because NMs do not have HDFS delegation token which is needed for resource localization.


The app has a couple of delegation token including timeline-server token and HDFS delegation token.
When restarting RM, RM will request a new HDFS delegation token for an app that was submitted long ago. (It's fixed by YARN-5098)
But, If an app has a couple of delegation token and an exception occur in the token processed first, the next tokens are not processed.
I think that’s why lost system-credentials.

Here are RM’s logs at the time of restarting RM.
{code}
2020-05-19 14:25:05,712 WARN  security.DelegationTokenRenewer (DelegationTokenRenewer.java:handleDTRenewerAppRecoverEvent(955)) - Unable to add the application to the delegation token renewer on recovery.
java.io.IOException: Failed to renew token: Kind: TIMELINE_DELEGATION_TOKEN, Service: 10.1.1.1:8190, Ident: (TIMELINE_DELEGATION_TOKEN owner=test-admin, renewer=yarn, realUser=yarn, issueDate=1586136363258, maxDate=1587000363258, sequenceNumber=2193, masterKeyId=340)
        at org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.handleAppSubmitEvent(DelegationTokenRenewer.java:503)
        at org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.handleDTRenewerAppRecoverEvent(DelegationTokenRenewer.java:953)
        at org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.access$700(DelegationTokenRenewer.java:79)
        at org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$DelegationTokenRenewerRunnable.run(DelegationTokenRenewer.java:912)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.IOException: HTTP status [403], message [org.apache.hadoop.security.token.SecretManager$InvalidToken: yarn tried to renew an expired token (TIMELINE_DELEGATION_TOKEN owner=test-admin, renewer=yarn, realUser=yarn, issueDate=1586136363258, maxDate=1587000363258, sequenceNumber=2193, masterKeyId=340) max expiration date: 2020-04-16 10:26:03,258+0900 currentTime: 2020-05-19 14:25:05,700+0900]
        at org.apache.hadoop.util.HttpExceptionUtils.validateResponse(HttpExceptionUtils.java:166)
        at org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticator.doDelegationTokenOperation(DelegationTokenAuthenticator.java:319)
        at org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticator.renewDelegationToken(DelegationTokenAuthenticator.java:235)
        at org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticatedURL.renewDelegationToken(DelegationTokenAuthenticatedURL.java:437)
        at org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$2.run(TimelineClientImpl.java:247)
        at org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl$2.run(TimelineClientImpl.java:227)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729)
        at org.apache.hadoop.yarn.client.api.impl.TimelineConnector$TimelineClientRetryOpForOperateDelegationToken.run(TimelineConnector.java:431)
        at org.apache.hadoop.yarn.client.api.impl.TimelineConnector$TimelineClientConnectionRetry.retryOn(TimelineConnector.java:334)
        at org.apache.hadoop.yarn.client.api.impl.TimelineConnector.operateDelegationToken(TimelineConnector.java:218)
        at org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.renewDelegationToken(TimelineClientImpl.java:250)
        at org.apache.hadoop.yarn.security.client.TimelineDelegationTokenIdentifier$Renewer.renew(TimelineDelegationTokenIdentifier.java:81)
        at org.apache.hadoop.security.token.Token.renew(Token.java:512)
        at org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$1.run(DelegationTokenRenewer.java:629)
        at org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer$1.run(DelegationTokenRenewer.java:626)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729)
        at org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.renewToken(DelegationTokenRenewer.java:625)
        at org.apache.hadoop.yarn.server.resourcemanager.security.DelegationTokenRenewer.handleAppSubmitEvent(DelegationTokenRenewer.java:489)
        ... 6 more

{code}





--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-dev-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-dev-help@hadoop.apache.org