You are viewing a plain text version of this content. The canonical link for it is here.
Posted to hdfs-user@hadoop.apache.org by Rainer Toebbicke <rt...@pclella.cern.ch> on 2013/12/02 17:25:53 UTC

Client mapred tries to renew a token with renewer specified as nobody

Hello,

I am trying to understand why my long-running mapreduce jobs stop after 24 hours (approx) on a secure cluster.

This is on Cloudera CDH 4.3.0, hence hadoop 2.0.0, using mrv1 (not yarn), authentication specified as "kerberos". Trying with a short-lived Kerberos ticket (1h) I see that it gets renewed regularly. Still, the job crashes after 24 hours because the delegation token expires.

On a test cluster with increased logging and shortened dfs.namenode.delegation.token.renew-interval (for quicker debugging) I see that an immediate renew of the delegation token fails, and then after the expiry period the Namenode log starts getting clobbered.

Detail:


2013-12-02 15:57:08,461 INFO SecurityLogger.org.apache.hadoop.ipc.Server: Auth successful for tobbicke@CERN.CH (auth:TOKEN)
2013-12-02 15:57:08,462 INFO SecurityLogger.org.apache.hadoop.security.authorize.ServiceAuthorizationManager: Authorization successful for tobbicke@CERN.CH (auth:TOKEN) for protocol=interface org.apache.hadoop.hdfs.protocol.ClientProtocol
2013-12-02 15:57:08,500 INFO SecurityLogger.org.apache.hadoop.ipc.Server: Auth successful for mapred/xxx.cern.ch@CERN.CH (auth:SIMPLE)
2013-12-02 15:57:08,540 INFO SecurityLogger.org.apache.hadoop.security.authorize.ServiceAuthorizationManager: Authorization successful for mapred/xxx.cern.ch@CERN.CH (auth:KERBEROS) for protocol=interface org.apache.hadoop.hdfs.protocol.ClientProtocol
2013-12-02 15:57:08,541 INFO org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager: Token renewal requested for identifier: HDFS_DELEGATION_TOKEN token 12 for tobbicke

2013-12-02 15:57:08,541 ERROR org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException as:mapred/xxx.cern.ch@CERN.CH (auth:KERBEROS) cause:org.apache.hadoop.security.AccessControlException: Client mapred tries to renew a token with renewer specified as nobody
2013-12-02 15:57:08,541 INFO org.apache.hadoop.ipc.Server: IPC Server handler 9 on 9000, call org.apache.hadoop.hdfs.protocol.ClientProtocol.renewDelegationToken from 188.184.xxx.xxx:42031: error: org.apache.hadoop.security.AccessControlException: Client mapred tries to renew a token with renewer specified as nobody
org.apache.hadoop.security.AccessControlException: Client mapred tries to renew a token with renewer specified as nobody
        at org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager.renewToken(AbstractDelegationTokenSecretManager.java:274)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.renewDelegationToken(FSNamesystem.java:5319)
        at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.renewDelegationToken(NameNodeRpcServer.java:377)
        at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.renewDelegationToken(ClientNamenodeProtocolServerSideTranslatorPB.java:814)
        at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:45024)
        at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1002)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1701)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1697)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:396)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1695)



Is this as unhealthy as it looks? If the first (immediate) renewal fails I assume others will share the same fate. Would that explain the 24-hour-lifetime on the "real" cluster and what could be the reason? How does "nobody" come into the game here?

In any case, linked to this or not, after dfs.namenode.delegation.token.renew-interval ms the following is logged a zillion times:

2013-12-02 16:58:09,718 WARN SecurityLogger.org.apache.hadoop.ipc.Server: Auth failed for 188.184.xxx.xxx:44979:null (DIGEST-MD5: IO error acquiring password)
2013-12-02 16:58:09,719 INFO org.apache.hadoop.ipc.Server: IPC Server listener on 9000: readAndProcess threw exception javax.security.sasl.SaslException: DIGEST-MD5: IO error acquiring password [Caused by org.apache.hadoop.security.token.SecretManager$InvalidToken: token (HDFS_DELEGATION_TOKEN token 12 for tobbicke) is expired] from client 188.184.xxx.xxx. Count of bytes read: 0
javax.security.sasl.SaslException: DIGEST-MD5: IO error acquiring password [Caused by org.apache.hadoop.security.token.SecretManager$InvalidToken: token (HDFS_DELEGATION_TOKEN token 12 for tobbicke) is expired]
        at com.sun.security.sasl.digest.DigestMD5Server.validateClientResponse(DigestMD5Server.java:577)
        at com.sun.security.sasl.digest.DigestMD5Server.evaluateResponse(DigestMD5Server.java:226)
        at org.apache.hadoop.ipc.Server$Connection.saslReadAndProcess(Server.java:1210)
        at org.apache.hadoop.ipc.Server$Connection.readAndProcess(Server.java:1405)
        at org.apache.hadoop.ipc.Server$Listener.doRead(Server.java:719)
        at org.apache.hadoop.ipc.Server$Listener$Reader.doRunLoop(Server.java:518)
        at org.apache.hadoop.ipc.Server$Listener$Reader.run(Server.java:493)
Caused by: org.apache.hadoop.security.token.SecretManager$InvalidToken: token (HDFS_DELEGATION_TOKEN token 12 for tobbicke) is expired
        at org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager.retrievePassword(AbstractDelegationTokenSecretManager.java:227)
        at org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager.retrievePassword(AbstractDelegationTokenSecretManager.java:46)
        at org.apache.hadoop.security.SaslRpcServer$SaslDigestCallbackHandler.getPassword(SaslRpcServer.java:194)
        at org.apache.hadoop.security.SaslRpcServer$SaslDigestCallbackHandler.handle(SaslRpcServer.java:220)
        at com.sun.security.sasl.digest.DigestMD5Server.validateClientResponse(DigestMD5Server.java:568)
        ... 6 more


Any ideas?

Rainer

Re: Client mapred tries to renew a token with renewer specified as nobody

Posted by Rainer Toebbicke <rt...@pclella.cern.ch>.
Well, that does not seem to be the issue. The Kerberos ticket gets refreshed automatically, but the delegation token doesn't.


Le 3 déc. 2013 à 20:24, Raviteja Chirala a écrit :

> Alternatively you can schedule a cron job to do kinit every 20 hours or so. Just to renew token before it expires. 


Re: Client mapred tries to renew a token with renewer specified as nobody

Posted by Rainer Toebbicke <rt...@pclella.cern.ch>.
Well, that does not seem to be the issue. The Kerberos ticket gets refreshed automatically, but the delegation token doesn't.


Le 3 déc. 2013 à 20:24, Raviteja Chirala a écrit :

> Alternatively you can schedule a cron job to do kinit every 20 hours or so. Just to renew token before it expires. 


Re: Client mapred tries to renew a token with renewer specified as nobody

Posted by Rainer Toebbicke <rt...@pclella.cern.ch>.
Well, that does not seem to be the issue. The Kerberos ticket gets refreshed automatically, but the delegation token doesn't.


Le 3 déc. 2013 à 20:24, Raviteja Chirala a écrit :

> Alternatively you can schedule a cron job to do kinit every 20 hours or so. Just to renew token before it expires. 


Re: Client mapred tries to renew a token with renewer specified as nobody

Posted by Rainer Toebbicke <rt...@pclella.cern.ch>.
Well, that does not seem to be the issue. The Kerberos ticket gets refreshed automatically, but the delegation token doesn't.


Le 3 déc. 2013 à 20:24, Raviteja Chirala a écrit :

> Alternatively you can schedule a cron job to do kinit every 20 hours or so. Just to renew token before it expires. 


Re: Client mapred tries to renew a token with renewer specified as nobody

Posted by Raviteja Chirala <rt...@gmail.com>.
Alternatively you can schedule a cron job to do kinit every 20 hours or so. Just to renew token before it expires. 

—
Sent from Mailbox for iPad

On Mon, Dec 2, 2013 at 9:12 AM, Rainer Toebbicke <rt...@pclella.cern.ch>
wrote:

> Hello,
> I am trying to understand why my long-running mapreduce jobs stop after 24 hours (approx) on a secure cluster.
> This is on Cloudera CDH 4.3.0, hence hadoop 2.0.0, using mrv1 (not yarn), authentication specified as "kerberos". Trying with a short-lived Kerberos ticket (1h) I see that it gets renewed regularly. Still, the job crashes after 24 hours because the delegation token expires.
> On a test cluster with increased logging and shortened dfs.namenode.delegation.token.renew-interval (for quicker debugging) I see that an immediate renew of the delegation token fails, and then after the expiry period the Namenode log starts getting clobbered.
> Detail:
> 2013-12-02 15:57:08,461 INFO SecurityLogger.org.apache.hadoop.ipc.Server: Auth successful for tobbicke@CERN.CH (auth:TOKEN)
> 2013-12-02 15:57:08,462 INFO SecurityLogger.org.apache.hadoop.security.authorize.ServiceAuthorizationManager: Authorization successful for tobbicke@CERN.CH (auth:TOKEN) for protocol=interface org.apache.hadoop.hdfs.protocol.ClientProtocol
> 2013-12-02 15:57:08,500 INFO SecurityLogger.org.apache.hadoop.ipc.Server: Auth successful for mapred/xxx.cern.ch@CERN.CH (auth:SIMPLE)
> 2013-12-02 15:57:08,540 INFO SecurityLogger.org.apache.hadoop.security.authorize.ServiceAuthorizationManager: Authorization successful for mapred/xxx.cern.ch@CERN.CH (auth:KERBEROS) for protocol=interface org.apache.hadoop.hdfs.protocol.ClientProtocol
> 2013-12-02 15:57:08,541 INFO org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager: Token renewal requested for identifier: HDFS_DELEGATION_TOKEN token 12 for tobbicke
> 2013-12-02 15:57:08,541 ERROR org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException as:mapred/xxx.cern.ch@CERN.CH (auth:KERBEROS) cause:org.apache.hadoop.security.AccessControlException: Client mapred tries to renew a token with renewer specified as nobody
> 2013-12-02 15:57:08,541 INFO org.apache.hadoop.ipc.Server: IPC Server handler 9 on 9000, call org.apache.hadoop.hdfs.protocol.ClientProtocol.renewDelegationToken from 188.184.xxx.xxx:42031: error: org.apache.hadoop.security.AccessControlException: Client mapred tries to renew a token with renewer specified as nobody
> org.apache.hadoop.security.AccessControlException: Client mapred tries to renew a token with renewer specified as nobody
>         at org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager.renewToken(AbstractDelegationTokenSecretManager.java:274)
>         at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.renewDelegationToken(FSNamesystem.java:5319)
>         at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.renewDelegationToken(NameNodeRpcServer.java:377)
>         at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.renewDelegationToken(ClientNamenodeProtocolServerSideTranslatorPB.java:814)
>         at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:45024)
>         at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
>         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1002)
>         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1701)
>         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1697)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:396)
>         at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
>         at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1695)
> Is this as unhealthy as it looks? If the first (immediate) renewal fails I assume others will share the same fate. Would that explain the 24-hour-lifetime on the "real" cluster and what could be the reason? How does "nobody" come into the game here?
> In any case, linked to this or not, after dfs.namenode.delegation.token.renew-interval ms the following is logged a zillion times:
> 2013-12-02 16:58:09,718 WARN SecurityLogger.org.apache.hadoop.ipc.Server: Auth failed for 188.184.xxx.xxx:44979:null (DIGEST-MD5: IO error acquiring password)
> 2013-12-02 16:58:09,719 INFO org.apache.hadoop.ipc.Server: IPC Server listener on 9000: readAndProcess threw exception javax.security.sasl.SaslException: DIGEST-MD5: IO error acquiring password [Caused by org.apache.hadoop.security.token.SecretManager$InvalidToken: token (HDFS_DELEGATION_TOKEN token 12 for tobbicke) is expired] from client 188.184.xxx.xxx. Count of bytes read: 0
> javax.security.sasl.SaslException: DIGEST-MD5: IO error acquiring password [Caused by org.apache.hadoop.security.token.SecretManager$InvalidToken: token (HDFS_DELEGATION_TOKEN token 12 for tobbicke) is expired]
>         at com.sun.security.sasl.digest.DigestMD5Server.validateClientResponse(DigestMD5Server.java:577)
>         at com.sun.security.sasl.digest.DigestMD5Server.evaluateResponse(DigestMD5Server.java:226)
>         at org.apache.hadoop.ipc.Server$Connection.saslReadAndProcess(Server.java:1210)
>         at org.apache.hadoop.ipc.Server$Connection.readAndProcess(Server.java:1405)
>         at org.apache.hadoop.ipc.Server$Listener.doRead(Server.java:719)
>         at org.apache.hadoop.ipc.Server$Listener$Reader.doRunLoop(Server.java:518)
>         at org.apache.hadoop.ipc.Server$Listener$Reader.run(Server.java:493)
> Caused by: org.apache.hadoop.security.token.SecretManager$InvalidToken: token (HDFS_DELEGATION_TOKEN token 12 for tobbicke) is expired
>         at org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager.retrievePassword(AbstractDelegationTokenSecretManager.java:227)
>         at org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager.retrievePassword(AbstractDelegationTokenSecretManager.java:46)
>         at org.apache.hadoop.security.SaslRpcServer$SaslDigestCallbackHandler.getPassword(SaslRpcServer.java:194)
>         at org.apache.hadoop.security.SaslRpcServer$SaslDigestCallbackHandler.handle(SaslRpcServer.java:220)
>         at com.sun.security.sasl.digest.DigestMD5Server.validateClientResponse(DigestMD5Server.java:568)
>         ... 6 more
> Any ideas?
> Rainer

Re: Client mapred tries to renew a token with renewer specified as nobody

Posted by Daryn Sharp <da...@yahoo-inc.com>.
We encountered the same issue in yarn's RM so I made the RM recognize it's own tokens and renew them regardless of the renewer.  In part because older versions of oozie hardcode the renewer as "mrtoken".  I thought the change made it back into 1.x JT but I guess not.

I agree that the conversion to short name is occurring on the "wrong side of the fence".  Auth_to_local is a function of the server, not the client.  Please file a jira.

Dfsclient also shouldn't be repeatedly retrying when the connection fails with InvalidToken.  Hopefully the client didn't try more than 45 times (to clarify your "gazillion" from an earlier message).   Some exceptions are transient, but an invalid token is game over because tokens don't ever become valid again.  HA used to rely on InvalidToken to trigger retries in case the standby is in the processing of catching up on edits, but that's been converted to a RetriableException during a transition to active.   Please file another jira.

I would also expect  _HOST to expand correctly since a client may need to submit to multiple JTs and not want to hassle with using different configs.  Would you like to file another jira? :)

Daryn

On Dec 9, 2013, at 3:13 AM, Rainer Toebbicke <rt...@rtb-big-mac.cern.ch> wrote:

> 
> Le 5 déc. 2013 à 05:30, Vinod Kumar Vavilapalli a écrit :
> 
>> 
>> It is clearly mentioning that the renewer is wrong (renewer marked is 'nobody' but mapred is trying to renew the token), you may want to check this.
>> 
>> Thanks,
>> +Vinod
>> 
>> On Dec 2, 2013, at 8:25 AM, Rainer Toebbicke wrote:
>> 
>>> 2013-12-02 15:57:08,541 ERROR org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException as:mapred/xxx.cern.ch@CERN.CH (auth:KERBEROS) cause:org.apache.hadoop.security.AccessControlException: Client mapred tries to renew a token with renewer specified as nobody
>> 
> 
> Thanks, I had already guessed so far and finally the problem turned out a subtle bug in the Cloudera "Configuring MRv1 Security" instructions: they recommend to specify mapreduce.jobtracker.kerberos.principal as mapred/_HOST@YOUR-REALM.COM in mapred-site.xml. That won't work.
> 
> It confuses TokenCache.obtainTokensForNamenodesInternal() when it tries to obtain mapred's HadoopKerberosName.getShortname(): it finds the untranslated _HOST which a correctly configured hadoop.security.auth_to_local would not recognize as valid cluster node. That's where "nobody" comes in. 
> 
> Solution: mapreduce.jobtracker.kerberos.principal should really be full a fully resolved mapred/jobtracker-fqdn@YOUR-REALM.COM and thus would usually pass the hadoop.security.auth_to_local rules.
> 
> One could argue that TokenCache.obtainTokensForNamenodesInternal() actually uses getShortname() from the wrong side of the fence but the little change fixes the problem.
> 
> Thanks, Rainer


Re: Client mapred tries to renew a token with renewer specified as nobody

Posted by Daryn Sharp <da...@yahoo-inc.com>.
We encountered the same issue in yarn's RM so I made the RM recognize it's own tokens and renew them regardless of the renewer.  In part because older versions of oozie hardcode the renewer as "mrtoken".  I thought the change made it back into 1.x JT but I guess not.

I agree that the conversion to short name is occurring on the "wrong side of the fence".  Auth_to_local is a function of the server, not the client.  Please file a jira.

Dfsclient also shouldn't be repeatedly retrying when the connection fails with InvalidToken.  Hopefully the client didn't try more than 45 times (to clarify your "gazillion" from an earlier message).   Some exceptions are transient, but an invalid token is game over because tokens don't ever become valid again.  HA used to rely on InvalidToken to trigger retries in case the standby is in the processing of catching up on edits, but that's been converted to a RetriableException during a transition to active.   Please file another jira.

I would also expect  _HOST to expand correctly since a client may need to submit to multiple JTs and not want to hassle with using different configs.  Would you like to file another jira? :)

Daryn

On Dec 9, 2013, at 3:13 AM, Rainer Toebbicke <rt...@rtb-big-mac.cern.ch> wrote:

> 
> Le 5 déc. 2013 à 05:30, Vinod Kumar Vavilapalli a écrit :
> 
>> 
>> It is clearly mentioning that the renewer is wrong (renewer marked is 'nobody' but mapred is trying to renew the token), you may want to check this.
>> 
>> Thanks,
>> +Vinod
>> 
>> On Dec 2, 2013, at 8:25 AM, Rainer Toebbicke wrote:
>> 
>>> 2013-12-02 15:57:08,541 ERROR org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException as:mapred/xxx.cern.ch@CERN.CH (auth:KERBEROS) cause:org.apache.hadoop.security.AccessControlException: Client mapred tries to renew a token with renewer specified as nobody
>> 
> 
> Thanks, I had already guessed so far and finally the problem turned out a subtle bug in the Cloudera "Configuring MRv1 Security" instructions: they recommend to specify mapreduce.jobtracker.kerberos.principal as mapred/_HOST@YOUR-REALM.COM in mapred-site.xml. That won't work.
> 
> It confuses TokenCache.obtainTokensForNamenodesInternal() when it tries to obtain mapred's HadoopKerberosName.getShortname(): it finds the untranslated _HOST which a correctly configured hadoop.security.auth_to_local would not recognize as valid cluster node. That's where "nobody" comes in. 
> 
> Solution: mapreduce.jobtracker.kerberos.principal should really be full a fully resolved mapred/jobtracker-fqdn@YOUR-REALM.COM and thus would usually pass the hadoop.security.auth_to_local rules.
> 
> One could argue that TokenCache.obtainTokensForNamenodesInternal() actually uses getShortname() from the wrong side of the fence but the little change fixes the problem.
> 
> Thanks, Rainer


Re: Client mapred tries to renew a token with renewer specified as nobody

Posted by Daryn Sharp <da...@yahoo-inc.com>.
We encountered the same issue in yarn's RM so I made the RM recognize it's own tokens and renew them regardless of the renewer.  In part because older versions of oozie hardcode the renewer as "mrtoken".  I thought the change made it back into 1.x JT but I guess not.

I agree that the conversion to short name is occurring on the "wrong side of the fence".  Auth_to_local is a function of the server, not the client.  Please file a jira.

Dfsclient also shouldn't be repeatedly retrying when the connection fails with InvalidToken.  Hopefully the client didn't try more than 45 times (to clarify your "gazillion" from an earlier message).   Some exceptions are transient, but an invalid token is game over because tokens don't ever become valid again.  HA used to rely on InvalidToken to trigger retries in case the standby is in the processing of catching up on edits, but that's been converted to a RetriableException during a transition to active.   Please file another jira.

I would also expect  _HOST to expand correctly since a client may need to submit to multiple JTs and not want to hassle with using different configs.  Would you like to file another jira? :)

Daryn

On Dec 9, 2013, at 3:13 AM, Rainer Toebbicke <rt...@rtb-big-mac.cern.ch> wrote:

> 
> Le 5 déc. 2013 à 05:30, Vinod Kumar Vavilapalli a écrit :
> 
>> 
>> It is clearly mentioning that the renewer is wrong (renewer marked is 'nobody' but mapred is trying to renew the token), you may want to check this.
>> 
>> Thanks,
>> +Vinod
>> 
>> On Dec 2, 2013, at 8:25 AM, Rainer Toebbicke wrote:
>> 
>>> 2013-12-02 15:57:08,541 ERROR org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException as:mapred/xxx.cern.ch@CERN.CH (auth:KERBEROS) cause:org.apache.hadoop.security.AccessControlException: Client mapred tries to renew a token with renewer specified as nobody
>> 
> 
> Thanks, I had already guessed so far and finally the problem turned out a subtle bug in the Cloudera "Configuring MRv1 Security" instructions: they recommend to specify mapreduce.jobtracker.kerberos.principal as mapred/_HOST@YOUR-REALM.COM in mapred-site.xml. That won't work.
> 
> It confuses TokenCache.obtainTokensForNamenodesInternal() when it tries to obtain mapred's HadoopKerberosName.getShortname(): it finds the untranslated _HOST which a correctly configured hadoop.security.auth_to_local would not recognize as valid cluster node. That's where "nobody" comes in. 
> 
> Solution: mapreduce.jobtracker.kerberos.principal should really be full a fully resolved mapred/jobtracker-fqdn@YOUR-REALM.COM and thus would usually pass the hadoop.security.auth_to_local rules.
> 
> One could argue that TokenCache.obtainTokensForNamenodesInternal() actually uses getShortname() from the wrong side of the fence but the little change fixes the problem.
> 
> Thanks, Rainer


Re: Client mapred tries to renew a token with renewer specified as nobody

Posted by Daryn Sharp <da...@yahoo-inc.com>.
We encountered the same issue in yarn's RM so I made the RM recognize it's own tokens and renew them regardless of the renewer.  In part because older versions of oozie hardcode the renewer as "mrtoken".  I thought the change made it back into 1.x JT but I guess not.

I agree that the conversion to short name is occurring on the "wrong side of the fence".  Auth_to_local is a function of the server, not the client.  Please file a jira.

Dfsclient also shouldn't be repeatedly retrying when the connection fails with InvalidToken.  Hopefully the client didn't try more than 45 times (to clarify your "gazillion" from an earlier message).   Some exceptions are transient, but an invalid token is game over because tokens don't ever become valid again.  HA used to rely on InvalidToken to trigger retries in case the standby is in the processing of catching up on edits, but that's been converted to a RetriableException during a transition to active.   Please file another jira.

I would also expect  _HOST to expand correctly since a client may need to submit to multiple JTs and not want to hassle with using different configs.  Would you like to file another jira? :)

Daryn

On Dec 9, 2013, at 3:13 AM, Rainer Toebbicke <rt...@rtb-big-mac.cern.ch> wrote:

> 
> Le 5 déc. 2013 à 05:30, Vinod Kumar Vavilapalli a écrit :
> 
>> 
>> It is clearly mentioning that the renewer is wrong (renewer marked is 'nobody' but mapred is trying to renew the token), you may want to check this.
>> 
>> Thanks,
>> +Vinod
>> 
>> On Dec 2, 2013, at 8:25 AM, Rainer Toebbicke wrote:
>> 
>>> 2013-12-02 15:57:08,541 ERROR org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException as:mapred/xxx.cern.ch@CERN.CH (auth:KERBEROS) cause:org.apache.hadoop.security.AccessControlException: Client mapred tries to renew a token with renewer specified as nobody
>> 
> 
> Thanks, I had already guessed so far and finally the problem turned out a subtle bug in the Cloudera "Configuring MRv1 Security" instructions: they recommend to specify mapreduce.jobtracker.kerberos.principal as mapred/_HOST@YOUR-REALM.COM in mapred-site.xml. That won't work.
> 
> It confuses TokenCache.obtainTokensForNamenodesInternal() when it tries to obtain mapred's HadoopKerberosName.getShortname(): it finds the untranslated _HOST which a correctly configured hadoop.security.auth_to_local would not recognize as valid cluster node. That's where "nobody" comes in. 
> 
> Solution: mapreduce.jobtracker.kerberos.principal should really be full a fully resolved mapred/jobtracker-fqdn@YOUR-REALM.COM and thus would usually pass the hadoop.security.auth_to_local rules.
> 
> One could argue that TokenCache.obtainTokensForNamenodesInternal() actually uses getShortname() from the wrong side of the fence but the little change fixes the problem.
> 
> Thanks, Rainer


Re: Client mapred tries to renew a token with renewer specified as nobody

Posted by Rainer Toebbicke <rt...@pclella.cern.ch>.
Le 5 déc. 2013 à 05:30, Vinod Kumar Vavilapalli a écrit :

> 
> It is clearly mentioning that the renewer is wrong (renewer marked is 'nobody' but mapred is trying to renew the token), you may want to check this.
> 
> Thanks,
> +Vinod
> 
> On Dec 2, 2013, at 8:25 AM, Rainer Toebbicke wrote:
> 
>> 2013-12-02 15:57:08,541 ERROR org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException as:mapred/xxx.cern.ch@CERN.CH (auth:KERBEROS) cause:org.apache.hadoop.security.AccessControlException: Client mapred tries to renew a token with renewer specified as nobody
> 

Thanks, I had already guessed so far and finally the problem turned out a subtle bug in the Cloudera "Configuring MRv1 Security" instructions: they recommend to specify mapreduce.jobtracker.kerberos.principal as mapred/_HOST@YOUR-REALM.COM in mapred-site.xml. That won't work.

It confuses TokenCache.obtainTokensForNamenodesInternal() when it tries to obtain mapred's HadoopKerberosName.getShortname(): it finds the untranslated _HOST which a correctly configured hadoop.security.auth_to_local would not recognize as valid cluster node. That's where "nobody" comes in. 

Solution: mapreduce.jobtracker.kerberos.principal should really be full a fully resolved mapred/jobtracker-fqdn@YOUR-REALM.COM and thus would usually pass the hadoop.security.auth_to_local rules.

One could argue that TokenCache.obtainTokensForNamenodesInternal() actually uses getShortname() from the wrong side of the fence but the little change fixes the problem.

Thanks, Rainer

Re: Client mapred tries to renew a token with renewer specified as nobody

Posted by Rainer Toebbicke <rt...@pclella.cern.ch>.
Le 5 déc. 2013 à 05:30, Vinod Kumar Vavilapalli a écrit :

> 
> It is clearly mentioning that the renewer is wrong (renewer marked is 'nobody' but mapred is trying to renew the token), you may want to check this.
> 
> Thanks,
> +Vinod
> 
> On Dec 2, 2013, at 8:25 AM, Rainer Toebbicke wrote:
> 
>> 2013-12-02 15:57:08,541 ERROR org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException as:mapred/xxx.cern.ch@CERN.CH (auth:KERBEROS) cause:org.apache.hadoop.security.AccessControlException: Client mapred tries to renew a token with renewer specified as nobody
> 

Thanks, I had already guessed so far and finally the problem turned out a subtle bug in the Cloudera "Configuring MRv1 Security" instructions: they recommend to specify mapreduce.jobtracker.kerberos.principal as mapred/_HOST@YOUR-REALM.COM in mapred-site.xml. That won't work.

It confuses TokenCache.obtainTokensForNamenodesInternal() when it tries to obtain mapred's HadoopKerberosName.getShortname(): it finds the untranslated _HOST which a correctly configured hadoop.security.auth_to_local would not recognize as valid cluster node. That's where "nobody" comes in. 

Solution: mapreduce.jobtracker.kerberos.principal should really be full a fully resolved mapred/jobtracker-fqdn@YOUR-REALM.COM and thus would usually pass the hadoop.security.auth_to_local rules.

One could argue that TokenCache.obtainTokensForNamenodesInternal() actually uses getShortname() from the wrong side of the fence but the little change fixes the problem.

Thanks, Rainer

Re: Client mapred tries to renew a token with renewer specified as nobody

Posted by Rainer Toebbicke <rt...@pclella.cern.ch>.
Le 5 déc. 2013 à 05:30, Vinod Kumar Vavilapalli a écrit :

> 
> It is clearly mentioning that the renewer is wrong (renewer marked is 'nobody' but mapred is trying to renew the token), you may want to check this.
> 
> Thanks,
> +Vinod
> 
> On Dec 2, 2013, at 8:25 AM, Rainer Toebbicke wrote:
> 
>> 2013-12-02 15:57:08,541 ERROR org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException as:mapred/xxx.cern.ch@CERN.CH (auth:KERBEROS) cause:org.apache.hadoop.security.AccessControlException: Client mapred tries to renew a token with renewer specified as nobody
> 

Thanks, I had already guessed so far and finally the problem turned out a subtle bug in the Cloudera "Configuring MRv1 Security" instructions: they recommend to specify mapreduce.jobtracker.kerberos.principal as mapred/_HOST@YOUR-REALM.COM in mapred-site.xml. That won't work.

It confuses TokenCache.obtainTokensForNamenodesInternal() when it tries to obtain mapred's HadoopKerberosName.getShortname(): it finds the untranslated _HOST which a correctly configured hadoop.security.auth_to_local would not recognize as valid cluster node. That's where "nobody" comes in. 

Solution: mapreduce.jobtracker.kerberos.principal should really be full a fully resolved mapred/jobtracker-fqdn@YOUR-REALM.COM and thus would usually pass the hadoop.security.auth_to_local rules.

One could argue that TokenCache.obtainTokensForNamenodesInternal() actually uses getShortname() from the wrong side of the fence but the little change fixes the problem.

Thanks, Rainer

Re: Client mapred tries to renew a token with renewer specified as nobody

Posted by Rainer Toebbicke <rt...@pclella.cern.ch>.
Le 5 déc. 2013 à 05:30, Vinod Kumar Vavilapalli a écrit :

> 
> It is clearly mentioning that the renewer is wrong (renewer marked is 'nobody' but mapred is trying to renew the token), you may want to check this.
> 
> Thanks,
> +Vinod
> 
> On Dec 2, 2013, at 8:25 AM, Rainer Toebbicke wrote:
> 
>> 2013-12-02 15:57:08,541 ERROR org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException as:mapred/xxx.cern.ch@CERN.CH (auth:KERBEROS) cause:org.apache.hadoop.security.AccessControlException: Client mapred tries to renew a token with renewer specified as nobody
> 

Thanks, I had already guessed so far and finally the problem turned out a subtle bug in the Cloudera "Configuring MRv1 Security" instructions: they recommend to specify mapreduce.jobtracker.kerberos.principal as mapred/_HOST@YOUR-REALM.COM in mapred-site.xml. That won't work.

It confuses TokenCache.obtainTokensForNamenodesInternal() when it tries to obtain mapred's HadoopKerberosName.getShortname(): it finds the untranslated _HOST which a correctly configured hadoop.security.auth_to_local would not recognize as valid cluster node. That's where "nobody" comes in. 

Solution: mapreduce.jobtracker.kerberos.principal should really be full a fully resolved mapred/jobtracker-fqdn@YOUR-REALM.COM and thus would usually pass the hadoop.security.auth_to_local rules.

One could argue that TokenCache.obtainTokensForNamenodesInternal() actually uses getShortname() from the wrong side of the fence but the little change fixes the problem.

Thanks, Rainer

Re: Client mapred tries to renew a token with renewer specified as nobody

Posted by Vinod Kumar Vavilapalli <vi...@hortonworks.com>.
It is clearly mentioning that the renewer is wrong (renewer marked is 'nobody' but mapred is trying to renew the token), you may want to check this.

Thanks,
+Vinod

On Dec 2, 2013, at 8:25 AM, Rainer Toebbicke wrote:

> 2013-12-02 15:57:08,541 ERROR org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException as:mapred/xxx.cern.ch@CERN.CH (auth:KERBEROS) cause:org.apache.hadoop.security.AccessControlException: Client mapred tries to renew a token with renewer specified as nobody


-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Re: Client mapred tries to renew a token with renewer specified as nobody

Posted by Vinod Kumar Vavilapalli <vi...@hortonworks.com>.
It is clearly mentioning that the renewer is wrong (renewer marked is 'nobody' but mapred is trying to renew the token), you may want to check this.

Thanks,
+Vinod

On Dec 2, 2013, at 8:25 AM, Rainer Toebbicke wrote:

> 2013-12-02 15:57:08,541 ERROR org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException as:mapred/xxx.cern.ch@CERN.CH (auth:KERBEROS) cause:org.apache.hadoop.security.AccessControlException: Client mapred tries to renew a token with renewer specified as nobody


-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Re: Client mapred tries to renew a token with renewer specified as nobody

Posted by Vinod Kumar Vavilapalli <vi...@hortonworks.com>.
It is clearly mentioning that the renewer is wrong (renewer marked is 'nobody' but mapred is trying to renew the token), you may want to check this.

Thanks,
+Vinod

On Dec 2, 2013, at 8:25 AM, Rainer Toebbicke wrote:

> 2013-12-02 15:57:08,541 ERROR org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException as:mapred/xxx.cern.ch@CERN.CH (auth:KERBEROS) cause:org.apache.hadoop.security.AccessControlException: Client mapred tries to renew a token with renewer specified as nobody


-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Re: Client mapred tries to renew a token with renewer specified as nobody

Posted by Vinod Kumar Vavilapalli <vi...@hortonworks.com>.
It is clearly mentioning that the renewer is wrong (renewer marked is 'nobody' but mapred is trying to renew the token), you may want to check this.

Thanks,
+Vinod

On Dec 2, 2013, at 8:25 AM, Rainer Toebbicke wrote:

> 2013-12-02 15:57:08,541 ERROR org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException as:mapred/xxx.cern.ch@CERN.CH (auth:KERBEROS) cause:org.apache.hadoop.security.AccessControlException: Client mapred tries to renew a token with renewer specified as nobody


-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Re: Client mapred tries to renew a token with renewer specified as nobody

Posted by Raviteja Chirala <rt...@gmail.com>.
Alternatively you can schedule a cron job to do kinit every 20 hours or so. Just to renew token before it expires. 

—
Sent from Mailbox for iPad

On Mon, Dec 2, 2013 at 9:12 AM, Rainer Toebbicke <rt...@pclella.cern.ch>
wrote:

> Hello,
> I am trying to understand why my long-running mapreduce jobs stop after 24 hours (approx) on a secure cluster.
> This is on Cloudera CDH 4.3.0, hence hadoop 2.0.0, using mrv1 (not yarn), authentication specified as "kerberos". Trying with a short-lived Kerberos ticket (1h) I see that it gets renewed regularly. Still, the job crashes after 24 hours because the delegation token expires.
> On a test cluster with increased logging and shortened dfs.namenode.delegation.token.renew-interval (for quicker debugging) I see that an immediate renew of the delegation token fails, and then after the expiry period the Namenode log starts getting clobbered.
> Detail:
> 2013-12-02 15:57:08,461 INFO SecurityLogger.org.apache.hadoop.ipc.Server: Auth successful for tobbicke@CERN.CH (auth:TOKEN)
> 2013-12-02 15:57:08,462 INFO SecurityLogger.org.apache.hadoop.security.authorize.ServiceAuthorizationManager: Authorization successful for tobbicke@CERN.CH (auth:TOKEN) for protocol=interface org.apache.hadoop.hdfs.protocol.ClientProtocol
> 2013-12-02 15:57:08,500 INFO SecurityLogger.org.apache.hadoop.ipc.Server: Auth successful for mapred/xxx.cern.ch@CERN.CH (auth:SIMPLE)
> 2013-12-02 15:57:08,540 INFO SecurityLogger.org.apache.hadoop.security.authorize.ServiceAuthorizationManager: Authorization successful for mapred/xxx.cern.ch@CERN.CH (auth:KERBEROS) for protocol=interface org.apache.hadoop.hdfs.protocol.ClientProtocol
> 2013-12-02 15:57:08,541 INFO org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager: Token renewal requested for identifier: HDFS_DELEGATION_TOKEN token 12 for tobbicke
> 2013-12-02 15:57:08,541 ERROR org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException as:mapred/xxx.cern.ch@CERN.CH (auth:KERBEROS) cause:org.apache.hadoop.security.AccessControlException: Client mapred tries to renew a token with renewer specified as nobody
> 2013-12-02 15:57:08,541 INFO org.apache.hadoop.ipc.Server: IPC Server handler 9 on 9000, call org.apache.hadoop.hdfs.protocol.ClientProtocol.renewDelegationToken from 188.184.xxx.xxx:42031: error: org.apache.hadoop.security.AccessControlException: Client mapred tries to renew a token with renewer specified as nobody
> org.apache.hadoop.security.AccessControlException: Client mapred tries to renew a token with renewer specified as nobody
>         at org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager.renewToken(AbstractDelegationTokenSecretManager.java:274)
>         at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.renewDelegationToken(FSNamesystem.java:5319)
>         at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.renewDelegationToken(NameNodeRpcServer.java:377)
>         at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.renewDelegationToken(ClientNamenodeProtocolServerSideTranslatorPB.java:814)
>         at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:45024)
>         at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
>         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1002)
>         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1701)
>         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1697)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:396)
>         at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
>         at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1695)
> Is this as unhealthy as it looks? If the first (immediate) renewal fails I assume others will share the same fate. Would that explain the 24-hour-lifetime on the "real" cluster and what could be the reason? How does "nobody" come into the game here?
> In any case, linked to this or not, after dfs.namenode.delegation.token.renew-interval ms the following is logged a zillion times:
> 2013-12-02 16:58:09,718 WARN SecurityLogger.org.apache.hadoop.ipc.Server: Auth failed for 188.184.xxx.xxx:44979:null (DIGEST-MD5: IO error acquiring password)
> 2013-12-02 16:58:09,719 INFO org.apache.hadoop.ipc.Server: IPC Server listener on 9000: readAndProcess threw exception javax.security.sasl.SaslException: DIGEST-MD5: IO error acquiring password [Caused by org.apache.hadoop.security.token.SecretManager$InvalidToken: token (HDFS_DELEGATION_TOKEN token 12 for tobbicke) is expired] from client 188.184.xxx.xxx. Count of bytes read: 0
> javax.security.sasl.SaslException: DIGEST-MD5: IO error acquiring password [Caused by org.apache.hadoop.security.token.SecretManager$InvalidToken: token (HDFS_DELEGATION_TOKEN token 12 for tobbicke) is expired]
>         at com.sun.security.sasl.digest.DigestMD5Server.validateClientResponse(DigestMD5Server.java:577)
>         at com.sun.security.sasl.digest.DigestMD5Server.evaluateResponse(DigestMD5Server.java:226)
>         at org.apache.hadoop.ipc.Server$Connection.saslReadAndProcess(Server.java:1210)
>         at org.apache.hadoop.ipc.Server$Connection.readAndProcess(Server.java:1405)
>         at org.apache.hadoop.ipc.Server$Listener.doRead(Server.java:719)
>         at org.apache.hadoop.ipc.Server$Listener$Reader.doRunLoop(Server.java:518)
>         at org.apache.hadoop.ipc.Server$Listener$Reader.run(Server.java:493)
> Caused by: org.apache.hadoop.security.token.SecretManager$InvalidToken: token (HDFS_DELEGATION_TOKEN token 12 for tobbicke) is expired
>         at org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager.retrievePassword(AbstractDelegationTokenSecretManager.java:227)
>         at org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager.retrievePassword(AbstractDelegationTokenSecretManager.java:46)
>         at org.apache.hadoop.security.SaslRpcServer$SaslDigestCallbackHandler.getPassword(SaslRpcServer.java:194)
>         at org.apache.hadoop.security.SaslRpcServer$SaslDigestCallbackHandler.handle(SaslRpcServer.java:220)
>         at com.sun.security.sasl.digest.DigestMD5Server.validateClientResponse(DigestMD5Server.java:568)
>         ... 6 more
> Any ideas?
> Rainer

Re: Client mapred tries to renew a token with renewer specified as nobody

Posted by Raviteja Chirala <rt...@gmail.com>.
Alternatively you can schedule a cron job to do kinit every 20 hours or so. Just to renew token before it expires. 

—
Sent from Mailbox for iPad

On Mon, Dec 2, 2013 at 9:12 AM, Rainer Toebbicke <rt...@pclella.cern.ch>
wrote:

> Hello,
> I am trying to understand why my long-running mapreduce jobs stop after 24 hours (approx) on a secure cluster.
> This is on Cloudera CDH 4.3.0, hence hadoop 2.0.0, using mrv1 (not yarn), authentication specified as "kerberos". Trying with a short-lived Kerberos ticket (1h) I see that it gets renewed regularly. Still, the job crashes after 24 hours because the delegation token expires.
> On a test cluster with increased logging and shortened dfs.namenode.delegation.token.renew-interval (for quicker debugging) I see that an immediate renew of the delegation token fails, and then after the expiry period the Namenode log starts getting clobbered.
> Detail:
> 2013-12-02 15:57:08,461 INFO SecurityLogger.org.apache.hadoop.ipc.Server: Auth successful for tobbicke@CERN.CH (auth:TOKEN)
> 2013-12-02 15:57:08,462 INFO SecurityLogger.org.apache.hadoop.security.authorize.ServiceAuthorizationManager: Authorization successful for tobbicke@CERN.CH (auth:TOKEN) for protocol=interface org.apache.hadoop.hdfs.protocol.ClientProtocol
> 2013-12-02 15:57:08,500 INFO SecurityLogger.org.apache.hadoop.ipc.Server: Auth successful for mapred/xxx.cern.ch@CERN.CH (auth:SIMPLE)
> 2013-12-02 15:57:08,540 INFO SecurityLogger.org.apache.hadoop.security.authorize.ServiceAuthorizationManager: Authorization successful for mapred/xxx.cern.ch@CERN.CH (auth:KERBEROS) for protocol=interface org.apache.hadoop.hdfs.protocol.ClientProtocol
> 2013-12-02 15:57:08,541 INFO org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager: Token renewal requested for identifier: HDFS_DELEGATION_TOKEN token 12 for tobbicke
> 2013-12-02 15:57:08,541 ERROR org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException as:mapred/xxx.cern.ch@CERN.CH (auth:KERBEROS) cause:org.apache.hadoop.security.AccessControlException: Client mapred tries to renew a token with renewer specified as nobody
> 2013-12-02 15:57:08,541 INFO org.apache.hadoop.ipc.Server: IPC Server handler 9 on 9000, call org.apache.hadoop.hdfs.protocol.ClientProtocol.renewDelegationToken from 188.184.xxx.xxx:42031: error: org.apache.hadoop.security.AccessControlException: Client mapred tries to renew a token with renewer specified as nobody
> org.apache.hadoop.security.AccessControlException: Client mapred tries to renew a token with renewer specified as nobody
>         at org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager.renewToken(AbstractDelegationTokenSecretManager.java:274)
>         at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.renewDelegationToken(FSNamesystem.java:5319)
>         at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.renewDelegationToken(NameNodeRpcServer.java:377)
>         at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.renewDelegationToken(ClientNamenodeProtocolServerSideTranslatorPB.java:814)
>         at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:45024)
>         at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
>         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1002)
>         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1701)
>         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1697)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:396)
>         at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
>         at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1695)
> Is this as unhealthy as it looks? If the first (immediate) renewal fails I assume others will share the same fate. Would that explain the 24-hour-lifetime on the "real" cluster and what could be the reason? How does "nobody" come into the game here?
> In any case, linked to this or not, after dfs.namenode.delegation.token.renew-interval ms the following is logged a zillion times:
> 2013-12-02 16:58:09,718 WARN SecurityLogger.org.apache.hadoop.ipc.Server: Auth failed for 188.184.xxx.xxx:44979:null (DIGEST-MD5: IO error acquiring password)
> 2013-12-02 16:58:09,719 INFO org.apache.hadoop.ipc.Server: IPC Server listener on 9000: readAndProcess threw exception javax.security.sasl.SaslException: DIGEST-MD5: IO error acquiring password [Caused by org.apache.hadoop.security.token.SecretManager$InvalidToken: token (HDFS_DELEGATION_TOKEN token 12 for tobbicke) is expired] from client 188.184.xxx.xxx. Count of bytes read: 0
> javax.security.sasl.SaslException: DIGEST-MD5: IO error acquiring password [Caused by org.apache.hadoop.security.token.SecretManager$InvalidToken: token (HDFS_DELEGATION_TOKEN token 12 for tobbicke) is expired]
>         at com.sun.security.sasl.digest.DigestMD5Server.validateClientResponse(DigestMD5Server.java:577)
>         at com.sun.security.sasl.digest.DigestMD5Server.evaluateResponse(DigestMD5Server.java:226)
>         at org.apache.hadoop.ipc.Server$Connection.saslReadAndProcess(Server.java:1210)
>         at org.apache.hadoop.ipc.Server$Connection.readAndProcess(Server.java:1405)
>         at org.apache.hadoop.ipc.Server$Listener.doRead(Server.java:719)
>         at org.apache.hadoop.ipc.Server$Listener$Reader.doRunLoop(Server.java:518)
>         at org.apache.hadoop.ipc.Server$Listener$Reader.run(Server.java:493)
> Caused by: org.apache.hadoop.security.token.SecretManager$InvalidToken: token (HDFS_DELEGATION_TOKEN token 12 for tobbicke) is expired
>         at org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager.retrievePassword(AbstractDelegationTokenSecretManager.java:227)
>         at org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager.retrievePassword(AbstractDelegationTokenSecretManager.java:46)
>         at org.apache.hadoop.security.SaslRpcServer$SaslDigestCallbackHandler.getPassword(SaslRpcServer.java:194)
>         at org.apache.hadoop.security.SaslRpcServer$SaslDigestCallbackHandler.handle(SaslRpcServer.java:220)
>         at com.sun.security.sasl.digest.DigestMD5Server.validateClientResponse(DigestMD5Server.java:568)
>         ... 6 more
> Any ideas?
> Rainer

Re: Client mapred tries to renew a token with renewer specified as nobody

Posted by Raviteja Chirala <rt...@gmail.com>.
Alternatively you can schedule a cron job to do kinit every 20 hours or so. Just to renew token before it expires. 

—
Sent from Mailbox for iPad

On Mon, Dec 2, 2013 at 9:12 AM, Rainer Toebbicke <rt...@pclella.cern.ch>
wrote:

> Hello,
> I am trying to understand why my long-running mapreduce jobs stop after 24 hours (approx) on a secure cluster.
> This is on Cloudera CDH 4.3.0, hence hadoop 2.0.0, using mrv1 (not yarn), authentication specified as "kerberos". Trying with a short-lived Kerberos ticket (1h) I see that it gets renewed regularly. Still, the job crashes after 24 hours because the delegation token expires.
> On a test cluster with increased logging and shortened dfs.namenode.delegation.token.renew-interval (for quicker debugging) I see that an immediate renew of the delegation token fails, and then after the expiry period the Namenode log starts getting clobbered.
> Detail:
> 2013-12-02 15:57:08,461 INFO SecurityLogger.org.apache.hadoop.ipc.Server: Auth successful for tobbicke@CERN.CH (auth:TOKEN)
> 2013-12-02 15:57:08,462 INFO SecurityLogger.org.apache.hadoop.security.authorize.ServiceAuthorizationManager: Authorization successful for tobbicke@CERN.CH (auth:TOKEN) for protocol=interface org.apache.hadoop.hdfs.protocol.ClientProtocol
> 2013-12-02 15:57:08,500 INFO SecurityLogger.org.apache.hadoop.ipc.Server: Auth successful for mapred/xxx.cern.ch@CERN.CH (auth:SIMPLE)
> 2013-12-02 15:57:08,540 INFO SecurityLogger.org.apache.hadoop.security.authorize.ServiceAuthorizationManager: Authorization successful for mapred/xxx.cern.ch@CERN.CH (auth:KERBEROS) for protocol=interface org.apache.hadoop.hdfs.protocol.ClientProtocol
> 2013-12-02 15:57:08,541 INFO org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager: Token renewal requested for identifier: HDFS_DELEGATION_TOKEN token 12 for tobbicke
> 2013-12-02 15:57:08,541 ERROR org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException as:mapred/xxx.cern.ch@CERN.CH (auth:KERBEROS) cause:org.apache.hadoop.security.AccessControlException: Client mapred tries to renew a token with renewer specified as nobody
> 2013-12-02 15:57:08,541 INFO org.apache.hadoop.ipc.Server: IPC Server handler 9 on 9000, call org.apache.hadoop.hdfs.protocol.ClientProtocol.renewDelegationToken from 188.184.xxx.xxx:42031: error: org.apache.hadoop.security.AccessControlException: Client mapred tries to renew a token with renewer specified as nobody
> org.apache.hadoop.security.AccessControlException: Client mapred tries to renew a token with renewer specified as nobody
>         at org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager.renewToken(AbstractDelegationTokenSecretManager.java:274)
>         at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.renewDelegationToken(FSNamesystem.java:5319)
>         at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.renewDelegationToken(NameNodeRpcServer.java:377)
>         at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.renewDelegationToken(ClientNamenodeProtocolServerSideTranslatorPB.java:814)
>         at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:45024)
>         at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
>         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1002)
>         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1701)
>         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1697)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:396)
>         at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
>         at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1695)
> Is this as unhealthy as it looks? If the first (immediate) renewal fails I assume others will share the same fate. Would that explain the 24-hour-lifetime on the "real" cluster and what could be the reason? How does "nobody" come into the game here?
> In any case, linked to this or not, after dfs.namenode.delegation.token.renew-interval ms the following is logged a zillion times:
> 2013-12-02 16:58:09,718 WARN SecurityLogger.org.apache.hadoop.ipc.Server: Auth failed for 188.184.xxx.xxx:44979:null (DIGEST-MD5: IO error acquiring password)
> 2013-12-02 16:58:09,719 INFO org.apache.hadoop.ipc.Server: IPC Server listener on 9000: readAndProcess threw exception javax.security.sasl.SaslException: DIGEST-MD5: IO error acquiring password [Caused by org.apache.hadoop.security.token.SecretManager$InvalidToken: token (HDFS_DELEGATION_TOKEN token 12 for tobbicke) is expired] from client 188.184.xxx.xxx. Count of bytes read: 0
> javax.security.sasl.SaslException: DIGEST-MD5: IO error acquiring password [Caused by org.apache.hadoop.security.token.SecretManager$InvalidToken: token (HDFS_DELEGATION_TOKEN token 12 for tobbicke) is expired]
>         at com.sun.security.sasl.digest.DigestMD5Server.validateClientResponse(DigestMD5Server.java:577)
>         at com.sun.security.sasl.digest.DigestMD5Server.evaluateResponse(DigestMD5Server.java:226)
>         at org.apache.hadoop.ipc.Server$Connection.saslReadAndProcess(Server.java:1210)
>         at org.apache.hadoop.ipc.Server$Connection.readAndProcess(Server.java:1405)
>         at org.apache.hadoop.ipc.Server$Listener.doRead(Server.java:719)
>         at org.apache.hadoop.ipc.Server$Listener$Reader.doRunLoop(Server.java:518)
>         at org.apache.hadoop.ipc.Server$Listener$Reader.run(Server.java:493)
> Caused by: org.apache.hadoop.security.token.SecretManager$InvalidToken: token (HDFS_DELEGATION_TOKEN token 12 for tobbicke) is expired
>         at org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager.retrievePassword(AbstractDelegationTokenSecretManager.java:227)
>         at org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager.retrievePassword(AbstractDelegationTokenSecretManager.java:46)
>         at org.apache.hadoop.security.SaslRpcServer$SaslDigestCallbackHandler.getPassword(SaslRpcServer.java:194)
>         at org.apache.hadoop.security.SaslRpcServer$SaslDigestCallbackHandler.handle(SaslRpcServer.java:220)
>         at com.sun.security.sasl.digest.DigestMD5Server.validateClientResponse(DigestMD5Server.java:568)
>         ... 6 more
> Any ideas?
> Rainer