You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@twill.apache.org by "Alvin Wang (JIRA)" <ji...@apache.org> on 2014/11/04 02:26:33 UTC
[jira] [Commented] (TWILL-106) HDFS delegation token is not being
refreshed properly
[ https://issues.apache.org/jira/browse/TWILL-106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14195525#comment-14195525 ]
Alvin Wang commented on TWILL-106:
----------------------------------
Tested a simple Twill app that writes to an HDFS file every ~10 seconds and was able to reproduce this issue. I observed that according to UserGroupInformation.getCurrentUser().getTokens(), the HDFS delegation token is properly updated every 5 minutes as expected (Twill schedules the update to be 5 minutes less than dfs.namenode.delegation.token.renew-interval).
* After running for ~12 hours, the Twill app prints "WARN o.a.h.security.UserGroupInformation - Exception encountered while running the renewal command. Aborting renew thread. org.apache.hadoop.util.Shell$ExitCodeException: kinit: Ticket expired while renewing credentials".
* After running for < 24 hours, the Twill app repeatedly prints "ERROR examples.HelloWorld - Error org.apache.hadoop.ipc.RemoteException: token (HDFS_DELEGATION_TOKEN token XX for yarn) is expired".
* After running for ~24 hours, the Twill app repeatedly prints "ERROR examples.HelloWorld - Error org.apache.hadoop.ipc.RemoteException: token (HDFS_DELEGATION_TOKEN token XX for yarn) can't be found in cache".
The first WARN ("Ticket expired while renewing credentials") is likely due to the Kerberos ticket renewable life of 0. Hadoop spawns a Kerberos ticket renewal thread (UserGroupInformation.spawnAutoRenewalThreadForUserCreds()) that renews the Kerberos ticket via "kinit -R" and does reloginFromTicketCache(). I tried doing kinit with a relevant principal/keytab, and failed to renew via "kinit -R" apparently because the renewable life was 0.
With the same simple Twill app, and 7 day renewable life, the app can run for longer than 24 hours without getting either the "expired" nor the "can't be found in cache" errors.
Cluster configuration:
{code}
HDP 2.0
Hadoop 2.2.0.2.0.11.0-1 (source with checksum 4e0bbc06297bf19ac5705dc7ffcdb)
dfs.namenode.delegation.key.update-interval: 86400000 (1 day, default)
dfs.namenode.delegation.token.max-lifetime: 604800000 (1 week, default)
dfs.namenode.delegation.token.renew-interval: 600000 (10 minutes)
{code}
kadmin.local: getprinc krbtgt/CONTINUUITY.NET@CONTINUUITY.NET
{code}
Principal: krbtgt/CONTINUUITY.NET@CONTINUUITY.NET
Expiration date: [never]
Last password change: [never]
Password expiration date: [none]
Maximum ticket life: 1 day 00:00:00
Maximum renewable life: 0 days 00:00:00
Last modified: Sun Nov 02 08:14:12 UTC 2014 (root/admin@CONTINUUITY.NET)
Last successful authentication: [never]
Last failed authentication: [never]
Failed password attempts: 0
Number of keys: 4
Key: vno 1, aes256-cts-hmac-sha1-96, no salt
Key: vno 1, aes128-cts-hmac-sha1-96, no salt
Key: vno 1, des3-cbc-sha1, no salt
Key: vno 1, arcfour-hmac, no salt
MKey: vno 1
Attributes:
Policy: [none]
{code}
kadmin.local: getprinc yarn/cdap-secure120-1000.dev.continuuity.net@CONTINUUITY.NET
{code}
Principal: yarn/cdap-secure120-1000.dev.continuuity.net@CONTINUUITY.NET
Expiration date: [never]
Last password change: Tue Sep 23 04:50:48 UTC 2014
Password expiration date: [none]
Maximum ticket life: 1 day 00:00:00
Maximum renewable life: 0 days 00:00:00
Last modified: Sun Nov 02 08:25:12 UTC 2014 (root/admin@CONTINUUITY.NET)
Last successful authentication: [never]
Last failed authentication: [never]
Failed password attempts: 0
Number of keys: 4
Key: vno 2, aes256-cts-hmac-sha1-96, no salt
Key: vno 2, aes128-cts-hmac-sha1-96, no salt
Key: vno 2, des3-cbc-sha1, no salt
Key: vno 2, arcfour-hmac, no salt
MKey: vno 1
Attributes:
Policy: [none]
{code}
/etc/krb5.conf
{code}
# Generated by Chef for cdap-secure120-1000.dev.continuuity.net
# Local modifications will be overwritten.
[logging]
default = FILE:/var/log/krb5libs.log
kdc = FILE:/var/log/krb5kdc.log
admin_server = FILE:/var/log/kadmind.log
[libdefaults]
default_realm = CONTINUUITY.NET
dns_lookup_realm = false
dns_lookup_kdc = true
forwardable = true
renew_lifetime = 1825d
ticket_lifetime = 24h
[realms]
CONTINUUITY.NET = {
kdc = test-kdc481-1000.dev.continuuity.net
admin_server = test-kdc481-1000.dev.continuuity.net
}
continuuity.net = CONTINUUITY.NET
.continuuity.net = CONTINUUITY.NET
[appdefaults]
pam = {
debug = false
forwardable = true
renew_lifetime = 1825d
ticket_lifetime = 24h
krb4_convert = false
}
{code}
> HDFS delegation token is not being refreshed properly
> -----------------------------------------------------
>
> Key: TWILL-106
> URL: https://issues.apache.org/jira/browse/TWILL-106
> Project: Apache Twill
> Issue Type: Bug
> Components: core
> Affects Versions: 0.4.0-incubating
> Reporter: Poorna Chandra
>
> We have a Twill app that runs in a secure Hadoop cluster. The app starts up fine, and runs for a day. I can see in logs that say secure store was updated regularly. However, after a day I see exceptions that say "token (HDFS_DELEGATION_TOKEN token 4287 for yarn) can't be found in cache".
> Exception:
> -------------
> 2014-10-23T04:12:42,101Z ERROR c.c.t.TransactionManager [cdap-secure120-1000.dev.continuuity.net] [tx-snapshot] TransactionManager:abortService(TransactionManager.java:594) - Aborting transaction manager due to: Snapshot (timestamp 1414037562088) failed due to: token (HDFS_DELEGATION_TOKEN token 4287 for yarn) can't be found in cache
> org.apache.hadoop.ipc.RemoteException: token (HDFS_DELEGATION_TOKEN token 4287 for yarn) can't be found in cache
> at org.apache.hadoop.ipc.Client.call(Client.java:1347)
> ...
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)