You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-issues@hadoop.apache.org by "Clay B. (Jira)" <ji...@apache.org> on 2022/09/30 00:36:00 UTC
[jira] [Comment Edited] (HADOOP-16298) Manage/Renew delegation tokens for externally scheduled jobs

    [ https://issues.apache.org/jira/browse/HADOOP-16298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17611307#comment-17611307 ] 

Clay B. edited comment on HADOOP-16298 at 9/30/22 12:35 AM:
------------------------------------------------------------

*History:*
 * This work began in 2019
 * This work as is requires manually renewing delegation tokens has been ported to Hadoop 3.x; some unit-tests written as in the PR from March 9th.
 * I have had teams run this code a fair amount in a Kubernetes environment running Flink jobs as a long-running HBase 1.x client without issue
 * The current code requires invasive changes to all Hadoop client frameworks – to hook into authentication failures and reload credentials on-demand. *This has proved a terrible support experience.*
 * Further, the actual design of having client frameworks hook into authentication failures does not meaningfully support threading models like HBase 2.x's. *Proactive refresh of tokens is needed.*
 * To [~stevel@apache.org] 's awesome feedback that was never addressed:

 * 
 ** Steve thank you so much for +Hadoop and Kerberos: The Madness beyond the Gate+ I've loved learning from it! I see your proposal of "Client-side push of renewed Delegation Tokens" there. While I've seen some well orchestrated operations and Hadoop application development teams, for our use-case I would expect we need to stick close to "standard"/vendor Hadoop application design patterns.
To me, running a small server in a client would be a challenge for services deployed on Kubernetes clusters as we would need the injector to run in the namespace or setup ingest rules for this. Further, for the use-cases I see written by hundreds of application teams largely infrastructure ignorant I appreciate if I can abstract them away from knowing about authentication and can rely on relatively standard Hadoop documentation and training vs a more custom injection process. (Currently in the deployment model we have been using, keytabs and any long-term authentication credential is separated out from application team control and outside the K8s namespace.)
Lastly, the client injecting credentials seems a lot like it would require the same UGI changes in the end as the client re-reading credentials off disk?
 ** [~Deshpande]  when he started this work looked at {{maxDate}} but see my comments about HBase (which is our main use-case) below. I think this can be reconciled but seems like that should be a later addition. (E.g. try to keep the changes as small and deterministic as possible first.)
 ** While HBase, Hive (I think it only has WebHCat tokens) are something I can test with. I'm a bit out of my depth with KMS, S3A and ABFS tokens yet. Thank you for those pointers and maybe you have some ideas on unit-tests I could extend to verify them?
I do not understand how SPNEGO auth would here be involved? Here the intention is that the client is started and run solely with Delegation Tokens and never does any Kerberos exchanges. Are you thinking if one was doing RESTful operations from the application to a SPNEGO authenticated endpoint? (If so, could you offer an example you would envision?)

*Next Steps:*
_To no longer require changes in client frameworks, I look to implement a proactive refresh as is today done for Kerberos tickets_
 * I am now looking to craft a renewal thread process in UGI under {{spawnAutoRenewalThreadForUserCreds}} following the pattern there for Kerberos to proactively refresh tokens. This proactive refresh would address the need for Hadoop client frameworks needing to be changed at all.
 * The renewal thread approach will require only one API call in application client code (versus client frameworks) to use the feature. Nicely, it would be done as the same UGI entry points required for long-running Kerberos clients.
 ** I will implement the thread running on a fixed time vs querying token expiration at first.
 ** A challenge for my use-case to support automatically determining refresh timing is that HBase delegation tokens extend {{TokenIdentifier}} and has the field {{{}expirationDate{}}}. Meanwhile, Hadoop delegation tokens extend AbstractDelegationTokenIdentifier which offers a {{maxDate}} field


was (Author: clayb):
*History:*
 * This work began in 2019
 * This work as is requires manually renewing delegation tokens has been ported to Hadoop 3.x; some unit-tests written as in the PR from March 9th.
 * I have had teams run this code a fair amount in a Kubernetes environment running Flink jobs as a long-running HBase 1.x client without issue
 * The current code requires invasive changes to all Hadoop client frameworks – to hook into authentication failures and reload credentials on-demand. *This has proved a terrible support experience.*
 * Further, the actual design of having client frameworks hook into authentication failures does not meaningfully support threading models like HBase 2.x's. *Proactive refresh of tokens is needed.*
 * To @stevel@apache.org's awesome feedback that was never addressed:

 * 
 ** Steve thank you so much for +Hadoop and Kerberos: The Madness beyond the Gate+ I've loved learning from it! I see your proposal of "Client-side push of renewed Delegation Tokens" there. While I've seen some well orchestrated operations and Hadoop application development teams, for our use-case I would expect we need to stick close to "standard"/vendor Hadoop application design patterns.
To me, running a small server in a client would be a challenge for services deployed on Kubernetes clusters as we would need the injector to run in the namespace or setup ingest rules for this. Further, for the use-cases I see written by hundreds of application teams largely infrastructure ignorant I appreciate if I can abstract them away from knowing about authentication and can rely on relatively standard Hadoop documentation and training vs a more custom injection process. (Currently in the deployment model we have been using, keytabs and any long-term authentication credential is separated out from application team control and outside the K8s namespace.)
Lastly, the client injecting credentials seems a lot like it would require the same UGI changes in the end as the client re-reading credentials off disk?
 ** Pankaj when he started this work looked at {{maxDate}} but see my comments about HBase (which is our main use-case) below. I think this can be reconciled but seems like that should be a later addition. (E.g. try to keep the changes as small and deterministic as possible first.)
 ** While HBase, Hive (I think it only has WebHCat tokens) are something I can test with. I'm a bit out of my depth with KMS, S3A and ABFS tokens yet. Thank you for those pointers and maybe you have some ideas on unit-tests I could extend to verify them?
I do not understand how SPNEGO auth would here be involved? Here the intention is that the client is started and run solely with Delegation Tokens and never does any Kerberos exchanges. Are you thinking if one was doing RESTful operations from the application to a SPNEGO authenticated endpoint? (If so, could you offer an example you would envision?)

*Next Steps:*
_To no longer require changes in client frameworks, I look to implement a proactive refresh as is today done for Kerberos tickets_
 * I am now looking to craft a renewal thread process in UGI under {{spawnAutoRenewalThreadForUserCreds}} following the pattern there for Kerberos to proactively refresh tokens. This proactive refresh would address the need for Hadoop client frameworks needing to be changed at all.
 * The renewal thread approach will require only one API call in application client code (versus client frameworks) to use the feature. Nicely, it would be done as the same UGI entry points required for long-running Kerberos clients.
 ** I will implement the thread running on a fixed time vs querying token expiration at first.
 ** A challenge for my use-case to support automatically determining refresh timing is that HBase delegation tokens extend {{TokenIdentifier}} and has the field {{{}expirationDate{}}}. Meanwhile, Hadoop delegation tokens extend AbstractDelegationTokenIdentifier which offers a {{maxDate}} field

> Manage/Renew delegation tokens for externally scheduled jobs
> ------------------------------------------------------------
>
>                 Key: HADOOP-16298
>                 URL: https://issues.apache.org/jira/browse/HADOOP-16298
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: security
>    Affects Versions: 3.3.0
>            Reporter: Pankaj Deshpande
>            Assignee: Clay B.
>            Priority: Major
>              Labels: pull-request-available
>         Attachments: Proposal for changes to UGI for managing_renewing externally managed delegation tokens.pdf
>
>          Time Spent: 20m
>  Remaining Estimate: 0h
>
> * Presently when jobs are run in the Hadoop ecosystem, the implicit assumption is that YARN will be used as a scheduling agent with access to appropriate keytabs for renewal of kerberos tickets and delegation tokens. 
>  * Jobs that interact with kerberized hadoop services such as hbase/hive/hdfs and use an external scheduler such as Kubernetes, typically do not have access to keytabs. In such cases, delegation tokens are a logical choice for interacting with a kerberized cluster. These tokens are issued based on some external auth mechanism (such as Kube LDAP authentication).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org