You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-issues@hadoop.apache.org by "Wei-Chiu Chuang (Jira)" <ji...@apache.org> on 2020/01/16 01:05:00 UTC

[jira] [Commented] (HADOOP-14441) LoadBalancingKMSClientProvider#addDelegationTokens should add delegation tokens from all KMS instances

    [ https://issues.apache.org/jira/browse/HADOOP-14441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17016433#comment-17016433 ] 

Wei-Chiu Chuang commented on HADOOP-14441:
------------------------------------------

HADOOP-14445 is an extensive change and getting it into Hadoop 2.x is going to be require some good amount of cycles to get done. I am hesitate to backport HADOOP-14445 to 2.x.

For any one on 2.x that still want to resolve this issue, HADOOP-1441 is a simpler approach to address the issue -- There was a bug in the 004 patch so attached a 005 patch to update it. 

This "simpler" approach has 2 downsides:
(1) it acquires one delegation token from each KMS instance. So the number of delegation tokens increases w.r.t to the number of KMS. In a busy/big cluster it can grow so much that Zookeeper (the delegation token store) is overwhelmed.
(2) if an application acquires delegation token when KMS1 is down, it will only acquire the dt from KMS2. Later if KMS1 comes back but KMS2 goes down, this application will fail to access KMS. It is a likely scenario during cluster rolling restart.

> LoadBalancingKMSClientProvider#addDelegationTokens should add delegation tokens from all KMS instances
> ------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-14441
>                 URL: https://issues.apache.org/jira/browse/HADOOP-14441
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: kms
>    Affects Versions: 2.7.0
>         Environment: CDH5.7.4, Kerberized, SSL, KMS-HA, at rest encryption
>            Reporter: Wei-Chiu Chuang
>            Assignee: Wei-Chiu Chuang
>            Priority: Major
>         Attachments: HADOOP-14441.001.patch, HADOOP-14441.002.patch, HADOOP-14441.003.patch, HADOOP-14441.004.patch, HADOOP-14441.branch-2.005.patch
>
>
> LoadBalancingKMSClientProvider only gets delegation token from one KMS instance, in a round-robin fashion. This is arguably a bug, as JavaDoc for {{KeyProviderDelegationTokenExtension#addDelegationTokens}} states:
> {quote}
> /**
>      * The implementer of this class will take a renewer and add all
>      * delegation tokens associated with the renewer to the 
>      * <code>Credentials</code> object if it is not already present, 
> ...
> **/
> {quote}
> This bug doesn't pop up very often, because HDFS clients such as MapReduce unintentionally calls {{FileSystem#addDelegationTokens}} multiple times.
> We have a custom client that accesses HDFS/KMS-HA using delegation token, and we were puzzled why it always throws "Failed to find any Kerberos tgt" exceptions talking to one KMS but not the other. Turns out that client couldn't talk to the KMS because {{FileSystem#addDelegationTokens}} only gets one KMS delegation token at a time.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org