You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "Kan Zhang (JIRA)" <ji...@apache.org> on 2008/10/04 03:55:44 UTC

[jira] Created: (HADOOP-4343) Adding user and service-to-service authentication to Hadoop

Adding user and service-to-service authentication to Hadoop
-----------------------------------------------------------

                 Key: HADOOP-4343
                 URL: https://issues.apache.org/jira/browse/HADOOP-4343
             Project: Hadoop Core
          Issue Type: New Feature
            Reporter: Kan Zhang
            Assignee: Kan Zhang
             Fix For: 0.20.0


Currently, Hadoop services do not authenticate users or other services. As a result, Hadoop is subject to the following security risks.

1. A user can access an HDFS or M/R cluster as any other user. This makes it impossible to enforce access control in an uncooperative environment. For example, file permission checking on HDFS can be easily circumvented.

2. An attacker can masquerade as Hadoop services. For example, user code running on a M/R cluster can register itself as a new TaskTracker.

This JIRA is intended to be a tracking JIRA, where we discuss requirements, agree on a general approach and identify subtasks. Detailed design and implementation are the subject of those subtasks.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4343) Adding user and service-to-service authentication to Hadoop

Posted by "Kan Zhang (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12689174#action_12689174 ] 

Kan Zhang commented on HADOOP-4343:
-----------------------------------

An additional benefit of using Hadoop proprietary delegation tokens for delegation, as opposed to using Kerberos TGT/Service tickets, is that Kerberos is only used at the "edge" of Hadoop. Delegation tokens don't depend on Kerberos and can be coupled with non-Kerberos authentication mechanisms (such as SSL) used at the edge.

> Adding user and service-to-service authentication to Hadoop
> -----------------------------------------------------------
>
>                 Key: HADOOP-4343
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4343
>             Project: Hadoop Core
>          Issue Type: New Feature
>            Reporter: Kan Zhang
>            Assignee: Kan Zhang
>
> Currently, Hadoop services do not authenticate users or other services. As a result, Hadoop is subject to the following security risks.
> 1. A user can access an HDFS or M/R cluster as any other user. This makes it impossible to enforce access control in an uncooperative environment. For example, file permission checking on HDFS can be easily circumvented.
> 2. An attacker can masquerade as Hadoop services. For example, user code running on a M/R cluster can register itself as a new TaskTracker.
> This JIRA is intended to be a tracking JIRA, where we discuss requirements, agree on a general approach and identify subtasks. Detailed design and implementation are the subject of those subtasks.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4343) Adding user and service-to-service authentication to Hadoop

Posted by "Kan Zhang (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12678881#action_12678881 ] 

Kan Zhang commented on HADOOP-4343:
-----------------------------------

Here is the authentication design I plan to implement. 

For all Hadoop services except NN, we simply use Kerberos. For NN, we complement Kerberos with a second mechanism called [DIGEST-MD5|http://www.ietf.org/rfc/rfc2831.txt] (available from Java SASL library). A client can authenticate to NN in 2 ways. 
* *Kerberos only* For example, a user accessing HDFS using Hadoop fs commands may use this approach.
* *Kerberos + DIGEST-MD5*  In this case, Kerberos is used for the initial authentication and setting up a secure connection between a client and NN. After that, the client can obtain a secret key from the server over the secure connection. This secret key is known only to the client and NN, and can be used by the client to authenticate to NN on subsequent accesses. Authentication using the secret key is done using the DIGEST-MD5 protocol, which doesn't involve any third party, such as Kerberos KDC (key distribution center). The client can also delegate the secret key to others, so that they may use the key to authenticate to NN as the client. This is useful in the cases where a M/R job needs to access NN as the job owner. Hereinafter, we refer to the secret key as *delegation token*. The reasons for introducing delegation token (and associated DIGEST-MD5 mechanism) are as follows.
** *Performance* On a Map/Reduce cluster, there can be thousands of Tasks running at the same time. If they use Kerberos to authenticate to a NN, they need either a delegated TGT (ticket granting ticket) or a delegated service ticket. If using delegated TGT, the Kerberos KDC could become a bottleneck, since each task needs to get a Kerberos service ticket from the KDC using the delegated TGT. Using delegation tokens will save those network traffic to the KDC. Another option is to use a delegated service ticket. Delegated service tickets can be used in a similar fashion as delegation tokens, i.e., without the need to contact an online third party like the KDC. However, Java GSS-API doesn't support service ticket delegation. We may need to use a 3rd party (native) Kerberos library, which requires significantly more development efforts and makes code less portable.
** *Credential renewal* For Tasks to use Kerberos, the Task owner's Kerberos TGT or service ticket needs to be delegated and made available to the Tasks. Both TGT and service ticket can be renewed for long-running jobs (up to max lifetime set at initial issuing). However, during Kerberos renewal, a new TGT or service ticket will be issued, which needs to be distributed to all running Tasks. If using delegation tokens, the renewal mechanism can be designed in such a way that only the validity period of a token is extended on the NN, but the token itself stays the same. Hence, no new tokens need to be issued and pushed to running Tasks. Moreover, renewing Kerberos tickets has to be done before current validity period expires, which puts a timing constraint on the renewal operation. Our delegation tokens can be renewed (or revived) after current validity period expires (but within the max lifetime) by the designated renewer. Being able to renew an expired delegation token is not considered a big risk since (unlike Kerberos) only the designated renewer can renew a token. A stolen token can't be renewed by the attacker. 
** *Less damage when credential is compromised* A user's Kerberos TGT may be used to access services other than HDFS. If a delegated TGT is used and compromised, the damage is greater than using an HDFS-only credential (delegation token). On the other hand, using a delegated service ticket is equivalent to using a delegation token.


> Adding user and service-to-service authentication to Hadoop
> -----------------------------------------------------------
>
>                 Key: HADOOP-4343
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4343
>             Project: Hadoop Core
>          Issue Type: New Feature
>            Reporter: Kan Zhang
>            Assignee: Kan Zhang
>
> Currently, Hadoop services do not authenticate users or other services. As a result, Hadoop is subject to the following security risks.
> 1. A user can access an HDFS or M/R cluster as any other user. This makes it impossible to enforce access control in an uncooperative environment. For example, file permission checking on HDFS can be easily circumvented.
> 2. An attacker can masquerade as Hadoop services. For example, user code running on a M/R cluster can register itself as a new TaskTracker.
> This JIRA is intended to be a tracking JIRA, where we discuss requirements, agree on a general approach and identify subtasks. Detailed design and implementation are the subject of those subtasks.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4343) Adding user and service-to-service authentication to Hadoop

Posted by "Kan Zhang (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12678899#action_12678899 ] 

Kan Zhang commented on HADOOP-4343:
-----------------------------------

More details on the delegation token design.

h4. Overview

After initial authentication to NN using Kerberos credentials, a user may obtain a delegation token, which can be given to user jobs for subsequent authentication to NN as the user. The token is in fact a secret key shared between the user and NN and should be protected when passed over insecure channels. Anyone who gets it can impersonate the user on NN. Note that *a user can only obtain new tokens after authenticating using Kerberos*.

When a user obtains a delegation token from NN, the user should tell NN who is the designated token renewer. The designated renewer should authenticate to NN as itself when renewing the token for the user. Renewing a token means extending the validity period of that token on NN. No new token is issued. The old token continues to work. To let a Map/Reduce job use a delegation token, the user needs to designate JT as the token renewer. All the Tasks of the same job use the same token. JT is responsible for keeping the token valid till the job is finished. After that, JT may optionally cancel the token. 

h4. Design

Here is the format of delegation token.

{noformat}
TokenID = {ownerID, renewerID, issueDate, maxDate}
TokenAuthenticator = HMAC(masterKey, TokenID)
Delegation Token = {TokenID, TokenAuthenticator}
{noformat}

NN chooses {{masterKey}} randomly and uses it to generate and verify delegation tokens. NN keeps all active tokens in memory and associates each token with an {{expiryDate}}. If {{currentTime > expiryDate}}, the token is considered expired and any client authentication request using the token will be rejected. Expired tokens will be deleted from memory. A token is also deleted from memory when the owner or the renewer cancels the token.

*Using Delegation Token* When a client (e.g., a Task) uses a delegation token to authenticate, it first sends {{TokenID}} to NN (but never sends the associated {{TokenAuthenticator}} to NN). {{TokenID}} identifies the token the client intends to use. Using {{TokenID}} and {{masterKey}}, NN can re-compute {{TokenAuthenticator}} and the token. NN checks if the token is valid. A token is valid if and only if the token exists in memory and {{currentTime < expiryDate}} associated with the token. If the token is valid, the client and NN will try to authenticate each other using their own {{TokenAuthenticator}} as the secret key and [DIGEST-MD5|http://www.ietf.org/rfc/rfc2831.txt] as the protocol. Note that during authentication, one party never reveals its own {{TokenAuthenticator}} to the other party. If authentication fails (which means the client and NN do not share the same {{TokenAuthenticator}}), they don't get to know each other's {{TokenAuthenticator}}.

*Token Renewal* Delegation tokens need to be renewed periodically to keep them valid. Suppose JT is the designated renewer for a token. During renewal, JT authenticates to NN as JT. After successful authentication, JT sends the token to be renewed to NN. NN verifies that 1) JT is the renewer specified in {{TokenID}}, 2) {{TokenAuthenticator}} is correct, and 3) {{currentTime < maxDate}} specified in {{TokenID}}. Upon successful verification, if the token exists in memory, which means the token is currently valid, NN sets its new {{expiryDate}} to {{min(currentTime+renewPeriod, maxDate)}}. If the token doesn't exist in memory, which indicates NN has restarted and therefore lost memory of all previously stored tokens, NN adds the token to memory and sets its {{expiryDate}} similarly. The latter case allows jobs to survive NN restarts. All JT has to do is to renew all tokens with NN after NN restarts and before relaunching failed Tasks.

Note that the designated renewer can revive an expired (or canceled) token by simply renewing it, if {{currentTime < maxDate}} specified in the token. This is because NN can't tell the difference between a token that has expired (or has been canceled) and a token that is not in the memory because NN restarted. Since only the designated renewer can revive an expired (or canceled) token, this doesn't seem to be a security problem. An attacker who steals the token can't renew or revive it.

The {{masterKey}} needs to be updated periodically. NN only needs to persist the {{masterKey}} on disk, not the tokens.



> Adding user and service-to-service authentication to Hadoop
> -----------------------------------------------------------
>
>                 Key: HADOOP-4343
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4343
>             Project: Hadoop Core
>          Issue Type: New Feature
>            Reporter: Kan Zhang
>            Assignee: Kan Zhang
>
> Currently, Hadoop services do not authenticate users or other services. As a result, Hadoop is subject to the following security risks.
> 1. A user can access an HDFS or M/R cluster as any other user. This makes it impossible to enforce access control in an uncooperative environment. For example, file permission checking on HDFS can be easily circumvented.
> 2. An attacker can masquerade as Hadoop services. For example, user code running on a M/R cluster can register itself as a new TaskTracker.
> This JIRA is intended to be a tracking JIRA, where we discuss requirements, agree on a general approach and identify subtasks. Detailed design and implementation are the subject of those subtasks.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.