You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@knox.apache.org by larry mccay <la...@gmail.com> on 2013/07/02 22:09:25 UTC
Discussion thread started on common-dev for Hadoop SSO

The following email was just sent to common-dev to start discussions in the
community around SSO for hadoop. Common-dev was agreed upon as the email
list for general security efforts until we have a need to break out a
security specific one.

If you have any comments on this topic - please feel free to comment on the
thread in common-dev. I'm just providing it here for convenience to the
knox team as we will need to stay aligned with this more general effort.

All -

As a follow up to the discussions that were had during Hadoop Summit, I
would like to introduce the discussion topic around the moving parts of a
Hadoop SSO/Token Service.
There are a couple of related Jira's that can be referenced and may or may
not be updated as a result of this discuss thread.

https://issues.apache.org/jira/browse/HADOOP-9533
https://issues.apache.org/jira/browse/HADOOP-9392

As the first aspect of the discussion, we should probably state the overall
goals and scoping for this effort:
* An alternative authentication mechanism to Kerberos for user
authentication
* A broader capability for integration into enterprise identity and SSO
solutions
* Possibly the advertisement/negotiation of available authentication
mechanisms
* Backward compatibility for the existing use of Kerberos
* No (or minimal) changes to existing Hadoop tokens (delegation, job, block
access, etc)
* Pluggable authentication mechanisms across: RPC, REST and webui
enforcement points
* Continued support for existing authorization policy/ACLs, etc
* Keeping more fine grained authorization policies in mind - like attribute
based access control
- fine grained access control is a separate but related effort that we must
not preclude with this effort
* Cross cluster SSO

In order to tease out the moving parts here are a couple high level and
simplified descriptions of SSO interaction flow:
                               +------+
+------+ credentials 1 | SSO  |
|CLIENT|-------------->|SERVER|
+------+  :tokens      +------+
  2 |
    | access token
    V :requested resource
+-------+
|HADOOP |
|SERVICE|
+-------+
The above diagram represents the simplest interaction model for an SSO
service in Hadoop.
1. client authenticates to SSO service and acquires an access token
  a. client presents credentials to an authentication service endpoint
exposed by the SSO server (AS) and receives a token representing the
authentication event and verified identity
  b. client then presents the identity token from 1.a. to the token
endpoint exposed by the SSO server (TGS) to request an access token to a
particular Hadoop service and receives an access token
2. client presents the Hadoop access token to the Hadoop service for which
the access token has been granted and requests the desired resource or
services
  a. access token is presented as appropriate for the service endpoint
protocol being used
  b. Hadoop service token validation handler validates the token and
verifies its integrity and the identity of the issuer

    +------+
    |  IdP |
    +------+
    1   ^ credentials
        | :idp_token
        |                      +------+
+------+  idp_token  2 | SSO  |
|CLIENT|-------------->|SERVER|
+------+  :tokens      +------+
  3 |
    | access token
    V :requested resource
+-------+
|HADOOP |
|SERVICE|
+-------+

The above diagram represents a slightly more complicated interaction model
for an SSO service in Hadoop that removes Hadoop from the credential
collection business.
1. client authenticates to a trusted identity provider within the
enterprise and acquires an IdP specific token
  a. client presents credentials to an enterprise IdP and receives a token
representing the authentication identity
2. client authenticates to SSO service and acquires an access token
  a. client presents idp_token to an authentication service endpoint
exposed by the SSO server (AS) and receives a token representing the
authentication event and verified identity
  b. client then presents the identity token from 2.a. to the token
endpoint exposed by the SSO server (TGS) to request an access token to a
particular Hadoop service and receives an access token
3. client presents the Hadoop access token to the Hadoop service for which
the access token has been granted and requests the desired resource or
services
  a. access token is presented as appropriate for the service endpoint
protocol being used
  b. Hadoop service token validation handler validates the token and
verifies its integrity and the identity of the issuer
Considering the above set of goals and high level interaction flow
description, we can start to discuss the component inventory required to
accomplish this vision:

1. SSO Server Instance: this component must be able to expose endpoints for
both authentication of users by collecting and validating credentials and
federation of identities represented by tokens from trusted IdPs within the
enterprise. The endpoints should be composable so as to allow for
multifactor authentication mechanisms. They will also need to return tokens
that represent the authentication event and verified identity as well as
access tokens for specific Hadoop services.

2. Authentication Providers: pluggable authentication mechanisms must be
easily created and configured for use within the SSO server instance. They
will ideally allow the enterprise to plugin their preferred components from
off the shelf as well as provide custom providers. Supporting existing
standards for such authentication providers should be a top priority
concern. There are a number of standard approaches in use in the Java
world: JAAS loginmodules, servlet filters, JASPIC authmodules, etc. A
pluggable provider architecture that allows the enterprise to leverage
existing investments in these technologies and existing skill sets would be
ideal.

3. Token Authority: a token authority component would need to have the
ability to issue, verify and revoke tokens. This authority will need to be
trusted by all enforcement points that need to verify incoming tokens.
Using something like PKI for establishing trust will be required.

4. Hadoop SSO Tokens: the exact shape and form of the sso tokens will need
to be considered in order to determine the means by which trust and
integrity are ensured while using them. There may be some abstraction of
the underlying format provided through interface based design but all token
implementations will need to have the same attributes and capabilities in
terms of validation and cryptographic verification.

5. SSO Protocol: the lowest common denominator protocol for SSO server
interactions across client types would likely be REST. Depending on the
REST client in use it may require explicitly coding to the token flow
described in the earlier interaction descriptions or a plugin may be
provided for things like HTTPClient, curl, etc. RPC clients will have this
taken care for them within the SASL layer and will leverage the REST
endpoints as well. This likely implies trust requirements for the RPC
client to be able to trust the SSO server's identity cert that is presented
over SSL.

6. REST Client Agent Plugins: required for encapsulating the interaction
with the SSO server for the client programming models. We may need these
for many client types: e.g. Java, JavaScript, .Net, Python, cURL etc.

7. Server Side Authentication Handlers: the server side of the REST, RPC or
webui connection will need to be able to validate and verify the incoming
Hadoop tokens in order to grant or deny access to requested resources.

8. Credential/Trust Management: throughout the system - on client and
server sides - we will need to manage and provide access to PKI and
potentially shared secret artifacts in order to establish the required
trust relationships to replace the mutual authentication that would be
otherwise provided by using kerberos everywhere.

So, discussion points:

1. Are there additional components that would be required for a Hadoop SSO
service?
2. Should any of the above described components be considered not actually
necessary or poorly described?
2. Should we create a new umbrella Jira to identify each of these as a
subtask?
3. Should we just continue to use 9533 for the SSO server and add
additional subtasks?
4. What are the natural seams of separation between these components and
any dependencies between one and another that affect priority?

Obviously, each component that we identify will have a jira of its own -
more than likely - so we are only trying to identify the high level
descriptions for now.

Can we try and drive this discussion to a close by the end of the week?
This will allow us to start breaking out into component implementation
plans.

thanks,

--larry