You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@knox.apache.org by larry mccay <lm...@apache.org> on 2016/01/16 17:10:16 UTC

[DISCUSS] Identity Assertion Requirements for 0.8.0 Release

All -

The pac4j provider contribution was committed yesterday and we are on track
for our 0.8.0 release. Note that the docs are still being massaged a bit
and will end up in the new 0.8.0 users guide book soon.

In the meantime, I'd like to start a discussion wrt the requirements for
identity assertion functionality in order to have full usecase coverage for
our new authentication/federation mechanisms.

A bit of background first...

Some of the external provider integrations that are enabled by the pac4j
provider:

1. result in a PrimaryPrinicipal that is actually an id rather than a
username that could be used directly within the hadoop cluster.
2. some also allow you to configure the user profile attribute to returned
as the subject - such as SAML (okta). So, we could at least some times have
it be an email address.
3. others result in an actual username as the PrimaryPrincipal
4. It is extremely likely that none of these PrimaryPrincipals won't
actually line up with enterprise username that can be used within the
cluster.

Existing identity assertion providers:

1. pseudo/default identity assertion - we have the ability to use principal
mapping to mapping a numeric id/email or whatever to an acceptable username
for hadoop. However, all users that would access hadoop through a topology
configured for pac4j would need to have their principal mappings defined
within the topology. Not a very scalable or manageable approach. The
topology itself would likely end up being huge and they would need to be
sync'd up across all Knox instances in the deployment.
2. regex identity assertion provider - this provider would be able to take
something like an email address PrimaryPrincipal and extract a username
from that. In some cases, like okta, this may be the proper username for
companies that use okta as a hosted SSO solution. There is no additional
principal mapping capabilities however.

So, questions/options for 0.8.0 release:

Option 1. Is static principal mapping within a topology using the
pseudo/default identity assertion provider sufficient for the first release
that has support for these external providers?

Option 2. Do we need to add principal mapping capabilities to the regex
provider to allow for the extraction of a username AND subsequently mapping
that to another username?

Option 3. Should we create a new identity asserter that does a look up in
LDAP for mapping an id or email address to the username/CN? A more dynamic
assertion provider like this would certainly be better for scalability and
management but at the same time would require a change to LDAP schemas for
things like twitter id. Email address may not require a schema change but
would require the email address from the external provider to match that
within the corporate LDAP.

Option 4. Should we consider a central mapping storage identity assertion
provider that would interrogate some KnoxSSO specific mechanism? We could
look at a mapping of PrimaryPrincipal to DN from LDAP, to corporate email
address or directly to username. This would require some separate
registration or user sync mechanism to populate this central store and
likely couple the mappings to a particular user store like LDAP in some
way. It will also introduce a new wrinkle or consideration for Knox
upgrades having actual user data to migrate, etc. For the central store we
could consider:
     a. file in HDFS
     b. embedded HBase
     c. Hive
     d. RDBMS
     e. LDAP

Personally, I lean toward the following:

* Option 1 from above for 0.8.0 release introduces the pac4j provider with
static principal mapping using pseudo/default assertion provider and
possibly add support for principal mapping to the regex provider (Option 2)
for additional flexibility.

* Option 3 and/or 4 from above for a follow up release/s when we can
determine the exact design for the central store and user sync/registration
mechanism that would best meet the community needs and be sure to put the
time into the upgrade/migration considerations.

Thoughts?

thanks,

--larry

Re: [DISCUSS] Identity Assertion Requirements for 0.8.0 Release

Posted by larry mccay <lm...@apache.org>.
Given the current principal mapping capabilities in Knox and what it would
mean for a large number of users to have to map each one explicitly in the
topology file, I propose that the default principal mapping not be part of
the driving usecase for 0.8.0.

Therefore, the driving usecase would need to be one that allows the
username to be extracted from an available attribute such as email.

Considering that an enterprise usecase wouldn't likely have an email
address that could come from twitter, FB, etc - we should probably
concentrate on SAML. Enterprise SSO solutions are often SAML based, Okta is
becoming more and more popular for SSO in the companies and these user
profiles will more likely have an email address that can be used to extract
the enterprise username.

0.8.0 Driving Usecase/s - *Enterprise*:

SAML/Okta
* Email Address as PrimaryPrincipal
* Regex Identity Assertion Provider to extract username from email address
* Targeting the following auidences:
   1. SSO for KnoxSSO participating applications such as Ambari, Ranger and
Hadoop UIs with SAML/Okta integration
   2. Development of KnoxSSO participating applications that consume Hadoop
resources through REST APIs and Knox Gateway

0.8.0 Driving Usecase/s - *Development*:

OpenID Connect, OAuth (Facebook, Twitter, etc)
* Id as PrimaryPrincipal
* Discussions/development on new identity assertion providers while
leveraging the default and regex providers for dev/testing
* Targeting the following auidences:
   1. Developers/integrators that would like to use OpenID Connect and help
drive the principal mapping requirements and implementation in 0.9.0
   2. Developers/integrators that would like to use an OAuth provider and
help drive the principal mapping requirements and implementation in 0.9.0

I believe that this focus allows us to get an 0.8.0 out to the community
with support for SAML/Okta for use in their environments and also emerging
usecase functionality into the hands of the developers that will help drive
the next release functionality to make those usecases more enterprise ready.

Thoughts?

On Mon, Jan 18, 2016 at 3:58 PM, larry mccay <lm...@apache.org> wrote:

> That sounds reasonable - I wouldn't want to try and rush something of that
> level into 0.8.0.
>
>
> On Mon, Jan 18, 2016 at 2:25 PM, Jérôme LELEU <le...@gmail.com> wrote:
>
>> Hi,
>>
>> This is definitely an option from my point of view, but
>> AuthorizationGenerators can also be used for roles and permissions. Knox
>> 0.8.0 has a first pac4j support for indirect clients. For me, it's a first
>> step, but the final objective is to use all the capabilities of pac4j:
>> REST
>> support, authorizations... in Knox.
>>
>> So my advice would be to release knox 0.8.0 and stay with the regular Knox
>> identity assertion filters. Then, for Knox 0.9.0, start thinking how we
>> can
>> define a full pac4j configuration like this one:
>>
>> https://github.com/pac4j/j2e-pac4j-demo/blob/master/src/main/java/org/pac4j/demo/j2e/config/DemoConfigFactory.java
>> instead of only using simple parameters.
>>
>> Thanks.
>> Best regards,
>> Jérôme
>>
>>
>>
>>
>> 2016-01-18 18:35 GMT+01:00 larry mccay <lm...@apache.org>:
>>
>> > > We have the additional mechanism of AuthorizationGenerator you can
>> attach
>> > to a Client to compute authorizations after the login and user profile
>> > retrieval. In fact, you could even use it to switch the identifier of
>> the
>> > user profile by one of its attribute.
>> >
>> > Are you suggesting that a new identity assertion provider based on the
>> > AuthorizationGenerator might make sense?
>> > I can see that as being an interesting approach:
>> >
>> > 1. configure the pac4j assertion provider and specify which attribute to
>> > use as the authenticated user
>> > 2. optionally map the principal to the effective user
>> >
>> > Would #1 above be consistently available across all indirect clients for
>> > which we have support?
>> >
>> >
>> > On Mon, Jan 18, 2016 at 4:53 AM, Jérôme LELEU <le...@gmail.com> wrote:
>> >
>> > > Hi,
>> > >
>> > > The pac4j vision:
>> > > 1) For direct clients (LDAP authentication for example) which are NOT
>> > > currently supported by the Knox / pac4j gateway, we have two
>> components:
>> > > the Authenticator which validates credentials and the ProfileCreator
>> > which
>> > > create a user profile (by default, it relies on the data returned by
>> the
>> > > Authenticator). It means that for this kind of authentication, we
>> assume
>> > we
>> > > can have two identity sources: one for login, one to get attributes
>> and
>> > > username.
>> > > 2) For indirect clients (Facebook or SAML authentication for example)
>> > which
>> > > are currently supported by the Knox / pac4j gateway, we have only one
>> > > component: the client which represents an authentication mechanism as
>> we
>> > > assume that all actions (credentials validation, user profile
>> retrieval)
>> > > are done via one identity source.
>> > >
>> > > We have the additional mechanism of AuthorizationGenerator you can
>> attach
>> > > to a Client to compute authorizations after the login and user profile
>> > > retrieval. In fact, you could even use it to switch the identifier of
>> the
>> > > user profile by one of its attribute.
>> > >
>> > > Thanks.
>> > > Best regards,
>> > > Jérôme
>> > >
>> > >
>> > >
>> > >
>> > > 2016-01-16 17:10 GMT+01:00 larry mccay <lm...@apache.org>:
>> > >
>> > > > All -
>> > > >
>> > > > The pac4j provider contribution was committed yesterday and we are
>> on
>> > > track
>> > > > for our 0.8.0 release. Note that the docs are still being massaged a
>> > bit
>> > > > and will end up in the new 0.8.0 users guide book soon.
>> > > >
>> > > > In the meantime, I'd like to start a discussion wrt the requirements
>> > for
>> > > > identity assertion functionality in order to have full usecase
>> coverage
>> > > for
>> > > > our new authentication/federation mechanisms.
>> > > >
>> > > > A bit of background first...
>> > > >
>> > > > Some of the external provider integrations that are enabled by the
>> > pac4j
>> > > > provider:
>> > > >
>> > > > 1. result in a PrimaryPrinicipal that is actually an id rather than
>> a
>> > > > username that could be used directly within the hadoop cluster.
>> > > > 2. some also allow you to configure the user profile attribute to
>> > > returned
>> > > > as the subject - such as SAML (okta). So, we could at least some
>> times
>> > > have
>> > > > it be an email address.
>> > > > 3. others result in an actual username as the PrimaryPrincipal
>> > > > 4. It is extremely likely that none of these PrimaryPrincipals won't
>> > > > actually line up with enterprise username that can be used within
>> the
>> > > > cluster.
>> > > >
>> > > > Existing identity assertion providers:
>> > > >
>> > > > 1. pseudo/default identity assertion - we have the ability to use
>> > > principal
>> > > > mapping to mapping a numeric id/email or whatever to an acceptable
>> > > username
>> > > > for hadoop. However, all users that would access hadoop through a
>> > > topology
>> > > > configured for pac4j would need to have their principal mappings
>> > defined
>> > > > within the topology. Not a very scalable or manageable approach. The
>> > > > topology itself would likely end up being huge and they would need
>> to
>> > be
>> > > > sync'd up across all Knox instances in the deployment.
>> > > > 2. regex identity assertion provider - this provider would be able
>> to
>> > > take
>> > > > something like an email address PrimaryPrincipal and extract a
>> username
>> > > > from that. In some cases, like okta, this may be the proper username
>> > for
>> > > > companies that use okta as a hosted SSO solution. There is no
>> > additional
>> > > > principal mapping capabilities however.
>> > > >
>> > > > So, questions/options for 0.8.0 release:
>> > > >
>> > > > Option 1. Is static principal mapping within a topology using the
>> > > > pseudo/default identity assertion provider sufficient for the first
>> > > release
>> > > > that has support for these external providers?
>> > > >
>> > > > Option 2. Do we need to add principal mapping capabilities to the
>> regex
>> > > > provider to allow for the extraction of a username AND subsequently
>> > > mapping
>> > > > that to another username?
>> > > >
>> > > > Option 3. Should we create a new identity asserter that does a look
>> up
>> > in
>> > > > LDAP for mapping an id or email address to the username/CN? A more
>> > > dynamic
>> > > > assertion provider like this would certainly be better for
>> scalability
>> > > and
>> > > > management but at the same time would require a change to LDAP
>> schemas
>> > > for
>> > > > things like twitter id. Email address may not require a schema
>> change
>> > but
>> > > > would require the email address from the external provider to match
>> > that
>> > > > within the corporate LDAP.
>> > > >
>> > > > Option 4. Should we consider a central mapping storage identity
>> > assertion
>> > > > provider that would interrogate some KnoxSSO specific mechanism? We
>> > could
>> > > > look at a mapping of PrimaryPrincipal to DN from LDAP, to corporate
>> > email
>> > > > address or directly to username. This would require some separate
>> > > > registration or user sync mechanism to populate this central store
>> and
>> > > > likely couple the mappings to a particular user store like LDAP in
>> some
>> > > > way. It will also introduce a new wrinkle or consideration for Knox
>> > > > upgrades having actual user data to migrate, etc. For the central
>> store
>> > > we
>> > > > could consider:
>> > > >      a. file in HDFS
>> > > >      b. embedded HBase
>> > > >      c. Hive
>> > > >      d. RDBMS
>> > > >      e. LDAP
>> > > >
>> > > > Personally, I lean toward the following:
>> > > >
>> > > > * Option 1 from above for 0.8.0 release introduces the pac4j
>> provider
>> > > with
>> > > > static principal mapping using pseudo/default assertion provider and
>> > > > possibly add support for principal mapping to the regex provider
>> > (Option
>> > > 2)
>> > > > for additional flexibility.
>> > > >
>> > > > * Option 3 and/or 4 from above for a follow up release/s when we can
>> > > > determine the exact design for the central store and user
>> > > sync/registration
>> > > > mechanism that would best meet the community needs and be sure to
>> put
>> > the
>> > > > time into the upgrade/migration considerations.
>> > > >
>> > > > Thoughts?
>> > > >
>> > > > thanks,
>> > > >
>> > > > --larry
>> > > >
>> > >
>> >
>>
>
>

Re: [DISCUSS] Identity Assertion Requirements for 0.8.0 Release

Posted by larry mccay <lm...@apache.org>.
That sounds reasonable - I wouldn't want to try and rush something of that
level into 0.8.0.


On Mon, Jan 18, 2016 at 2:25 PM, Jérôme LELEU <le...@gmail.com> wrote:

> Hi,
>
> This is definitely an option from my point of view, but
> AuthorizationGenerators can also be used for roles and permissions. Knox
> 0.8.0 has a first pac4j support for indirect clients. For me, it's a first
> step, but the final objective is to use all the capabilities of pac4j: REST
> support, authorizations... in Knox.
>
> So my advice would be to release knox 0.8.0 and stay with the regular Knox
> identity assertion filters. Then, for Knox 0.9.0, start thinking how we can
> define a full pac4j configuration like this one:
>
> https://github.com/pac4j/j2e-pac4j-demo/blob/master/src/main/java/org/pac4j/demo/j2e/config/DemoConfigFactory.java
> instead of only using simple parameters.
>
> Thanks.
> Best regards,
> Jérôme
>
>
>
>
> 2016-01-18 18:35 GMT+01:00 larry mccay <lm...@apache.org>:
>
> > > We have the additional mechanism of AuthorizationGenerator you can
> attach
> > to a Client to compute authorizations after the login and user profile
> > retrieval. In fact, you could even use it to switch the identifier of the
> > user profile by one of its attribute.
> >
> > Are you suggesting that a new identity assertion provider based on the
> > AuthorizationGenerator might make sense?
> > I can see that as being an interesting approach:
> >
> > 1. configure the pac4j assertion provider and specify which attribute to
> > use as the authenticated user
> > 2. optionally map the principal to the effective user
> >
> > Would #1 above be consistently available across all indirect clients for
> > which we have support?
> >
> >
> > On Mon, Jan 18, 2016 at 4:53 AM, Jérôme LELEU <le...@gmail.com> wrote:
> >
> > > Hi,
> > >
> > > The pac4j vision:
> > > 1) For direct clients (LDAP authentication for example) which are NOT
> > > currently supported by the Knox / pac4j gateway, we have two
> components:
> > > the Authenticator which validates credentials and the ProfileCreator
> > which
> > > create a user profile (by default, it relies on the data returned by
> the
> > > Authenticator). It means that for this kind of authentication, we
> assume
> > we
> > > can have two identity sources: one for login, one to get attributes and
> > > username.
> > > 2) For indirect clients (Facebook or SAML authentication for example)
> > which
> > > are currently supported by the Knox / pac4j gateway, we have only one
> > > component: the client which represents an authentication mechanism as
> we
> > > assume that all actions (credentials validation, user profile
> retrieval)
> > > are done via one identity source.
> > >
> > > We have the additional mechanism of AuthorizationGenerator you can
> attach
> > > to a Client to compute authorizations after the login and user profile
> > > retrieval. In fact, you could even use it to switch the identifier of
> the
> > > user profile by one of its attribute.
> > >
> > > Thanks.
> > > Best regards,
> > > Jérôme
> > >
> > >
> > >
> > >
> > > 2016-01-16 17:10 GMT+01:00 larry mccay <lm...@apache.org>:
> > >
> > > > All -
> > > >
> > > > The pac4j provider contribution was committed yesterday and we are on
> > > track
> > > > for our 0.8.0 release. Note that the docs are still being massaged a
> > bit
> > > > and will end up in the new 0.8.0 users guide book soon.
> > > >
> > > > In the meantime, I'd like to start a discussion wrt the requirements
> > for
> > > > identity assertion functionality in order to have full usecase
> coverage
> > > for
> > > > our new authentication/federation mechanisms.
> > > >
> > > > A bit of background first...
> > > >
> > > > Some of the external provider integrations that are enabled by the
> > pac4j
> > > > provider:
> > > >
> > > > 1. result in a PrimaryPrinicipal that is actually an id rather than a
> > > > username that could be used directly within the hadoop cluster.
> > > > 2. some also allow you to configure the user profile attribute to
> > > returned
> > > > as the subject - such as SAML (okta). So, we could at least some
> times
> > > have
> > > > it be an email address.
> > > > 3. others result in an actual username as the PrimaryPrincipal
> > > > 4. It is extremely likely that none of these PrimaryPrincipals won't
> > > > actually line up with enterprise username that can be used within the
> > > > cluster.
> > > >
> > > > Existing identity assertion providers:
> > > >
> > > > 1. pseudo/default identity assertion - we have the ability to use
> > > principal
> > > > mapping to mapping a numeric id/email or whatever to an acceptable
> > > username
> > > > for hadoop. However, all users that would access hadoop through a
> > > topology
> > > > configured for pac4j would need to have their principal mappings
> > defined
> > > > within the topology. Not a very scalable or manageable approach. The
> > > > topology itself would likely end up being huge and they would need to
> > be
> > > > sync'd up across all Knox instances in the deployment.
> > > > 2. regex identity assertion provider - this provider would be able to
> > > take
> > > > something like an email address PrimaryPrincipal and extract a
> username
> > > > from that. In some cases, like okta, this may be the proper username
> > for
> > > > companies that use okta as a hosted SSO solution. There is no
> > additional
> > > > principal mapping capabilities however.
> > > >
> > > > So, questions/options for 0.8.0 release:
> > > >
> > > > Option 1. Is static principal mapping within a topology using the
> > > > pseudo/default identity assertion provider sufficient for the first
> > > release
> > > > that has support for these external providers?
> > > >
> > > > Option 2. Do we need to add principal mapping capabilities to the
> regex
> > > > provider to allow for the extraction of a username AND subsequently
> > > mapping
> > > > that to another username?
> > > >
> > > > Option 3. Should we create a new identity asserter that does a look
> up
> > in
> > > > LDAP for mapping an id or email address to the username/CN? A more
> > > dynamic
> > > > assertion provider like this would certainly be better for
> scalability
> > > and
> > > > management but at the same time would require a change to LDAP
> schemas
> > > for
> > > > things like twitter id. Email address may not require a schema change
> > but
> > > > would require the email address from the external provider to match
> > that
> > > > within the corporate LDAP.
> > > >
> > > > Option 4. Should we consider a central mapping storage identity
> > assertion
> > > > provider that would interrogate some KnoxSSO specific mechanism? We
> > could
> > > > look at a mapping of PrimaryPrincipal to DN from LDAP, to corporate
> > email
> > > > address or directly to username. This would require some separate
> > > > registration or user sync mechanism to populate this central store
> and
> > > > likely couple the mappings to a particular user store like LDAP in
> some
> > > > way. It will also introduce a new wrinkle or consideration for Knox
> > > > upgrades having actual user data to migrate, etc. For the central
> store
> > > we
> > > > could consider:
> > > >      a. file in HDFS
> > > >      b. embedded HBase
> > > >      c. Hive
> > > >      d. RDBMS
> > > >      e. LDAP
> > > >
> > > > Personally, I lean toward the following:
> > > >
> > > > * Option 1 from above for 0.8.0 release introduces the pac4j provider
> > > with
> > > > static principal mapping using pseudo/default assertion provider and
> > > > possibly add support for principal mapping to the regex provider
> > (Option
> > > 2)
> > > > for additional flexibility.
> > > >
> > > > * Option 3 and/or 4 from above for a follow up release/s when we can
> > > > determine the exact design for the central store and user
> > > sync/registration
> > > > mechanism that would best meet the community needs and be sure to put
> > the
> > > > time into the upgrade/migration considerations.
> > > >
> > > > Thoughts?
> > > >
> > > > thanks,
> > > >
> > > > --larry
> > > >
> > >
> >
>

Re: [DISCUSS] Identity Assertion Requirements for 0.8.0 Release

Posted by Jérôme LELEU <le...@gmail.com>.
Hi,

This is definitely an option from my point of view, but
AuthorizationGenerators can also be used for roles and permissions. Knox
0.8.0 has a first pac4j support for indirect clients. For me, it's a first
step, but the final objective is to use all the capabilities of pac4j: REST
support, authorizations... in Knox.

So my advice would be to release knox 0.8.0 and stay with the regular Knox
identity assertion filters. Then, for Knox 0.9.0, start thinking how we can
define a full pac4j configuration like this one:
https://github.com/pac4j/j2e-pac4j-demo/blob/master/src/main/java/org/pac4j/demo/j2e/config/DemoConfigFactory.java
instead of only using simple parameters.

Thanks.
Best regards,
Jérôme




2016-01-18 18:35 GMT+01:00 larry mccay <lm...@apache.org>:

> > We have the additional mechanism of AuthorizationGenerator you can attach
> to a Client to compute authorizations after the login and user profile
> retrieval. In fact, you could even use it to switch the identifier of the
> user profile by one of its attribute.
>
> Are you suggesting that a new identity assertion provider based on the
> AuthorizationGenerator might make sense?
> I can see that as being an interesting approach:
>
> 1. configure the pac4j assertion provider and specify which attribute to
> use as the authenticated user
> 2. optionally map the principal to the effective user
>
> Would #1 above be consistently available across all indirect clients for
> which we have support?
>
>
> On Mon, Jan 18, 2016 at 4:53 AM, Jérôme LELEU <le...@gmail.com> wrote:
>
> > Hi,
> >
> > The pac4j vision:
> > 1) For direct clients (LDAP authentication for example) which are NOT
> > currently supported by the Knox / pac4j gateway, we have two components:
> > the Authenticator which validates credentials and the ProfileCreator
> which
> > create a user profile (by default, it relies on the data returned by the
> > Authenticator). It means that for this kind of authentication, we assume
> we
> > can have two identity sources: one for login, one to get attributes and
> > username.
> > 2) For indirect clients (Facebook or SAML authentication for example)
> which
> > are currently supported by the Knox / pac4j gateway, we have only one
> > component: the client which represents an authentication mechanism as we
> > assume that all actions (credentials validation, user profile retrieval)
> > are done via one identity source.
> >
> > We have the additional mechanism of AuthorizationGenerator you can attach
> > to a Client to compute authorizations after the login and user profile
> > retrieval. In fact, you could even use it to switch the identifier of the
> > user profile by one of its attribute.
> >
> > Thanks.
> > Best regards,
> > Jérôme
> >
> >
> >
> >
> > 2016-01-16 17:10 GMT+01:00 larry mccay <lm...@apache.org>:
> >
> > > All -
> > >
> > > The pac4j provider contribution was committed yesterday and we are on
> > track
> > > for our 0.8.0 release. Note that the docs are still being massaged a
> bit
> > > and will end up in the new 0.8.0 users guide book soon.
> > >
> > > In the meantime, I'd like to start a discussion wrt the requirements
> for
> > > identity assertion functionality in order to have full usecase coverage
> > for
> > > our new authentication/federation mechanisms.
> > >
> > > A bit of background first...
> > >
> > > Some of the external provider integrations that are enabled by the
> pac4j
> > > provider:
> > >
> > > 1. result in a PrimaryPrinicipal that is actually an id rather than a
> > > username that could be used directly within the hadoop cluster.
> > > 2. some also allow you to configure the user profile attribute to
> > returned
> > > as the subject - such as SAML (okta). So, we could at least some times
> > have
> > > it be an email address.
> > > 3. others result in an actual username as the PrimaryPrincipal
> > > 4. It is extremely likely that none of these PrimaryPrincipals won't
> > > actually line up with enterprise username that can be used within the
> > > cluster.
> > >
> > > Existing identity assertion providers:
> > >
> > > 1. pseudo/default identity assertion - we have the ability to use
> > principal
> > > mapping to mapping a numeric id/email or whatever to an acceptable
> > username
> > > for hadoop. However, all users that would access hadoop through a
> > topology
> > > configured for pac4j would need to have their principal mappings
> defined
> > > within the topology. Not a very scalable or manageable approach. The
> > > topology itself would likely end up being huge and they would need to
> be
> > > sync'd up across all Knox instances in the deployment.
> > > 2. regex identity assertion provider - this provider would be able to
> > take
> > > something like an email address PrimaryPrincipal and extract a username
> > > from that. In some cases, like okta, this may be the proper username
> for
> > > companies that use okta as a hosted SSO solution. There is no
> additional
> > > principal mapping capabilities however.
> > >
> > > So, questions/options for 0.8.0 release:
> > >
> > > Option 1. Is static principal mapping within a topology using the
> > > pseudo/default identity assertion provider sufficient for the first
> > release
> > > that has support for these external providers?
> > >
> > > Option 2. Do we need to add principal mapping capabilities to the regex
> > > provider to allow for the extraction of a username AND subsequently
> > mapping
> > > that to another username?
> > >
> > > Option 3. Should we create a new identity asserter that does a look up
> in
> > > LDAP for mapping an id or email address to the username/CN? A more
> > dynamic
> > > assertion provider like this would certainly be better for scalability
> > and
> > > management but at the same time would require a change to LDAP schemas
> > for
> > > things like twitter id. Email address may not require a schema change
> but
> > > would require the email address from the external provider to match
> that
> > > within the corporate LDAP.
> > >
> > > Option 4. Should we consider a central mapping storage identity
> assertion
> > > provider that would interrogate some KnoxSSO specific mechanism? We
> could
> > > look at a mapping of PrimaryPrincipal to DN from LDAP, to corporate
> email
> > > address or directly to username. This would require some separate
> > > registration or user sync mechanism to populate this central store and
> > > likely couple the mappings to a particular user store like LDAP in some
> > > way. It will also introduce a new wrinkle or consideration for Knox
> > > upgrades having actual user data to migrate, etc. For the central store
> > we
> > > could consider:
> > >      a. file in HDFS
> > >      b. embedded HBase
> > >      c. Hive
> > >      d. RDBMS
> > >      e. LDAP
> > >
> > > Personally, I lean toward the following:
> > >
> > > * Option 1 from above for 0.8.0 release introduces the pac4j provider
> > with
> > > static principal mapping using pseudo/default assertion provider and
> > > possibly add support for principal mapping to the regex provider
> (Option
> > 2)
> > > for additional flexibility.
> > >
> > > * Option 3 and/or 4 from above for a follow up release/s when we can
> > > determine the exact design for the central store and user
> > sync/registration
> > > mechanism that would best meet the community needs and be sure to put
> the
> > > time into the upgrade/migration considerations.
> > >
> > > Thoughts?
> > >
> > > thanks,
> > >
> > > --larry
> > >
> >
>

Re: [DISCUSS] Identity Assertion Requirements for 0.8.0 Release

Posted by larry mccay <lm...@apache.org>.
> We have the additional mechanism of AuthorizationGenerator you can attach
to a Client to compute authorizations after the login and user profile
retrieval. In fact, you could even use it to switch the identifier of the
user profile by one of its attribute.

Are you suggesting that a new identity assertion provider based on the
AuthorizationGenerator might make sense?
I can see that as being an interesting approach:

1. configure the pac4j assertion provider and specify which attribute to
use as the authenticated user
2. optionally map the principal to the effective user

Would #1 above be consistently available across all indirect clients for
which we have support?


On Mon, Jan 18, 2016 at 4:53 AM, Jérôme LELEU <le...@gmail.com> wrote:

> Hi,
>
> The pac4j vision:
> 1) For direct clients (LDAP authentication for example) which are NOT
> currently supported by the Knox / pac4j gateway, we have two components:
> the Authenticator which validates credentials and the ProfileCreator which
> create a user profile (by default, it relies on the data returned by the
> Authenticator). It means that for this kind of authentication, we assume we
> can have two identity sources: one for login, one to get attributes and
> username.
> 2) For indirect clients (Facebook or SAML authentication for example) which
> are currently supported by the Knox / pac4j gateway, we have only one
> component: the client which represents an authentication mechanism as we
> assume that all actions (credentials validation, user profile retrieval)
> are done via one identity source.
>
> We have the additional mechanism of AuthorizationGenerator you can attach
> to a Client to compute authorizations after the login and user profile
> retrieval. In fact, you could even use it to switch the identifier of the
> user profile by one of its attribute.
>
> Thanks.
> Best regards,
> Jérôme
>
>
>
>
> 2016-01-16 17:10 GMT+01:00 larry mccay <lm...@apache.org>:
>
> > All -
> >
> > The pac4j provider contribution was committed yesterday and we are on
> track
> > for our 0.8.0 release. Note that the docs are still being massaged a bit
> > and will end up in the new 0.8.0 users guide book soon.
> >
> > In the meantime, I'd like to start a discussion wrt the requirements for
> > identity assertion functionality in order to have full usecase coverage
> for
> > our new authentication/federation mechanisms.
> >
> > A bit of background first...
> >
> > Some of the external provider integrations that are enabled by the pac4j
> > provider:
> >
> > 1. result in a PrimaryPrinicipal that is actually an id rather than a
> > username that could be used directly within the hadoop cluster.
> > 2. some also allow you to configure the user profile attribute to
> returned
> > as the subject - such as SAML (okta). So, we could at least some times
> have
> > it be an email address.
> > 3. others result in an actual username as the PrimaryPrincipal
> > 4. It is extremely likely that none of these PrimaryPrincipals won't
> > actually line up with enterprise username that can be used within the
> > cluster.
> >
> > Existing identity assertion providers:
> >
> > 1. pseudo/default identity assertion - we have the ability to use
> principal
> > mapping to mapping a numeric id/email or whatever to an acceptable
> username
> > for hadoop. However, all users that would access hadoop through a
> topology
> > configured for pac4j would need to have their principal mappings defined
> > within the topology. Not a very scalable or manageable approach. The
> > topology itself would likely end up being huge and they would need to be
> > sync'd up across all Knox instances in the deployment.
> > 2. regex identity assertion provider - this provider would be able to
> take
> > something like an email address PrimaryPrincipal and extract a username
> > from that. In some cases, like okta, this may be the proper username for
> > companies that use okta as a hosted SSO solution. There is no additional
> > principal mapping capabilities however.
> >
> > So, questions/options for 0.8.0 release:
> >
> > Option 1. Is static principal mapping within a topology using the
> > pseudo/default identity assertion provider sufficient for the first
> release
> > that has support for these external providers?
> >
> > Option 2. Do we need to add principal mapping capabilities to the regex
> > provider to allow for the extraction of a username AND subsequently
> mapping
> > that to another username?
> >
> > Option 3. Should we create a new identity asserter that does a look up in
> > LDAP for mapping an id or email address to the username/CN? A more
> dynamic
> > assertion provider like this would certainly be better for scalability
> and
> > management but at the same time would require a change to LDAP schemas
> for
> > things like twitter id. Email address may not require a schema change but
> > would require the email address from the external provider to match that
> > within the corporate LDAP.
> >
> > Option 4. Should we consider a central mapping storage identity assertion
> > provider that would interrogate some KnoxSSO specific mechanism? We could
> > look at a mapping of PrimaryPrincipal to DN from LDAP, to corporate email
> > address or directly to username. This would require some separate
> > registration or user sync mechanism to populate this central store and
> > likely couple the mappings to a particular user store like LDAP in some
> > way. It will also introduce a new wrinkle or consideration for Knox
> > upgrades having actual user data to migrate, etc. For the central store
> we
> > could consider:
> >      a. file in HDFS
> >      b. embedded HBase
> >      c. Hive
> >      d. RDBMS
> >      e. LDAP
> >
> > Personally, I lean toward the following:
> >
> > * Option 1 from above for 0.8.0 release introduces the pac4j provider
> with
> > static principal mapping using pseudo/default assertion provider and
> > possibly add support for principal mapping to the regex provider (Option
> 2)
> > for additional flexibility.
> >
> > * Option 3 and/or 4 from above for a follow up release/s when we can
> > determine the exact design for the central store and user
> sync/registration
> > mechanism that would best meet the community needs and be sure to put the
> > time into the upgrade/migration considerations.
> >
> > Thoughts?
> >
> > thanks,
> >
> > --larry
> >
>

Re: [DISCUSS] Identity Assertion Requirements for 0.8.0 Release

Posted by Jérôme LELEU <le...@gmail.com>.
Hi,

The pac4j vision:
1) For direct clients (LDAP authentication for example) which are NOT
currently supported by the Knox / pac4j gateway, we have two components:
the Authenticator which validates credentials and the ProfileCreator which
create a user profile (by default, it relies on the data returned by the
Authenticator). It means that for this kind of authentication, we assume we
can have two identity sources: one for login, one to get attributes and
username.
2) For indirect clients (Facebook or SAML authentication for example) which
are currently supported by the Knox / pac4j gateway, we have only one
component: the client which represents an authentication mechanism as we
assume that all actions (credentials validation, user profile retrieval)
are done via one identity source.

We have the additional mechanism of AuthorizationGenerator you can attach
to a Client to compute authorizations after the login and user profile
retrieval. In fact, you could even use it to switch the identifier of the
user profile by one of its attribute.

Thanks.
Best regards,
Jérôme




2016-01-16 17:10 GMT+01:00 larry mccay <lm...@apache.org>:

> All -
>
> The pac4j provider contribution was committed yesterday and we are on track
> for our 0.8.0 release. Note that the docs are still being massaged a bit
> and will end up in the new 0.8.0 users guide book soon.
>
> In the meantime, I'd like to start a discussion wrt the requirements for
> identity assertion functionality in order to have full usecase coverage for
> our new authentication/federation mechanisms.
>
> A bit of background first...
>
> Some of the external provider integrations that are enabled by the pac4j
> provider:
>
> 1. result in a PrimaryPrinicipal that is actually an id rather than a
> username that could be used directly within the hadoop cluster.
> 2. some also allow you to configure the user profile attribute to returned
> as the subject - such as SAML (okta). So, we could at least some times have
> it be an email address.
> 3. others result in an actual username as the PrimaryPrincipal
> 4. It is extremely likely that none of these PrimaryPrincipals won't
> actually line up with enterprise username that can be used within the
> cluster.
>
> Existing identity assertion providers:
>
> 1. pseudo/default identity assertion - we have the ability to use principal
> mapping to mapping a numeric id/email or whatever to an acceptable username
> for hadoop. However, all users that would access hadoop through a topology
> configured for pac4j would need to have their principal mappings defined
> within the topology. Not a very scalable or manageable approach. The
> topology itself would likely end up being huge and they would need to be
> sync'd up across all Knox instances in the deployment.
> 2. regex identity assertion provider - this provider would be able to take
> something like an email address PrimaryPrincipal and extract a username
> from that. In some cases, like okta, this may be the proper username for
> companies that use okta as a hosted SSO solution. There is no additional
> principal mapping capabilities however.
>
> So, questions/options for 0.8.0 release:
>
> Option 1. Is static principal mapping within a topology using the
> pseudo/default identity assertion provider sufficient for the first release
> that has support for these external providers?
>
> Option 2. Do we need to add principal mapping capabilities to the regex
> provider to allow for the extraction of a username AND subsequently mapping
> that to another username?
>
> Option 3. Should we create a new identity asserter that does a look up in
> LDAP for mapping an id or email address to the username/CN? A more dynamic
> assertion provider like this would certainly be better for scalability and
> management but at the same time would require a change to LDAP schemas for
> things like twitter id. Email address may not require a schema change but
> would require the email address from the external provider to match that
> within the corporate LDAP.
>
> Option 4. Should we consider a central mapping storage identity assertion
> provider that would interrogate some KnoxSSO specific mechanism? We could
> look at a mapping of PrimaryPrincipal to DN from LDAP, to corporate email
> address or directly to username. This would require some separate
> registration or user sync mechanism to populate this central store and
> likely couple the mappings to a particular user store like LDAP in some
> way. It will also introduce a new wrinkle or consideration for Knox
> upgrades having actual user data to migrate, etc. For the central store we
> could consider:
>      a. file in HDFS
>      b. embedded HBase
>      c. Hive
>      d. RDBMS
>      e. LDAP
>
> Personally, I lean toward the following:
>
> * Option 1 from above for 0.8.0 release introduces the pac4j provider with
> static principal mapping using pseudo/default assertion provider and
> possibly add support for principal mapping to the regex provider (Option 2)
> for additional flexibility.
>
> * Option 3 and/or 4 from above for a follow up release/s when we can
> determine the exact design for the central store and user sync/registration
> mechanism that would best meet the community needs and be sure to put the
> time into the upgrade/migration considerations.
>
> Thoughts?
>
> thanks,
>
> --larry
>

Re: [DISCUSS] Identity Assertion Requirements for 0.8.0 Release

Posted by larry mccay <la...@gmail.com>.
Oh, nice!

I think that it may have been better to leverage existing principal mapping
implementation - if possible.
Unfortunately, it doesn't seem to have been made available in the abstract
bases yet.

I could actually see the regex.lookup as a slightly different beast than
principal mapping.
Although, we could debate whether both were needed.

It seems like:

1. regex.output/lookup are a means to establishing/normalizing the
authenticated principal
2. principal.mapping would be a means to mapping the authenticated
principal to the effective principal

Give the current regex implementation, would we be able to take
somebody@us.imaginary.com and map that to "guest"?
I guess if we started with a completely different regex.input then we could
easily do that.

Another consideration would be to standardize on the regex mechanism for
doing principal mapping - which would mean moving some or all of that into
the abstract base for identity assertion providers.

On Mon, Jan 18, 2016 at 10:18 AM, Kevin Minder <kevin.minder@hortonworks.com
> wrote:

> WRT Option 2 below: The regex identity assertion mapper can already do
> what is described.  Given the configuration below it will turn
> somebody@us.imaginary.com into somebody_USA. The {[2]} takes the value
> from the second matching group and looks it up in the lookup table.
>
> <provider>
>   <role>identity-assertion</role>
>   <name>Regex</name>
>   <enabled>true</enabled>
>   <param>
>     <name>input</name>
>     <value>(.*)@(.*?)\..*</value>
>   </param>
>   <param>
>     <name>output</name>
>     <value>{1}_{[2]}</value>
>   </param>
>   <param>
>     <name>lookup</name>
>     <value>us=USA;ca=CANADA</value>
>   </param>
> </provider>
>
>
>
>
> On 1/16/16, 11:10 AM, "larry mccay" <lm...@apache.org> wrote:
>
> >All -
> >
> >The pac4j provider contribution was committed yesterday and we are on
> track
> >for our 0.8.0 release. Note that the docs are still being massaged a bit
> >and will end up in the new 0.8.0 users guide book soon.
> >
> >In the meantime, I'd like to start a discussion wrt the requirements for
> >identity assertion functionality in order to have full usecase coverage
> for
> >our new authentication/federation mechanisms.
> >
> >A bit of background first...
> >
> >Some of the external provider integrations that are enabled by the pac4j
> >provider:
> >
> >1. result in a PrimaryPrinicipal that is actually an id rather than a
> >username that could be used directly within the hadoop cluster.
> >2. some also allow you to configure the user profile attribute to returned
> >as the subject - such as SAML (okta). So, we could at least some times
> have
> >it be an email address.
> >3. others result in an actual username as the PrimaryPrincipal
> >4. It is extremely likely that none of these PrimaryPrincipals won't
> >actually line up with enterprise username that can be used within the
> >cluster.
> >
> >Existing identity assertion providers:
> >
> >1. pseudo/default identity assertion - we have the ability to use
> principal
> >mapping to mapping a numeric id/email or whatever to an acceptable
> username
> >for hadoop. However, all users that would access hadoop through a topology
> >configured for pac4j would need to have their principal mappings defined
> >within the topology. Not a very scalable or manageable approach. The
> >topology itself would likely end up being huge and they would need to be
> >sync'd up across all Knox instances in the deployment.
> >2. regex identity assertion provider - this provider would be able to take
> >something like an email address PrimaryPrincipal and extract a username
> >from that. In some cases, like okta, this may be the proper username for
> >companies that use okta as a hosted SSO solution. There is no additional
> >principal mapping capabilities however.
> >
> >So, questions/options for 0.8.0 release:
> >
> >Option 1. Is static principal mapping within a topology using the
> >pseudo/default identity assertion provider sufficient for the first
> release
> >that has support for these external providers?
> >
> >Option 2. Do we need to add principal mapping capabilities to the regex
> >provider to allow for the extraction of a username AND subsequently
> mapping
> >that to another username?
> >
> >Option 3. Should we create a new identity asserter that does a look up in
> >LDAP for mapping an id or email address to the username/CN? A more dynamic
> >assertion provider like this would certainly be better for scalability and
> >management but at the same time would require a change to LDAP schemas for
> >things like twitter id. Email address may not require a schema change but
> >would require the email address from the external provider to match that
> >within the corporate LDAP.
> >
> >Option 4. Should we consider a central mapping storage identity assertion
> >provider that would interrogate some KnoxSSO specific mechanism? We could
> >look at a mapping of PrimaryPrincipal to DN from LDAP, to corporate email
> >address or directly to username. This would require some separate
> >registration or user sync mechanism to populate this central store and
> >likely couple the mappings to a particular user store like LDAP in some
> >way. It will also introduce a new wrinkle or consideration for Knox
> >upgrades having actual user data to migrate, etc. For the central store we
> >could consider:
> >     a. file in HDFS
> >     b. embedded HBase
> >     c. Hive
> >     d. RDBMS
> >     e. LDAP
> >
> >Personally, I lean toward the following:
> >
> >* Option 1 from above for 0.8.0 release introduces the pac4j provider with
> >static principal mapping using pseudo/default assertion provider and
> >possibly add support for principal mapping to the regex provider (Option
> 2)
> >for additional flexibility.
> >
> >* Option 3 and/or 4 from above for a follow up release/s when we can
> >determine the exact design for the central store and user
> sync/registration
> >mechanism that would best meet the community needs and be sure to put the
> >time into the upgrade/migration considerations.
> >
> >Thoughts?
> >
> >thanks,
> >
> >--larry
>

Re: [DISCUSS] Identity Assertion Requirements for 0.8.0 Release

Posted by Kevin Minder <ke...@hortonworks.com>.
WRT Option 2 below: The regex identity assertion mapper can already do what is described.  Given the configuration below it will turn somebody@us.imaginary.com into somebody_USA. The {[2]} takes the value from the second matching group and looks it up in the lookup table.

<provider>
  <role>identity-assertion</role>
  <name>Regex</name>
  <enabled>true</enabled>
  <param>
    <name>input</name>
    <value>(.*)@(.*?)\..*</value>
  </param>
  <param>
    <name>output</name>
    <value>{1}_{[2]}</value>
  </param>
  <param>
    <name>lookup</name>
    <value>us=USA;ca=CANADA</value>
  </param>
</provider>  




On 1/16/16, 11:10 AM, "larry mccay" <lm...@apache.org> wrote:

>All -
>
>The pac4j provider contribution was committed yesterday and we are on track
>for our 0.8.0 release. Note that the docs are still being massaged a bit
>and will end up in the new 0.8.0 users guide book soon.
>
>In the meantime, I'd like to start a discussion wrt the requirements for
>identity assertion functionality in order to have full usecase coverage for
>our new authentication/federation mechanisms.
>
>A bit of background first...
>
>Some of the external provider integrations that are enabled by the pac4j
>provider:
>
>1. result in a PrimaryPrinicipal that is actually an id rather than a
>username that could be used directly within the hadoop cluster.
>2. some also allow you to configure the user profile attribute to returned
>as the subject - such as SAML (okta). So, we could at least some times have
>it be an email address.
>3. others result in an actual username as the PrimaryPrincipal
>4. It is extremely likely that none of these PrimaryPrincipals won't
>actually line up with enterprise username that can be used within the
>cluster.
>
>Existing identity assertion providers:
>
>1. pseudo/default identity assertion - we have the ability to use principal
>mapping to mapping a numeric id/email or whatever to an acceptable username
>for hadoop. However, all users that would access hadoop through a topology
>configured for pac4j would need to have their principal mappings defined
>within the topology. Not a very scalable or manageable approach. The
>topology itself would likely end up being huge and they would need to be
>sync'd up across all Knox instances in the deployment.
>2. regex identity assertion provider - this provider would be able to take
>something like an email address PrimaryPrincipal and extract a username
>from that. In some cases, like okta, this may be the proper username for
>companies that use okta as a hosted SSO solution. There is no additional
>principal mapping capabilities however.
>
>So, questions/options for 0.8.0 release:
>
>Option 1. Is static principal mapping within a topology using the
>pseudo/default identity assertion provider sufficient for the first release
>that has support for these external providers?
>
>Option 2. Do we need to add principal mapping capabilities to the regex
>provider to allow for the extraction of a username AND subsequently mapping
>that to another username?
>
>Option 3. Should we create a new identity asserter that does a look up in
>LDAP for mapping an id or email address to the username/CN? A more dynamic
>assertion provider like this would certainly be better for scalability and
>management but at the same time would require a change to LDAP schemas for
>things like twitter id. Email address may not require a schema change but
>would require the email address from the external provider to match that
>within the corporate LDAP.
>
>Option 4. Should we consider a central mapping storage identity assertion
>provider that would interrogate some KnoxSSO specific mechanism? We could
>look at a mapping of PrimaryPrincipal to DN from LDAP, to corporate email
>address or directly to username. This would require some separate
>registration or user sync mechanism to populate this central store and
>likely couple the mappings to a particular user store like LDAP in some
>way. It will also introduce a new wrinkle or consideration for Knox
>upgrades having actual user data to migrate, etc. For the central store we
>could consider:
>     a. file in HDFS
>     b. embedded HBase
>     c. Hive
>     d. RDBMS
>     e. LDAP
>
>Personally, I lean toward the following:
>
>* Option 1 from above for 0.8.0 release introduces the pac4j provider with
>static principal mapping using pseudo/default assertion provider and
>possibly add support for principal mapping to the regex provider (Option 2)
>for additional flexibility.
>
>* Option 3 and/or 4 from above for a follow up release/s when we can
>determine the exact design for the central store and user sync/registration
>mechanism that would best meet the community needs and be sure to put the
>time into the upgrade/migration considerations.
>
>Thoughts?
>
>thanks,
>
>--larry