You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@sentry.apache.org by Bhooshan Mogal <bh...@gmail.com> on 2016/08/03 15:49:37 UTC

Re: Delegation tokens?

Hi Folks,

Any thoughts?

-
Bhooshan

On Sat, Jul 30, 2016 at 8:33 AM, Bhooshan Mogal <bh...@gmail.com>
wrote:

> Hi,
>
> Does the Sentry Service provide delegation tokens for processes without
> Kerberos credentials to communicate with it (from YARN containers).
>
>
> Use case: We have programs running in YARN accessing some entities on whom
> authorization is enforced using Apache Sentry. There is a master process
> that can communicate with Sentry just fine using its Kerberos credentials.
> We have some level of caching implemented for ACLs as well, so we don't
> have to hit Sentry for every authorization request. However, given that
> this is a security feature, the cache needs to be updated very frequently.
> For updating this cache, going via the master every single time will create
> a bottleneck. So we wanted to explore if there was a way if a dedicated
> service running in YARN containers (not every program, but a dedicated
> service) can communicate with Sentry using delegation tokens. Exposing the
> master's kerberos credentials to such a service is not an option because it
> would lead to a security loophole.
>
> This would be similar to what KMS offers via
> https://issues.apache.org/jira/browse/HADOOP-10769.
>
>
> Thanks in advance,
> Bhooshan
>
>


-- 
Bhooshan

Re: Delegation tokens?

Posted by Sravya Tirukkovalur <sr...@cloudera.com>.

> On Aug 3, 2016, at 4:57 PM, Bhooshan Mogal <bh...@gmail.com> wrote:
> 
> Thanks for the info Sravya. Are you proposing a cache in the Java Sentry
> Client? That could potentially be useful.
> 
Yes, eventually. Client cache which relies on update log in Sentry. Update log is discussed as part of sentry ha redesign in sentry-872, FYI.

> The dedicated service is necessary primarily because programs run in YARN
> as the users who submitted the programs. These users are not expected to be
> whitelisted in the Sentry Service. Hence the need for this extra hop.
> 
I see. There is some discussion around relaxing the whitelist requirement in Sentry mainly driven by Kafka/ solr use cases. To allow non service users to grant/ revoke/ list permissions using Sentry cli. Let me find the jira for it. That might be something which helps this use case as well. Basically, end user should be able to authenticate using end user Kerberos credentials with sentry and be allowed to list his/ her own privileges.

> About the cache - individual program containers maintain a cache per
> container. Cache entries individually have a TTL. Each cache is
> asynchronously refreshed periodically by making calls to Sentry, currently
> via master. However, with delegation tokens, we'd want to reduce this load
> from the master to a dedicated service inside YARN (which is also used for
> other such "system" operations during program runtime, which can't be run
> as the user who submitted the program).
> 
> I'd be more than willing to blog about our use-case and integration with
> Apache Sentry on the Apache blog. We're currently in the middle of testing
> this integration, and will get back to you about that soon. I want to make
> sure that all ends are covered before doing so. Our deadlines for this are
> coming up pretty soon, so I can hopefully reach out to you about the blog
> in the next couple of weeks.
> 
Sounds good! 
> In the meantime though, should we create a jira to add support for the
> Sentry Service to issue delegation tokens? I have seen some examples of how
> that is done in Hadoop-land over HTTP - so I'd say I could take a stab at
> it, but I'm unsure of how to go about it in Sentry because Sentry uses
> Thrift.
> 
Nevertheless, Delegation tokens do seem like a useful feature which could potentially avoid trusted impersonation in Sentry as well. Please feel free to file a jira as always. And thanks for volunteering to take a stab at it! I am not intimately familiar with the implementation details but can dig into it a bit more as well. 
> 
> Thanks,
> Bhooshan
> 
> 
> On Wed, Aug 3, 2016 at 10:25 AM, Sravya Tirukkovalur <sr...@cloudera.com>
> wrote:
> 
>> Thanks for bringing this up Bhooshan!
>> 
>> Apache Sentry does not support delegation tokens yet. Looking at your use
>> case, it seems like cache with strong (near strong) freshness guarantees is
>> the key requirement here. Sentry does plan to support a way to store delta
>> changes and serve these deltas through API in future. That would make it
>> easier for the downstream clients who wish to cache (read only) the
>> permission data to have an efficient and reliable way to keep the cache
>> upto date.
>> 
>> Having said that, curious how does your master cache the permissions today?
>> And is the latency of multiple RPC in your proposed approach acceptable:
>> Container -> this new service -> Sentry service using delegation token? And
>> how is this approach better than just making an RPC to Sentry directly?
>> 
>> Also, I am sure community would benefit greatly from your Sentry use case.
>> Would you be interested in blogging about it on Apache blog?
>> 
>> 
>> On Wed, Aug 3, 2016 at 8:49 AM, Bhooshan Mogal <bh...@gmail.com>
>> wrote:
>> 
>>> Hi Folks,
>>> 
>>> Any thoughts?
>>> 
>>> -
>>> Bhooshan
>>> 
>>> On Sat, Jul 30, 2016 at 8:33 AM, Bhooshan Mogal <
>> bhooshan.mogal@gmail.com>
>>> wrote:
>>> 
>>>> Hi,
>>>> 
>>>> Does the Sentry Service provide delegation tokens for processes without
>>>> Kerberos credentials to communicate with it (from YARN containers).
>>>> 
>>>> 
>>>> Use case: We have programs running in YARN accessing some entities on
>>> whom
>>>> authorization is enforced using Apache Sentry. There is a master
>> process
>>>> that can communicate with Sentry just fine using its Kerberos
>>> credentials.
>>>> We have some level of caching implemented for ACLs as well, so we don't
>>>> have to hit Sentry for every authorization request. However, given that
>>>> this is a security feature, the cache needs to be updated very
>>> frequently.
>>>> For updating this cache, going via the master every single time will
>>> create
>>>> a bottleneck. So we wanted to explore if there was a way if a dedicated
>>>> service running in YARN containers (not every program, but a dedicated
>>>> service) can communicate with Sentry using delegation tokens. Exposing
>>> the
>>>> master's kerberos credentials to such a service is not an option
>> because
>>> it
>>>> would lead to a security loophole.
>>>> 
>>>> This would be similar to what KMS offers via
>>>> https://issues.apache.org/jira/browse/HADOOP-10769.
>>>> 
>>>> 
>>>> Thanks in advance,
>>>> Bhooshan
>>> 
>>> 
>>> --
>>> Bhooshan
>> 
>> 
>> 
>> --
>> Sravya Tirukkovalur
> 
> 
> 
> -- 
> Bhooshan

Re: Delegation tokens?

Posted by Bhooshan Mogal <bh...@gmail.com>.
Thanks for the info Sravya. Are you proposing a cache in the Java Sentry
Client? That could potentially be useful.

The dedicated service is necessary primarily because programs run in YARN
as the users who submitted the programs. These users are not expected to be
whitelisted in the Sentry Service. Hence the need for this extra hop.

About the cache - individual program containers maintain a cache per
container. Cache entries individually have a TTL. Each cache is
asynchronously refreshed periodically by making calls to Sentry, currently
via master. However, with delegation tokens, we'd want to reduce this load
from the master to a dedicated service inside YARN (which is also used for
other such "system" operations during program runtime, which can't be run
as the user who submitted the program).

I'd be more than willing to blog about our use-case and integration with
Apache Sentry on the Apache blog. We're currently in the middle of testing
this integration, and will get back to you about that soon. I want to make
sure that all ends are covered before doing so. Our deadlines for this are
coming up pretty soon, so I can hopefully reach out to you about the blog
in the next couple of weeks.

In the meantime though, should we create a jira to add support for the
Sentry Service to issue delegation tokens? I have seen some examples of how
that is done in Hadoop-land over HTTP - so I'd say I could take a stab at
it, but I'm unsure of how to go about it in Sentry because Sentry uses
Thrift.


Thanks,
Bhooshan


On Wed, Aug 3, 2016 at 10:25 AM, Sravya Tirukkovalur <sr...@cloudera.com>
wrote:

> Thanks for bringing this up Bhooshan!
>
> Apache Sentry does not support delegation tokens yet. Looking at your use
> case, it seems like cache with strong (near strong) freshness guarantees is
> the key requirement here. Sentry does plan to support a way to store delta
> changes and serve these deltas through API in future. That would make it
> easier for the downstream clients who wish to cache (read only) the
> permission data to have an efficient and reliable way to keep the cache
> upto date.
>
> Having said that, curious how does your master cache the permissions today?
> And is the latency of multiple RPC in your proposed approach acceptable:
> Container -> this new service -> Sentry service using delegation token? And
> how is this approach better than just making an RPC to Sentry directly?
>
> Also, I am sure community would benefit greatly from your Sentry use case.
> Would you be interested in blogging about it on Apache blog?
>
>
> On Wed, Aug 3, 2016 at 8:49 AM, Bhooshan Mogal <bh...@gmail.com>
> wrote:
>
> > Hi Folks,
> >
> > Any thoughts?
> >
> > -
> > Bhooshan
> >
> > On Sat, Jul 30, 2016 at 8:33 AM, Bhooshan Mogal <
> bhooshan.mogal@gmail.com>
> > wrote:
> >
> > > Hi,
> > >
> > > Does the Sentry Service provide delegation tokens for processes without
> > > Kerberos credentials to communicate with it (from YARN containers).
> > >
> > >
> > > Use case: We have programs running in YARN accessing some entities on
> > whom
> > > authorization is enforced using Apache Sentry. There is a master
> process
> > > that can communicate with Sentry just fine using its Kerberos
> > credentials.
> > > We have some level of caching implemented for ACLs as well, so we don't
> > > have to hit Sentry for every authorization request. However, given that
> > > this is a security feature, the cache needs to be updated very
> > frequently.
> > > For updating this cache, going via the master every single time will
> > create
> > > a bottleneck. So we wanted to explore if there was a way if a dedicated
> > > service running in YARN containers (not every program, but a dedicated
> > > service) can communicate with Sentry using delegation tokens. Exposing
> > the
> > > master's kerberos credentials to such a service is not an option
> because
> > it
> > > would lead to a security loophole.
> > >
> > > This would be similar to what KMS offers via
> > > https://issues.apache.org/jira/browse/HADOOP-10769.
> > >
> > >
> > > Thanks in advance,
> > > Bhooshan
> > >
> > >
> >
> >
> > --
> > Bhooshan
> >
>
>
>
> --
> Sravya Tirukkovalur
>



-- 
Bhooshan

Re: Delegation tokens?

Posted by Sravya Tirukkovalur <sr...@cloudera.com>.
Thanks for bringing this up Bhooshan!

Apache Sentry does not support delegation tokens yet. Looking at your use
case, it seems like cache with strong (near strong) freshness guarantees is
the key requirement here. Sentry does plan to support a way to store delta
changes and serve these deltas through API in future. That would make it
easier for the downstream clients who wish to cache (read only) the
permission data to have an efficient and reliable way to keep the cache
upto date.

Having said that, curious how does your master cache the permissions today?
And is the latency of multiple RPC in your proposed approach acceptable:
Container -> this new service -> Sentry service using delegation token? And
how is this approach better than just making an RPC to Sentry directly?

Also, I am sure community would benefit greatly from your Sentry use case.
Would you be interested in blogging about it on Apache blog?


On Wed, Aug 3, 2016 at 8:49 AM, Bhooshan Mogal <bh...@gmail.com>
wrote:

> Hi Folks,
>
> Any thoughts?
>
> -
> Bhooshan
>
> On Sat, Jul 30, 2016 at 8:33 AM, Bhooshan Mogal <bh...@gmail.com>
> wrote:
>
> > Hi,
> >
> > Does the Sentry Service provide delegation tokens for processes without
> > Kerberos credentials to communicate with it (from YARN containers).
> >
> >
> > Use case: We have programs running in YARN accessing some entities on
> whom
> > authorization is enforced using Apache Sentry. There is a master process
> > that can communicate with Sentry just fine using its Kerberos
> credentials.
> > We have some level of caching implemented for ACLs as well, so we don't
> > have to hit Sentry for every authorization request. However, given that
> > this is a security feature, the cache needs to be updated very
> frequently.
> > For updating this cache, going via the master every single time will
> create
> > a bottleneck. So we wanted to explore if there was a way if a dedicated
> > service running in YARN containers (not every program, but a dedicated
> > service) can communicate with Sentry using delegation tokens. Exposing
> the
> > master's kerberos credentials to such a service is not an option because
> it
> > would lead to a security loophole.
> >
> > This would be similar to what KMS offers via
> > https://issues.apache.org/jira/browse/HADOOP-10769.
> >
> >
> > Thanks in advance,
> > Bhooshan
> >
> >
>
>
> --
> Bhooshan
>



-- 
Sravya Tirukkovalur