You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@ranger.apache.org by Bolke de Bruin <bd...@gmail.com> on 2018/12/04 14:02:12 UTC

Allow clients to supply tag information

Hi All,

Ranger assumes that clients are tag unaware. So the Tag Enricher is dependent on a resource to tag mapping supplied externally by for example Apache Atlas. We found out that having tags available in Ranger can have a prohibitive delay. For example, data arrives at the platform and is being tagged programatically in Apache Atlas. Atlas then puts the data on Kafka and Ranger picks it up. The client (or another) needs to refresh its policies before the tagging info becomes available for evaluation. Typically, this can be too slow. Kafka introduces a lag and the policy refresh also introduces a lag (tested).

If the client is tag aware and it could supply this information to the plugin policy evaluation could continue. I have created https://issues.apache.org/jira/browse/RANGER-2302 <https://issues.apache.org/jira/browse/RANGER-2302> to track this. I also have created an initial patch. The patch allows a client to set the special “RangerTagEnricher.KEY_CLIENT_TAGS” as a value in the access request. This will then be picked up by the Tag Enricher. Currently, client supplied tags overwrite the system supplied tags. The reason for this is that the client might have more recent information. Most likely this will need to be checked against the “updated” field in the tag itself, bit that wasn't readily available.

I am looking for feedback to see if we can have this in. Or are there other ways to solve this?

Cheers
Bolke

Re: Allow clients to supply tag information

Posted by Bolke de Bruin <bd...@gmail.com>.

Hi Don (apologies for the earlier misspelling),

I was a bit off in my analysis, I mixed client and plugin. getTags can definitely work, but I don’t think we should move everything to the “advanced” side of the plugin. In other words I think the TagEnricher should still do its thing.

I’m going to be AFK for awhile, but I’ll pick this after that.

B.

> On 7 Dec 2018, at 20:12, Bolke de Bruin <bd...@gmail.com> wrote:
> 
> Hi Dan,
> 
> Thanks for think along. Answers inline again.
> 
> B.
> 
> Verstuurd vanaf mijn iPad
> 
>> Op 7 dec. 2018 om 19:42 heeft Don Bosco Durai <bo...@apache.org> het volgende geschreven:
>> 
>> Hi Bolke
>> 
>> Thanks for the suggestion and contribution.
>> 
>> I am trying to understand your approach. Can you add your patch in Review Board. It will be easy to see the changes you have done visually.
> 
> Will do after the fixing the selection of which tag version takes precedence (see below).
> 
>> 
>> I have a few questions and design suggestions.
>> 1. When you say "Client", are you referring to "Plugin Implementation Code" or "End User" (similar to Accumulo model)
> 
> Plugin. I don’t trust the user :-). If at least by plugin you mean agent inside for example Hive.
> 
>> 2. If you meant "Plugin" or "Custom Plugin", then I feel it is a good suggestion and we should support it. It is end-user, then it is a longer discussion
> 
> No long discussion then :-) 
> 
>> 3. Based on the discussion so far and reviewing the code change at high level, it seems you are extending at the Tag Enricher level. Alternatively, would it me more design friendly to provide a method/API in the plugin interface to return or override the tags. E.g.  getTags( request ). Custom plugins can override this method and alter the Tags to be returned. This might be more isolated and cleaner implementation, so the Plugin writers can only focus on their Plugin implementation.
> 
> 
> That could work. Taking that a bit further I was always thinking that ‘context’ was something to be supplied by the client. It caught me by surprise that a client (i.e. plugin) can actually set no additional context at all. So instead of a getTags I suggest a “updateContext” with name spaced keys. I think that is more future proof. However it won’t be backwards compatible (this is why I choose to implement it as it is now). So when you upgrade Ranger you would need to update all your clients at once. Adding getTags would do the same. What do you think?
> 
> I’m now working out which tag takes precedence, e.g. a system supplied one or a client supplied one. I will use the updatedTime field for this. This would need somewhat more complexity than I would expect a custom plugin to handle. 
> 
> 
>> 4. If needed, for advanced users, we can provide an interface or API to implement their own Tag Sync. Which could be in addition to Atlas/Kafka or exclusive to their environment or Meta Store.
> 
> I suggest making it possible to have multiple syncs and be able to set the order in which they should be evaluated and a hierarchy which one can overwrite the other. But this is for later.
> 
>> 
>> Thanks
>> 
>> Bosco
>> 
>> 
>> 
>> On 12/5/18, 11:59 AM, "Bolke de Bruin" <bd...@gmail.com> wrote:
>> 
>>   Hi Abhay,
>> 
>>   Also answers inline.
>> 
>>   B.
>> 
>>   Verstuurd vanaf mijn iPad
>> 
>>> Op 5 dec. 2018 om 20:25 heeft Abhay Kulkarni <ak...@hortonworks.com> het volgende geschreven:
>>> 
>>> Hi Bolke,
>>> 
>>> My comments inline.
>>> 
>>> Thanks,
>>> -Abhay
>>> 
>>>> On 12/4/18, 1:07 PM, "Bolke de Bruin" <bd...@gmail.com> wrote:
>>>> 
>>>> Hi Abhay,
>>>> 
>>>> Good point on #1 will take that into account if possible (can a enricher
>>>> call audit events?).
>>>> 
>>>> On #2 yes, otherwise the resource matcher will stop working. Maybe proper
>>>> namespacing is the way to go here. Implementing it this way ensures
>>>> backwards compatibility. On a broader thought, I think Ranger is lacking
>>>> here. Context could also be provided by the client and there is no real
>>>> clean way of doing this at the moment.
>>> 
>>> Abhay> I will need to take a look to figure out why resource matcher will
>>> not work. However, instead of implementing a new API (removeValue()), is
>>> it possible to use setValue() API to set KEY_CLIENT_TAG entry to null?
>> 
>>   I don’t think that is possible. The resource matcher checks for elements and setting it to null means it is present which means the signature still doesn’t match.
>> 
>>>> 
>>>> Question should client tags only apply to SELF, or also
>>>> SELF_OR_DESCENDENT and ANCESTOR? I wasn’t sure here.
>>> 
>>> Abhay> I don’t see any issue, at this time, to apply client-tags when
>>> match-type is SELF, SELF_OR_DESCENDENT or ANCESTOR.
>> 
>>   This means a client tag will match against all of them at any time. The client isn’t aware of match-types. Correct?
>> 
>>>> 
>>>> Second question (a bit unrelated): how scaleable is the tagsync approach?
>>>> If we have millions of tagged files and sources they all end up being
>>>> registered in Ranger this could easily grow exponentially. Besides
>>>> getting outdated? The other approach could be to have this handled in the
>>>> client (pickup info from TagSource - ie. Atlas and supply this to the
>>>> policy engine).
>>> 
>>> Abhay> I see that there is some lag involved. But, overall, the
>>> architecture allows for tag-based policies (really ABAC way of
>>> authorization) to be applied across all components uniformly. Having
>>> ranger-admin as a central repository of policies and tags, and components
>>> as simply clients downloading these artifacts has many more advantages
>>> than each component having to do all the work by itself. Also, any Kafka
>>> delay will also be an issue even when components directly received tags
>>> from Atlas without ranger-admin mediating tag transfer. Moreover, there
>>> are several optimizations possible (such as incremental download of tags -
>>> not implemented yet) which can speed up tag downloads significantly. With
>>> a large number of tags, surely, the size of ranger-admin tag tables will
>>> increase, but IMO, it is a fair trade-off considering all other advantages
>>> this architecture provides us. Also, it will be useful to know the order
>>> of magnitude of delay you experienced (other than possibly up to 1 minute
>>> delay because of the interval between tag downloads).
>> 
>>   The one minute is already too much for us. The example I gave happens within a few milliseconds so basically any delay is not acceptable.
>> 
>>   To me it seems architecturally incorrect to have Ranger to be a source for tags as that is  Atlas (or some other). Ranger is duplicating things here rather than sticking to what it is good at: policies.  Clients are already downloading tags, doing that from Atlas instead of Ranger is not adding a lot of complexity and can be handled in the plugin transparently. But that is just my opinion. 
>> 
>>   Maybe there is a possibility to accept client tags as a temporary in Ranger that can then be overwritten by the Tag Store (ie. Atlas). Just thinking out loud.
>> 
>>>> 
>>>> Cheers
>>>> Bolke
>>>> 
>>>> 
>>>> Verstuurd vanaf mijn iPad
>>>> 
>>>>> Op 4 dec. 2018 om 21:51 heeft Abhay Kulkarni
>>>>> <ak...@hortonworks.com> het volgende geschreven:
>>>>> 
>>>>> Hi Bolke, 
>>>>> 
>>>>> This looks like a good addition to tag-based authorization in Ranger. I
>>>>> will review the patch separately. However, here are a few thoughts.
>>>>> 
>>>>> 1. If the client component is tag-aware and client-supplied tags
>>>>> overwrite
>>>>> admin-supplied tags, audit needs to record this very clearly. This will
>>>>> avoid any potential confusion about why the authorization decision was
>>>>> different only for a certain (or certain type) of component.
>>>>> 
>>>>> 2. Do the client-supplied tags have to be removed from the
>>>>> access-request?
>>>>> 
>>>>> Thanks,
>>>>> -Abhay
>>>>> 
>>>>>> On 12/4/18, 6:02 AM, "Bolke de Bruin" <bd...@gmail.com> wrote:
>>>>>> 
>>>>>> Hi All,
>>>>>> 
>>>>>> Ranger assumes that clients are tag unaware. So the Tag Enricher is
>>>>>> dependent on a resource to tag mapping supplied externally by for
>>>>>> example
>>>>>> Apache Atlas. We found out that having tags available in Ranger can
>>>>>> have
>>>>>> a prohibitive delay. For example, data arrives at the platform and is
>>>>>> being tagged programatically in Apache Atlas. Atlas then puts the data
>>>>>> on
>>>>>> Kafka and Ranger picks it up. The client (or another) needs to refresh
>>>>>> its policies before the tagging info becomes available for evaluation.
>>>>>> Typically, this can be too slow. Kafka introduces a lag and the policy
>>>>>> refresh also introduces a lag (tested).
>>>>>> 
>>>>>> If the client is tag aware and it could supply this information to the
>>>>>> plugin policy evaluation could continue. I have created
>>>>>> https://issues.apache.org/jira/browse/RANGER-2302
>>>>>> <https://issues.apache.org/jira/browse/RANGER-2302> to track this. I
>>>>>> also
>>>>>> have created an initial patch. The patch allows a client to set the
>>>>>> special ³RangerTagEnricher.KEY_CLIENT_TAGS² as a value in the access
>>>>>> request. This will then be picked up by the Tag Enricher. Currently,
>>>>>> client supplied tags overwrite the system supplied tags. The reason for
>>>>>> this is that the client might have more recent information. Most likely
>>>>>> this will need to be checked against the ³updated² field in the tag
>>>>>> itself, bit that wasn't readily available.
>>>>>> 
>>>>>> I am looking for feedback to see if we can have this in. Or are there
>>>>>> other ways to solve this?
>>>>>> 
>>>>>> Cheers
>>>>>> Bolke
>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>> 
>> 
>> 
>>

Re: Allow clients to supply tag information

Posted by Bolke de Bruin <bd...@gmail.com>.

Hi Dan,

Thanks for think along. Answers inline again.

B.

Verstuurd vanaf mijn iPad

> Op 7 dec. 2018 om 19:42 heeft Don Bosco Durai <bo...@apache.org> het volgende geschreven:
> 
> Hi Bolke
> 
> Thanks for the suggestion and contribution.
> 
> I am trying to understand your approach. Can you add your patch in Review Board. It will be easy to see the changes you have done visually.

Will do after the fixing the selection of which tag version takes precedence (see below).

> 
> I have a few questions and design suggestions.
> 1. When you say "Client", are you referring to "Plugin Implementation Code" or "End User" (similar to Accumulo model)

Plugin. I don’t trust the user :-). If at least by plugin you mean agent inside for example Hive.

> 2. If you meant "Plugin" or "Custom Plugin", then I feel it is a good suggestion and we should support it. It is end-user, then it is a longer discussion

No long discussion then :-) 

> 3. Based on the discussion so far and reviewing the code change at high level, it seems you are extending at the Tag Enricher level. Alternatively, would it me more design friendly to provide a method/API in the plugin interface to return or override the tags. E.g.  getTags( request ). Custom plugins can override this method and alter the Tags to be returned. This might be more isolated and cleaner implementation, so the Plugin writers can only focus on their Plugin implementation.


That could work. Taking that a bit further I was always thinking that ‘context’ was something to be supplied by the client. It caught me by surprise that a client (i.e. plugin) can actually set no additional context at all. So instead of a getTags I suggest a “updateContext” with name spaced keys. I think that is more future proof. However it won’t be backwards compatible (this is why I choose to implement it as it is now). So when you upgrade Ranger you would need to update all your clients at once. Adding getTags would do the same. What do you think?

I’m now working out which tag takes precedence, e.g. a system supplied one or a client supplied one. I will use the updatedTime field for this. This would need somewhat more complexity than I would expect a custom plugin to handle. 


> 4. If needed, for advanced users, we can provide an interface or API to implement their own Tag Sync. Which could be in addition to Atlas/Kafka or exclusive to their environment or Meta Store.

I suggest making it possible to have multiple syncs and be able to set the order in which they should be evaluated and a hierarchy which one can overwrite the other. But this is for later.

> 
> Thanks
> 
> Bosco
> 
> 
> 
> On 12/5/18, 11:59 AM, "Bolke de Bruin" <bd...@gmail.com> wrote:
> 
>    Hi Abhay,
> 
>    Also answers inline.
> 
>    B.
> 
>    Verstuurd vanaf mijn iPad
> 
>> Op 5 dec. 2018 om 20:25 heeft Abhay Kulkarni <ak...@hortonworks.com> het volgende geschreven:
>> 
>> Hi Bolke,
>> 
>> My comments inline.
>> 
>> Thanks,
>> -Abhay
>> 
>>> On 12/4/18, 1:07 PM, "Bolke de Bruin" <bd...@gmail.com> wrote:
>>> 
>>> Hi Abhay,
>>> 
>>> Good point on #1 will take that into account if possible (can a enricher
>>> call audit events?).
>>> 
>>> On #2 yes, otherwise the resource matcher will stop working. Maybe proper
>>> namespacing is the way to go here. Implementing it this way ensures
>>> backwards compatibility. On a broader thought, I think Ranger is lacking
>>> here. Context could also be provided by the client and there is no real
>>> clean way of doing this at the moment.
>> 
>> Abhay> I will need to take a look to figure out why resource matcher will
>> not work. However, instead of implementing a new API (removeValue()), is
>> it possible to use setValue() API to set KEY_CLIENT_TAG entry to null?
> 
>    I don’t think that is possible. The resource matcher checks for elements and setting it to null means it is present which means the signature still doesn’t match.
> 
>>> 
>>> Question should client tags only apply to SELF, or also
>>> SELF_OR_DESCENDENT and ANCESTOR? I wasn’t sure here.
>> 
>> Abhay> I don’t see any issue, at this time, to apply client-tags when
>> match-type is SELF, SELF_OR_DESCENDENT or ANCESTOR.
> 
>    This means a client tag will match against all of them at any time. The client isn’t aware of match-types. Correct?
> 
>>> 
>>> Second question (a bit unrelated): how scaleable is the tagsync approach?
>>> If we have millions of tagged files and sources they all end up being
>>> registered in Ranger this could easily grow exponentially. Besides
>>> getting outdated? The other approach could be to have this handled in the
>>> client (pickup info from TagSource - ie. Atlas and supply this to the
>>> policy engine).
>> 
>> Abhay> I see that there is some lag involved. But, overall, the
>> architecture allows for tag-based policies (really ABAC way of
>> authorization) to be applied across all components uniformly. Having
>> ranger-admin as a central repository of policies and tags, and components
>> as simply clients downloading these artifacts has many more advantages
>> than each component having to do all the work by itself. Also, any Kafka
>> delay will also be an issue even when components directly received tags
>> from Atlas without ranger-admin mediating tag transfer. Moreover, there
>> are several optimizations possible (such as incremental download of tags -
>> not implemented yet) which can speed up tag downloads significantly. With
>> a large number of tags, surely, the size of ranger-admin tag tables will
>> increase, but IMO, it is a fair trade-off considering all other advantages
>> this architecture provides us. Also, it will be useful to know the order
>> of magnitude of delay you experienced (other than possibly up to 1 minute
>> delay because of the interval between tag downloads).
> 
>    The one minute is already too much for us. The example I gave happens within a few milliseconds so basically any delay is not acceptable.
> 
>    To me it seems architecturally incorrect to have Ranger to be a source for tags as that is  Atlas (or some other). Ranger is duplicating things here rather than sticking to what it is good at: policies.  Clients are already downloading tags, doing that from Atlas instead of Ranger is not adding a lot of complexity and can be handled in the plugin transparently. But that is just my opinion. 
> 
>    Maybe there is a possibility to accept client tags as a temporary in Ranger that can then be overwritten by the Tag Store (ie. Atlas). Just thinking out loud.
> 
>>> 
>>> Cheers
>>> Bolke
>>> 
>>> 
>>> Verstuurd vanaf mijn iPad
>>> 
>>>> Op 4 dec. 2018 om 21:51 heeft Abhay Kulkarni
>>>> <ak...@hortonworks.com> het volgende geschreven:
>>>> 
>>>> Hi Bolke, 
>>>> 
>>>> This looks like a good addition to tag-based authorization in Ranger. I
>>>> will review the patch separately. However, here are a few thoughts.
>>>> 
>>>> 1. If the client component is tag-aware and client-supplied tags
>>>> overwrite
>>>> admin-supplied tags, audit needs to record this very clearly. This will
>>>> avoid any potential confusion about why the authorization decision was
>>>> different only for a certain (or certain type) of component.
>>>> 
>>>> 2. Do the client-supplied tags have to be removed from the
>>>> access-request?
>>>> 
>>>> Thanks,
>>>> -Abhay
>>>> 
>>>>> On 12/4/18, 6:02 AM, "Bolke de Bruin" <bd...@gmail.com> wrote:
>>>>> 
>>>>> Hi All,
>>>>> 
>>>>> Ranger assumes that clients are tag unaware. So the Tag Enricher is
>>>>> dependent on a resource to tag mapping supplied externally by for
>>>>> example
>>>>> Apache Atlas. We found out that having tags available in Ranger can
>>>>> have
>>>>> a prohibitive delay. For example, data arrives at the platform and is
>>>>> being tagged programatically in Apache Atlas. Atlas then puts the data
>>>>> on
>>>>> Kafka and Ranger picks it up. The client (or another) needs to refresh
>>>>> its policies before the tagging info becomes available for evaluation.
>>>>> Typically, this can be too slow. Kafka introduces a lag and the policy
>>>>> refresh also introduces a lag (tested).
>>>>> 
>>>>> If the client is tag aware and it could supply this information to the
>>>>> plugin policy evaluation could continue. I have created
>>>>> https://issues.apache.org/jira/browse/RANGER-2302
>>>>> <https://issues.apache.org/jira/browse/RANGER-2302> to track this. I
>>>>> also
>>>>> have created an initial patch. The patch allows a client to set the
>>>>> special ³RangerTagEnricher.KEY_CLIENT_TAGS² as a value in the access
>>>>> request. This will then be picked up by the Tag Enricher. Currently,
>>>>> client supplied tags overwrite the system supplied tags. The reason for
>>>>> this is that the client might have more recent information. Most likely
>>>>> this will need to be checked against the ³updated² field in the tag
>>>>> itself, bit that wasn't readily available.
>>>>> 
>>>>> I am looking for feedback to see if we can have this in. Or are there
>>>>> other ways to solve this?
>>>>> 
>>>>> Cheers
>>>>> Bolke
>>>>> 
>>>>> 
>>>> 
>>> 
>> 
> 
> 
>

Re: Allow clients to supply tag information

Posted by Don Bosco Durai <bo...@apache.org>.

Hi Bolke

Thanks for the suggestion and contribution.

I am trying to understand your approach. Can you add your patch in Review Board. It will be easy to see the changes you have done visually.

I have a few questions and design suggestions.
1. When you say "Client", are you referring to "Plugin Implementation Code" or "End User" (similar to Accumulo model)
2. If you meant "Plugin" or "Custom Plugin", then I feel it is a good suggestion and we should support it. It is end-user, then it is a longer discussion
3. Based on the discussion so far and reviewing the code change at high level, it seems you are extending at the Tag Enricher level. Alternatively, would it me more design friendly to provide a method/API in the plugin interface to return or override the tags. E.g.  getTags( request ). Custom plugins can override this method and alter the Tags to be returned. This might be more isolated and cleaner implementation, so the Plugin writers can only focus on their Plugin implementation.
4. If needed, for advanced users, we can provide an interface or API to implement their own Tag Sync. Which could be in addition to Atlas/Kafka or exclusive to their environment or Meta Store.

Thanks

Bosco



On 12/5/18, 11:59 AM, "Bolke de Bruin" <bd...@gmail.com> wrote:

    Hi Abhay,
    
    Also answers inline.
    
    B.
    
    Verstuurd vanaf mijn iPad
    
    > Op 5 dec. 2018 om 20:25 heeft Abhay Kulkarni <ak...@hortonworks.com> het volgende geschreven:
    > 
    > Hi Bolke,
    > 
    > My comments inline.
    > 
    > Thanks,
    > -Abhay
    > 
    >> On 12/4/18, 1:07 PM, "Bolke de Bruin" <bd...@gmail.com> wrote:
    >> 
    >> Hi Abhay,
    >> 
    >> Good point on #1 will take that into account if possible (can a enricher
    >> call audit events?).
    >> 
    >> On #2 yes, otherwise the resource matcher will stop working. Maybe proper
    >> namespacing is the way to go here. Implementing it this way ensures
    >> backwards compatibility. On a broader thought, I think Ranger is lacking
    >> here. Context could also be provided by the client and there is no real
    >> clean way of doing this at the moment.
    > 
    > Abhay> I will need to take a look to figure out why resource matcher will
    > not work. However, instead of implementing a new API (removeValue()), is
    > it possible to use setValue() API to set KEY_CLIENT_TAG entry to null?
    
    I don’t think that is possible. The resource matcher checks for elements and setting it to null means it is present which means the signature still doesn’t match.
    
    >> 
    >> Question should client tags only apply to SELF, or also
    >> SELF_OR_DESCENDENT and ANCESTOR? I wasn’t sure here.
    > 
    > Abhay> I don’t see any issue, at this time, to apply client-tags when
    > match-type is SELF, SELF_OR_DESCENDENT or ANCESTOR.
    
    This means a client tag will match against all of them at any time. The client isn’t aware of match-types. Correct?
    
    >> 
    >> Second question (a bit unrelated): how scaleable is the tagsync approach?
    >> If we have millions of tagged files and sources they all end up being
    >> registered in Ranger this could easily grow exponentially. Besides
    >> getting outdated? The other approach could be to have this handled in the
    >> client (pickup info from TagSource - ie. Atlas and supply this to the
    >> policy engine).
    > 
    > Abhay> I see that there is some lag involved. But, overall, the
    > architecture allows for tag-based policies (really ABAC way of
    > authorization) to be applied across all components uniformly. Having
    > ranger-admin as a central repository of policies and tags, and components
    > as simply clients downloading these artifacts has many more advantages
    > than each component having to do all the work by itself. Also, any Kafka
    > delay will also be an issue even when components directly received tags
    > from Atlas without ranger-admin mediating tag transfer. Moreover, there
    > are several optimizations possible (such as incremental download of tags -
    > not implemented yet) which can speed up tag downloads significantly. With
    > a large number of tags, surely, the size of ranger-admin tag tables will
    > increase, but IMO, it is a fair trade-off considering all other advantages
    > this architecture provides us. Also, it will be useful to know the order
    > of magnitude of delay you experienced (other than possibly up to 1 minute
    > delay because of the interval between tag downloads).
    
    The one minute is already too much for us. The example I gave happens within a few milliseconds so basically any delay is not acceptable.
    
    To me it seems architecturally incorrect to have Ranger to be a source for tags as that is  Atlas (or some other). Ranger is duplicating things here rather than sticking to what it is good at: policies.  Clients are already downloading tags, doing that from Atlas instead of Ranger is not adding a lot of complexity and can be handled in the plugin transparently. But that is just my opinion. 
    
    Maybe there is a possibility to accept client tags as a temporary in Ranger that can then be overwritten by the Tag Store (ie. Atlas). Just thinking out loud.
    
    >> 
    >> Cheers
    >> Bolke
    >> 
    >> 
    >> Verstuurd vanaf mijn iPad
    >> 
    >>> Op 4 dec. 2018 om 21:51 heeft Abhay Kulkarni
    >>> <ak...@hortonworks.com> het volgende geschreven:
    >>> 
    >>> Hi Bolke, 
    >>> 
    >>> This looks like a good addition to tag-based authorization in Ranger. I
    >>> will review the patch separately. However, here are a few thoughts.
    >>> 
    >>> 1. If the client component is tag-aware and client-supplied tags
    >>> overwrite
    >>> admin-supplied tags, audit needs to record this very clearly. This will
    >>> avoid any potential confusion about why the authorization decision was
    >>> different only for a certain (or certain type) of component.
    >>> 
    >>> 2. Do the client-supplied tags have to be removed from the
    >>> access-request?
    >>> 
    >>> Thanks,
    >>> -Abhay
    >>> 
    >>>> On 12/4/18, 6:02 AM, "Bolke de Bruin" <bd...@gmail.com> wrote:
    >>>> 
    >>>> Hi All,
    >>>> 
    >>>> Ranger assumes that clients are tag unaware. So the Tag Enricher is
    >>>> dependent on a resource to tag mapping supplied externally by for
    >>>> example
    >>>> Apache Atlas. We found out that having tags available in Ranger can
    >>>> have
    >>>> a prohibitive delay. For example, data arrives at the platform and is
    >>>> being tagged programatically in Apache Atlas. Atlas then puts the data
    >>>> on
    >>>> Kafka and Ranger picks it up. The client (or another) needs to refresh
    >>>> its policies before the tagging info becomes available for evaluation.
    >>>> Typically, this can be too slow. Kafka introduces a lag and the policy
    >>>> refresh also introduces a lag (tested).
    >>>> 
    >>>> If the client is tag aware and it could supply this information to the
    >>>> plugin policy evaluation could continue. I have created
    >>>> https://issues.apache.org/jira/browse/RANGER-2302
    >>>> <https://issues.apache.org/jira/browse/RANGER-2302> to track this. I
    >>>> also
    >>>> have created an initial patch. The patch allows a client to set the
    >>>> special ³RangerTagEnricher.KEY_CLIENT_TAGS² as a value in the access
    >>>> request. This will then be picked up by the Tag Enricher. Currently,
    >>>> client supplied tags overwrite the system supplied tags. The reason for
    >>>> this is that the client might have more recent information. Most likely
    >>>> this will need to be checked against the ³updated² field in the tag
    >>>> itself, bit that wasn't readily available.
    >>>> 
    >>>> I am looking for feedback to see if we can have this in. Or are there
    >>>> other ways to solve this?
    >>>> 
    >>>> Cheers
    >>>> Bolke
    >>>> 
    >>>> 
    >>> 
    >> 
    >

Re: Allow clients to supply tag information

Posted by Bolke de Bruin <bd...@gmail.com>.

Hi Abhay,

Also answers inline.

B.

Verstuurd vanaf mijn iPad

> Op 5 dec. 2018 om 20:25 heeft Abhay Kulkarni <ak...@hortonworks.com> het volgende geschreven:
> 
> Hi Bolke,
> 
> My comments inline.
> 
> Thanks,
> -Abhay
> 
>> On 12/4/18, 1:07 PM, "Bolke de Bruin" <bd...@gmail.com> wrote:
>> 
>> Hi Abhay,
>> 
>> Good point on #1 will take that into account if possible (can a enricher
>> call audit events?).
>> 
>> On #2 yes, otherwise the resource matcher will stop working. Maybe proper
>> namespacing is the way to go here. Implementing it this way ensures
>> backwards compatibility. On a broader thought, I think Ranger is lacking
>> here. Context could also be provided by the client and there is no real
>> clean way of doing this at the moment.
> 
> Abhay> I will need to take a look to figure out why resource matcher will
> not work. However, instead of implementing a new API (removeValue()), is
> it possible to use setValue() API to set KEY_CLIENT_TAG entry to null?

I don’t think that is possible. The resource matcher checks for elements and setting it to null means it is present which means the signature still doesn’t match.

>> 
>> Question should client tags only apply to SELF, or also
>> SELF_OR_DESCENDENT and ANCESTOR? I wasn’t sure here.
> 
> Abhay> I don’t see any issue, at this time, to apply client-tags when
> match-type is SELF, SELF_OR_DESCENDENT or ANCESTOR.

This means a client tag will match against all of them at any time. The client isn’t aware of match-types. Correct?

>> 
>> Second question (a bit unrelated): how scaleable is the tagsync approach?
>> If we have millions of tagged files and sources they all end up being
>> registered in Ranger this could easily grow exponentially. Besides
>> getting outdated? The other approach could be to have this handled in the
>> client (pickup info from TagSource - ie. Atlas and supply this to the
>> policy engine).
> 
> Abhay> I see that there is some lag involved. But, overall, the
> architecture allows for tag-based policies (really ABAC way of
> authorization) to be applied across all components uniformly. Having
> ranger-admin as a central repository of policies and tags, and components
> as simply clients downloading these artifacts has many more advantages
> than each component having to do all the work by itself. Also, any Kafka
> delay will also be an issue even when components directly received tags
> from Atlas without ranger-admin mediating tag transfer. Moreover, there
> are several optimizations possible (such as incremental download of tags -
> not implemented yet) which can speed up tag downloads significantly. With
> a large number of tags, surely, the size of ranger-admin tag tables will
> increase, but IMO, it is a fair trade-off considering all other advantages
> this architecture provides us. Also, it will be useful to know the order
> of magnitude of delay you experienced (other than possibly up to 1 minute
> delay because of the interval between tag downloads).

The one minute is already too much for us. The example I gave happens within a few milliseconds so basically any delay is not acceptable.

To me it seems architecturally incorrect to have Ranger to be a source for tags as that is  Atlas (or some other). Ranger is duplicating things here rather than sticking to what it is good at: policies.  Clients are already downloading tags, doing that from Atlas instead of Ranger is not adding a lot of complexity and can be handled in the plugin transparently. But that is just my opinion. 

Maybe there is a possibility to accept client tags as a temporary in Ranger that can then be overwritten by the Tag Store (ie. Atlas). Just thinking out loud.

>> 
>> Cheers
>> Bolke
>> 
>> 
>> Verstuurd vanaf mijn iPad
>> 
>>> Op 4 dec. 2018 om 21:51 heeft Abhay Kulkarni
>>> <ak...@hortonworks.com> het volgende geschreven:
>>> 
>>> Hi Bolke, 
>>> 
>>> This looks like a good addition to tag-based authorization in Ranger. I
>>> will review the patch separately. However, here are a few thoughts.
>>> 
>>> 1. If the client component is tag-aware and client-supplied tags
>>> overwrite
>>> admin-supplied tags, audit needs to record this very clearly. This will
>>> avoid any potential confusion about why the authorization decision was
>>> different only for a certain (or certain type) of component.
>>> 
>>> 2. Do the client-supplied tags have to be removed from the
>>> access-request?
>>> 
>>> Thanks,
>>> -Abhay
>>> 
>>>> On 12/4/18, 6:02 AM, "Bolke de Bruin" <bd...@gmail.com> wrote:
>>>> 
>>>> Hi All,
>>>> 
>>>> Ranger assumes that clients are tag unaware. So the Tag Enricher is
>>>> dependent on a resource to tag mapping supplied externally by for
>>>> example
>>>> Apache Atlas. We found out that having tags available in Ranger can
>>>> have
>>>> a prohibitive delay. For example, data arrives at the platform and is
>>>> being tagged programatically in Apache Atlas. Atlas then puts the data
>>>> on
>>>> Kafka and Ranger picks it up. The client (or another) needs to refresh
>>>> its policies before the tagging info becomes available for evaluation.
>>>> Typically, this can be too slow. Kafka introduces a lag and the policy
>>>> refresh also introduces a lag (tested).
>>>> 
>>>> If the client is tag aware and it could supply this information to the
>>>> plugin policy evaluation could continue. I have created
>>>> https://issues.apache.org/jira/browse/RANGER-2302
>>>> <https://issues.apache.org/jira/browse/RANGER-2302> to track this. I
>>>> also
>>>> have created an initial patch. The patch allows a client to set the
>>>> special ³RangerTagEnricher.KEY_CLIENT_TAGS² as a value in the access
>>>> request. This will then be picked up by the Tag Enricher. Currently,
>>>> client supplied tags overwrite the system supplied tags. The reason for
>>>> this is that the client might have more recent information. Most likely
>>>> this will need to be checked against the ³updated² field in the tag
>>>> itself, bit that wasn't readily available.
>>>> 
>>>> I am looking for feedback to see if we can have this in. Or are there
>>>> other ways to solve this?
>>>> 
>>>> Cheers
>>>> Bolke
>>>> 
>>>> 
>>> 
>> 
>

Re: Allow clients to supply tag information

Posted by Abhay Kulkarni <ak...@hortonworks.com>.

Hi Bolke,

My comments inline.

Thanks,
-Abhay

On 12/4/18, 1:07 PM, "Bolke de Bruin" <bd...@gmail.com> wrote:

>Hi Abhay,
>
>Good point on #1 will take that into account if possible (can a enricher
>call audit events?).
>
>On #2 yes, otherwise the resource matcher will stop working. Maybe proper
>namespacing is the way to go here. Implementing it this way ensures
>backwards compatibility. On a broader thought, I think Ranger is lacking
>here. Context could also be provided by the client and there is no real
>clean way of doing this at the moment.

Abhay> I will need to take a look to figure out why resource matcher will
not work. However, instead of implementing a new API (removeValue()), is
it possible to use setValue() API to set KEY_CLIENT_TAG entry to null?
>
>Question should client tags only apply to SELF, or also
>SELF_OR_DESCENDENT and ANCESTOR? I wasn’t sure here.

Abhay> I don’t see any issue, at this time, to apply client-tags when
match-type is SELF, SELF_OR_DESCENDENT or ANCESTOR.
>
>Second question (a bit unrelated): how scaleable is the tagsync approach?
>If we have millions of tagged files and sources they all end up being
>registered in Ranger this could easily grow exponentially. Besides
>getting outdated? The other approach could be to have this handled in the
>client (pickup info from TagSource - ie. Atlas and supply this to the
>policy engine).

Abhay> I see that there is some lag involved. But, overall, the
architecture allows for tag-based policies (really ABAC way of
authorization) to be applied across all components uniformly. Having
ranger-admin as a central repository of policies and tags, and components
as simply clients downloading these artifacts has many more advantages
than each component having to do all the work by itself. Also, any Kafka
delay will also be an issue even when components directly received tags
from Atlas without ranger-admin mediating tag transfer. Moreover, there
are several optimizations possible (such as incremental download of tags -
not implemented yet) which can speed up tag downloads significantly. With
a large number of tags, surely, the size of ranger-admin tag tables will
increase, but IMO, it is a fair trade-off considering all other advantages
this architecture provides us. Also, it will be useful to know the order
of magnitude of delay you experienced (other than possibly up to 1 minute
delay because of the interval between tag downloads).
>
>Cheers
>Bolke
>
>
>Verstuurd vanaf mijn iPad
>
>> Op 4 dec. 2018 om 21:51 heeft Abhay Kulkarni
>><ak...@hortonworks.com> het volgende geschreven:
>> 
>> Hi Bolke, 
>> 
>> This looks like a good addition to tag-based authorization in Ranger. I
>> will review the patch separately. However, here are a few thoughts.
>> 
>> 1. If the client component is tag-aware and client-supplied tags
>>overwrite
>> admin-supplied tags, audit needs to record this very clearly. This will
>> avoid any potential confusion about why the authorization decision was
>> different only for a certain (or certain type) of component.
>> 
>> 2. Do the client-supplied tags have to be removed from the
>>access-request?
>> 
>> Thanks,
>> -Abhay
>> 
>>> On 12/4/18, 6:02 AM, "Bolke de Bruin" <bd...@gmail.com> wrote:
>>> 
>>> Hi All,
>>> 
>>> Ranger assumes that clients are tag unaware. So the Tag Enricher is
>>> dependent on a resource to tag mapping supplied externally by for
>>>example
>>> Apache Atlas. We found out that having tags available in Ranger can
>>>have
>>> a prohibitive delay. For example, data arrives at the platform and is
>>> being tagged programatically in Apache Atlas. Atlas then puts the data
>>>on
>>> Kafka and Ranger picks it up. The client (or another) needs to refresh
>>> its policies before the tagging info becomes available for evaluation.
>>> Typically, this can be too slow. Kafka introduces a lag and the policy
>>> refresh also introduces a lag (tested).
>>> 
>>> If the client is tag aware and it could supply this information to the
>>> plugin policy evaluation could continue. I have created
>>> https://issues.apache.org/jira/browse/RANGER-2302
>>> <https://issues.apache.org/jira/browse/RANGER-2302> to track this. I
>>>also
>>> have created an initial patch. The patch allows a client to set the
>>> special ³RangerTagEnricher.KEY_CLIENT_TAGS² as a value in the access
>>> request. This will then be picked up by the Tag Enricher. Currently,
>>> client supplied tags overwrite the system supplied tags. The reason for
>>> this is that the client might have more recent information. Most likely
>>> this will need to be checked against the ³updated² field in the tag
>>> itself, bit that wasn't readily available.
>>> 
>>> I am looking for feedback to see if we can have this in. Or are there
>>> other ways to solve this?
>>> 
>>> Cheers
>>> Bolke
>>> 
>>> 
>> 
>

Re: Allow clients to supply tag information

Posted by Bolke de Bruin <bd...@gmail.com>.

Hi Abhay,

Good point on #1 will take that into account if possible (can a enricher call audit events?).

On #2 yes, otherwise the resource matcher will stop working. Maybe proper namespacing is the way to go here. Implementing it this way ensures backwards compatibility. On a broader thought, I think Ranger is lacking here. Context could also be provided by the client and there is no real clean way of doing this at the moment.

Question should client tags only apply to SELF, or also SELF_OR_DESCENDENT and ANCESTOR? I wasn’t sure here.

Second question (a bit unrelated): how scaleable is the tagsync approach? If we have millions of tagged files and sources they all end up being registered in Ranger this could easily grow exponentially. Besides getting outdated? The other approach could be to have this handled in the client (pickup info from TagSource - ie. Atlas and supply this to the policy engine).

Cheers
Bolke


Verstuurd vanaf mijn iPad

> Op 4 dec. 2018 om 21:51 heeft Abhay Kulkarni <ak...@hortonworks.com> het volgende geschreven:
> 
> Hi Bolke, 
> 
> This looks like a good addition to tag-based authorization in Ranger. I
> will review the patch separately. However, here are a few thoughts.
> 
> 1. If the client component is tag-aware and client-supplied tags overwrite
> admin-supplied tags, audit needs to record this very clearly. This will
> avoid any potential confusion about why the authorization decision was
> different only for a certain (or certain type) of component.
> 
> 2. Do the client-supplied tags have to be removed from the access-request?
> 
> Thanks,
> -Abhay
> 
>> On 12/4/18, 6:02 AM, "Bolke de Bruin" <bd...@gmail.com> wrote:
>> 
>> Hi All,
>> 
>> Ranger assumes that clients are tag unaware. So the Tag Enricher is
>> dependent on a resource to tag mapping supplied externally by for example
>> Apache Atlas. We found out that having tags available in Ranger can have
>> a prohibitive delay. For example, data arrives at the platform and is
>> being tagged programatically in Apache Atlas. Atlas then puts the data on
>> Kafka and Ranger picks it up. The client (or another) needs to refresh
>> its policies before the tagging info becomes available for evaluation.
>> Typically, this can be too slow. Kafka introduces a lag and the policy
>> refresh also introduces a lag (tested).
>> 
>> If the client is tag aware and it could supply this information to the
>> plugin policy evaluation could continue. I have created
>> https://issues.apache.org/jira/browse/RANGER-2302
>> <https://issues.apache.org/jira/browse/RANGER-2302> to track this. I also
>> have created an initial patch. The patch allows a client to set the
>> special ³RangerTagEnricher.KEY_CLIENT_TAGS² as a value in the access
>> request. This will then be picked up by the Tag Enricher. Currently,
>> client supplied tags overwrite the system supplied tags. The reason for
>> this is that the client might have more recent information. Most likely
>> this will need to be checked against the ³updated² field in the tag
>> itself, bit that wasn't readily available.
>> 
>> I am looking for feedback to see if we can have this in. Or are there
>> other ways to solve this?
>> 
>> Cheers
>> Bolke
>> 
>> 
>

Re: Allow clients to supply tag information

Posted by Abhay Kulkarni <ak...@hortonworks.com>.

Hi Bolke, 

This looks like a good addition to tag-based authorization in Ranger. I
will review the patch separately. However, here are a few thoughts.

1. If the client component is tag-aware and client-supplied tags overwrite
admin-supplied tags, audit needs to record this very clearly. This will
avoid any potential confusion about why the authorization decision was
different only for a certain (or certain type) of component.

2. Do the client-supplied tags have to be removed from the access-request?

Thanks,
-Abhay

On 12/4/18, 6:02 AM, "Bolke de Bruin" <bd...@gmail.com> wrote:

>Hi All,
>
>Ranger assumes that clients are tag unaware. So the Tag Enricher is
>dependent on a resource to tag mapping supplied externally by for example
>Apache Atlas. We found out that having tags available in Ranger can have
>a prohibitive delay. For example, data arrives at the platform and is
>being tagged programatically in Apache Atlas. Atlas then puts the data on
>Kafka and Ranger picks it up. The client (or another) needs to refresh
>its policies before the tagging info becomes available for evaluation.
>Typically, this can be too slow. Kafka introduces a lag and the policy
>refresh also introduces a lag (tested).
>
>If the client is tag aware and it could supply this information to the
>plugin policy evaluation could continue. I have created
>https://issues.apache.org/jira/browse/RANGER-2302
><https://issues.apache.org/jira/browse/RANGER-2302> to track this. I also
>have created an initial patch. The patch allows a client to set the
>special ³RangerTagEnricher.KEY_CLIENT_TAGS² as a value in the access
>request. This will then be picked up by the Tag Enricher. Currently,
>client supplied tags overwrite the system supplied tags. The reason for
>this is that the client might have more recent information. Most likely
>this will need to be checked against the ³updated² field in the tag
>itself, bit that wasn't readily available.
>
>I am looking for feedback to see if we can have this in. Or are there
>other ways to solve this?
>
>Cheers
>Bolke
>
>