You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@ranger.apache.org by Lars Francke <la...@gmail.com> on 2020/01/23 12:49:08 UTC

Ranger policies best practices

Hi,

I'm wondering what the best practices for policies in Ranger are?
With Deny policies I'm not sure anymore.

The way I understand it I now need to

* add a ALLOW <group> policy
* add a DENY public group
* add a DENY EXCLUDE <group> policy

so that  I can allow access for people from the <group>. Those would be
three rules for one ALLOW.

We can disable the HDFS fallback but it's global.
What I had assumed so far (wrongly) is that as soon as there is a policy
that matches a resource it is authoritative i.e. if this policy doesn't
allow access it'll not fall through and deny.

Is there anything I misunderstood and/or what are the best practices for
policies in Ranger these days?

I know this Wiki page (<
https://cwiki.apache.org/confluence/display/RANGER/How+Deny+Policies+Work+in+Apache+Ranger>)
but that misses just those corner cases.

I assume (from my experience with customers) that quite a few people are
actually using Ranger wrong if my understanding is correct.

Thanks for your help!

Cheers,
Lars

Re: Ranger policies best practices

Posted by Don Bosco Durai <bo...@apache.org>.
If we only have ALLOW that does not mean DENY for people that have not been explicitly allowed, it means NOT_SPECIFIED (or similar is what it's called in the code) and the HDFS ACLs are checked.
 

You are correct. This is by design so you can do chaining of authorization plugins. When the plugin doesn’t have explicit DENY or ALLOW, then it go to the next plugin for evaluation. In the case of HDFS and YARN, we fall back to native policies. YARN has a global switch to turn this off. HDFS is more tricky. In some cases it will be too much of a work to manage the policies in Ranger. E.g. policies in /tmp folder, service folders, etc. 

 

The JIRA Madhan mentioned would be a good to solve some specific use cases. Like the way you have setup (3 policies).

 

I feel, in the long run we should have something similar to Security Zone (or an option in SecurityZone itself), where we should identify certain resources. E.g. /user, /hive/warehouse, /data_folders, etc. (or inverse) to be managed exclusively by Ranger with no fall back. In that way, without Ranger policies the users won’t get access to resource. This might be a cleaner approach.

 

Bosco

 

 

 

 

From: Lars Francke <la...@gmail.com>
Reply-To: <us...@ranger.apache.org>
Date: Friday, January 24, 2020 at 12:31 AM
To: <us...@ranger.apache.org>
Subject: Re: Ranger policies best practices

 

Madhan,

 

thank you for the pointer. That looks promising! We'll try to get Ranger 2 running to evaluate.

 

Cheers,

Lars

 

On Fri, Jan 24, 2020 at 9:03 AM Madhan Neethiraj <ma...@apache.org> wrote:

Lars,

 

The enhancement in RANGER-2507 introduced the notion of “DenyAllElse”, which denies access to specified resources unless explicitly allowed by the policy. This should help address your usecase. Please review.

 

Madhan

 

 

From: Lars Francke <la...@gmail.com>
Reply-To: "user@ranger.apache.org" <us...@ranger.apache.org>
Date: Thursday, January 23, 2020 at 11:43 PM
To: "user@ranger.apache.org" <us...@ranger.apache.org>
Subject: Re: Ranger policies best practices

 

Hi Bosco and thanks for the quick response!

 

Ranger policy definitions have evolved over time to address more complex use cases. Can you come with some real world use cases? We can try to come policies for them.

 

Relatively simple:

* If we have a policy for a resource (talking about HDFS) then we want to ALLOW only based on the Ranger policy and _not_ fall back on HDFS

* If we do not have a policy for a resource we want the fallback

 

At high level, here are key points;

 
Deny policy anywhere (tag/resource level) trumps. Exception would be conditional policies in Ranger 2.0
Allow policy is needed for providing access to resource. Allow policies are processed after all DENY policies are processed.
 

In the flow you gave, you only need “ALLOW” policy.

* add a ALLOW <group> policy

* add a DENY public group

* add a DENY EXCLUDE <group> policy

 

I believe that's not correct but would be happy to be wrong myself ;-)

But I think this was due to my earlier mail not being clear on what our requirements are (see above).

 

If we only have ALLOW that does not mean DENY for people that have not been explicitly allowed, it means NOT_SPECIFIED (or similar is what it's called in the code) and the HDFS ACLs are checked.

So to prevent HDFS checking we need the DENY "public" group but because that is checked before ALLOW we _also_ need DENY EXCLUDE.

 

To sum it up: We want the fallback to HDFS be configurable not just globally but per policy and until yesterday I always assumed this was already the case.

 

One example for DENY will be:

Your company is hosting interns over the summer and they will be doing some machine learning projects. The interns will need access to your dataset, but your company policy doesn’t allow them to view PII data. However, there is one intern name Julia as an exception and could access PII data.

 
Tag based policy: “DENY” all resources tagged as “PII” for group “INTERN”
Exclude user “Julia”
Now for PII resources you want Julia to access, you give “ALLOW” access to user “julia”
 

Note, Exclude from DENY doesn’t mean the user will get the permission. There should be explicit ALLOW for the excluded user/group to access the resource.

 

Cheers,

Lars

 

 

 

Bosco

 

 

From: Lars Francke <la...@gmail.com>
Reply-To: <us...@ranger.apache.org>
Date: Thursday, January 23, 2020 at 4:49 AM
To: <us...@ranger.apache.org>
Subject: Ranger policies best practices

 

Hi,

 

I'm wondering what the best practices for policies in Ranger are?

With Deny policies I'm not sure anymore.

 

The way I understand it I now need to

 

* add a ALLOW <group> policy

* add a DENY public group

* add a DENY EXCLUDE <group> policy

 

so that  I can allow access for people from the <group>. Those would be three rules for one ALLOW.

 

We can disable the HDFS fallback but it's global.

What I had assumed so far (wrongly) is that as soon as there is a policy that matches a resource it is authoritative i.e. if this policy doesn't allow access it'll not fall through and deny.

 

Is there anything I misunderstood and/or what are the best practices for policies in Ranger these days?

 

I know this Wiki page (<https://cwiki.apache.org/confluence/display/RANGER/How+Deny+Policies+Work+in+Apache+Ranger>) but that misses just those corner cases.

 

I assume (from my experience with customers) that quite a few people are actually using Ranger wrong if my understanding is correct.

 

Thanks for your help!

 

Cheers,

Lars


Re: Ranger policies best practices

Posted by Lars Francke <la...@gmail.com>.
Madhan,

thank you for the pointer. That looks promising! We'll try to get Ranger 2
running to evaluate.

Cheers,
Lars

On Fri, Jan 24, 2020 at 9:03 AM Madhan Neethiraj <ma...@apache.org> wrote:

> Lars,
>
>
>
> The enhancement in RANGER-2507
> <https://issues.apache.org/jira/browse/RANGER-2507> introduced the notion
> of “DenyAllElse”, which denies access to specified resources unless
> explicitly allowed by the policy. This should help address your usecase.
> Please review.
>
>
>
> Madhan
>
>
>
>
>
> *From: *Lars Francke <la...@gmail.com>
> *Reply-To: *"user@ranger.apache.org" <us...@ranger.apache.org>
> *Date: *Thursday, January 23, 2020 at 11:43 PM
> *To: *"user@ranger.apache.org" <us...@ranger.apache.org>
> *Subject: *Re: Ranger policies best practices
>
>
>
> Hi Bosco and thanks for the quick response!
>
>
>
> Ranger policy definitions have evolved over time to address more complex
> use cases. Can you come with some real world use cases? We can try to come
> policies for them.
>
>
>
> Relatively simple:
>
> * If we have a policy for a resource (talking about HDFS) then we want to
> ALLOW only based on the Ranger policy and _not_ fall back on HDFS
>
> * If we do not have a policy for a resource we want the fallback
>
>
>
> At high level, here are key points;
>
>
>
>    - Deny policy anywhere (tag/resource level) trumps. Exception would be
>    conditional policies in Ranger 2.0
>    - Allow policy is needed for providing access to resource. Allow
>    policies are processed after all DENY policies are processed.
>
>
>
> In the flow you gave, you only need “ALLOW” policy.
>
> * add a ALLOW <group> policy
>
> * add a DENY public group
>
> * add a DENY EXCLUDE <group> policy
>
>
>
> I believe that's not correct but would be happy to be wrong myself ;-)
>
> But I think this was due to my earlier mail not being clear on what our
> requirements are (see above).
>
>
>
> If we only have ALLOW that does not mean DENY for people that have not
> been explicitly allowed, it means NOT_SPECIFIED (or similar is what it's
> called in the code) and the HDFS ACLs are checked.
>
> So to prevent HDFS checking we need the DENY "public" group but because
> that is checked before ALLOW we _also_ need DENY EXCLUDE.
>
>
>
> To sum it up: We want the fallback to HDFS be configurable not just
> globally but per policy and until yesterday I always assumed this was
> already the case.
>
>
>
> One example for DENY will be:
>
> Your company is hosting interns over the summer and they will be doing
> some machine learning projects. The interns will need access to your
> dataset, but your company policy doesn’t allow them to view PII data.
> However, there is one intern name Julia as an exception and could access
> PII data.
>
>
>
>    - Tag based policy: “DENY” all resources tagged as “PII” for group
>    “INTERN”
>    - Exclude user “Julia”
>    - Now for PII resources you want Julia to access, you give “ALLOW”
>    access to user “julia”
>
>
>
> Note, Exclude from DENY doesn’t mean the user will get the permission.
> There should be explicit ALLOW for the excluded user/group to access the
> resource.
>
>
>
> Cheers,
>
> Lars
>
>
>
>
>
>
>
> Bosco
>
>
>
>
>
> *From: *Lars Francke <la...@gmail.com>
> *Reply-To: *<us...@ranger.apache.org>
> *Date: *Thursday, January 23, 2020 at 4:49 AM
> *To: *<us...@ranger.apache.org>
> *Subject: *Ranger policies best practices
>
>
>
> Hi,
>
>
>
> I'm wondering what the best practices for policies in Ranger are?
>
> With Deny policies I'm not sure anymore.
>
>
>
> The way I understand it I now need to
>
>
>
> * add a ALLOW <group> policy
>
> * add a DENY public group
>
> * add a DENY EXCLUDE <group> policy
>
>
>
> so that  I can allow access for people from the <group>. Those would be
> three rules for one ALLOW.
>
>
>
> We can disable the HDFS fallback but it's global.
>
> What I had assumed so far (wrongly) is that as soon as there is a policy
> that matches a resource it is authoritative i.e. if this policy doesn't
> allow access it'll not fall through and deny.
>
>
>
> Is there anything I misunderstood and/or what are the best practices for
> policies in Ranger these days?
>
>
>
> I know this Wiki page (<
> https://cwiki.apache.org/confluence/display/RANGER/How+Deny+Policies+Work+in+Apache+Ranger>)
> but that misses just those corner cases.
>
>
>
> I assume (from my experience with customers) that quite a few people are
> actually using Ranger wrong if my understanding is correct.
>
>
>
> Thanks for your help!
>
>
>
> Cheers,
>
> Lars
>
>

Re: Ranger policies best practices

Posted by Madhan Neethiraj <ma...@apache.org>.
Lars,

 

The enhancement in RANGER-2507 introduced the notion of “DenyAllElse”, which denies access to specified resources unless explicitly allowed by the policy. This should help address your usecase. Please review.

 

Madhan

 

 

From: Lars Francke <la...@gmail.com>
Reply-To: "user@ranger.apache.org" <us...@ranger.apache.org>
Date: Thursday, January 23, 2020 at 11:43 PM
To: "user@ranger.apache.org" <us...@ranger.apache.org>
Subject: Re: Ranger policies best practices

 

Hi Bosco and thanks for the quick response!

 

Ranger policy definitions have evolved over time to address more complex use cases. Can you come with some real world use cases? We can try to come policies for them.

 

Relatively simple:

* If we have a policy for a resource (talking about HDFS) then we want to ALLOW only based on the Ranger policy and _not_ fall back on HDFS

* If we do not have a policy for a resource we want the fallback

 

At high level, here are key points;

 
Deny policy anywhere (tag/resource level) trumps. Exception would be conditional policies in Ranger 2.0
Allow policy is needed for providing access to resource. Allow policies are processed after all DENY policies are processed.
 

In the flow you gave, you only need “ALLOW” policy.

* add a ALLOW <group> policy

* add a DENY public group

* add a DENY EXCLUDE <group> policy

 

I believe that's not correct but would be happy to be wrong myself ;-)

But I think this was due to my earlier mail not being clear on what our requirements are (see above).

 

If we only have ALLOW that does not mean DENY for people that have not been explicitly allowed, it means NOT_SPECIFIED (or similar is what it's called in the code) and the HDFS ACLs are checked.

So to prevent HDFS checking we need the DENY "public" group but because that is checked before ALLOW we _also_ need DENY EXCLUDE.

 

To sum it up: We want the fallback to HDFS be configurable not just globally but per policy and until yesterday I always assumed this was already the case.

 

One example for DENY will be:

Your company is hosting interns over the summer and they will be doing some machine learning projects. The interns will need access to your dataset, but your company policy doesn’t allow them to view PII data. However, there is one intern name Julia as an exception and could access PII data.

 
Tag based policy: “DENY” all resources tagged as “PII” for group “INTERN”
Exclude user “Julia”
Now for PII resources you want Julia to access, you give “ALLOW” access to user “julia”
 

Note, Exclude from DENY doesn’t mean the user will get the permission. There should be explicit ALLOW for the excluded user/group to access the resource.

 

Cheers,

Lars

 

 

 

Bosco

 

 

From: Lars Francke <la...@gmail.com>
Reply-To: <us...@ranger.apache.org>
Date: Thursday, January 23, 2020 at 4:49 AM
To: <us...@ranger.apache.org>
Subject: Ranger policies best practices

 

Hi,

 

I'm wondering what the best practices for policies in Ranger are?

With Deny policies I'm not sure anymore.

 

The way I understand it I now need to

 

* add a ALLOW <group> policy

* add a DENY public group

* add a DENY EXCLUDE <group> policy

 

so that  I can allow access for people from the <group>. Those would be three rules for one ALLOW.

 

We can disable the HDFS fallback but it's global.

What I had assumed so far (wrongly) is that as soon as there is a policy that matches a resource it is authoritative i.e. if this policy doesn't allow access it'll not fall through and deny.

 

Is there anything I misunderstood and/or what are the best practices for policies in Ranger these days?

 

I know this Wiki page (<https://cwiki.apache.org/confluence/display/RANGER/How+Deny+Policies+Work+in+Apache+Ranger>) but that misses just those corner cases.

 

I assume (from my experience with customers) that quite a few people are actually using Ranger wrong if my understanding is correct.

 

Thanks for your help!

 

Cheers,

Lars


Re: Ranger policies best practices

Posted by Lars Francke <la...@gmail.com>.
Hi Bosco and thanks for the quick response!

Ranger policy definitions have evolved over time to address more complex
> use cases. Can you come with some real world use cases? We can try to come
> policies for them.
>

Relatively simple:
* If we have a policy for a resource (talking about HDFS) then we want to
ALLOW only based on the Ranger policy and _not_ fall back on HDFS
* If we do not have a policy for a resource we want the fallback


> At high level, here are key points;
>
>
>
>    - Deny policy anywhere (tag/resource level) trumps. Exception would be
>    conditional policies in Ranger 2.0
>    - Allow policy is needed for providing access to resource. Allow
>    policies are processed after all DENY policies are processed.
>
>
>
> In the flow you gave, you only need “ALLOW” policy.
>
> * add a ALLOW <group> policy
>
> * add a DENY public group
>
> * add a DENY EXCLUDE <group> policy
>

I believe that's not correct but would be happy to be wrong myself ;-)
But I think this was due to my earlier mail not being clear on what our
requirements are (see above).

If we only have ALLOW that does not mean DENY for people that have not been
explicitly allowed, it means NOT_SPECIFIED (or similar is what it's called
in the code) and the HDFS ACLs are checked.
So to prevent HDFS checking we need the DENY "public" group but because
that is checked before ALLOW we _also_ need DENY EXCLUDE.

To sum it up: We want the fallback to HDFS be configurable not just
globally but per policy and until yesterday I always assumed this was
already the case.

One example for DENY will be:
>
> Your company is hosting interns over the summer and they will be doing
> some machine learning projects. The interns will need access to your
> dataset, but your company policy doesn’t allow them to view PII data.
> However, there is one intern name Julia as an exception and could access
> PII data.
>
>
>
>    - Tag based policy: “DENY” all resources tagged as “PII” for group
>    “INTERN”
>    - Exclude user “Julia”
>    - Now for PII resources you want Julia to access, you give “ALLOW”
>    access to user “julia”
>
>
>
> Note, Exclude from DENY doesn’t mean the user will get the permission.
> There should be explicit ALLOW for the excluded user/group to access the
> resource.
>

Cheers,
Lars


>
>
>
>
> Bosco
>
>
>
>
>
> *From: *Lars Francke <la...@gmail.com>
> *Reply-To: *<us...@ranger.apache.org>
> *Date: *Thursday, January 23, 2020 at 4:49 AM
> *To: *<us...@ranger.apache.org>
> *Subject: *Ranger policies best practices
>
>
>
> Hi,
>
>
>
> I'm wondering what the best practices for policies in Ranger are?
>
> With Deny policies I'm not sure anymore.
>
>
>
> The way I understand it I now need to
>
>
>
> * add a ALLOW <group> policy
>
> * add a DENY public group
>
> * add a DENY EXCLUDE <group> policy
>
>
>
> so that  I can allow access for people from the <group>. Those would be
> three rules for one ALLOW.
>
>
>
> We can disable the HDFS fallback but it's global.
>
> What I had assumed so far (wrongly) is that as soon as there is a policy
> that matches a resource it is authoritative i.e. if this policy doesn't
> allow access it'll not fall through and deny.
>
>
>
> Is there anything I misunderstood and/or what are the best practices for
> policies in Ranger these days?
>
>
>
> I know this Wiki page (<
> https://cwiki.apache.org/confluence/display/RANGER/How+Deny+Policies+Work+in+Apache+Ranger>)
> but that misses just those corner cases.
>
>
>
> I assume (from my experience with customers) that quite a few people are
> actually using Ranger wrong if my understanding is correct.
>
>
>
> Thanks for your help!
>
>
>
> Cheers,
>
> Lars
>

Re: Ranger policies best practices

Posted by Don Bosco Durai <bo...@apache.org>.
Hi Lars

 

Ranger policy definitions have evolved over time to address more complex use cases. Can you come with some real world use cases? We can try to come policies for them.

 

At high level, here are key points;

 
Deny policy anywhere (tag/resource level) trumps. Exception would be conditional policies in Ranger 2.0
Allow policy is needed for providing access to resource. Allow policies are processed after all DENY policies are processed.
 

In the flow you gave, you only need “ALLOW” policy.

* add a ALLOW <group> policy

* add a DENY public group

* add a DENY EXCLUDE <group> policy

 

One example for DENY will be:

Your company is hosting interns over the summer and they will be doing some machine learning projects. The interns will need access to your dataset, but your company policy doesn’t allow them to view PII data. However, there is one intern name Julia as an exception and could access PII data.

 
Tag based policy: “DENY” all resources tagged as “PII” for group “INTERN”
Exclude user “Julia”
Now for PII resources you want Julia to access, you give “ALLOW” access to user “julia”
 

Note, Exclude from DENY doesn’t mean the user will get the permission. There should be explicit ALLOW for the excluded user/group to access the resource.

 

 

Bosco

 

 

From: Lars Francke <la...@gmail.com>
Reply-To: <us...@ranger.apache.org>
Date: Thursday, January 23, 2020 at 4:49 AM
To: <us...@ranger.apache.org>
Subject: Ranger policies best practices

 

Hi,

 

I'm wondering what the best practices for policies in Ranger are?

With Deny policies I'm not sure anymore.

 

The way I understand it I now need to

 

* add a ALLOW <group> policy

* add a DENY public group

* add a DENY EXCLUDE <group> policy

 

so that  I can allow access for people from the <group>. Those would be three rules for one ALLOW.

 

We can disable the HDFS fallback but it's global.

What I had assumed so far (wrongly) is that as soon as there is a policy that matches a resource it is authoritative i.e. if this policy doesn't allow access it'll not fall through and deny.

 

Is there anything I misunderstood and/or what are the best practices for policies in Ranger these days?

 

I know this Wiki page (<https://cwiki.apache.org/confluence/display/RANGER/How+Deny+Policies+Work+in+Apache+Ranger>) but that misses just those corner cases.

 

I assume (from my experience with customers) that quite a few people are actually using Ranger wrong if my understanding is correct.

 

Thanks for your help!

 

Cheers,

Lars