You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@ranger.apache.org by Smit Shah <sm...@zillowgroup.com> on 2020/08/26 22:59:26 UTC

Help: Tag based policy for non-Atlas solution

cc: Team Members who created Confluence wiki pages that I have referred

Hi Apache Ranger Dev Team,

I am Smit Shah, working at Zillow<https://www.zillow.com/corp/About.htm> as a Data Engineer. My team is working on Data Governance around Apache Hive. We came across Apache Ranger and one of the key feature we like is Tag Based Policies, and really interested to leverage this. :)

Now, when going through the documentation for Tag Based Policies<https://cwiki.apache.org/confluence/display/RANGER/Tag+Based+Policies>, I found that Tag Sync has native support for Apache Atlas. Now, our team already has our own tag store and trying to avoid adding another layer. So, checking with the team if there are any examples/blogs/documentation that you can share which can help to:
1. Store tags
2. How to make tag based policy work in Apache Ranger for non Apache Atlas solution

Some web-pages that I came across during my initial investigation:
1. Context enrichers<https://cwiki.apache.org/confluence/display/RANGER/Dynamic+Policy+Hooks+in+Ranger+-+Configure+and+Use> – Not sure if this is important for my use-case
2. Installing Tag Synchronizer<https://cwiki.apache.org/confluence/display/RANGER/Tag+Synchronizer+Installation+and+Configuration> – How to make this work for non-Atlas solution
3. Ranger API<https://ranger.apache.org/apidocs/index.html> – This might be needed for storing tags, like we can create service which calls this end-point which takes data from our tag store and store it in Ranger in required format.

You help/details will be really helpful to us. Sending email seemed like the best way to reach out to the team. Thank you very much in advance. :)

SMIT SHAH
SDE, Big Data
Pronouns: he/him/his
[signature_938899596]<http://www.zillow.com/>


Re: Help: Tag based policy for non-Atlas solution

Posted by Smit Shah <sm...@zillowgroup.com>.
Hi Madhan,

Thanks for confirming that other 2 solution is also feasible. This are great insights for us. :)

SMIT SHAH
SDE, Big Data
Pronouns: he/him/his
[signature_164655020]<http://www.zillow.com/>


From: Madhan Neethiraj <ma...@apache.org>
Date: Sunday, September 6, 2020 at 2:50 PM
To: Smit Shah <sm...@zillowgroup.com>, "dev@ranger.apache.org" <de...@ranger.apache.org>
Cc: "abhay@apache.org" <ab...@apache.org>, "bganesan@apache.org" <bg...@apache.org>
Subject: Re: Help: Tag based policy for non-Atlas solution

Smit,

I understand the reasoning to leverage existing Ranger tag-sync and tag-store implementation, instead of going with a custom context-enricher. While this is feasible, it will require use of internal APIs which could change in future releases. If you still want to provide an alternate source for tags, I suggest to consider extending org.apache.ranger.tagsync.model.AbstractTagSource, similar to AtlasTagSource, and register using with following configurations in ranger-tagsync-site.xml:
ranger.tagsync.source.<name-of-your-source>=true
ranger.tagsync.source.<name-of-your-source>.class=<implementation-class-name>

Hope this helps.

Madhan

From: Smit Shah <sm...@zillowgroup.com>
Date: Tuesday, September 1, 2020 at 4:06 PM
To: Madhan Neethiraj <ma...@apache.org>, "dev@ranger.apache.org" <de...@ranger.apache.org>
Cc: "abhay@apache.org" <ab...@apache.org>, "bganesan@apache.org" <bg...@apache.org>
Subject: Re: Help: Tag based policy for non-Atlas solution

Hi Madhan,

Thank you for writing back with suggestion.

I would like to get some more insights on few options and general questions based on the suggestion provided and more investigation.

Option A: The solution you suggested (it’s really helpful)
With this we will not be leveraging ranger-tagsync process and all the tag related tables (ranger.x_tag*) that Ranger maintains. I can think of two challenges to tackle for us:

  1.  For our high request demand, the end-point which retrieves tags for resource needs to be highly available, faster and handle concurrent requests.
  2.  If incase the end-point or our tag store is down, it will fail and we have to either make the resource request deny/pass-through.

Option B: Leveraging ranger-tagsync process
Similar to how Ranger listens to Atlas’s Kafka topic, we can create an Apache Kafka topic for our tag stores change notification and let ranger-tagsync process listen to it. We can skip Option A.
Many of the property name defined inside install.properties<https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcwiki.apache.org%2Fconfluence%2Fdisplay%2FRANGER%2FTag%2BSynchronizer%2BInstallation%2Band%2BConfiguration&data=02%7C01%7Csmits%40zillowgroup.com%7C2858df2fb5de45d9d78608d852aee32b%7C033464830d1840e7a5883784ac50e16f%7C0%7C0%7C637350258386448431&sdata=J2xcsWW%2BPg6%2F7%2B5QOh8ISE8R0av1F%2BPZWpbNLjqcwmM%3D&reserved=0> are specific to Atlas. So, not sure if ranger-tagsync is designed specifically for Atlas.
Can you think of any challenges here?

Option C: Storing our tags directly inside Rangers internal tag store
There are end-points<https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Franger.apache.org%2Fapidocs%2Findex.html&data=02%7C01%7Csmits%40zillowgroup.com%7C2858df2fb5de45d9d78608d852aee32b%7C033464830d1840e7a5883784ac50e16f%7C0%7C0%7C637350258386458429&sdata=sUdv888Bbcdg0PkKpvxin6SPFw%2BaghPEas187y2B73g%3D&reserved=0> provided by Ranger that we can leverage. So, instead of implementing content enricher (Option A), we can store our tags inside ranger tag-store and let Ranger work the normal way.
Can you think of any challenges here?


General question:
Does Ranger plugins also keep a cached version of the rangers internal tag-store apart from policy? Trying to see if there are benefits of putting our tag details inside rangers tag-store.


Overall, Option B seems like a better option to me if possible to implement.


SMIT SHAH
SDE, Big Data
Pronouns: he/him/his
[signature_810873024]<https://nam11.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.zillow.com%2F&data=02%7C01%7Csmits%40zillowgroup.com%7C2858df2fb5de45d9d78608d852aee32b%7C033464830d1840e7a5883784ac50e16f%7C0%7C0%7C637350258386458429&sdata=yd0VDfGQLT1mSmgrr6901wqCQ37ZbI2p8cZ1AR3K7p4%3D&reserved=0>


From: Madhan Neethiraj <ma...@apache.org>
Date: Monday, August 31, 2020 at 1:28 AM
To: Smit Shah <sm...@zillowgroup.com>, "dev@ranger.apache.org" <de...@ranger.apache.org>
Cc: "madhan@apache.org" <ma...@apache.org>, "abhay@apache.org" <ab...@apache.org>, "bganesan@apache.org" <bg...@apache.org>
Subject: Re: Help: Tag based policy for non-Atlas solution

Smit,

I suggest to consider implementing a context enricher that deals with retrieving tags from your tag store and sets tags for the resource in the request-context, with a call to RangerAccessRequestUtil.setRequestTagsInContext(context, tags). Tag service-def<https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Franger%2Fblob%2Fmaster%2Fagents-common%2Fsrc%2Fmain%2Fresources%2Fservice-defs%2Franger-servicedef-tag.json%23L55&data=02%7C01%7Csmits%40zillowgroup.com%7C2858df2fb5de45d9d78608d852aee32b%7C033464830d1840e7a5883784ac50e16f%7C0%7C0%7C637350258386468423&sdata=Iw%2FCUzgg7IUZKhgGYE0MFg5Pd2V4G4d91XziuFo%2FUCA%3D&reserved=0> should be updated to register this context enricher, instead of current enricher implementation (RangerAdminTagRetriever).

Hope this helps.

Madhan



From: Smit Shah <sm...@zillowgroup.com>
Date: Wednesday, August 26, 2020 at 3:59 PM
To: "dev@ranger.apache.org" <de...@ranger.apache.org>
Cc: "madhan@apache.org" <ma...@apache.org>, "abhay@apache.org" <ab...@apache.org>, "bganesan@apache.org" <bg...@apache.org>
Subject: Help: Tag based policy for non-Atlas solution

cc: Team Members who created Confluence wiki pages that I have referred

Hi Apache Ranger Dev Team,

I am Smit Shah, working at Zillow<https://www.zillow.com/corp/About.htm> as a Data Engineer. My team is working on Data Governance around Apache Hive. We came across Apache Ranger and one of the key feature we like is Tag Based Policies, and really interested to leverage this. :)

Now, when going through the documentation for Tag Based Policies<https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcwiki.apache.org%2Fconfluence%2Fdisplay%2FRANGER%2FTag%2BBased%2BPolicies&data=02%7C01%7Csmits%40zillowgroup.com%7C2858df2fb5de45d9d78608d852aee32b%7C033464830d1840e7a5883784ac50e16f%7C0%7C0%7C637350258386478413&sdata=0C%2F0S22j2PPuHfTvAHJ%2FiAVnyQ5iPE9fAo2EcPAnkgs%3D&reserved=0>, I found that Tag Sync has native support for Apache Atlas. Now, our team already has our own tag store and trying to avoid adding another layer. So, checking with the team if there are any examples/blogs/documentation that you can share which can help to:
1. Store tags
2. How to make tag based policy work in Apache Ranger for non Apache Atlas solution

Some web-pages that I came across during my initial investigation:
1. Context enrichers<https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcwiki.apache.org%2Fconfluence%2Fdisplay%2FRANGER%2FDynamic%2BPolicy%2BHooks%2Bin%2BRanger%2B-%2BConfigure%2Band%2BUse&data=02%7C01%7Csmits%40zillowgroup.com%7C2858df2fb5de45d9d78608d852aee32b%7C033464830d1840e7a5883784ac50e16f%7C0%7C0%7C637350258386478413&sdata=x26roG5JR0Qd5gBUC3SQhMlVyAwhxZEyi4cjm%2B61E2Q%3D&reserved=0> – Not sure if this is important for my use-case
2. Installing Tag Synchronizer<https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcwiki.apache.org%2Fconfluence%2Fdisplay%2FRANGER%2FTag%2BSynchronizer%2BInstallation%2Band%2BConfiguration&data=02%7C01%7Csmits%40zillowgroup.com%7C2858df2fb5de45d9d78608d852aee32b%7C033464830d1840e7a5883784ac50e16f%7C0%7C0%7C637350258386488412&sdata=Ow6WivzqGueM1jmTi579iCrL9JHOht%2BtF%2FY5M6eNJWA%3D&reserved=0> – How to make this work for non-Atlas solution
3. Ranger API<https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Franger.apache.org%2Fapidocs%2Findex.html&data=02%7C01%7Csmits%40zillowgroup.com%7C2858df2fb5de45d9d78608d852aee32b%7C033464830d1840e7a5883784ac50e16f%7C0%7C0%7C637350258386498410&sdata=O5HrAlCreLohvZupKxXmW80nEF9uMj2q44KvI6Eb11k%3D&reserved=0> – This might be needed for storing tags, like we can create service which calls this end-point which takes data from our tag store and store it in Ranger in required format.

You help/details will be really helpful to us. Sending email seemed like the best way to reach out to the team. Thank you very much in advance. :)

SMIT SHAH
SDE, Big Data
Pronouns: he/him/his
[signature_938899596]<https://nam11.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.zillow.com%2F&data=02%7C01%7Csmits%40zillowgroup.com%7C2858df2fb5de45d9d78608d852aee32b%7C033464830d1840e7a5883784ac50e16f%7C0%7C0%7C637350258386498410&sdata=J2%2B0yzJD7VB39XLQ0Kbth6w89ZzawPVw4A%2Fb5qIuocM%3D&reserved=0>


Re: Help: Tag based policy for non-Atlas solution

Posted by Madhan Neethiraj <ma...@apache.org>.
Smit,

 

I understand the reasoning to leverage existing Ranger tag-sync and tag-store implementation, instead of going with a custom context-enricher. While this is feasible, it will require use of internal APIs which could change in future releases. If you still want to provide an alternate source for tags, I suggest to consider extending org.apache.ranger.tagsync.model.AbstractTagSource, similar to AtlasTagSource, and register using with following configurations in ranger-tagsync-site.xml:

ranger.tagsync.source.<name-of-your-source>=true

ranger.tagsync.source.<name-of-your-source>.class=<implementation-class-name>

 

Hope this helps.

 

Madhan

 

From: Smit Shah <sm...@zillowgroup.com>
Date: Tuesday, September 1, 2020 at 4:06 PM
To: Madhan Neethiraj <ma...@apache.org>, "dev@ranger.apache.org" <de...@ranger.apache.org>
Cc: "abhay@apache.org" <ab...@apache.org>, "bganesan@apache.org" <bg...@apache.org>
Subject: Re: Help: Tag based policy for non-Atlas solution

 

Hi Madhan, 

Thank you for writing back with suggestion. 

I would like to get some more insights on few options and general questions based on the suggestion provided and more investigation.

 

Option A: The solution you suggested (it’s really helpful)
With this we will not be leveraging ranger-tagsync process and all the tag related tables (ranger.x_tag*) that Ranger maintains. I can think of two challenges to tackle for us:
For our high request demand, the end-point which retrieves tags for resource needs to be highly available, faster and handle concurrent requests. 
If incase the end-point or our tag store is down, it will fail and we have to either make the resource request deny/pass-through. 
 

Option B: Leveraging ranger-tagsync process

Similar to how Ranger listens to Atlas’s Kafka topic, we can create an Apache Kafka topic for our tag stores change notification and let ranger-tagsync process listen to it. We can skip Option A.

Many of the property name defined inside install.properties are specific to Atlas. So, not sure if ranger-tagsync is designed specifically for Atlas. 
Can you think of any challenges here? 

Option C: Storing our tags directly inside Rangers internal tag store
There are end-points provided by Ranger that we can leverage. So, instead of implementing content enricher (Option A), we can store our tags inside ranger tag-store and let Ranger work the normal way. 

Can you think of any challenges here?   




General question:

Does Ranger plugins also keep a cached version of the rangers internal tag-store apart from policy? Trying to see if there are benefits of putting our tag details inside rangers tag-store.





Overall, Option B seems like a better option to me if possible to implement. 

 

 

SMIT SHAH
SDE, Big Data
Pronouns: he/him/his
 

 

From: Madhan Neethiraj <ma...@apache.org>
Date: Monday, August 31, 2020 at 1:28 AM
To: Smit Shah <sm...@zillowgroup.com>, "dev@ranger.apache.org" <de...@ranger.apache.org>
Cc: "madhan@apache.org" <ma...@apache.org>, "abhay@apache.org" <ab...@apache.org>, "bganesan@apache.org" <bg...@apache.org>
Subject: Re: Help: Tag based policy for non-Atlas solution

 

Smit,

 

I suggest to consider implementing a context enricher that deals with retrieving tags from your tag store and sets tags for the resource in the request-context, with a call to RangerAccessRequestUtil.setRequestTagsInContext(context, tags). Tag service-def should be updated to register this context enricher, instead of current enricher implementation (RangerAdminTagRetriever).

 

Hope this helps.

 

Madhan

 

 

 

From: Smit Shah <sm...@zillowgroup.com>
Date: Wednesday, August 26, 2020 at 3:59 PM
To: "dev@ranger.apache.org" <de...@ranger.apache.org>
Cc: "madhan@apache.org" <ma...@apache.org>, "abhay@apache.org" <ab...@apache.org>, "bganesan@apache.org" <bg...@apache.org>
Subject: Help: Tag based policy for non-Atlas solution

 

cc: Team Members who created Confluence wiki pages that I have referred

 

Hi Apache Ranger Dev Team, 

I am Smit Shah, working at Zillow as a Data Engineer. My team is working on Data Governance around Apache Hive. We came across Apache Ranger and one of the key feature we like is Tag Based Policies, and really interested to leverage this. :)

Now, when going through the documentation for Tag Based Policies, I found that Tag Sync has native support for Apache Atlas. Now, our team already has our own tag store and trying to avoid adding another layer. So, checking with the team if there are any examples/blogs/documentation that you can share which can help to: 
1. Store tags
2. How to make tag based policy work in Apache Ranger for non Apache Atlas solution 

Some web-pages that I came across during my initial investigation: 
1. Context enrichers – Not sure if this is important for my use-case
2. Installing Tag Synchronizer – How to make this work for non-Atlas solution
3. Ranger API – This might be needed for storing tags, like we can create service which calls this end-point which takes data from our tag store and store it in Ranger in required format. 


You help/details will be really helpful to us. Sending email seemed like the best way to reach out to the team. Thank you very much in advance. :)

 

SMIT SHAH
SDE, Big Data
Pronouns: he/him/his
 


Re: Help: Tag based policy for non-Atlas solution

Posted by Madhan Neethiraj <ma...@apache.org>.
Smit,

 

I suggest to consider implementing a context enricher that deals with retrieving tags from your tag store and sets tags for the resource in the request-context, with a call to RangerAccessRequestUtil.setRequestTagsInContext(context, tags). Tag service-def should be updated to register this context enricher, instead of current enricher implementation (RangerAdminTagRetriever).

 

Hope this helps.

 

Madhan

 

 

 

From: Smit Shah <sm...@zillowgroup.com>
Date: Wednesday, August 26, 2020 at 3:59 PM
To: "dev@ranger.apache.org" <de...@ranger.apache.org>
Cc: "madhan@apache.org" <ma...@apache.org>, "abhay@apache.org" <ab...@apache.org>, "bganesan@apache.org" <bg...@apache.org>
Subject: Help: Tag based policy for non-Atlas solution

 

cc: Team Members who created Confluence wiki pages that I have referred

 

Hi Apache Ranger Dev Team, 

I am Smit Shah, working at Zillow as a Data Engineer. My team is working on Data Governance around Apache Hive. We came across Apache Ranger and one of the key feature we like is Tag Based Policies, and really interested to leverage this. :)

Now, when going through the documentation for Tag Based Policies, I found that Tag Sync has native support for Apache Atlas. Now, our team already has our own tag store and trying to avoid adding another layer. So, checking with the team if there are any examples/blogs/documentation that you can share which can help to: 
1. Store tags
2. How to make tag based policy work in Apache Ranger for non Apache Atlas solution 

Some web-pages that I came across during my initial investigation: 
1. Context enrichers – Not sure if this is important for my use-case
2. Installing Tag Synchronizer – How to make this work for non-Atlas solution
3. Ranger API – This might be needed for storing tags, like we can create service which calls this end-point which takes data from our tag store and store it in Ranger in required format. 


You help/details will be really helpful to us. Sending email seemed like the best way to reach out to the team. Thank you very much in advance. :)

 

SMIT SHAH
SDE, Big Data
Pronouns: he/him/his