You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@sentry.apache.org by Colin Ma <ju...@intel.com> on 2014/12/08 13:14:10 UTC

Review Request 28800: SENTRY-565:Improvement the performance when Sentry filter the entity

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/28800/
-----------------------------------------------------------

Review request for sentry.


Repository: sentry


Description
-------

Currently, when get the metadata from hive, eg, "show tables", "show databases". Sentry will filter the result and output the authorized entities. There will be many RPC calls when filtering the result. The related code is in HiveAuthzBinding, for example, in filterShowTables:

......
for (String tableName : queryResult) {
  ......
  hiveAuthzBinding.authorize(operation, tableMetaDataPrivilege, subject, inputHierarchy,
            outputHierarchy, providedPrivileges);
  ......
}
......

hiveAuthzBinding.authorize will get the privileges from sentry service, if there are many tables in the hive, the filtering process will spend much time. Considering sentry also need to filter the column, HiveAuthzBinding should be improved to reduce the number of rpc calls when doing the filter.


Diffs
-----

  sentry-binding/sentry-binding-hive/src/main/java/org/apache/sentry/binding/hive/HiveAuthzBindingHook.java 97ef3b8 
  sentry-binding/sentry-binding-hive/src/main/java/org/apache/sentry/binding/hive/HiveAuthzBindingPreExecHook.java 813200a 
  sentry-binding/sentry-binding-hive/src/main/java/org/apache/sentry/binding/hive/authz/HiveAuthzBinding.java b4b69e1 
  sentry-binding/sentry-binding-hive/src/main/java/org/apache/sentry/binding/metastore/MetastoreAuthzBinding.java e7f96c1 
  sentry-binding/sentry-binding-hive/src/test/java/org/apache/sentry/binding/hive/TestHiveAuthzBindings.java d41f6cf 
  sentry-provider/sentry-provider-common/src/main/java/org/apache/sentry/provider/common/AuthorizationProvider.java a88d2f8 
  sentry-provider/sentry-provider-common/src/main/java/org/apache/sentry/provider/common/NoAuthorizationProvider.java a814527 
  sentry-provider/sentry-provider-common/src/main/java/org/apache/sentry/provider/common/ResourceAuthorizationProvider.java 6449405 

Diff: https://reviews.apache.org/r/28800/diff/


Testing
-------


Thanks,

Colin Ma


Re: Review Request 28800: SENTRY-565:Improvement the performance when Sentry filter the entity

Posted by Lenni Kuff <ls...@cloudera.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/28800/#review101398
-----------------------------------------------------------



sentry-binding/sentry-binding-hive/src/main/java/org/apache/sentry/binding/hive/HiveAuthzBindingHook.java (line 668)
<https://reviews.apache.org/r/28800/#comment158808>

    Can you add a brief comment on this method and explain the return value?
    
    Maybe rename to getActivePrivilegesForUser()?



sentry-provider/sentry-provider-common/src/main/java/org/apache/sentry/provider/common/AuthorizationProvider.java (line 67)
<https://reviews.apache.org/r/28800/#comment158810>

    This is kind of a strange API to me. The caller of this class shouldn't really care about what the privileges are and why would they should rely on Sentry as the source of truth.
    I don't know if we want to add this to our main AuthorizationProvider class. 
    
    One thing to also look at is the PrivilegeCache / SimpleCacheProviderBackend classes. They are not quite what you are looking for, but do allows you to inject privileges. In your case you could implement a version of that class that just calls getPrivilegesForUser()...



sentry-provider/sentry-provider-common/src/main/java/org/apache/sentry/provider/common/ResourceAuthorizationProvider.java (line 149)
<https://reviews.apache.org/r/28800/#comment158809>

    I feel like the previous API was much more structured. You have just encoded this structure into strings, which is going to be harder to debug.
    I also don't understand why you would call getPrivileges and pass set of privileges.
    
    Think a bit more about the naming and see if you can find a way to avoid the convertion to strings .


- Lenni Kuff


On Dec. 8, 2014, 12:14 p.m., Colin Ma wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/28800/
> -----------------------------------------------------------
> 
> (Updated Dec. 8, 2014, 12:14 p.m.)
> 
> 
> Review request for sentry.
> 
> 
> Repository: sentry
> 
> 
> Description
> -------
> 
> Currently, when get the metadata from hive, eg, "show tables", "show databases". Sentry will filter the result and output the authorized entities. There will be many RPC calls when filtering the result. The related code is in HiveAuthzBinding, for example, in filterShowTables:
> 
> ......
> for (String tableName : queryResult) {
>   ......
>   hiveAuthzBinding.authorize(operation, tableMetaDataPrivilege, subject, inputHierarchy,
>             outputHierarchy, providedPrivileges);
>   ......
> }
> ......
> 
> hiveAuthzBinding.authorize will get the privileges from sentry service, if there are many tables in the hive, the filtering process will spend much time. Considering sentry also need to filter the column, HiveAuthzBinding should be improved to reduce the number of rpc calls when doing the filter.
> 
> 
> Diffs
> -----
> 
>   sentry-binding/sentry-binding-hive/src/main/java/org/apache/sentry/binding/hive/HiveAuthzBindingHook.java 97ef3b8 
>   sentry-binding/sentry-binding-hive/src/main/java/org/apache/sentry/binding/hive/HiveAuthzBindingPreExecHook.java 813200a 
>   sentry-binding/sentry-binding-hive/src/main/java/org/apache/sentry/binding/hive/authz/HiveAuthzBinding.java b4b69e1 
>   sentry-binding/sentry-binding-hive/src/main/java/org/apache/sentry/binding/metastore/MetastoreAuthzBinding.java e7f96c1 
>   sentry-binding/sentry-binding-hive/src/test/java/org/apache/sentry/binding/hive/TestHiveAuthzBindings.java d41f6cf 
>   sentry-provider/sentry-provider-common/src/main/java/org/apache/sentry/provider/common/AuthorizationProvider.java a88d2f8 
>   sentry-provider/sentry-provider-common/src/main/java/org/apache/sentry/provider/common/NoAuthorizationProvider.java a814527 
>   sentry-provider/sentry-provider-common/src/main/java/org/apache/sentry/provider/common/ResourceAuthorizationProvider.java 6449405 
> 
> Diff: https://reviews.apache.org/r/28800/diff/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Colin Ma
> 
>


Re: Review Request 28800: SENTRY-565:Improvement the performance when Sentry filter the entity

Posted by Colin Ma <ju...@intel.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/28800/
-----------------------------------------------------------

(Updated Oct. 28, 2015, 7:42 a.m.)


Review request for sentry and Lenni Kuff.


Repository: sentry


Description
-------

Currently, when get the metadata from hive, eg, "show tables", "show databases". Sentry will filter the result and output the authorized entities. There will be many RPC calls when filtering the result. The related code is in HiveAuthzBinding, for example, in filterShowTables:

......
for (String tableName : queryResult) {
  ......
  hiveAuthzBinding.authorize(operation, tableMetaDataPrivilege, subject, inputHierarchy,
            outputHierarchy, providedPrivileges);
  ......
}
......

hiveAuthzBinding.authorize will get the privileges from sentry service, if there are many tables in the hive, the filtering process will spend much time. Considering sentry also need to filter the column, HiveAuthzBinding should be improved to reduce the number of rpc calls when doing the filter.


Diffs (updated)
-----

  sentry-provider/sentry-provider-cache/pom.xml c67f094 
  sentry-provider/sentry-provider-cache/src/main/java/org/apache/sentry/provider/cache/CachedPrivilegeWrap.java PRE-CREATION 
  sentry-provider/sentry-provider-cache/src/main/java/org/apache/sentry/provider/cache/DBPrivilegeCache.java PRE-CREATION 
  sentry-provider/sentry-provider-cache/src/main/java/org/apache/sentry/provider/cache/SimpleCacheProviderBackend.java 4b98447 
  sentry-provider/sentry-provider-cache/src/test/java/org/apache/sentry/provider/cache/TestCacheProvider.java e5b29b8 
  sentry-provider/sentry-provider-db/src/gen/thrift/gen-javabean/org/apache/sentry/provider/db/service/thrift/SentryPolicyService.java 0c24449 
  sentry-provider/sentry-provider-db/src/gen/thrift/gen-javabean/org/apache/sentry/provider/db/service/thrift/TListSentryPrivilegesForCachedRequest.java PRE-CREATION 
  sentry-provider/sentry-provider-db/src/gen/thrift/gen-javabean/org/apache/sentry/provider/db/service/thrift/TListSentryPrivilegesForCachedResponse.java PRE-CREATION 
  sentry-provider/sentry-provider-db/src/main/java/org/apache/sentry/provider/db/service/persistent/SentryStore.java fbb611e 
  sentry-provider/sentry-provider-db/src/main/java/org/apache/sentry/provider/db/service/thrift/SentryPolicyServiceClient.java cbc0aaf 
  sentry-provider/sentry-provider-db/src/main/java/org/apache/sentry/provider/db/service/thrift/SentryPolicyServiceClientDefaultImpl.java 74f379a 
  sentry-provider/sentry-provider-db/src/main/java/org/apache/sentry/provider/db/service/thrift/SentryPolicyStoreProcessor.java 4f8c834 
  sentry-provider/sentry-provider-db/src/main/resources/sentry_policy_service.thrift 40889e8 

Diff: https://reviews.apache.org/r/28800/diff/


Testing
-------


Thanks,

Colin Ma


Re: Review Request 28800: SENTRY-565:Improvement the performance when Sentry filter the entity

Posted by Colin Ma <ju...@intel.com>.

> On Oct. 21, 2015, 12:48 a.m., Sravya Tirukkovalur wrote:
> > Thanks for the update Colin! Will it be possible for you to add description on how you are solving the problem. It would be easy to follow along and also would help review the approach before delving into the code details.
> 
> Colin Ma wrote:
>     hi, Sravya, the following is the current idea for the cached privilege:
>     1. CachedPrivilegeManagement is responsible for getPrivileges, and it likes an agent for providerBackend. CachedPrivilegeManagement will be initialized in HiveAuthzBinding according to the configuration file.
>     2. All cached privileges are stored as HashMap in CachedPrivilegeManagement. The key for this map is username, and the value is wrapped in the CachedPrivilege including all privileges for this username and the timestamp when refresh the privilege.
>     3. When do the authorization, SimpleDBPolicyEngine.getPrivileges will be called, and CachedPrivilegeManagement.getPrivileges will be called. If CachedPrivilegeManagement is initialized and cached privilege enable in configuraion, CachedPrivilegeManagement.getPrivileges will get the privileges from cache first, if not exist or expire, get the privileges by backend and refresh the cache. The expired time can be configured and default is 5 mins.
>     
>     The core part for this problem is: HiveAuthzBinding, CachedPrivilege, CachedPrivilegeManagement, SimpleDBPolicyEngine, TestCachedPrivilege. The other code is just for the interface change.
>     The disadvantage for this solution is if admin change the privilege for user, this will not be effect immediately. There will be delay according to the expire time setting in configuration file. I think this is accepted for the authorization management.
>     
>     Feel free for any comments, thanks for your review.
> 
> Sravya Tirukkovalur wrote:
>     Hi Colin,
>     
>     I think the main problem here is predicate push down. We should focus on what is the best way to be able to send the tables/ db names in the RPC when doing auth check instead of doing auth checks for each table/dbname. Also, we already have a cache provider, if we decide to improve performance of hive binding by moving to a cache provider, we should move everything to using a cache provider and also have a way to refresh this cache. Moving just one path to cache provider does not seem like a right approach as it can lead to behavior which is hard to reason.

Hi, Sravya,

Thanks for the comments, change the implementation to send table/db names in the RPC will be a lots of changes on the core code(eg, SentryStore), this can be a seperate task if necessary.
I think cache privilege is very useful for Sentry, and I'll implement this with the PrivilegeCache.
The main idea will be the same as my description above, maybe some changes on the data structure for cached privilege, using Map<group, cachedPrivilege> instead of Map<username, cachedPrivilege> to reduce the duplicated privileges.


- Colin


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/28800/#review103333
-----------------------------------------------------------


On Oct. 16, 2015, 11:18 a.m., Colin Ma wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/28800/
> -----------------------------------------------------------
> 
> (Updated Oct. 16, 2015, 11:18 a.m.)
> 
> 
> Review request for sentry and Lenni Kuff.
> 
> 
> Repository: sentry
> 
> 
> Description
> -------
> 
> Currently, when get the metadata from hive, eg, "show tables", "show databases". Sentry will filter the result and output the authorized entities. There will be many RPC calls when filtering the result. The related code is in HiveAuthzBinding, for example, in filterShowTables:
> 
> ......
> for (String tableName : queryResult) {
>   ......
>   hiveAuthzBinding.authorize(operation, tableMetaDataPrivilege, subject, inputHierarchy,
>             outputHierarchy, providedPrivileges);
>   ......
> }
> ......
> 
> hiveAuthzBinding.authorize will get the privileges from sentry service, if there are many tables in the hive, the filtering process will spend much time. Considering sentry also need to filter the column, HiveAuthzBinding should be improved to reduce the number of rpc calls when doing the filter.
> 
> 
> Diffs
> -----
> 
>   sentry-binding/sentry-binding-hive/src/main/java/org/apache/sentry/binding/hive/authz/HiveAuthzBinding.java 3071475 
>   sentry-policy/sentry-policy-common/src/main/java/org/apache/sentry/policy/common/PolicyEngine.java 38a5b65 
>   sentry-policy/sentry-policy-db/src/main/java/org/apache/sentry/policy/db/CachedPrivilege.java PRE-CREATION 
>   sentry-policy/sentry-policy-db/src/main/java/org/apache/sentry/policy/db/CachedPrivilegeManagement.java PRE-CREATION 
>   sentry-policy/sentry-policy-db/src/main/java/org/apache/sentry/policy/db/SimpleDBPolicyEngine.java a03794e 
>   sentry-policy/sentry-policy-db/src/test/java/org/apache/sentry/policy/db/AbstractTestSimplePolicyEngine.java d1151e3 
>   sentry-policy/sentry-policy-db/src/test/java/org/apache/sentry/policy/db/TestPolicyParsingNegative.java 5f7c671 
>   sentry-policy/sentry-policy-db/src/test/java/org/apache/sentry/policy/db/TestSimpleDBPolicyEngineDFS.java f8c36e2 
>   sentry-policy/sentry-policy-indexer/src/main/java/org/apache/sentry/policy/indexer/SimpleIndexerPolicyEngine.java 2f4bc1d 
>   sentry-policy/sentry-policy-indexer/src/test/java/org/apache/sentry/policy/indexer/AbstractTestIndexerPolicyEngine.java d7d1ae2 
>   sentry-policy/sentry-policy-indexer/src/test/java/org/apache/sentry/policy/indexer/TestIndexerPolicyNegative.java 0706560 
>   sentry-policy/sentry-policy-search/src/main/java/org/apache/sentry/policy/search/SimpleSearchPolicyEngine.java bc518fb 
>   sentry-policy/sentry-policy-search/src/test/java/org/apache/sentry/policy/search/AbstractTestSearchPolicyEngine.java d1c415b 
>   sentry-policy/sentry-policy-search/src/test/java/org/apache/sentry/policy/search/TestSearchPolicyNegative.java 2abe8f2 
>   sentry-policy/sentry-policy-sqoop/src/main/java/org/apache/sentry/policy/sqoop/SimpleSqoopPolicyEngine.java e8615a0 
>   sentry-policy/sentry-policy-sqoop/src/test/java/org/apache/sentry/policy/sqoop/AbstractTestSqoopPolicyEngine.java 1389fca 
>   sentry-policy/sentry-policy-sqoop/src/test/java/org/apache/sentry/policy/sqoop/TestSqoopPolicyNegative.java 406e53f 
>   sentry-provider/sentry-provider-common/src/main/java/org/apache/sentry/provider/common/ResourceAuthorizationProvider.java 06573b7 
>   sentry-provider/sentry-provider-common/src/test/java/org/apache/sentry/provider/common/TestGetGroupMapping.java f57198a 
>   sentry-provider/sentry-provider-db/src/main/java/org/apache/sentry/service/thrift/ServiceConstants.java bc35742 
>   sentry-tests/sentry-tests-hive/src/test/java/org/apache/sentry/tests/e2e/dbprovider/TestCachedPrivilege.java PRE-CREATION 
>   sentry-tests/sentry-tests-hive/src/test/java/org/apache/sentry/tests/e2e/hive/AbstractTestWithStaticConfiguration.java cc5daef 
> 
> Diff: https://reviews.apache.org/r/28800/diff/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Colin Ma
> 
>


Re: Review Request 28800: SENTRY-565:Improvement the performance when Sentry filter the entity

Posted by Colin Ma <ju...@intel.com>.

> On Oct. 21, 2015, 12:48 a.m., Sravya Tirukkovalur wrote:
> > Thanks for the update Colin! Will it be possible for you to add description on how you are solving the problem. It would be easy to follow along and also would help review the approach before delving into the code details.

hi, Sravya, the following is the current idea for the cached privilege:
1. CachedPrivilegeManagement is responsible for getPrivileges, and it likes an agent for providerBackend. CachedPrivilegeManagement will be initialized in HiveAuthzBinding according to the configuration file.
2. All cached privileges are stored as HashMap in CachedPrivilegeManagement. The key for this map is username, and the value is wrapped in the CachedPrivilege including all privileges for this username and the timestamp when refresh the privilege.
3. When do the authorization, SimpleDBPolicyEngine.getPrivileges will be called, and CachedPrivilegeManagement.getPrivileges will be called. If CachedPrivilegeManagement is initialized and cached privilege enable in configuraion, CachedPrivilegeManagement.getPrivileges will get the privileges from cache first, if not exist or expire, get the privileges by backend and refresh the cache. The expired time can be configured and default is 5 mins.

The core part for this problem is: HiveAuthzBinding, CachedPrivilege, CachedPrivilegeManagement, SimpleDBPolicyEngine, TestCachedPrivilege. The other code is just for the interface change.
The disadvantage for this solution is if admin change the privilege for user, this will not be effect immediately. There will be delay according to the expire time setting in configuration file. I think this is accepted for the authorization management.

Feel free for any comments, thanks for your review.


- Colin


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/28800/#review103333
-----------------------------------------------------------


On Oct. 16, 2015, 11:18 a.m., Colin Ma wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/28800/
> -----------------------------------------------------------
> 
> (Updated Oct. 16, 2015, 11:18 a.m.)
> 
> 
> Review request for sentry and Lenni Kuff.
> 
> 
> Repository: sentry
> 
> 
> Description
> -------
> 
> Currently, when get the metadata from hive, eg, "show tables", "show databases". Sentry will filter the result and output the authorized entities. There will be many RPC calls when filtering the result. The related code is in HiveAuthzBinding, for example, in filterShowTables:
> 
> ......
> for (String tableName : queryResult) {
>   ......
>   hiveAuthzBinding.authorize(operation, tableMetaDataPrivilege, subject, inputHierarchy,
>             outputHierarchy, providedPrivileges);
>   ......
> }
> ......
> 
> hiveAuthzBinding.authorize will get the privileges from sentry service, if there are many tables in the hive, the filtering process will spend much time. Considering sentry also need to filter the column, HiveAuthzBinding should be improved to reduce the number of rpc calls when doing the filter.
> 
> 
> Diffs
> -----
> 
>   sentry-binding/sentry-binding-hive/src/main/java/org/apache/sentry/binding/hive/authz/HiveAuthzBinding.java 3071475 
>   sentry-policy/sentry-policy-common/src/main/java/org/apache/sentry/policy/common/PolicyEngine.java 38a5b65 
>   sentry-policy/sentry-policy-db/src/main/java/org/apache/sentry/policy/db/CachedPrivilege.java PRE-CREATION 
>   sentry-policy/sentry-policy-db/src/main/java/org/apache/sentry/policy/db/CachedPrivilegeManagement.java PRE-CREATION 
>   sentry-policy/sentry-policy-db/src/main/java/org/apache/sentry/policy/db/SimpleDBPolicyEngine.java a03794e 
>   sentry-policy/sentry-policy-db/src/test/java/org/apache/sentry/policy/db/AbstractTestSimplePolicyEngine.java d1151e3 
>   sentry-policy/sentry-policy-db/src/test/java/org/apache/sentry/policy/db/TestPolicyParsingNegative.java 5f7c671 
>   sentry-policy/sentry-policy-db/src/test/java/org/apache/sentry/policy/db/TestSimpleDBPolicyEngineDFS.java f8c36e2 
>   sentry-policy/sentry-policy-indexer/src/main/java/org/apache/sentry/policy/indexer/SimpleIndexerPolicyEngine.java 2f4bc1d 
>   sentry-policy/sentry-policy-indexer/src/test/java/org/apache/sentry/policy/indexer/AbstractTestIndexerPolicyEngine.java d7d1ae2 
>   sentry-policy/sentry-policy-indexer/src/test/java/org/apache/sentry/policy/indexer/TestIndexerPolicyNegative.java 0706560 
>   sentry-policy/sentry-policy-search/src/main/java/org/apache/sentry/policy/search/SimpleSearchPolicyEngine.java bc518fb 
>   sentry-policy/sentry-policy-search/src/test/java/org/apache/sentry/policy/search/AbstractTestSearchPolicyEngine.java d1c415b 
>   sentry-policy/sentry-policy-search/src/test/java/org/apache/sentry/policy/search/TestSearchPolicyNegative.java 2abe8f2 
>   sentry-policy/sentry-policy-sqoop/src/main/java/org/apache/sentry/policy/sqoop/SimpleSqoopPolicyEngine.java e8615a0 
>   sentry-policy/sentry-policy-sqoop/src/test/java/org/apache/sentry/policy/sqoop/AbstractTestSqoopPolicyEngine.java 1389fca 
>   sentry-policy/sentry-policy-sqoop/src/test/java/org/apache/sentry/policy/sqoop/TestSqoopPolicyNegative.java 406e53f 
>   sentry-provider/sentry-provider-common/src/main/java/org/apache/sentry/provider/common/ResourceAuthorizationProvider.java 06573b7 
>   sentry-provider/sentry-provider-common/src/test/java/org/apache/sentry/provider/common/TestGetGroupMapping.java f57198a 
>   sentry-provider/sentry-provider-db/src/main/java/org/apache/sentry/service/thrift/ServiceConstants.java bc35742 
>   sentry-tests/sentry-tests-hive/src/test/java/org/apache/sentry/tests/e2e/dbprovider/TestCachedPrivilege.java PRE-CREATION 
>   sentry-tests/sentry-tests-hive/src/test/java/org/apache/sentry/tests/e2e/hive/AbstractTestWithStaticConfiguration.java cc5daef 
> 
> Diff: https://reviews.apache.org/r/28800/diff/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Colin Ma
> 
>


Re: Review Request 28800: SENTRY-565:Improvement the performance when Sentry filter the entity

Posted by Sravya Tirukkovalur <sr...@cloudera.com>.

> On Oct. 21, 2015, 12:48 a.m., Sravya Tirukkovalur wrote:
> > Thanks for the update Colin! Will it be possible for you to add description on how you are solving the problem. It would be easy to follow along and also would help review the approach before delving into the code details.
> 
> Colin Ma wrote:
>     hi, Sravya, the following is the current idea for the cached privilege:
>     1. CachedPrivilegeManagement is responsible for getPrivileges, and it likes an agent for providerBackend. CachedPrivilegeManagement will be initialized in HiveAuthzBinding according to the configuration file.
>     2. All cached privileges are stored as HashMap in CachedPrivilegeManagement. The key for this map is username, and the value is wrapped in the CachedPrivilege including all privileges for this username and the timestamp when refresh the privilege.
>     3. When do the authorization, SimpleDBPolicyEngine.getPrivileges will be called, and CachedPrivilegeManagement.getPrivileges will be called. If CachedPrivilegeManagement is initialized and cached privilege enable in configuraion, CachedPrivilegeManagement.getPrivileges will get the privileges from cache first, if not exist or expire, get the privileges by backend and refresh the cache. The expired time can be configured and default is 5 mins.
>     
>     The core part for this problem is: HiveAuthzBinding, CachedPrivilege, CachedPrivilegeManagement, SimpleDBPolicyEngine, TestCachedPrivilege. The other code is just for the interface change.
>     The disadvantage for this solution is if admin change the privilege for user, this will not be effect immediately. There will be delay according to the expire time setting in configuration file. I think this is accepted for the authorization management.
>     
>     Feel free for any comments, thanks for your review.

Hi Colin,

I think the main problem here is predicate push down. We should focus on what is the best way to be able to send the tables/ db names in the RPC when doing auth check instead of doing auth checks for each table/dbname. Also, we already have a cache provider, if we decide to improve performance of hive binding by moving to a cache provider, we should move everything to using a cache provider and also have a way to refresh this cache. Moving just one path to cache provider does not seem like a right approach as it can lead to behavior which is hard to reason.


- Sravya


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/28800/#review103333
-----------------------------------------------------------


On Oct. 16, 2015, 11:18 a.m., Colin Ma wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/28800/
> -----------------------------------------------------------
> 
> (Updated Oct. 16, 2015, 11:18 a.m.)
> 
> 
> Review request for sentry and Lenni Kuff.
> 
> 
> Repository: sentry
> 
> 
> Description
> -------
> 
> Currently, when get the metadata from hive, eg, "show tables", "show databases". Sentry will filter the result and output the authorized entities. There will be many RPC calls when filtering the result. The related code is in HiveAuthzBinding, for example, in filterShowTables:
> 
> ......
> for (String tableName : queryResult) {
>   ......
>   hiveAuthzBinding.authorize(operation, tableMetaDataPrivilege, subject, inputHierarchy,
>             outputHierarchy, providedPrivileges);
>   ......
> }
> ......
> 
> hiveAuthzBinding.authorize will get the privileges from sentry service, if there are many tables in the hive, the filtering process will spend much time. Considering sentry also need to filter the column, HiveAuthzBinding should be improved to reduce the number of rpc calls when doing the filter.
> 
> 
> Diffs
> -----
> 
>   sentry-binding/sentry-binding-hive/src/main/java/org/apache/sentry/binding/hive/authz/HiveAuthzBinding.java 3071475 
>   sentry-policy/sentry-policy-common/src/main/java/org/apache/sentry/policy/common/PolicyEngine.java 38a5b65 
>   sentry-policy/sentry-policy-db/src/main/java/org/apache/sentry/policy/db/CachedPrivilege.java PRE-CREATION 
>   sentry-policy/sentry-policy-db/src/main/java/org/apache/sentry/policy/db/CachedPrivilegeManagement.java PRE-CREATION 
>   sentry-policy/sentry-policy-db/src/main/java/org/apache/sentry/policy/db/SimpleDBPolicyEngine.java a03794e 
>   sentry-policy/sentry-policy-db/src/test/java/org/apache/sentry/policy/db/AbstractTestSimplePolicyEngine.java d1151e3 
>   sentry-policy/sentry-policy-db/src/test/java/org/apache/sentry/policy/db/TestPolicyParsingNegative.java 5f7c671 
>   sentry-policy/sentry-policy-db/src/test/java/org/apache/sentry/policy/db/TestSimpleDBPolicyEngineDFS.java f8c36e2 
>   sentry-policy/sentry-policy-indexer/src/main/java/org/apache/sentry/policy/indexer/SimpleIndexerPolicyEngine.java 2f4bc1d 
>   sentry-policy/sentry-policy-indexer/src/test/java/org/apache/sentry/policy/indexer/AbstractTestIndexerPolicyEngine.java d7d1ae2 
>   sentry-policy/sentry-policy-indexer/src/test/java/org/apache/sentry/policy/indexer/TestIndexerPolicyNegative.java 0706560 
>   sentry-policy/sentry-policy-search/src/main/java/org/apache/sentry/policy/search/SimpleSearchPolicyEngine.java bc518fb 
>   sentry-policy/sentry-policy-search/src/test/java/org/apache/sentry/policy/search/AbstractTestSearchPolicyEngine.java d1c415b 
>   sentry-policy/sentry-policy-search/src/test/java/org/apache/sentry/policy/search/TestSearchPolicyNegative.java 2abe8f2 
>   sentry-policy/sentry-policy-sqoop/src/main/java/org/apache/sentry/policy/sqoop/SimpleSqoopPolicyEngine.java e8615a0 
>   sentry-policy/sentry-policy-sqoop/src/test/java/org/apache/sentry/policy/sqoop/AbstractTestSqoopPolicyEngine.java 1389fca 
>   sentry-policy/sentry-policy-sqoop/src/test/java/org/apache/sentry/policy/sqoop/TestSqoopPolicyNegative.java 406e53f 
>   sentry-provider/sentry-provider-common/src/main/java/org/apache/sentry/provider/common/ResourceAuthorizationProvider.java 06573b7 
>   sentry-provider/sentry-provider-common/src/test/java/org/apache/sentry/provider/common/TestGetGroupMapping.java f57198a 
>   sentry-provider/sentry-provider-db/src/main/java/org/apache/sentry/service/thrift/ServiceConstants.java bc35742 
>   sentry-tests/sentry-tests-hive/src/test/java/org/apache/sentry/tests/e2e/dbprovider/TestCachedPrivilege.java PRE-CREATION 
>   sentry-tests/sentry-tests-hive/src/test/java/org/apache/sentry/tests/e2e/hive/AbstractTestWithStaticConfiguration.java cc5daef 
> 
> Diff: https://reviews.apache.org/r/28800/diff/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Colin Ma
> 
>


Re: Review Request 28800: SENTRY-565:Improvement the performance when Sentry filter the entity

Posted by Sravya Tirukkovalur <sr...@cloudera.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/28800/#review103333
-----------------------------------------------------------


Thanks for the update Colin! Will it be possible for you to add description on how you are solving the problem. It would be easy to follow along and also would help review the approach before delving into the code details.

- Sravya Tirukkovalur


On Oct. 16, 2015, 11:18 a.m., Colin Ma wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/28800/
> -----------------------------------------------------------
> 
> (Updated Oct. 16, 2015, 11:18 a.m.)
> 
> 
> Review request for sentry and Lenni Kuff.
> 
> 
> Repository: sentry
> 
> 
> Description
> -------
> 
> Currently, when get the metadata from hive, eg, "show tables", "show databases". Sentry will filter the result and output the authorized entities. There will be many RPC calls when filtering the result. The related code is in HiveAuthzBinding, for example, in filterShowTables:
> 
> ......
> for (String tableName : queryResult) {
>   ......
>   hiveAuthzBinding.authorize(operation, tableMetaDataPrivilege, subject, inputHierarchy,
>             outputHierarchy, providedPrivileges);
>   ......
> }
> ......
> 
> hiveAuthzBinding.authorize will get the privileges from sentry service, if there are many tables in the hive, the filtering process will spend much time. Considering sentry also need to filter the column, HiveAuthzBinding should be improved to reduce the number of rpc calls when doing the filter.
> 
> 
> Diffs
> -----
> 
>   sentry-binding/sentry-binding-hive/src/main/java/org/apache/sentry/binding/hive/authz/HiveAuthzBinding.java 3071475 
>   sentry-policy/sentry-policy-common/src/main/java/org/apache/sentry/policy/common/PolicyEngine.java 38a5b65 
>   sentry-policy/sentry-policy-db/src/main/java/org/apache/sentry/policy/db/CachedPrivilege.java PRE-CREATION 
>   sentry-policy/sentry-policy-db/src/main/java/org/apache/sentry/policy/db/CachedPrivilegeManagement.java PRE-CREATION 
>   sentry-policy/sentry-policy-db/src/main/java/org/apache/sentry/policy/db/SimpleDBPolicyEngine.java a03794e 
>   sentry-policy/sentry-policy-db/src/test/java/org/apache/sentry/policy/db/AbstractTestSimplePolicyEngine.java d1151e3 
>   sentry-policy/sentry-policy-db/src/test/java/org/apache/sentry/policy/db/TestPolicyParsingNegative.java 5f7c671 
>   sentry-policy/sentry-policy-db/src/test/java/org/apache/sentry/policy/db/TestSimpleDBPolicyEngineDFS.java f8c36e2 
>   sentry-policy/sentry-policy-indexer/src/main/java/org/apache/sentry/policy/indexer/SimpleIndexerPolicyEngine.java 2f4bc1d 
>   sentry-policy/sentry-policy-indexer/src/test/java/org/apache/sentry/policy/indexer/AbstractTestIndexerPolicyEngine.java d7d1ae2 
>   sentry-policy/sentry-policy-indexer/src/test/java/org/apache/sentry/policy/indexer/TestIndexerPolicyNegative.java 0706560 
>   sentry-policy/sentry-policy-search/src/main/java/org/apache/sentry/policy/search/SimpleSearchPolicyEngine.java bc518fb 
>   sentry-policy/sentry-policy-search/src/test/java/org/apache/sentry/policy/search/AbstractTestSearchPolicyEngine.java d1c415b 
>   sentry-policy/sentry-policy-search/src/test/java/org/apache/sentry/policy/search/TestSearchPolicyNegative.java 2abe8f2 
>   sentry-policy/sentry-policy-sqoop/src/main/java/org/apache/sentry/policy/sqoop/SimpleSqoopPolicyEngine.java e8615a0 
>   sentry-policy/sentry-policy-sqoop/src/test/java/org/apache/sentry/policy/sqoop/AbstractTestSqoopPolicyEngine.java 1389fca 
>   sentry-policy/sentry-policy-sqoop/src/test/java/org/apache/sentry/policy/sqoop/TestSqoopPolicyNegative.java 406e53f 
>   sentry-provider/sentry-provider-common/src/main/java/org/apache/sentry/provider/common/ResourceAuthorizationProvider.java 06573b7 
>   sentry-provider/sentry-provider-common/src/test/java/org/apache/sentry/provider/common/TestGetGroupMapping.java f57198a 
>   sentry-provider/sentry-provider-db/src/main/java/org/apache/sentry/service/thrift/ServiceConstants.java bc35742 
>   sentry-tests/sentry-tests-hive/src/test/java/org/apache/sentry/tests/e2e/dbprovider/TestCachedPrivilege.java PRE-CREATION 
>   sentry-tests/sentry-tests-hive/src/test/java/org/apache/sentry/tests/e2e/hive/AbstractTestWithStaticConfiguration.java cc5daef 
> 
> Diff: https://reviews.apache.org/r/28800/diff/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Colin Ma
> 
>


Re: Review Request 28800: SENTRY-565:Improvement the performance when Sentry filter the entity

Posted by Colin Ma <ju...@intel.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/28800/
-----------------------------------------------------------

(Updated Oct. 16, 2015, 11:18 a.m.)


Review request for sentry and Lenni Kuff.


Repository: sentry


Description
-------

Currently, when get the metadata from hive, eg, "show tables", "show databases". Sentry will filter the result and output the authorized entities. There will be many RPC calls when filtering the result. The related code is in HiveAuthzBinding, for example, in filterShowTables:

......
for (String tableName : queryResult) {
  ......
  hiveAuthzBinding.authorize(operation, tableMetaDataPrivilege, subject, inputHierarchy,
            outputHierarchy, providedPrivileges);
  ......
}
......

hiveAuthzBinding.authorize will get the privileges from sentry service, if there are many tables in the hive, the filtering process will spend much time. Considering sentry also need to filter the column, HiveAuthzBinding should be improved to reduce the number of rpc calls when doing the filter.


Diffs (updated)
-----

  sentry-binding/sentry-binding-hive/src/main/java/org/apache/sentry/binding/hive/authz/HiveAuthzBinding.java 3071475 
  sentry-policy/sentry-policy-common/src/main/java/org/apache/sentry/policy/common/PolicyEngine.java 38a5b65 
  sentry-policy/sentry-policy-db/src/main/java/org/apache/sentry/policy/db/CachedPrivilege.java PRE-CREATION 
  sentry-policy/sentry-policy-db/src/main/java/org/apache/sentry/policy/db/CachedPrivilegeManagement.java PRE-CREATION 
  sentry-policy/sentry-policy-db/src/main/java/org/apache/sentry/policy/db/SimpleDBPolicyEngine.java a03794e 
  sentry-policy/sentry-policy-db/src/test/java/org/apache/sentry/policy/db/AbstractTestSimplePolicyEngine.java d1151e3 
  sentry-policy/sentry-policy-db/src/test/java/org/apache/sentry/policy/db/TestPolicyParsingNegative.java 5f7c671 
  sentry-policy/sentry-policy-db/src/test/java/org/apache/sentry/policy/db/TestSimpleDBPolicyEngineDFS.java f8c36e2 
  sentry-policy/sentry-policy-indexer/src/main/java/org/apache/sentry/policy/indexer/SimpleIndexerPolicyEngine.java 2f4bc1d 
  sentry-policy/sentry-policy-indexer/src/test/java/org/apache/sentry/policy/indexer/AbstractTestIndexerPolicyEngine.java d7d1ae2 
  sentry-policy/sentry-policy-indexer/src/test/java/org/apache/sentry/policy/indexer/TestIndexerPolicyNegative.java 0706560 
  sentry-policy/sentry-policy-search/src/main/java/org/apache/sentry/policy/search/SimpleSearchPolicyEngine.java bc518fb 
  sentry-policy/sentry-policy-search/src/test/java/org/apache/sentry/policy/search/AbstractTestSearchPolicyEngine.java d1c415b 
  sentry-policy/sentry-policy-search/src/test/java/org/apache/sentry/policy/search/TestSearchPolicyNegative.java 2abe8f2 
  sentry-policy/sentry-policy-sqoop/src/main/java/org/apache/sentry/policy/sqoop/SimpleSqoopPolicyEngine.java e8615a0 
  sentry-policy/sentry-policy-sqoop/src/test/java/org/apache/sentry/policy/sqoop/AbstractTestSqoopPolicyEngine.java 1389fca 
  sentry-policy/sentry-policy-sqoop/src/test/java/org/apache/sentry/policy/sqoop/TestSqoopPolicyNegative.java 406e53f 
  sentry-provider/sentry-provider-common/src/main/java/org/apache/sentry/provider/common/ResourceAuthorizationProvider.java 06573b7 
  sentry-provider/sentry-provider-common/src/test/java/org/apache/sentry/provider/common/TestGetGroupMapping.java f57198a 
  sentry-provider/sentry-provider-db/src/main/java/org/apache/sentry/service/thrift/ServiceConstants.java bc35742 
  sentry-tests/sentry-tests-hive/src/test/java/org/apache/sentry/tests/e2e/dbprovider/TestCachedPrivilege.java PRE-CREATION 
  sentry-tests/sentry-tests-hive/src/test/java/org/apache/sentry/tests/e2e/hive/AbstractTestWithStaticConfiguration.java cc5daef 

Diff: https://reviews.apache.org/r/28800/diff/


Testing
-------


Thanks,

Colin Ma