You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@sentry.apache.org by "Dapeng Sun (JIRA)" <ji...@apache.org> on 2016/01/15 10:14:39 UTC

[jira] [Comment Edited] (SENTRY-1007) Sentry column-level performance for wide tables

    [ https://issues.apache.org/jira/browse/SENTRY-1007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15101521#comment-15101521 ] 

Dapeng Sun edited comment on SENTRY-1007 at 1/15/16 9:14 AM:
-------------------------------------------------------------

Hi [~anneyu] 

Which version of Sentry you are testing?

[~colinma] implemented a solution at SENTRY-565. But the target version is 1.7.0, you can apply the patch if you need.

{noformat}1) select * from table {noformat}
{noformat}2) show columns in table{noformat}
These issues will be fixed after SENTRY-565
{noformat}3) show count(*) in table{noformat}
As you said, this issue doesn't exist.
{noformat}4) select col1,col2,col3,.....,colN from test_tb1{noformat}
Here is the initial patch for fixing issue 4.


was (Author: dapengsun):
Hi [~anneyu] 

Which version of Sentry you are testing?

[~colinma] implemented a solution at SENTRY-565. But the target version is 1.7.0, you may need to apply the patch if you need.

{noformat}1) select * from table {noformat}
{noformat}2) show columns in table{noformat}
These issues will be fixed after SENTRY-565
{noformat}3) show count(*) in table{noformat}
As you said, this issue doesn't exist.
{noformat}4) select col1,col2,col3,.....,colN from test_tb1{noformat}
Here is the initial patch for fixing issue 4.

> Sentry column-level performance for wide tables
> -----------------------------------------------
>
>                 Key: SENTRY-1007
>                 URL: https://issues.apache.org/jira/browse/SENTRY-1007
>             Project: Sentry
>          Issue Type: Bug
>    Affects Versions: 1.5.1
>            Reporter: Anne Yu
>            Assignee: Dapeng Sun
>            Priority: Critical
>         Attachments: SENTRY-1007.001.patch
>
>
> It appears a query is taking a long time on a wide table due many Sentry column-level auth checks. 
> here are some investigation results:
> 1) create a table with 4000 columns, grant select on [table|db] to test user, however {noformat}select * from table{noformat} still validates column level privilege on each column. So this select command issues 4000 queries to validate column level permissions. It takes 40 seconds on my test cluster (2x large) to return results, for a moment, query seems to freeze:
> {noformat}
> ...
> 2016-01-11 11:54:14,816 INFO DataNucleus.Query: Reading in results for query "SELECT FROM org.apache.sentry.provider.db.service.model.MSentryPrivilege WHERE roles.contains(role) && (role.roleName == "test_role") && serverName == "server1" && ((dbName == "test_db") || (dbName == "__NULL__")) && (URI == "__NULL__") && ((tableName == "test_tb") || (tableName == "__NULL__")) && (URI == "__NULL__") && ((columnName == "test997") || (columnName == "__NULL__")) && (URI == "__NULL__") VARIABLES org.apache.sentry.provider.db.service.model.MSentryRole role" since the connection used is closing
> 2016-01-11 11:54:14,822 INFO DataNucleus.Query: Reading in results for query "SELECT FROM org.apache.sentry.provider.db.service.model.MSentryPrivilege WHERE roles.contains(role) && (role.roleName == "test_role") && serverName == "server1" && ((dbName == "test_db") || (dbName == "__NULL__")) && (URI == "__NULL__") && ((tableName == "test_tb") || (tableName == "__NULL__")) && (URI == "__NULL__") && ((columnName == "test998") || (columnName == "__NULL__")) && (URI == "__NULL__") VARIABLES org.apache.sentry.provider.db.service.model.MSentryRole role" since the connection used is closing
> 2016-01-11 11:54:14,828 INFO DataNucleus.Query: Reading in results for query "SELECT FROM org.apache.sentry.provider.db.service.model.MSentryPrivilege WHERE roles.contains(role) && (role.roleName == "test_role") && serverName == "server1" && ((dbName == "test_db") || (dbName == "__NULL__")) && (URI == "__NULL__") && ((tableName == "test_tb") || (tableName == "__NULL__")) && (URI == "__NULL__") && ((columnName == "test999") || (columnName == "__NULL__")) && (URI == "__NULL__") VARIABLES org.apache.sentry.provider.db.service.model.MSentryRole role" since the connection used is closing
> {noformat}
> Here is the debug log from sentry service:
> {noformat}
> org.apache.sentry.binding.hive.authz.HiveAuthzBinding.authorize(HiveAuthzBinding.java:304)] requiredInputPrivileges = {Table=[SELECT], Column=[SELECT], URI=[ALL]}
> {noformat}
> 2) the same issue for {noformat}show columns in table{noformat}
> 3) for {noformat}show count(*) in table{noformat}, it requires table level privilege, so this issue doesn't exist. 
> 4) I found out there are more commands have the same issues are:
> {code}
> SHOW COLUMNS FROM test_tb1;
> create table test_tb2 as select * from test_tb1
> show partitions in test_tb1
> select * from test_tb1
> {code}
> Even for {code}select col1,col2,col3 from test_tb1{code} will issue 3 queries for each column, instead of one query for all columns in one table;



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)