You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@impala.apache.org by "Fang-Yu Rao (Code Review)" <ge...@cloudera.org> on 2021/03/01 00:59:56 UTC

[Impala-ASF-CR] IMPALA-9234: Support Ranger row filtering policies

Fang-Yu Rao has posted comments on this change. ( http://gerrit.cloudera.org:8080/16976 )

Change subject: IMPALA-9234: Support Ranger row filtering policies
......................................................................


Patch Set 3:

(9 comments)

Thanks Quanlong for adding the support for row-filtering in Impala! Sorry that it took me quite a while to review this patch. I briefly reviewed the added end-to-end tests and they look good to me. I only have some very minor comments.

I plan to read/study the parts related to audit logs generation and will try to get back to you soon the coming week.

http://gerrit.cloudera.org:8080/#/c/16976/1/fe/src/main/java/org/apache/impala/authorization/ranger/RangerAuthorizationContext.java
File fe/src/main/java/org/apache/impala/authorization/ranger/RangerAuthorizationContext.java:

http://gerrit.cloudera.org:8080/#/c/16976/1/fe/src/main/java/org/apache/impala/authorization/ranger/RangerAuthorizationContext.java@56
PS1, Line 56:    * Stash and deduplicate the audit events produced by table masking (Column-masking /
> Thanks Quanlong! I will review this and try to get back to you later today.
Thanks Quanlong! Maybe we could also add here a statement like "Refer to IMPALA-9597 for further details" so that people interested in this method could refer to there. I do not have any other suggestion. :-)


http://gerrit.cloudera.org:8080/#/c/16976/3/fe/src/main/java/org/apache/impala/authorization/ranger/RangerAuthorizationContext.java
File fe/src/main/java/org/apache/impala/authorization/ranger/RangerAuthorizationContext.java:

http://gerrit.cloudera.org:8080/#/c/16976/3/fe/src/main/java/org/apache/impala/authorization/ranger/RangerAuthorizationContext.java@56
PS3, Line 56:   * Stash and deduplicate the audit events produced by table masking (Column-masking /
            :    * Row-filtering) which are performed during the analyze phase. Called at the end of
            :    * analyzing. These stashed events will be added back after the query pass the
            :    * authorization phase. Note that normal events (select, insert, drop, etc.) are
            :    * produced in the authorization phase. Stashing table masking events avoids exposing
            :    * them when the query fails authorization.
Thanks Quanlong!

Maybe we could also add here a statement like "Refer to IMPALA-9597 for further details" so that people interested in this method could refer to there as well. I do not have any other suggestion. :-)


http://gerrit.cloudera.org:8080/#/c/16976/3/testdata/workloads/functional-query/queries/QueryTest/ranger_column_masking_and_row_filtering.test
File testdata/workloads/functional-query/queries/QueryTest/ranger_column_masking_and_row_filtering.test:

http://gerrit.cloudera.org:8080/#/c/16976/3/testdata/workloads/functional-query/queries/QueryTest/ranger_column_masking_and_row_filtering.test@16
PS3, Line 16: # Column-masking policies of functional.alltypes mask "id" to "-id" and redact column
            : # "date_string_col". Row-filtering policy of functional.alltypes_view keeps rows with
            : # "id >= -8 and date_string_col = 'nn/nn/nn'"
Thanks Quanlong for adding this test case!

Maybe it would be even clearer to emphasize here that the view 'alltypes_view' is based on the table 'alltypes' and thus the column masking policies were applied to 'alltypes' before the row-filter policies are applied to 'alltypes_view'.


http://gerrit.cloudera.org:8080/#/c/16976/3/testdata/workloads/functional-query/queries/QueryTest/ranger_row_filtering.test
File testdata/workloads/functional-query/queries/QueryTest/ranger_row_filtering.test:

http://gerrit.cloudera.org:8080/#/c/16976/3/testdata/workloads/functional-query/queries/QueryTest/ranger_row_filtering.test@76
PS3, Line 76: support using subquery
Maybe it would be clearer to emphasize that the subquery is on the same table, i.e., 'functional', in order to contrast with the next test case.


http://gerrit.cloudera.org:8080/#/c/16976/3/testdata/workloads/functional-query/queries/QueryTest/ranger_row_filtering.test@129
PS3, Line 129: select * from functional_parquet.complextypestbl.nested_struct.b
Maybe it would be good to briefly mention here that what the query is to fetch the desired result set if there is no row-filtering policy involved.


http://gerrit.cloudera.org:8080/#/c/16976/3/testdata/workloads/functional-query/queries/QueryTest/ranger_row_filtering.test@136
PS3, Line 136: # The above query should be manually rewritten to this until IMPALA-10484 is resolved.
Thanks for adding this test case!


http://gerrit.cloudera.org:8080/#/c/16976/3/testdata/workloads/functional-query/queries/QueryTest/ranger_row_filtering.test@146
PS3, Line 146: ====
I was wondering if it would be good to add a test case involving 2 tables each associated with a row-filtering policy. For example, 'where id % 2 = 0' for the table 'functional.alltypestiny' and 'where id % 3 = 0' for the table 'functional.alltypessmall'


http://gerrit.cloudera.org:8080/#/c/16976/3/tests/authorization/test_ranger.py
File tests/authorization/test_ranger.py:

http://gerrit.cloudera.org:8080/#/c/16976/3/tests/authorization/test_ranger.py@1095
PS3, Line 1095:     user = getuser()
It may be good to add a comment here to point out that we do not need to additionally grant the SELECT privileges to 'getuser()' because 'getuser()' is the owner of the databases used in the test queries.

This test also shows that the a row-filtering policy takes precedence over the ownership of the resources involved in the policy. A user cannot access some rows in a table if there is some row-filtering policy applied to the table even though this user is the owner of the table. Is my understanding correct?


http://gerrit.cloudera.org:8080/#/c/16976/3/tests/authorization/test_ranger.py@1104
PS3, Line 1104: concat('0', '')
I am not very familiar with the creation of a row-filtering policy via REST API call. Is it true that simply using "string_col = '0'" does not work so that we have to use "concat('0', '')"?



-- 
To view, visit http://gerrit.cloudera.org:8080/16976
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I580517be241225ca15e45686381b78890178d7cc
Gerrit-Change-Number: 16976
Gerrit-PatchSet: 3
Gerrit-Owner: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Fang-Yu Rao <fa...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Mon, 01 Mar 2021 00:59:56 +0000
Gerrit-HasComments: Yes