You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@impala.apache.org by "Quanlong Huang (Code Review)" <ge...@cloudera.org> on 2021/03/18 13:26:19 UTC

[Impala-ASF-CR] IMPALA-9661: Avoid introducing unused columns in table masking view

Quanlong Huang has uploaded this change for review. ( http://gerrit.cloudera.org:8080/17199


Change subject: IMPALA-9661: Avoid introducing unused columns in table masking view
......................................................................

IMPALA-9661: Avoid introducing unused columns in table masking view

Previously, if a table has column masking policies, we replace its
unanalyzed TableRef with an analyzed InlineViewRef (table masking view)
in FromClause.analyze(). However, we can't detect which columns are
actually used in the original query at this point. In fact, analyze()
for SelectList, WhereClause, GroupByClause and other clauses containing
SlotRefs happen after FromClause.analyze(). After the whole query block
is analyzed, we can get the exact set of required columns.

This patch refactor the codes to do table masking after analyze() to
avoid introducing unused columns. Referenced columns of a TableRef are
registered in analyze(), which helps to figure out what columns are
actually needed.

Tests:
 - Run column masking and row filtering tests in test_ranger.py
 - Run FE audit tests

Change-Id: Ib015a8ab528065907b27fbdceb8e2818deb814e1
---
M fe/src/main/java/org/apache/impala/analysis/AlterViewStmt.java
M fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java
M fe/src/main/java/org/apache/impala/analysis/Analyzer.java
M fe/src/main/java/org/apache/impala/analysis/CreateTableAsSelectStmt.java
M fe/src/main/java/org/apache/impala/analysis/CreateViewStmt.java
M fe/src/main/java/org/apache/impala/analysis/Expr.java
M fe/src/main/java/org/apache/impala/analysis/FromClause.java
M fe/src/main/java/org/apache/impala/analysis/InlineViewRef.java
M fe/src/main/java/org/apache/impala/analysis/InsertStmt.java
M fe/src/main/java/org/apache/impala/analysis/Path.java
M fe/src/main/java/org/apache/impala/analysis/QueryStmt.java
M fe/src/main/java/org/apache/impala/analysis/SelectStmt.java
M fe/src/main/java/org/apache/impala/analysis/SetOperationStmt.java
M fe/src/main/java/org/apache/impala/analysis/SlotRef.java
M fe/src/main/java/org/apache/impala/analysis/StmtNode.java
M fe/src/main/java/org/apache/impala/analysis/Subquery.java
M fe/src/main/java/org/apache/impala/analysis/TableRef.java
M fe/src/main/java/org/apache/impala/analysis/WithClause.java
M fe/src/main/java/org/apache/impala/authorization/TableMask.java
M fe/src/test/java/org/apache/impala/authorization/ranger/RangerAuditLogTest.java
20 files changed, 267 insertions(+), 226 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/99/17199/1
-- 
To view, visit http://gerrit.cloudera.org:8080/17199
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: Ib015a8ab528065907b27fbdceb8e2818deb814e1
Gerrit-Change-Number: 17199
Gerrit-PatchSet: 1
Gerrit-Owner: Quanlong Huang <hu...@gmail.com>

[Impala-ASF-CR] IMPALA-9661: Avoid introducing unused columns in table masking view

Posted by "Quanlong Huang (Code Review)" <ge...@cloudera.org>.
Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/17199 )

Change subject: IMPALA-9661: Avoid introducing unused columns in table masking view
......................................................................


Patch Set 3:

Added more audit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/17199
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib015a8ab528065907b27fbdceb8e2818deb814e1
Gerrit-Change-Number: 17199
Gerrit-PatchSet: 3
Gerrit-Owner: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Fri, 19 Mar 2021 02:41:20 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-9661: Avoid introducing unused columns in table masking view

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17199 )

Change subject: IMPALA-9661: Avoid introducing unused columns in table masking view
......................................................................


Patch Set 5:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/6985/ DRY_RUN=true


-- 
To view, visit http://gerrit.cloudera.org:8080/17199
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib015a8ab528065907b27fbdceb8e2818deb814e1
Gerrit-Change-Number: 17199
Gerrit-PatchSet: 5
Gerrit-Owner: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Sun, 21 Mar 2021 13:03:06 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-9661: Avoid introducing unused columns in table masking view

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17199 )

Change subject: IMPALA-9661: Avoid introducing unused columns in table masking view
......................................................................


Patch Set 6:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/6987/ DRY_RUN=false


-- 
To view, visit http://gerrit.cloudera.org:8080/17199
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib015a8ab528065907b27fbdceb8e2818deb814e1
Gerrit-Change-Number: 17199
Gerrit-PatchSet: 6
Gerrit-Owner: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Mon, 22 Mar 2021 02:52:02 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-9661: Avoid introducing unused columns in table masking view

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17199 )

Change subject: IMPALA-9661: Avoid introducing unused columns in table masking view
......................................................................


Patch Set 5: Verified+1


-- 
To view, visit http://gerrit.cloudera.org:8080/17199
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib015a8ab528065907b27fbdceb8e2818deb814e1
Gerrit-Change-Number: 17199
Gerrit-PatchSet: 5
Gerrit-Owner: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Sun, 21 Mar 2021 18:42:46 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-9661: Avoid introducing unused columns in table masking view

Posted by "Quanlong Huang (Code Review)" <ge...@cloudera.org>.
Hello Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/17199

to look at the new patch set (#3).

Change subject: IMPALA-9661: Avoid introducing unused columns in table masking view
......................................................................

IMPALA-9661: Avoid introducing unused columns in table masking view

Previously, if a table has column masking policies, we replace its
unanalyzed TableRef with an analyzed InlineViewRef (table masking view)
in FromClause.analyze(). However, we can't detect which columns are
actually used in the original query at this point. In fact, analyze()
for SelectList, WhereClause, GroupByClause and other clauses containing
SlotRefs happen after FromClause.analyze(). After the whole query block
is analyzed, we can get the exact set of required columns.

This patch refactor the codes to do table masking after analyze() to
avoid introducing unused columns. Referenced columns of a TableRef are
registered in analyze(), which helps to figure out what columns are
actually needed.

With this, we don't need to revert table masking in FromClause.reset().
The doTableMasking flag in AST is also removed since now the table mask
is resolved once after analyze().

Tests:
 - Run column masking and row filtering tests in test_ranger.py
 - Run FE audit tests

Change-Id: Ib015a8ab528065907b27fbdceb8e2818deb814e1
---
M fe/src/main/java/org/apache/impala/analysis/AlterViewStmt.java
M fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java
M fe/src/main/java/org/apache/impala/analysis/Analyzer.java
M fe/src/main/java/org/apache/impala/analysis/CreateTableAsSelectStmt.java
M fe/src/main/java/org/apache/impala/analysis/CreateViewStmt.java
M fe/src/main/java/org/apache/impala/analysis/Expr.java
M fe/src/main/java/org/apache/impala/analysis/FromClause.java
M fe/src/main/java/org/apache/impala/analysis/InlineViewRef.java
M fe/src/main/java/org/apache/impala/analysis/InsertStmt.java
M fe/src/main/java/org/apache/impala/analysis/QueryStmt.java
M fe/src/main/java/org/apache/impala/analysis/SelectStmt.java
M fe/src/main/java/org/apache/impala/analysis/SetOperationStmt.java
M fe/src/main/java/org/apache/impala/analysis/SlotRef.java
M fe/src/main/java/org/apache/impala/analysis/StmtNode.java
M fe/src/main/java/org/apache/impala/analysis/Subquery.java
M fe/src/main/java/org/apache/impala/analysis/TableRef.java
M fe/src/main/java/org/apache/impala/analysis/WithClause.java
M fe/src/main/java/org/apache/impala/authorization/TableMask.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java
M fe/src/test/java/org/apache/impala/authorization/ranger/RangerAuditLogTest.java
20 files changed, 335 insertions(+), 226 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/99/17199/3
-- 
To view, visit http://gerrit.cloudera.org:8080/17199
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ib015a8ab528065907b27fbdceb8e2818deb814e1
Gerrit-Change-Number: 17199
Gerrit-PatchSet: 3
Gerrit-Owner: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>

[Impala-ASF-CR] IMPALA-9661: Avoid introducing unused columns in table masking view

Posted by "Aman Sinha (Code Review)" <ge...@cloudera.org>.
Aman Sinha has posted comments on this change. ( http://gerrit.cloudera.org:8080/17199 )

Change subject: IMPALA-9661: Avoid introducing unused columns in table masking view
......................................................................


Patch Set 5:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/17199/5/fe/src/main/java/org/apache/impala/analysis/Analyzer.java
File fe/src/main/java/org/apache/impala/analysis/Analyzer.java:

http://gerrit.cloudera.org:8080/#/c/17199/5/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@860
PS5, Line 860: making
nit: masking


http://gerrit.cloudera.org:8080/#/c/17199/5/testdata/workloads/functional-query/queries/QueryTest/ranger_row_filtering.test
File testdata/workloads/functional-query/queries/QueryTest/ranger_row_filtering.test:

http://gerrit.cloudera.org:8080/#/c/17199/5/testdata/workloads/functional-query/queries/QueryTest/ranger_row_filtering.test@88
PS5, Line 88: ---- QUERY
Thanks for adding the HAVING clause test. I am not sure if it came up earlier in the review but since you are also examining the SELECT list expressions, there could be scalar subqueries there as well.  It can occur at top level or within a CASE expr or an OVER clause. For example:

  SELECT int_col, (select min(bigint_col) from alltypestiny) as x from alltypes;
  SELECT int_col, case when id < (select min(id) from alltypestiny) then 0 else 1 end as x from alltypes t2;
  SELECT count(*) over (partition by case when id > (select min(id) from alltypestiny) then id else id + 5 end) as x from alltypes t2;

All these are converted to a Nested loop join between alltypestiny and alltypes.  
Currently, Impala only supports non-correlated SELECT list subqueries.
Could you add couple of such tests  ? Sorry, I should have suggested these earlier.



-- 
To view, visit http://gerrit.cloudera.org:8080/17199
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib015a8ab528065907b27fbdceb8e2818deb814e1
Gerrit-Change-Number: 17199
Gerrit-PatchSet: 5
Gerrit-Owner: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Sun, 21 Mar 2021 17:17:04 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-9661: Avoid introducing unused columns in table masking view

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17199 )

Change subject: IMPALA-9661: Avoid introducing unused columns in table masking view
......................................................................


Patch Set 4:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/8402/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/17199
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib015a8ab528065907b27fbdceb8e2818deb814e1
Gerrit-Change-Number: 17199
Gerrit-PatchSet: 4
Gerrit-Owner: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Fri, 19 Mar 2021 14:23:04 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-9661: Avoid introducing unused columns in table masking view

Posted by "Aman Sinha (Code Review)" <ge...@cloudera.org>.
Aman Sinha has posted comments on this change. ( http://gerrit.cloudera.org:8080/17199 )

Change subject: IMPALA-9661: Avoid introducing unused columns in table masking view
......................................................................


Patch Set 3:

(7 comments)

Thanks for working on this. Sending comments based on first pass.

http://gerrit.cloudera.org:8080/#/c/17199/3/fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java
File fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java:

http://gerrit.cloudera.org:8080/#/c/17199/3/fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java@491
PS3, Line 491: no user-visible effect
Is the 'no user-visible effect' accurate though ? The plan in the query profile will show the effects of the table masking isn't it ?


http://gerrit.cloudera.org:8080/#/c/17199/3/fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java@541
PS3, Line 541:     reAnalyzeWithoutPrivChecks(stmtTableCache, authzCtx, origResultTypes, origColLabels);
Shouldn't this only be called if reAnalyze = true ?  I am also curious why this is being called again (line 506 has the previous invocation).


http://gerrit.cloudera.org:8080/#/c/17199/3/fe/src/main/java/org/apache/impala/analysis/Analyzer.java
File fe/src/main/java/org/apache/impala/analysis/Analyzer.java:

http://gerrit.cloudera.org:8080/#/c/17199/3/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@860
PS3, Line 860: column %s
This says column but it seems the first parameter is resolvedTableRef's raw path ?


http://gerrit.cloudera.org:8080/#/c/17199/3/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@1326
PS3, Line 1326: names
nit: this is registering a Column object rather than just the name.


http://gerrit.cloudera.org:8080/#/c/17199/3/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@1340
PS3, Line 1340:       analyzer = analyzer.getParentAnalyzer();
Since getParentAnalyzer() can return null, we should break out of the loop if that happens.


http://gerrit.cloudera.org:8080/#/c/17199/3/fe/src/main/java/org/apache/impala/analysis/QueryStmt.java
File fe/src/main/java/org/apache/impala/analysis/QueryStmt.java:

http://gerrit.cloudera.org:8080/#/c/17199/3/fe/src/main/java/org/apache/impala/analysis/QueryStmt.java@171
PS3, Line 171:   public boolean resolveTableMask(Analyzer analyzer) throws AnalysisException {
Since this is not looking into the orderByElements_, I am wondering if it will miss the ORDER BY slot refs for table masking.


http://gerrit.cloudera.org:8080/#/c/17199/3/fe/src/main/java/org/apache/impala/analysis/SelectStmt.java
File fe/src/main/java/org/apache/impala/analysis/SelectStmt.java:

http://gerrit.cloudera.org:8080/#/c/17199/3/fe/src/main/java/org/apache/impala/analysis/SelectStmt.java@268
PS3, Line 268:   public boolean resolveTableMask(Analyzer analyzer) throws AnalysisException {
Shouldn't this also consider the groupByClause_, havingClause_, aggregate and analyticInfo ? Are the columns referenced by those getting captured in some other manner ?



-- 
To view, visit http://gerrit.cloudera.org:8080/17199
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib015a8ab528065907b27fbdceb8e2818deb814e1
Gerrit-Change-Number: 17199
Gerrit-PatchSet: 3
Gerrit-Owner: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Fri, 19 Mar 2021 06:31:52 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-9661: Avoid introducing unused columns in table masking view

Posted by "Aman Sinha (Code Review)" <ge...@cloudera.org>.
Aman Sinha has posted comments on this change. ( http://gerrit.cloudera.org:8080/17199 )

Change subject: IMPALA-9661: Avoid introducing unused columns in table masking view
......................................................................


Patch Set 4: Code-Review+1

Other than one pending comment, this looks good to me for a +1.


-- 
To view, visit http://gerrit.cloudera.org:8080/17199
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib015a8ab528065907b27fbdceb8e2818deb814e1
Gerrit-Change-Number: 17199
Gerrit-PatchSet: 4
Gerrit-Owner: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Sat, 20 Mar 2021 01:25:41 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-9661: Avoid introducing unused columns in table masking view

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17199 )

Change subject: IMPALA-9661: Avoid introducing unused columns in table masking view
......................................................................


Patch Set 3:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/8401/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/17199
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib015a8ab528065907b27fbdceb8e2818deb814e1
Gerrit-Change-Number: 17199
Gerrit-PatchSet: 3
Gerrit-Owner: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Fri, 19 Mar 2021 03:00:33 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-9661: Avoid introducing unused columns in table masking view

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17199 )

Change subject: IMPALA-9661: Avoid introducing unused columns in table masking view
......................................................................


Patch Set 6: Verified+1


-- 
To view, visit http://gerrit.cloudera.org:8080/17199
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib015a8ab528065907b27fbdceb8e2818deb814e1
Gerrit-Change-Number: 17199
Gerrit-PatchSet: 6
Gerrit-Owner: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Mon, 22 Mar 2021 08:40:59 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-9661: Avoid introducing unused columns in table masking view

Posted by "Aman Sinha (Code Review)" <ge...@cloudera.org>.
Aman Sinha has posted comments on this change. ( http://gerrit.cloudera.org:8080/17199 )

Change subject: IMPALA-9661: Avoid introducing unused columns in table masking view
......................................................................


Patch Set 6: Code-Review+2

Looks good to me. Let me know if you expect to make any other changes.


-- 
To view, visit http://gerrit.cloudera.org:8080/17199
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib015a8ab528065907b27fbdceb8e2818deb814e1
Gerrit-Change-Number: 17199
Gerrit-PatchSet: 6
Gerrit-Owner: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Mon, 22 Mar 2021 02:40:44 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-9661: Avoid introducing unused columns in table masking view

Posted by "Quanlong Huang (Code Review)" <ge...@cloudera.org>.
Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/17199 )

Change subject: IMPALA-9661: Avoid introducing unused columns in table masking view
......................................................................


Patch Set 6:

> Patch Set 6: Code-Review+2
> 
> Looks good to me. Let me know if you expect to make any other changes.

Thanks for the review! It's ready to go. Running GVO on this.


-- 
To view, visit http://gerrit.cloudera.org:8080/17199
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib015a8ab528065907b27fbdceb8e2818deb814e1
Gerrit-Change-Number: 17199
Gerrit-PatchSet: 6
Gerrit-Owner: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Mon, 22 Mar 2021 02:50:51 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-9661: Avoid introducing unused columns in table masking view

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/17199 )

Change subject: IMPALA-9661: Avoid introducing unused columns in table masking view
......................................................................

IMPALA-9661: Avoid introducing unused columns in table masking view

Previously, if a table has column masking policies, we replace its
unanalyzed TableRef with an analyzed InlineViewRef (table masking view)
in FromClause.analyze(). However, we can't detect which columns are
actually used in the original query at this point. In fact, analyze()
for SelectList, WhereClause, GroupByClause and other clauses containing
SlotRefs happen after FromClause.analyze(). After the whole query block
is analyzed, we can get the exact set of required columns.

This patch refactor the codes to do table masking after analyze() to
avoid introducing unused columns. Referenced columns of a TableRef are
registered in analyze(), which helps to figure out what columns are
actually needed.

With this, we don't need to revert table masking in FromClause.reset().
The doTableMasking flag in AST is also removed since now the table mask
is resolved once after analyze().

Tests:
 - Add more e2e tests in test_ranger.py
 - Run CORE tests

Change-Id: Ib015a8ab528065907b27fbdceb8e2818deb814e1
Reviewed-on: http://gerrit.cloudera.org:8080/17199
Reviewed-by: Aman Sinha <am...@cloudera.com>
Tested-by: Impala Public Jenkins <im...@cloudera.com>
---
M fe/src/main/java/org/apache/impala/analysis/AlterViewStmt.java
M fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java
M fe/src/main/java/org/apache/impala/analysis/Analyzer.java
M fe/src/main/java/org/apache/impala/analysis/CreateTableAsSelectStmt.java
M fe/src/main/java/org/apache/impala/analysis/CreateViewStmt.java
M fe/src/main/java/org/apache/impala/analysis/Expr.java
M fe/src/main/java/org/apache/impala/analysis/FromClause.java
M fe/src/main/java/org/apache/impala/analysis/InlineViewRef.java
M fe/src/main/java/org/apache/impala/analysis/InsertStmt.java
M fe/src/main/java/org/apache/impala/analysis/QueryStmt.java
M fe/src/main/java/org/apache/impala/analysis/SelectStmt.java
M fe/src/main/java/org/apache/impala/analysis/SetOperationStmt.java
M fe/src/main/java/org/apache/impala/analysis/SlotRef.java
M fe/src/main/java/org/apache/impala/analysis/StmtNode.java
M fe/src/main/java/org/apache/impala/analysis/Subquery.java
M fe/src/main/java/org/apache/impala/analysis/TableRef.java
M fe/src/main/java/org/apache/impala/analysis/WithClause.java
M fe/src/main/java/org/apache/impala/authorization/TableMask.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java
M fe/src/test/java/org/apache/impala/authorization/ranger/RangerAuditLogTest.java
M testdata/workloads/functional-query/queries/QueryTest/ranger_column_masking_and_row_filtering.test
M testdata/workloads/functional-query/queries/QueryTest/ranger_row_filtering.test
M tests/authorization/test_ranger.py
23 files changed, 518 insertions(+), 239 deletions(-)

Approvals:
  Aman Sinha: Looks good to me, approved
  Impala Public Jenkins: Verified

-- 
To view, visit http://gerrit.cloudera.org:8080/17199
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: Ib015a8ab528065907b27fbdceb8e2818deb814e1
Gerrit-Change-Number: 17199
Gerrit-PatchSet: 7
Gerrit-Owner: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>

[Impala-ASF-CR] IMPALA-9661: Avoid introducing unused columns in table masking view

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17199 )

Change subject: IMPALA-9661: Avoid introducing unused columns in table masking view
......................................................................


Patch Set 2:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/8400/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/17199
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib015a8ab528065907b27fbdceb8e2818deb814e1
Gerrit-Change-Number: 17199
Gerrit-PatchSet: 2
Gerrit-Owner: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Fri, 19 Mar 2021 02:20:20 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-9661: Avoid introducing unused columns in table masking view

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17199 )

Change subject: IMPALA-9661: Avoid introducing unused columns in table masking view
......................................................................


Patch Set 1:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/8388/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/17199
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib015a8ab528065907b27fbdceb8e2818deb814e1
Gerrit-Change-Number: 17199
Gerrit-PatchSet: 1
Gerrit-Owner: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Thu, 18 Mar 2021 13:46:24 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-9661: Avoid introducing unused columns in table masking view

Posted by "Aman Sinha (Code Review)" <ge...@cloudera.org>.
Aman Sinha has posted comments on this change. ( http://gerrit.cloudera.org:8080/17199 )

Change subject: IMPALA-9661: Avoid introducing unused columns in table masking view
......................................................................


Patch Set 4:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/17199/3/fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java
File fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java:

http://gerrit.cloudera.org:8080/#/c/17199/3/fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java@541
PS3, Line 541:           analysisResult_.stmt_ =
> Yeah, when reAnalyze = false, we return at line 528. We call this at line 5
Ah, I missed the early return.  For the second call, yeah comments would be good.


http://gerrit.cloudera.org:8080/#/c/17199/3/fe/src/main/java/org/apache/impala/analysis/Analyzer.java
File fe/src/main/java/org/apache/impala/analysis/Analyzer.java:

http://gerrit.cloudera.org:8080/#/c/17199/3/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@860
PS3, Line 860: place "co
> Yeah, it's actually a collection column. Here's an example query when the r
Thanks for the explanation.  Adding the comments makes sense.


http://gerrit.cloudera.org:8080/#/c/17199/3/fe/src/main/java/org/apache/impala/analysis/SelectStmt.java
File fe/src/main/java/org/apache/impala/analysis/SelectStmt.java:

http://gerrit.cloudera.org:8080/#/c/17199/3/fe/src/main/java/org/apache/impala/analysis/SelectStmt.java@268
PS3, Line 268:   public boolean resolveTableMask(Analyzer analyzer) throws AnalysisException {
> We only need to recurse in subqueries to replace all the catalog table/view
It is not clear to me why WHERE clause is considered here but HAVING is not even though HAVING can have a subquery..for example: 
  SELECT a, count(*) FROM t1 GROUP BY a HAVING count(*) > (SELECT min(b) FROM t2) 
If this is checked in another part of the code, that's fine.. maybe you can add the comments.



-- 
To view, visit http://gerrit.cloudera.org:8080/17199
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib015a8ab528065907b27fbdceb8e2818deb814e1
Gerrit-Change-Number: 17199
Gerrit-PatchSet: 4
Gerrit-Owner: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Sat, 20 Mar 2021 01:18:14 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-9661: Avoid introducing unused columns in table masking view

Posted by "Quanlong Huang (Code Review)" <ge...@cloudera.org>.
Hello Aman Sinha, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/17199

to look at the new patch set (#5).

Change subject: IMPALA-9661: Avoid introducing unused columns in table masking view
......................................................................

IMPALA-9661: Avoid introducing unused columns in table masking view

Previously, if a table has column masking policies, we replace its
unanalyzed TableRef with an analyzed InlineViewRef (table masking view)
in FromClause.analyze(). However, we can't detect which columns are
actually used in the original query at this point. In fact, analyze()
for SelectList, WhereClause, GroupByClause and other clauses containing
SlotRefs happen after FromClause.analyze(). After the whole query block
is analyzed, we can get the exact set of required columns.

This patch refactor the codes to do table masking after analyze() to
avoid introducing unused columns. Referenced columns of a TableRef are
registered in analyze(), which helps to figure out what columns are
actually needed.

With this, we don't need to revert table masking in FromClause.reset().
The doTableMasking flag in AST is also removed since now the table mask
is resolved once after analyze().

Tests:
 - Add more e2e tests in test_ranger.py
 - Run CORE tests

Change-Id: Ib015a8ab528065907b27fbdceb8e2818deb814e1
---
M fe/src/main/java/org/apache/impala/analysis/AlterViewStmt.java
M fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java
M fe/src/main/java/org/apache/impala/analysis/Analyzer.java
M fe/src/main/java/org/apache/impala/analysis/CreateTableAsSelectStmt.java
M fe/src/main/java/org/apache/impala/analysis/CreateViewStmt.java
M fe/src/main/java/org/apache/impala/analysis/Expr.java
M fe/src/main/java/org/apache/impala/analysis/FromClause.java
M fe/src/main/java/org/apache/impala/analysis/InlineViewRef.java
M fe/src/main/java/org/apache/impala/analysis/InsertStmt.java
M fe/src/main/java/org/apache/impala/analysis/QueryStmt.java
M fe/src/main/java/org/apache/impala/analysis/SelectStmt.java
M fe/src/main/java/org/apache/impala/analysis/SetOperationStmt.java
M fe/src/main/java/org/apache/impala/analysis/SlotRef.java
M fe/src/main/java/org/apache/impala/analysis/StmtNode.java
M fe/src/main/java/org/apache/impala/analysis/Subquery.java
M fe/src/main/java/org/apache/impala/analysis/TableRef.java
M fe/src/main/java/org/apache/impala/analysis/WithClause.java
M fe/src/main/java/org/apache/impala/authorization/TableMask.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java
M fe/src/test/java/org/apache/impala/authorization/ranger/RangerAuditLogTest.java
M testdata/workloads/functional-query/queries/QueryTest/ranger_column_masking_and_row_filtering.test
M testdata/workloads/functional-query/queries/QueryTest/ranger_row_filtering.test
M tests/authorization/test_ranger.py
23 files changed, 457 insertions(+), 239 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/99/17199/5
-- 
To view, visit http://gerrit.cloudera.org:8080/17199
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ib015a8ab528065907b27fbdceb8e2818deb814e1
Gerrit-Change-Number: 17199
Gerrit-PatchSet: 5
Gerrit-Owner: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>

[Impala-ASF-CR] IMPALA-9661: Avoid introducing unused columns in table masking view

Posted by "Quanlong Huang (Code Review)" <ge...@cloudera.org>.
Hello Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/17199

to look at the new patch set (#2).

Change subject: IMPALA-9661: Avoid introducing unused columns in table masking view
......................................................................

IMPALA-9661: Avoid introducing unused columns in table masking view

Previously, if a table has column masking policies, we replace its
unanalyzed TableRef with an analyzed InlineViewRef (table masking view)
in FromClause.analyze(). However, we can't detect which columns are
actually used in the original query at this point. In fact, analyze()
for SelectList, WhereClause, GroupByClause and other clauses containing
SlotRefs happen after FromClause.analyze(). After the whole query block
is analyzed, we can get the exact set of required columns.

This patch refactor the codes to do table masking after analyze() to
avoid introducing unused columns. Referenced columns of a TableRef are
registered in analyze(), which helps to figure out what columns are
actually needed.

With this, we don't need to revert table masking in FromClause.reset().
The doTableMasking flag in AST is also removed since now the table mask
is resolved once after analyze().

Tests:
 - Run column masking and row filtering tests in test_ranger.py
 - Run FE audit tests

Change-Id: Ib015a8ab528065907b27fbdceb8e2818deb814e1
---
M fe/src/main/java/org/apache/impala/analysis/AlterViewStmt.java
M fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java
M fe/src/main/java/org/apache/impala/analysis/Analyzer.java
M fe/src/main/java/org/apache/impala/analysis/CreateTableAsSelectStmt.java
M fe/src/main/java/org/apache/impala/analysis/CreateViewStmt.java
M fe/src/main/java/org/apache/impala/analysis/Expr.java
M fe/src/main/java/org/apache/impala/analysis/FromClause.java
M fe/src/main/java/org/apache/impala/analysis/InlineViewRef.java
M fe/src/main/java/org/apache/impala/analysis/InsertStmt.java
M fe/src/main/java/org/apache/impala/analysis/QueryStmt.java
M fe/src/main/java/org/apache/impala/analysis/SelectStmt.java
M fe/src/main/java/org/apache/impala/analysis/SetOperationStmt.java
M fe/src/main/java/org/apache/impala/analysis/SlotRef.java
M fe/src/main/java/org/apache/impala/analysis/StmtNode.java
M fe/src/main/java/org/apache/impala/analysis/Subquery.java
M fe/src/main/java/org/apache/impala/analysis/TableRef.java
M fe/src/main/java/org/apache/impala/analysis/WithClause.java
M fe/src/main/java/org/apache/impala/authorization/TableMask.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java
M fe/src/test/java/org/apache/impala/authorization/ranger/RangerAuditLogTest.java
20 files changed, 270 insertions(+), 226 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/99/17199/2
-- 
To view, visit http://gerrit.cloudera.org:8080/17199
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ib015a8ab528065907b27fbdceb8e2818deb814e1
Gerrit-Change-Number: 17199
Gerrit-PatchSet: 2
Gerrit-Owner: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>

[Impala-ASF-CR] IMPALA-9661: Avoid introducing unused columns in table masking view

Posted by "Quanlong Huang (Code Review)" <ge...@cloudera.org>.
Hello Aman Sinha, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/17199

to look at the new patch set (#4).

Change subject: IMPALA-9661: Avoid introducing unused columns in table masking view
......................................................................

IMPALA-9661: Avoid introducing unused columns in table masking view

Previously, if a table has column masking policies, we replace its
unanalyzed TableRef with an analyzed InlineViewRef (table masking view)
in FromClause.analyze(). However, we can't detect which columns are
actually used in the original query at this point. In fact, analyze()
for SelectList, WhereClause, GroupByClause and other clauses containing
SlotRefs happen after FromClause.analyze(). After the whole query block
is analyzed, we can get the exact set of required columns.

This patch refactor the codes to do table masking after analyze() to
avoid introducing unused columns. Referenced columns of a TableRef are
registered in analyze(), which helps to figure out what columns are
actually needed.

With this, we don't need to revert table masking in FromClause.reset().
The doTableMasking flag in AST is also removed since now the table mask
is resolved once after analyze().

Tests:
 - Run column masking and row filtering tests in test_ranger.py
 - Run FE audit tests

Change-Id: Ib015a8ab528065907b27fbdceb8e2818deb814e1
---
M fe/src/main/java/org/apache/impala/analysis/AlterViewStmt.java
M fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java
M fe/src/main/java/org/apache/impala/analysis/Analyzer.java
M fe/src/main/java/org/apache/impala/analysis/CreateTableAsSelectStmt.java
M fe/src/main/java/org/apache/impala/analysis/CreateViewStmt.java
M fe/src/main/java/org/apache/impala/analysis/Expr.java
M fe/src/main/java/org/apache/impala/analysis/FromClause.java
M fe/src/main/java/org/apache/impala/analysis/InlineViewRef.java
M fe/src/main/java/org/apache/impala/analysis/InsertStmt.java
M fe/src/main/java/org/apache/impala/analysis/QueryStmt.java
M fe/src/main/java/org/apache/impala/analysis/SelectStmt.java
M fe/src/main/java/org/apache/impala/analysis/SetOperationStmt.java
M fe/src/main/java/org/apache/impala/analysis/SlotRef.java
M fe/src/main/java/org/apache/impala/analysis/StmtNode.java
M fe/src/main/java/org/apache/impala/analysis/Subquery.java
M fe/src/main/java/org/apache/impala/analysis/TableRef.java
M fe/src/main/java/org/apache/impala/analysis/WithClause.java
M fe/src/main/java/org/apache/impala/authorization/TableMask.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java
M fe/src/test/java/org/apache/impala/authorization/ranger/RangerAuditLogTest.java
20 files changed, 352 insertions(+), 227 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/99/17199/4
-- 
To view, visit http://gerrit.cloudera.org:8080/17199
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ib015a8ab528065907b27fbdceb8e2818deb814e1
Gerrit-Change-Number: 17199
Gerrit-PatchSet: 4
Gerrit-Owner: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>

[Impala-ASF-CR] IMPALA-9661: Avoid introducing unused columns in table masking view

Posted by "Quanlong Huang (Code Review)" <ge...@cloudera.org>.
Hello Aman Sinha, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/17199

to look at the new patch set (#6).

Change subject: IMPALA-9661: Avoid introducing unused columns in table masking view
......................................................................

IMPALA-9661: Avoid introducing unused columns in table masking view

Previously, if a table has column masking policies, we replace its
unanalyzed TableRef with an analyzed InlineViewRef (table masking view)
in FromClause.analyze(). However, we can't detect which columns are
actually used in the original query at this point. In fact, analyze()
for SelectList, WhereClause, GroupByClause and other clauses containing
SlotRefs happen after FromClause.analyze(). After the whole query block
is analyzed, we can get the exact set of required columns.

This patch refactor the codes to do table masking after analyze() to
avoid introducing unused columns. Referenced columns of a TableRef are
registered in analyze(), which helps to figure out what columns are
actually needed.

With this, we don't need to revert table masking in FromClause.reset().
The doTableMasking flag in AST is also removed since now the table mask
is resolved once after analyze().

Tests:
 - Add more e2e tests in test_ranger.py
 - Run CORE tests

Change-Id: Ib015a8ab528065907b27fbdceb8e2818deb814e1
---
M fe/src/main/java/org/apache/impala/analysis/AlterViewStmt.java
M fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java
M fe/src/main/java/org/apache/impala/analysis/Analyzer.java
M fe/src/main/java/org/apache/impala/analysis/CreateTableAsSelectStmt.java
M fe/src/main/java/org/apache/impala/analysis/CreateViewStmt.java
M fe/src/main/java/org/apache/impala/analysis/Expr.java
M fe/src/main/java/org/apache/impala/analysis/FromClause.java
M fe/src/main/java/org/apache/impala/analysis/InlineViewRef.java
M fe/src/main/java/org/apache/impala/analysis/InsertStmt.java
M fe/src/main/java/org/apache/impala/analysis/QueryStmt.java
M fe/src/main/java/org/apache/impala/analysis/SelectStmt.java
M fe/src/main/java/org/apache/impala/analysis/SetOperationStmt.java
M fe/src/main/java/org/apache/impala/analysis/SlotRef.java
M fe/src/main/java/org/apache/impala/analysis/StmtNode.java
M fe/src/main/java/org/apache/impala/analysis/Subquery.java
M fe/src/main/java/org/apache/impala/analysis/TableRef.java
M fe/src/main/java/org/apache/impala/analysis/WithClause.java
M fe/src/main/java/org/apache/impala/authorization/TableMask.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java
M fe/src/test/java/org/apache/impala/authorization/ranger/RangerAuditLogTest.java
M testdata/workloads/functional-query/queries/QueryTest/ranger_column_masking_and_row_filtering.test
M testdata/workloads/functional-query/queries/QueryTest/ranger_row_filtering.test
M tests/authorization/test_ranger.py
23 files changed, 518 insertions(+), 239 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/99/17199/6
-- 
To view, visit http://gerrit.cloudera.org:8080/17199
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ib015a8ab528065907b27fbdceb8e2818deb814e1
Gerrit-Change-Number: 17199
Gerrit-PatchSet: 6
Gerrit-Owner: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>

[Impala-ASF-CR] IMPALA-9661: Avoid introducing unused columns in table masking view

Posted by "Quanlong Huang (Code Review)" <ge...@cloudera.org>.
Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/17199 )

Change subject: IMPALA-9661: Avoid introducing unused columns in table masking view
......................................................................


Patch Set 2:

(1 comment)

Fix a test failure in AnalyzeStmtsTest. Slightly refactor some codes.

http://gerrit.cloudera.org:8080/#/c/17199/1/fe/src/test/java/org/apache/impala/authorization/ranger/RangerAuditLogTest.java
File fe/src/test/java/org/apache/impala/authorization/ranger/RangerAuditLogTest.java:

http://gerrit.cloudera.org:8080/#/c/17199/1/fe/src/test/java/org/apache/impala/authorization/ranger/RangerAuditLogTest.java@359
PS1, Line 359:         assertEventEquals("@column", "select", "functional/alltypestiny/date_string_col",
> line too long (92 > 90)
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/17199
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib015a8ab528065907b27fbdceb8e2818deb814e1
Gerrit-Change-Number: 17199
Gerrit-PatchSet: 2
Gerrit-Owner: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Fri, 19 Mar 2021 02:00:43 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-9661: Avoid introducing unused columns in table masking view

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17199 )

Change subject: IMPALA-9661: Avoid introducing unused columns in table masking view
......................................................................


Patch Set 5:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/8413/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/17199
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib015a8ab528065907b27fbdceb8e2818deb814e1
Gerrit-Change-Number: 17199
Gerrit-PatchSet: 5
Gerrit-Owner: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Sun, 21 Mar 2021 13:03:50 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-9661: Avoid introducing unused columns in table masking view

Posted by "Quanlong Huang (Code Review)" <ge...@cloudera.org>.
Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/17199 )

Change subject: IMPALA-9661: Avoid introducing unused columns in table masking view
......................................................................


Patch Set 5:

(1 comment)

Added more tests. Fix the bug of HAVING clause and the null pointer bug introduce in PS4.

http://gerrit.cloudera.org:8080/#/c/17199/3/fe/src/main/java/org/apache/impala/analysis/SelectStmt.java
File fe/src/main/java/org/apache/impala/analysis/SelectStmt.java:

http://gerrit.cloudera.org:8080/#/c/17199/3/fe/src/main/java/org/apache/impala/analysis/SelectStmt.java@268
PS3, Line 268:   public boolean resolveTableMask(Analyzer analyzer) throws AnalysisException {
> It is not clear to me why WHERE clause is considered here but HAVING is not
Ah, it's a bug! I forgot that HAVING clause can have subqueries too... Thanks for catching this!



-- 
To view, visit http://gerrit.cloudera.org:8080/17199
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib015a8ab528065907b27fbdceb8e2818deb814e1
Gerrit-Change-Number: 17199
Gerrit-PatchSet: 5
Gerrit-Owner: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Sun, 21 Mar 2021 12:47:16 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-9661: Avoid introducing unused columns in table masking view

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17199 )

Change subject: IMPALA-9661: Avoid introducing unused columns in table masking view
......................................................................


Patch Set 6:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/8414/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/17199
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib015a8ab528065907b27fbdceb8e2818deb814e1
Gerrit-Change-Number: 17199
Gerrit-PatchSet: 6
Gerrit-Owner: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Mon, 22 Mar 2021 02:30:12 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-9661: Avoid introducing unused columns in table masking view

Posted by "Quanlong Huang (Code Review)" <ge...@cloudera.org>.
Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/17199 )

Change subject: IMPALA-9661: Avoid introducing unused columns in table masking view
......................................................................


Patch Set 3:

(7 comments)

Thank for the quick review! Added more comments.

http://gerrit.cloudera.org:8080/#/c/17199/3/fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java
File fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java:

http://gerrit.cloudera.org:8080/#/c/17199/3/fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java@491
PS3, Line 491: no user-visible effect
> Is the 'no user-visible effect' accurate though ? The plan in the query pro
Yeah, you are right. These are old comments. We should update them. I think it means the results will have the same types and labels as the original query regardless how we rewrite it. Even without table masking, the query plan may change after rewrite. So I think 'no user-visible effect' just means the query results.

Updated the comment.


http://gerrit.cloudera.org:8080/#/c/17199/3/fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java@541
PS3, Line 541:     reAnalyzeWithoutPrivChecks(stmtTableCache, authzCtx, origResultTypes, origColLabels);
> Shouldn't this only be called if reAnalyze = true ?  I am also curious why 
Yeah, when reAnalyze = false, we return at line 528. We call this at line 506 because rewriters require the AST is analyzed. Table masking will create new InlineViewRefs which are not analyzed. We also need some SlotRefs being resolved again to reference output exprs of these InlineViewRefs. Added some comments about this.


http://gerrit.cloudera.org:8080/#/c/17199/3/fe/src/main/java/org/apache/impala/analysis/Analyzer.java
File fe/src/main/java/org/apache/impala/analysis/Analyzer.java:

http://gerrit.cloudera.org:8080/#/c/17199/3/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@860
PS3, Line 860: column %s
> This says column but it seems the first parameter is resolvedTableRef's raw
Yeah, it's actually a collection column. Here's an example query when the resolvedTableRef is a non-relative CollectionTableRef:

 use functional_parquet;
 select item from complextypestbl.int_array;

At the parsing phase, "complextypestbl.int_array" is parsed as a TableRef. Before analyze(), we don't know whether it's a table named "int_array" in the database named "complextypestbl". After analyze(), the TableRef is resolved. It's now a CollectionTableRef and isRelative() returns false. At this case, the column name we get is "functional_parquet.complextypestbl.int_array". We have test coverage for this in ranger_row_filtering.test.

FWIW, here is a example for a relative CollectionTableRef:

 select a.item from complextypestbl t, t.int_array a;

The TableRef "t.int_array" is a relative CollectionTableRef. It references to the complextypestbl table.

Added some comments since I think the "relative" concept is not straighforword to understand.


http://gerrit.cloudera.org:8080/#/c/17199/3/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@1326
PS3, Line 1326: names
> nit: this is registering a Column object rather than just the name.
Done


http://gerrit.cloudera.org:8080/#/c/17199/3/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@1340
PS3, Line 1340:       analyzer = analyzer.getParentAnalyzer();
> Since getParentAnalyzer() can return null, we should break out of the loop 
Good point! I think we should add assertion here. If we can't find the tblRef, it means the slotDesc is illegal.


http://gerrit.cloudera.org:8080/#/c/17199/3/fe/src/main/java/org/apache/impala/analysis/QueryStmt.java
File fe/src/main/java/org/apache/impala/analysis/QueryStmt.java:

http://gerrit.cloudera.org:8080/#/c/17199/3/fe/src/main/java/org/apache/impala/analysis/QueryStmt.java@171
PS3, Line 171:   public boolean resolveTableMask(Analyzer analyzer) throws AnalysisException {
> Since this is not looking into the orderByElements_, I am wondering if it w
No, we just need to recurse in subqueries so don't need to consider ORDER BY yet.


http://gerrit.cloudera.org:8080/#/c/17199/3/fe/src/main/java/org/apache/impala/analysis/SelectStmt.java
File fe/src/main/java/org/apache/impala/analysis/SelectStmt.java:

http://gerrit.cloudera.org:8080/#/c/17199/3/fe/src/main/java/org/apache/impala/analysis/SelectStmt.java@268
PS3, Line 268:   public boolean resolveTableMask(Analyzer analyzer) throws AnalysisException {
> Shouldn't this also consider the groupByClause_, havingClause_, aggregate a
We only need to recurse in subqueries to replace all the catalog table/view with table masking views. For SlotRefs referencing to the original table refs, they will be reset and re-analyzed to reference to the table masking view.



-- 
To view, visit http://gerrit.cloudera.org:8080/17199
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib015a8ab528065907b27fbdceb8e2818deb814e1
Gerrit-Change-Number: 17199
Gerrit-PatchSet: 3
Gerrit-Owner: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Fri, 19 Mar 2021 14:03:38 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-9661: Avoid introducing unused columns in table masking view

Posted by "Quanlong Huang (Code Review)" <ge...@cloudera.org>.
Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/17199 )

Change subject: IMPALA-9661: Avoid introducing unused columns in table masking view
......................................................................


Patch Set 6:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/17199/5/fe/src/main/java/org/apache/impala/analysis/Analyzer.java
File fe/src/main/java/org/apache/impala/analysis/Analyzer.java:

http://gerrit.cloudera.org:8080/#/c/17199/5/fe/src/main/java/org/apache/impala/analysis/Analyzer.java@860
PS5, Line 860: maskin
> nit: masking
Done


http://gerrit.cloudera.org:8080/#/c/17199/5/testdata/workloads/functional-query/queries/QueryTest/ranger_row_filtering.test
File testdata/workloads/functional-query/queries/QueryTest/ranger_row_filtering.test:

http://gerrit.cloudera.org:8080/#/c/17199/5/testdata/workloads/functional-query/queries/QueryTest/ranger_row_filtering.test@88
PS5, Line 88: ---- QUERY
> Thanks for adding the HAVING clause test. I am not sure if it came up earli
Sure. We do need coverage for this. Welcome for more test ideas!



-- 
To view, visit http://gerrit.cloudera.org:8080/17199
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib015a8ab528065907b27fbdceb8e2818deb814e1
Gerrit-Change-Number: 17199
Gerrit-PatchSet: 6
Gerrit-Owner: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Mon, 22 Mar 2021 02:10:11 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-9661: Avoid introducing unused columns in table masking view

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17199 )

Change subject: IMPALA-9661: Avoid introducing unused columns in table masking view
......................................................................


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/17199/1/fe/src/test/java/org/apache/impala/authorization/ranger/RangerAuditLogTest.java
File fe/src/test/java/org/apache/impala/authorization/ranger/RangerAuditLogTest.java:

http://gerrit.cloudera.org:8080/#/c/17199/1/fe/src/test/java/org/apache/impala/authorization/ranger/RangerAuditLogTest.java@359
PS1, Line 359:         assertEventEquals("@column", "select", "functional/alltypestiny/date_string_col", 1,
line too long (92 > 90)



-- 
To view, visit http://gerrit.cloudera.org:8080/17199
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib015a8ab528065907b27fbdceb8e2818deb814e1
Gerrit-Change-Number: 17199
Gerrit-PatchSet: 1
Gerrit-Owner: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Thu, 18 Mar 2021 13:27:04 +0000
Gerrit-HasComments: Yes