You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@impala.apache.org by "Alex Behm (Code Review)" <ge...@cloudera.org> on 2016/10/28 14:52:26 UTC

[Impala-ASF-CR] IMPALA-1286: Extract common conjuncts from disjunctions.

Alex Behm has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/4877

Change subject: IMPALA-1286: Extract common conjuncts from disjunctions.
......................................................................

IMPALA-1286: Extract common conjuncts from disjunctions.

Adds a new ExprRewriteRule to extract common conjuncts from
disjunctions.

Examples:
(a AND b AND c) OR (b AND d) ==> b AND ((a AND c) OR (d))
(a AND b) OR (a AND b) ==> a AND b
(a AND b AND c) OR (c) ==> c

Testing: Added a new unit test in ExprRewriteRulesTest.

Change-Id: I3cf9b950afaa3fd753d1b09ba5e540b5258940ad
---
M fe/src/main/java/org/apache/impala/analysis/CompoundPredicate.java
A fe/src/main/java/org/apache/impala/rewrite/ExtractCommonConjunctRule.java
M fe/src/test/java/org/apache/impala/analysis/ExprRewriteRulesTest.java
M fe/src/test/java/org/apache/impala/analysis/ExprTest.java
M testdata/workloads/functional-planner/queries/PlannerTest/tpch-all.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpch-nested.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpch-views.test
7 files changed, 198 insertions(+), 17 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/77/4877/1
-- 
To view, visit http://gerrit.cloudera.org:8080/4877
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: I3cf9b950afaa3fd753d1b09ba5e540b5258940ad
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Alex Behm <al...@cloudera.com>

[Impala-ASF-CR] IMPALA-1286: Extract common conjuncts from disjunctions.

Posted by "Alex Behm (Code Review)" <ge...@cloudera.org>.
Alex Behm has posted comments on this change.

Change subject: IMPALA-1286: Extract common conjuncts from disjunctions.
......................................................................


Patch Set 5:

Thanks for the explanation and your patience. I'm certainly open to learning more about a cleaner design.

For the most part, the output type of an Expr is determined by its input types.

Sounds like we need to sit together and flesh out the future direction a little more. I'm still not sure I follow how your example would solve the CTAS problem. Can a LITERAL(2) be inserted into a column with a TINYINT column?

I think there is actually a more severe problem than the ones you identified. Expr optimizations can potentially change the result of a query. I believe this is true even with our existing expr substitution logic. Here's one example:

bool_col OR bool_col --> bool_col

The original Expr will return FALSE when bool_col is NULL, but the new Expr will return NULL. We can come up with other such examples.
Dealing with NULLs is generally tricky when transforming Exprs.

-- 
To view, visit http://gerrit.cloudera.org:8080/4877
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I3cf9b950afaa3fd753d1b09ba5e540b5258940ad
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-1286: Extract common conjuncts from disjunctions.

Posted by "Marcel Kornacker (Code Review)" <ge...@cloudera.org>.
Marcel Kornacker has posted comments on this change.

Change subject: IMPALA-1286: Extract common conjuncts from disjunctions.
......................................................................


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/4877/1/fe/src/test/java/org/apache/impala/analysis/ExprRewriteRulesTest.java
File fe/src/test/java/org/apache/impala/analysis/ExprRewriteRulesTest.java:

Line 146:         "(!(int_col=5 or tinyint_col > 9) and double_col = 7) or " +
> I think we should just be mindful that this change without normalisation ad
the latter is obviously much simpler.

i agree that the way it works now, this common-conjunct transformation only works for a very narrow use case.

to make it generally useful will require fairly extensive normalization (one of which should be to make && and || n-ary as opposed to binary, so we don't have to mix up traversal order with rule application - in this case, we'd have a single || predicate that includes all disjunctive clauses as children).


-- 
To view, visit http://gerrit.cloudera.org:8080/4877
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I3cf9b950afaa3fd753d1b09ba5e540b5258940ad
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-1286: Extract common conjuncts from disjunctions.

Posted by "Alex Behm (Code Review)" <ge...@cloudera.org>.
Alex Behm has posted comments on this change.

Change subject: IMPALA-1286: Extract common conjuncts from disjunctions.
......................................................................


Patch Set 2:

(7 comments)

http://gerrit.cloudera.org:8080/#/c/4877/2/fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java
File fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java:

Line 70:         BetweenToCompoundRule.INSTANCE, ExtractCommonConjunctRule.INSTANCE);
> for added effectiveness you want the between rule to be applied before the 
Done


http://gerrit.cloudera.org:8080/#/c/4877/2/fe/src/main/java/org/apache/impala/analysis/Subquery.java
File fe/src/main/java/org/apache/impala/analysis/Subquery.java:

Line 155:     return stmt_.toSql().equals(((Subquery)o).stmt_.toSql());
> hm, that seems a bit fragile, won't insignificant syntactic differences (su
Agree, somewhat fragile. But correct.

This change exposed an existing bug involving the lack of a proper Subquery.equals(). I filed a separate JIRA and CR for that bug. Added the TODO to that patch which should go in before this one.

Bugfix CR:
http://gerrit.cloudera.org:8080/4923


http://gerrit.cloudera.org:8080/#/c/4877/1/fe/src/main/java/org/apache/impala/rewrite/ExtractCommonConjunctRule.java
File fe/src/main/java/org/apache/impala/rewrite/ExtractCommonConjunctRule.java:

Line 31:  * Extracts common conjuncts from a single disjunctive CompoundPredicate.
> I agree we don't need to document the framework here, but I think it's non-
Used your wording verbatim.


http://gerrit.cloudera.org:8080/#/c/4877/2/fe/src/main/java/org/apache/impala/rewrite/ExtractCommonConjunctRule.java
File fe/src/main/java/org/apache/impala/rewrite/ExtractCommonConjunctRule.java:

Line 51:     List<Expr> child0Conjuncts = expr.getChild(0).getConjuncts();
> checkstate that the child conjuncts aren't empty?
Added Preconditions check.

Like Tim mentioned this sounds like a "simplification" normalization rule that would work together with constant folding.

Arguably, your example does not fit this rule because there are no common conjuncts.


Line 51:     List<Expr> child0Conjuncts = expr.getChild(0).getConjuncts();
> true or x => true seems like a normalisation rule we should add in a follow
Good point.


Line 62:         conjunct.setPrintSqlInParens(false);
> add comment: why
Done


http://gerrit.cloudera.org:8080/#/c/4877/1/fe/src/test/java/org/apache/impala/analysis/ExprRewriteRulesTest.java
File fe/src/test/java/org/apache/impala/analysis/ExprRewriteRulesTest.java:

Line 146:         "(!(int_col=5 or tinyint_col > 9) and double_col = 7) or " +
> I think we should just be mindful that this change without normalisation ad
Good point about the perf cliff. I think it's doable to squeeze in a few simple normalization rules, e.g. for BinaryPredicates.

Agree that constant folding will need normalization as well for maximal effectiveness, e.g., 1 + a + 10

I was actually thinking of doing constant folding next (without handling that case above), unless you want to take it on instead.


-- 
To view, visit http://gerrit.cloudera.org:8080/4877
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I3cf9b950afaa3fd753d1b09ba5e540b5258940ad
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-1286: Extract common conjuncts from disjunctions.

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-1286: Extract common conjuncts from disjunctions.
......................................................................


Patch Set 6:

I think there would have to be some distinction between logical types and physical types. E.g. during analysis we'd assign logical types like LITERAL(2) to the expression, which mapped to physical types like TINYINT for actual execution.

-- 
To view, visit http://gerrit.cloudera.org:8080/4877
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I3cf9b950afaa3fd753d1b09ba5e540b5258940ad
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-1286: Extract common conjuncts from disjunctions.

Posted by "Alex Behm (Code Review)" <ge...@cloudera.org>.
Alex Behm has restored this change.

Change subject: IMPALA-1286: Extract common conjuncts from disjunctions.
......................................................................


Restored

-- 
To view, visit http://gerrit.cloudera.org:8080/4877
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: restore
Gerrit-Change-Id: I3cf9b950afaa3fd753d1b09ba5e540b5258940ad
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>

[Impala-ASF-CR] IMPALA-1286: Extract common conjuncts from disjunctions.

Posted by "Alex Behm (Code Review)" <ge...@cloudera.org>.
Hello Marcel Kornacker, Tim Armstrong,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/4877

to look at the new patch set (#5).

Change subject: IMPALA-1286: Extract common conjuncts from disjunctions.
......................................................................

IMPALA-1286: Extract common conjuncts from disjunctions.

Adds a new ExprRewriteRule to extract common conjuncts from
disjunctions.

Examples:
(a AND b AND c) OR (b AND d) ==> b AND ((a AND c) OR (d))
(a AND b) OR (a AND b) ==> a AND b
(a AND b AND c) OR (c) ==> c

Adds a new query option ENABLE_EXPR_REWRITES to enable/disable
non-essential expr rewrites in the FE. Note that some rewrites
are required, e.g., BetweenToCompoundRule. Disabling the rewrites
is useful for testing, in particular, to make sure that the exprs
specified in expr-test.cc are executed as written.

Testing: Added a new unit test in ExprRewriteRulesTest.

Change-Id: I3cf9b950afaa3fd753d1b09ba5e540b5258940ad
---
M be/src/exprs/expr-test.cc
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/ImpalaInternalService.thrift
M common/thrift/ImpalaService.thrift
M fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java
M fe/src/main/java/org/apache/impala/analysis/BetweenPredicate.java
M fe/src/main/java/org/apache/impala/analysis/CompoundPredicate.java
M fe/src/main/java/org/apache/impala/analysis/ExistsPredicate.java
M fe/src/main/java/org/apache/impala/rewrite/ExprRewriteRule.java
A fe/src/main/java/org/apache/impala/rewrite/ExtractCommonConjunctRule.java
M fe/src/test/java/org/apache/impala/analysis/ExprRewriteRulesTest.java
M fe/src/test/java/org/apache/impala/analysis/ExprTest.java
M testdata/workloads/functional-planner/queries/PlannerTest/tpch-all.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpch-kudu.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpch-nested.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpch-views.test
17 files changed, 281 insertions(+), 22 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/77/4877/5
-- 
To view, visit http://gerrit.cloudera.org:8080/4877
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I3cf9b950afaa3fd753d1b09ba5e540b5258940ad
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>

[Impala-ASF-CR] IMPALA-1286: Extract common conjuncts from disjunctions.

Posted by "Alex Behm (Code Review)" <ge...@cloudera.org>.
Alex Behm has posted comments on this change.

Change subject: IMPALA-1286: Extract common conjuncts from disjunctions.
......................................................................


Patch Set 4:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/4877/4/be/src/exprs/expr-test.cc
File be/src/exprs/expr-test.cc:

Line 1539:   // TODO: Add option to disable expr rewrites for this test. This disjunction gets
> We're losing real test coverage here. Can you rewrite the test in a way tha
Agree. I think we really want a query option to disable these kind of "optimizing" rewrites.

Adding a test that circumvents the rewrite is going to be hard, i.e. we won't be able to go through SQL at all.


-- 
To view, visit http://gerrit.cloudera.org:8080/4877
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I3cf9b950afaa3fd753d1b09ba5e540b5258940ad
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-1286: Extract common conjuncts from disjunctions.

Posted by "Marcel Kornacker (Code Review)" <ge...@cloudera.org>.
Marcel Kornacker has posted comments on this change.

Change subject: IMPALA-1286: Extract common conjuncts from disjunctions.
......................................................................


Patch Set 1:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/4877/1/fe/src/main/java/org/apache/impala/rewrite/ExtractCommonConjunctRule.java
File fe/src/main/java/org/apache/impala/rewrite/ExtractCommonConjunctRule.java:

Line 37: public class ExtractCommonConjunctRule implements ExprRewriteRule {
where is this being applied?


http://gerrit.cloudera.org:8080/#/c/4877/1/testdata/workloads/functional-planner/queries/PlannerTest/tpch-all.test
File testdata/workloads/functional-planner/queries/PlannerTest/tpch-all.test:

Line 3384:       p_brand = 'Brand#12'
> I'm not really sure how this patch handles a bushy predicate. Do we flatten
i agree, i think we need normalization as a separate transformation step. this will be a prerequisite for other transformations as well, and having to bake normalization into every single transformation rule doesn't make sense.


Line 3393:       p_brand = 'Brand#23'
additional idea for transformation rule: derive additional, non-essential disjunctive scan predicates if the selectivity of the base predicates is sufficiently small (for '=', say).

in the example: brand = 12 or brand = 23 or brand = 34 as an extra scan predicate for 'parts', if 'brand' has a sufficiently high cardinality.


-- 
To view, visit http://gerrit.cloudera.org:8080/4877
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I3cf9b950afaa3fd753d1b09ba5e540b5258940ad
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-1286: Extract common conjuncts from disjunctions.

Posted by "Alex Behm (Code Review)" <ge...@cloudera.org>.
Alex Behm has posted comments on this change.

Change subject: IMPALA-1286: Extract common conjuncts from disjunctions.
......................................................................


Patch Set 4:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/4877/4/be/src/exprs/expr-test.cc
File be/src/exprs/expr-test.cc:

Line 1539:   // TODO: Add option to disable expr rewrites for this test. This disjunction gets
> How about we add a query test that tests || on a column with some NULLs in 
We can do that, but we're still losing test coverage because those are not the same exprs. With constant all these tests would run through the constant folding path, so I still think having a way of disabling the rewrite is saner.

Easy enough to retain the original type of the expr. Will do that.


-- 
To view, visit http://gerrit.cloudera.org:8080/4877
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I3cf9b950afaa3fd753d1b09ba5e540b5258940ad
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-1286: Extract common conjuncts from disjunctions.

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-1286: Extract common conjuncts from disjunctions.
......................................................................


Patch Set 5:

I think we can probably describe the constant folding behaviour as some kind of dependent type system that distinguishes between constant expressions and non-constant expressions (and where the literal value is part of the logical type of the expression). This code essentially already is essentially treating "literal decimal" and "decimal" as different types: https://github.com/apache/incubator-impala/blob/9755ea63b50d23f323c84edb4307ce603ded4fe1/fe/src/main/java/org/apache/impala/analysis/Expr.java#L453

So there may be a reasonable way out of my second concern, and an argument that we've already created that problem.

-- 
To view, visit http://gerrit.cloudera.org:8080/4877
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I3cf9b950afaa3fd753d1b09ba5e540b5258940ad
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-1286: Extract common conjuncts from disjunctions.

Posted by "Alex Behm (Code Review)" <ge...@cloudera.org>.
Alex Behm has uploaded a new patch set (#2).

Change subject: IMPALA-1286: Extract common conjuncts from disjunctions.
......................................................................

IMPALA-1286: Extract common conjuncts from disjunctions.

Adds a new ExprRewriteRule to extract common conjuncts from
disjunctions.

Examples:
(a AND b AND c) OR (b AND d) ==> b AND ((a AND c) OR (d))
(a AND b) OR (a AND b) ==> a AND b
(a AND b AND c) OR (c) ==> c

Testing: Added a new unit test in ExprRewriteRulesTest.

Change-Id: I3cf9b950afaa3fd753d1b09ba5e540b5258940ad
---
M fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java
M fe/src/main/java/org/apache/impala/analysis/BetweenPredicate.java
M fe/src/main/java/org/apache/impala/analysis/CompoundPredicate.java
M fe/src/main/java/org/apache/impala/analysis/ExistsPredicate.java
M fe/src/main/java/org/apache/impala/analysis/Subquery.java
M fe/src/main/java/org/apache/impala/rewrite/ExprRewriteRule.java
A fe/src/main/java/org/apache/impala/rewrite/ExtractCommonConjunctRule.java
M fe/src/test/java/org/apache/impala/analysis/ExprRewriteRulesTest.java
M fe/src/test/java/org/apache/impala/analysis/ExprTest.java
M testdata/workloads/functional-planner/queries/PlannerTest/tpch-all.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpch-nested.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpch-views.test
12 files changed, 258 insertions(+), 23 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/77/4877/2
-- 
To view, visit http://gerrit.cloudera.org:8080/4877
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I3cf9b950afaa3fd753d1b09ba5e540b5258940ad
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>

[Impala-ASF-CR] IMPALA-1286: Extract common conjuncts from disjunctions.

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-1286: Extract common conjuncts from disjunctions.
......................................................................


Patch Set 2: Code-Review+1

(4 comments)

http://gerrit.cloudera.org:8080/#/c/4877/1/fe/src/main/java/org/apache/impala/rewrite/ExtractCommonConjunctRule.java
File fe/src/main/java/org/apache/impala/rewrite/ExtractCommonConjunctRule.java:

Line 31:  * Extracts common conjuncts from a single disjunctive CompoundPredicate.
> Expanded comments on ExprRewriter and modified comment here. 
I agree we don't need to document the framework here, but I think it's non-obvious that implementing it for a single CompoundPredicate is equally as powerful as implementing it for n CompoundPredicates.

If I understand correctly it relies on the bottom-up application of rewrite rules to "bubble up" common predicates.

E.g. "This rule extracts common conjuncts from multiple disjunctions when it is applied recursively bottom-up to a tree of CompoundPredicates".


Line 49: 
> Agree that this is problematic. Unfortunately, also non-trivial to fix and 
Codegen in the BE generally shouldn't be N^2, although the constant factor is much higher.


http://gerrit.cloudera.org:8080/#/c/4877/2/fe/src/main/java/org/apache/impala/rewrite/ExtractCommonConjunctRule.java
File fe/src/main/java/org/apache/impala/rewrite/ExtractCommonConjunctRule.java:

Line 51:     List<Expr> child0Conjuncts = expr.getChild(0).getConjuncts();
> checkstate that the child conjuncts aren't empty?
true or x => true seems like a normalisation rule we should add in a follow-up patch


http://gerrit.cloudera.org:8080/#/c/4877/1/fe/src/test/java/org/apache/impala/analysis/ExprRewriteRulesTest.java
File fe/src/test/java/org/apache/impala/analysis/ExprRewriteRulesTest.java:

Line 146:         "(!(int_col=5 or tinyint_col > 9) and double_col = 7) or " +
> Completely agree about having additional rules for normalization. I view th
I think we should just be mindful that this change without normalisation added a performance cliff where e.g. flipping x < y to y > x can change the plan. I think prior to this change the two were equivalent.

Seems ok as an intermediate state but we shouldn't consider the feature finished until we've made it a little more robust. I think this probably also relates to the constant expr -> literal rewrite that's on my plate.


-- 
To view, visit http://gerrit.cloudera.org:8080/4877
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I3cf9b950afaa3fd753d1b09ba5e540b5258940ad
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-1286: Extract common conjuncts from disjunctions.

Posted by "Internal Jenkins (Code Review)" <ge...@cloudera.org>.
Internal Jenkins has submitted this change and it was merged.

Change subject: IMPALA-1286: Extract common conjuncts from disjunctions.
......................................................................


IMPALA-1286: Extract common conjuncts from disjunctions.

Adds a new ExprRewriteRule to extract common conjuncts from
disjunctions.

Examples:
(a AND b AND c) OR (b AND d) ==> b AND ((a AND c) OR (d))
(a AND b) OR (a AND b) ==> a AND b
(a AND b AND c) OR (c) ==> c

Adds a new query option ENABLE_EXPR_REWRITES to enable/disable
non-essential expr rewrites in the FE. Note that some rewrites
are required, e.g., BetweenToCompoundRule. Disabling the rewrites
is useful for testing, in particular, to make sure that the exprs
specified in expr-test.cc are executed as written.

Testing: Added a new unit test in ExprRewriteRulesTest.

Change-Id: I3cf9b950afaa3fd753d1b09ba5e540b5258940ad
Reviewed-on: http://gerrit.cloudera.org:8080/4877
Reviewed-by: Alex Behm <al...@cloudera.com>
Tested-by: Internal Jenkins
---
M be/src/exprs/expr-test.cc
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/ImpalaInternalService.thrift
M common/thrift/ImpalaService.thrift
M fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java
M fe/src/main/java/org/apache/impala/analysis/BetweenPredicate.java
M fe/src/main/java/org/apache/impala/analysis/CompoundPredicate.java
M fe/src/main/java/org/apache/impala/analysis/ExistsPredicate.java
M fe/src/main/java/org/apache/impala/rewrite/ExprRewriteRule.java
A fe/src/main/java/org/apache/impala/rewrite/ExtractCommonConjunctRule.java
M fe/src/test/java/org/apache/impala/analysis/ExprRewriteRulesTest.java
M fe/src/test/java/org/apache/impala/analysis/ExprTest.java
M testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpch-all.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpch-kudu.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpch-nested.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpch-views.test
18 files changed, 282 insertions(+), 23 deletions(-)

Approvals:
  Internal Jenkins: Verified
  Alex Behm: Looks good to me, approved



-- 
To view, visit http://gerrit.cloudera.org:8080/4877
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: I3cf9b950afaa3fd753d1b09ba5e540b5258940ad
Gerrit-PatchSet: 8
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Internal Jenkins
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>

[Impala-ASF-CR] IMPALA-1286: Extract common conjuncts from disjunctions.

Posted by "Alex Behm (Code Review)" <ge...@cloudera.org>.
Alex Behm has posted comments on this change.

Change subject: IMPALA-1286: Extract common conjuncts from disjunctions.
......................................................................


Patch Set 3:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/4877/3/testdata/workloads/functional-planner/queries/PlannerTest/tpch-kudu.test
File testdata/workloads/functional-planner/queries/PlannerTest/tpch-kudu.test:

Line 1080: |     kudu predicates: p_size >= 1
> what happened here (why didn't this kick in before)?
My Kudu setup was broken so I could not run these Kudu tests locally (and I had forgotten to update based on a Jenkins run). I fixed my local setup.


-- 
To view, visit http://gerrit.cloudera.org:8080/4877
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I3cf9b950afaa3fd753d1b09ba5e540b5258940ad
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-1286: Extract common conjuncts from disjunctions.

Posted by "Marcel Kornacker (Code Review)" <ge...@cloudera.org>.
Marcel Kornacker has posted comments on this change.

Change subject: IMPALA-1286: Extract common conjuncts from disjunctions.
......................................................................


Patch Set 3: Code-Review+2

(1 comment)

http://gerrit.cloudera.org:8080/#/c/4877/3/testdata/workloads/functional-planner/queries/PlannerTest/tpch-kudu.test
File testdata/workloads/functional-planner/queries/PlannerTest/tpch-kudu.test:

Line 1080: |     kudu predicates: p_size >= 1
what happened here (why didn't this kick in before)?


-- 
To view, visit http://gerrit.cloudera.org:8080/4877
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I3cf9b950afaa3fd753d1b09ba5e540b5258940ad
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-1286: Extract common conjuncts from disjunctions.

Posted by "Internal Jenkins (Code Review)" <ge...@cloudera.org>.
Internal Jenkins has posted comments on this change.

Change subject: IMPALA-1286: Extract common conjuncts from disjunctions.
......................................................................


Patch Set 6: Verified-1

Build failed: http://sandbox.jenkins.cloudera.com/job/impala-external-gerrit-verify-merge-ASF/454/

-- 
To view, visit http://gerrit.cloudera.org:8080/4877
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I3cf9b950afaa3fd753d1b09ba5e540b5258940ad
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Internal Jenkins
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-1286: Extract common conjuncts from disjunctions.

Posted by "Alex Behm (Code Review)" <ge...@cloudera.org>.
Alex Behm has posted comments on this change.

Change subject: IMPALA-1286: Extract common conjuncts from disjunctions.
......................................................................


Patch Set 6: Code-Review+2

-- 
To view, visit http://gerrit.cloudera.org:8080/4877
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I3cf9b950afaa3fd753d1b09ba5e540b5258940ad
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-1286: Extract common conjuncts from disjunctions.

Posted by "Internal Jenkins (Code Review)" <ge...@cloudera.org>.
Internal Jenkins has posted comments on this change.

Change subject: IMPALA-1286: Extract common conjuncts from disjunctions.
......................................................................


Patch Set 7: Verified+1

-- 
To view, visit http://gerrit.cloudera.org:8080/4877
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I3cf9b950afaa3fd753d1b09ba5e540b5258940ad
Gerrit-PatchSet: 7
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Internal Jenkins
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-1286: Extract common conjuncts from disjunctions.

Posted by "Alex Behm (Code Review)" <ge...@cloudera.org>.
Hello Marcel Kornacker, Tim Armstrong,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/4877

to look at the new patch set (#3).

Change subject: IMPALA-1286: Extract common conjuncts from disjunctions.
......................................................................

IMPALA-1286: Extract common conjuncts from disjunctions.

Adds a new ExprRewriteRule to extract common conjuncts from
disjunctions.

Examples:
(a AND b AND c) OR (b AND d) ==> b AND ((a AND c) OR (d))
(a AND b) OR (a AND b) ==> a AND b
(a AND b AND c) OR (c) ==> c

Testing: Added a new unit test in ExprRewriteRulesTest.

Change-Id: I3cf9b950afaa3fd753d1b09ba5e540b5258940ad
---
M fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java
M fe/src/main/java/org/apache/impala/analysis/BetweenPredicate.java
M fe/src/main/java/org/apache/impala/analysis/CompoundPredicate.java
M fe/src/main/java/org/apache/impala/analysis/ExistsPredicate.java
M fe/src/main/java/org/apache/impala/analysis/Subquery.java
M fe/src/main/java/org/apache/impala/rewrite/ExprRewriteRule.java
A fe/src/main/java/org/apache/impala/rewrite/ExtractCommonConjunctRule.java
M fe/src/test/java/org/apache/impala/analysis/ExprRewriteRulesTest.java
M fe/src/test/java/org/apache/impala/analysis/ExprTest.java
M testdata/workloads/functional-planner/queries/PlannerTest/tpch-all.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpch-kudu.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpch-nested.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpch-views.test
13 files changed, 269 insertions(+), 24 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/77/4877/3
-- 
To view, visit http://gerrit.cloudera.org:8080/4877
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I3cf9b950afaa3fd753d1b09ba5e540b5258940ad
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>

[Impala-ASF-CR] IMPALA-1286: Extract common conjuncts from disjunctions.

Posted by "Alex Behm (Code Review)" <ge...@cloudera.org>.
Alex Behm has posted comments on this change.

Change subject: IMPALA-1286: Extract common conjuncts from disjunctions.
......................................................................


Patch Set 4:

I added a new query option ENABLE_EXPR_REWRITES and set it to false for expr-test.cc

I investigated the NULL type change and it turns out there is no easy way to preserve the type with out current approach because after rewriting we wipe all analysis state and re-analyze fresh. At that point, there is no way for us to know what type a NullLiteral should have. I don't see any harm in it.

-- 
To view, visit http://gerrit.cloudera.org:8080/4877
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I3cf9b950afaa3fd753d1b09ba5e540b5258940ad
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-1286: Extract common conjuncts from disjunctions.

Posted by "Alex Behm (Code Review)" <ge...@cloudera.org>.
Alex Behm has posted comments on this change.

Change subject: IMPALA-1286: Extract common conjuncts from disjunctions.
......................................................................


Patch Set 4:

Please take another quick look at expr-test.cc

I needed to do an interesting fix because some exprs get simplified and change their result type.

I'm thinking of adding a query option for disabling the FE expr rewrites, so that we can run the BE expr-test without the rewrites.
Might not be a bad idea to have such a switch for disabling the rewrites in case bugs sneak in as we add more complex rules. Thoughts?

-- 
To view, visit http://gerrit.cloudera.org:8080/4877
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I3cf9b950afaa3fd753d1b09ba5e540b5258940ad
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-1286: Extract common conjuncts from disjunctions.

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-1286: Extract common conjuncts from disjunctions.
......................................................................


Patch Set 4:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/4877/4/be/src/exprs/expr-test.cc
File be/src/exprs/expr-test.cc:

Line 1539:   // TODO: Add option to disable expr rewrites for this test. This disjunction gets
> Agree. I think we really want a query option to disable these kind of "opti
How about we add a query test that tests || on a column with some NULLs in it so we're exercising the runtime code path?

I don't really like the fact that the rewrite changes the type of the expression but it seems mostly harmless - how hard is it to retain the original type of the expression though?


-- 
To view, visit http://gerrit.cloudera.org:8080/4877
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I3cf9b950afaa3fd753d1b09ba5e540b5258940ad
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-1286: Extract common conjuncts from disjunctions.

Posted by "Alex Behm (Code Review)" <ge...@cloudera.org>.
Alex Behm has posted comments on this change.

Change subject: IMPALA-1286: Extract common conjuncts from disjunctions.
......................................................................


Patch Set 1:

(11 comments)

Still working on a minor test fix, sending out to continue CR.

http://gerrit.cloudera.org:8080/#/c/4877/1/fe/src/main/java/org/apache/impala/rewrite/ExtractCommonConjunctRule.java
File fe/src/main/java/org/apache/impala/rewrite/ExtractCommonConjunctRule.java:

Line 31:  * Extracts common conjuncts from disjunctive CompoundPredicates.
> Might be worth commenting on how this handles > 2 disjuncts. I.e. you need 
Expanded comments on ExprRewriter and modified comment here. 

(I don't think we should document how the framework works in every rule.)


Line 37: public class ExtractCommonConjunctRule implements ExprRewriteRule {
> where is this being applied?
At the end of analysis, made those changes (had forgotten this after rebasing/cleaning)


Line 42:     if (!Expr.IS_OR_PREDICATE.apply(expr)) return expr;
> Isn't this check a little weak in the sense that it only checks the root's 
Having separate rules for normalization certainly makes sense, but I see them as being independent of this change. I don't think we'd change the code/behavior of the current rule even if we had normalization.

In your example, we would not return 'a' because the single conjunct of child0 would be (a OR b) which is not common to any of the child1 conjuncts "a" and "c".

Maybe you can come up with another example that breaks?


Line 49:       if (child1Conjuncts.contains(conjunct)) {
> This is n^2 so will get very slow if we have long lists of disjuncts, e.g. 
Agree that this is problematic. Unfortunately, also non-trivial to fix and make sure the hashCode() and equals() functions are correct.

I added an arbitrary threshold on the maximum number of Expr.equals() comparisons to bail on these pathological cases.

It's also not clear whether this rule would be much worse than the codegen in the BE. Not a good argument to avoid the improvement, but the improvement here alone might not make a material impact for the overall query.


Line 72:     if (!child0Conjuncts.isEmpty()) {
> At this point both child0Conjuncts and child1Conjuncts are non-empty becaus
Right, forgot to correct after some cleanup. Done.


PS1, Line 84: newDisjuncts.size() > 1
> See above - this should always be true.
You are right, thanks. Done.


http://gerrit.cloudera.org:8080/#/c/4877/1/fe/src/test/java/org/apache/impala/analysis/ExprRewriteRulesTest.java
File fe/src/test/java/org/apache/impala/analysis/ExprRewriteRulesTest.java:

Line 146:         "(!(int_col=5 or tinyint_col > 9) and double_col = 7) or " +
> Slightly tangential, but should we apply De Morgan's first to normalise the
Completely agree about having additional rules for normalization. I view those as somewhat independent of the current rule. I don't think the code/approach of the current rule would change much even if we had normalization. Of course, normalizing would make the current rule useful in more cases.

Let's continue this train of thought and come up with useful normalization rules in a separate change set. Definitely needed.


Line 153:         "(int_col < 10 and id between 5 and 6) or " +
> Does common conjunct extraction work if you have a BETWEEN expression and t
Good point, there was a bug. Fixed bug and added tests.


http://gerrit.cloudera.org:8080/#/c/4877/1/testdata/workloads/functional-planner/queries/PlannerTest/tpch-all.test
File testdata/workloads/functional-planner/queries/PlannerTest/tpch-all.test:

Line 3384:       p_brand = 'Brand#12'
> i agree, i think we need normalization as a separate transformation step. t
Agree.


Line 3384:       p_brand = 'Brand#12'
> I'm not really sure how this patch handles a bushy predicate. Do we flatten
Agree we need normalization as a separate set of rules.

I'm not sure normalization is required for your specific example though. The interesting conjuncts are those that bubble up all the way to the top so that, e.g., we can assign them as scan predicates. I think the current implementation does that, although some opportunities for simplifying disjunctive subexpressions are missed.


Line 3393:       p_brand = 'Brand#23'
> additional idea for transformation rule: derive additional, non-essential d
Agree. Sounds like IMPALA-2108.


-- 
To view, visit http://gerrit.cloudera.org:8080/4877
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I3cf9b950afaa3fd753d1b09ba5e540b5258940ad
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-1286: Extract common conjuncts from disjunctions.

Posted by "Alex Behm (Code Review)" <ge...@cloudera.org>.
Alex Behm has posted comments on this change.

Change subject: IMPALA-1286: Extract common conjuncts from disjunctions.
......................................................................


Patch Set 7: Code-Review+2

Trivial update in expected plan to an equivalent one.

-- 
To view, visit http://gerrit.cloudera.org:8080/4877
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I3cf9b950afaa3fd753d1b09ba5e540b5258940ad
Gerrit-PatchSet: 7
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Internal Jenkins
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-1286: Extract common conjuncts from disjunctions.

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-1286: Extract common conjuncts from disjunctions.
......................................................................


Patch Set 5:

I understand that it's harmless in that queries that succeeded before the patch won't start failing after the patch, but I'm concerned that a) queries that succeed with optimisation on will fail or behave differently when you turn optimisations off because the analyser picks a wider type and b) the lack of distinction between user-visible types and "internal" types will be a source of bugs and confusing behaviour.

There may only be a couple of cases where this is user-visible (CTAS and typeof() come to mind), so maybe there's a way to fix that there.

The longer-term design concern is if there aren't clear rules for determining the types of expressions that can be applied during the initial semantic analysis (e.g. BOOL || BOOL => BOOL), it makes it very hard to change the implementation of the frontend without changing the user-visible behaviour of the frontend.

-- 
To view, visit http://gerrit.cloudera.org:8080/4877
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I3cf9b950afaa3fd753d1b09ba5e540b5258940ad
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-1286: Extract common conjuncts from disjunctions.

Posted by "Bharath Vissapragada (Code Review)" <ge...@cloudera.org>.
Bharath Vissapragada has posted comments on this change.

Change subject: IMPALA-1286: Extract common conjuncts from disjunctions.
......................................................................


Patch Set 1:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/4877/1/fe/src/main/java/org/apache/impala/rewrite/ExtractCommonConjunctRule.java
File fe/src/main/java/org/apache/impala/rewrite/ExtractCommonConjunctRule.java:

Line 42:     if (!Expr.IS_OR_PREDICATE.apply(expr)) return expr;
Isn't this check a little weak in the sense that it only checks the root's Op type to be OR. IIUC, a case like ((a OR b) OR (a AND c)) still passes and returns 'a' as the common conjunct. Is it better to have a check that makes sure the expr is in a proper disjunctive normal form?


http://gerrit.cloudera.org:8080/#/c/4877/1/testdata/workloads/functional-planner/queries/PlannerTest/tpch-all.test
File testdata/workloads/functional-planner/queries/PlannerTest/tpch-all.test:

Line 3384:       p_brand = 'Brand#12'
I'm not really sure how this patch handles a bushy predicate. Do we flatten them somewhere and convert it to a DNF before applying the new rule? (May be I misread the code)

For example: ((a or (b and c)) or ((a and b) or (d or f)))

The tests here and elsewhere seem to be handling simple predicates, so I'm wondering how cases like above are dealt with.


-- 
To view, visit http://gerrit.cloudera.org:8080/4877
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I3cf9b950afaa3fd753d1b09ba5e540b5258940ad
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-1286: Extract common conjuncts from disjunctions.

Posted by "Alex Behm (Code Review)" <ge...@cloudera.org>.
Hello Marcel Kornacker, Internal Jenkins, Tim Armstrong,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/4877

to look at the new patch set (#7).

Change subject: IMPALA-1286: Extract common conjuncts from disjunctions.
......................................................................

IMPALA-1286: Extract common conjuncts from disjunctions.

Adds a new ExprRewriteRule to extract common conjuncts from
disjunctions.

Examples:
(a AND b AND c) OR (b AND d) ==> b AND ((a AND c) OR (d))
(a AND b) OR (a AND b) ==> a AND b
(a AND b AND c) OR (c) ==> c

Adds a new query option ENABLE_EXPR_REWRITES to enable/disable
non-essential expr rewrites in the FE. Note that some rewrites
are required, e.g., BetweenToCompoundRule. Disabling the rewrites
is useful for testing, in particular, to make sure that the exprs
specified in expr-test.cc are executed as written.

Testing: Added a new unit test in ExprRewriteRulesTest.

Change-Id: I3cf9b950afaa3fd753d1b09ba5e540b5258940ad
---
M be/src/exprs/expr-test.cc
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/ImpalaInternalService.thrift
M common/thrift/ImpalaService.thrift
M fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java
M fe/src/main/java/org/apache/impala/analysis/BetweenPredicate.java
M fe/src/main/java/org/apache/impala/analysis/CompoundPredicate.java
M fe/src/main/java/org/apache/impala/analysis/ExistsPredicate.java
M fe/src/main/java/org/apache/impala/rewrite/ExprRewriteRule.java
A fe/src/main/java/org/apache/impala/rewrite/ExtractCommonConjunctRule.java
M fe/src/test/java/org/apache/impala/analysis/ExprRewriteRulesTest.java
M fe/src/test/java/org/apache/impala/analysis/ExprTest.java
M testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpch-all.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpch-kudu.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpch-nested.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpch-views.test
18 files changed, 282 insertions(+), 23 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/77/4877/7
-- 
To view, visit http://gerrit.cloudera.org:8080/4877
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I3cf9b950afaa3fd753d1b09ba5e540b5258940ad
Gerrit-PatchSet: 7
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Internal Jenkins
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>

[Impala-ASF-CR] IMPALA-1286: Extract common conjuncts from disjunctions.

Posted by "Alex Behm (Code Review)" <ge...@cloudera.org>.
Alex Behm has posted comments on this change.

Change subject: IMPALA-1286: Extract common conjuncts from disjunctions.
......................................................................


Patch Set 5:

I don't agree with the argument that the new behavior with constant folding will be less intuitive. I think it will be more user friendly.

In your example, of CTAS I think a reasonable user can expect:

insert into t select 1 + 1

to work if t has a tinyint column.

I think the only way we can satisfy your design goal is to always defer to the widest type, i.e. 1 + 1 -> smallint.

For simple cases we can fold in place, but with more complicated examples we have to substitute against inline view expressions to even realize something is constant, so that doesn't seem feasible to me.

-- 
To view, visit http://gerrit.cloudera.org:8080/4877
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I3cf9b950afaa3fd753d1b09ba5e540b5258940ad
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-1286: Extract common conjuncts from disjunctions.

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-1286: Extract common conjuncts from disjunctions.
......................................................................


Patch Set 5: Code-Review+1

I was less concerned with what the output type is and more about how it is obtained. The problem is that we don't have any comprehensible rules about how the output type of an expression is obtained or what the input to the type derivation algorithm even is. In my mind a query option shouldn't be part of the input to the type derivation. 

It's all a consequence of a pre-existing design so I'm not saying you should fix it as part of this change.

I think the original sin is that the output type of an operation should be determined by the input types, because that gives you a reasonable way to trace back and understand how the type of an expression was derived. This isn't the case today since it looks at other information and maybe depends in complicate way on other rewrites.

My point about dependent types is that you can achieve what you're arguing for without making typechecking depend on a query optimisation. E.g. the logical types would be "LITERAL(1) + LITERAL(1) => LITERAL(2)" instead of "TINYINT + TINYINT => SMALLINT) and they would be lowered at some point into actual physical types. The current system of typing literals seems to be trying to approximate this.

I'm not saying we shouldn't go ahead, but I think there are design problems with the type system and I think this is digging the hole a little deeper. I'm ok as long as we have a way to replace it with a more principled approach later without breaking things. It seems like it's mostly corner cases that would be affected in the meantime and there probably is a way out.

-- 
To view, visit http://gerrit.cloudera.org:8080/4877
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I3cf9b950afaa3fd753d1b09ba5e540b5258940ad
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-1286: Extract common conjuncts from disjunctions.

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-1286: Extract common conjuncts from disjunctions.
......................................................................


Patch Set 4:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/4877/4/be/src/exprs/expr-test.cc
File be/src/exprs/expr-test.cc:

Line 1539:   // TODO: Add option to disable expr rewrites for this test. This disjunction gets
We're losing real test coverage here. Can you rewrite the test in a way that the rewrite doesn't get applied? Or make sure that there's a system test that covers this case?


-- 
To view, visit http://gerrit.cloudera.org:8080/4877
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I3cf9b950afaa3fd753d1b09ba5e540b5258940ad
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-1286: Extract common conjuncts from disjunctions.

Posted by "Alex Behm (Code Review)" <ge...@cloudera.org>.
Hello Marcel Kornacker, Tim Armstrong,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/4877

to look at the new patch set (#4).

Change subject: IMPALA-1286: Extract common conjuncts from disjunctions.
......................................................................

IMPALA-1286: Extract common conjuncts from disjunctions.

Adds a new ExprRewriteRule to extract common conjuncts from
disjunctions.

Examples:
(a AND b AND c) OR (b AND d) ==> b AND ((a AND c) OR (d))
(a AND b) OR (a AND b) ==> a AND b
(a AND b AND c) OR (c) ==> c

Testing: Added a new unit test in ExprRewriteRulesTest.

Change-Id: I3cf9b950afaa3fd753d1b09ba5e540b5258940ad
---
M be/src/exprs/expr-test.cc
M fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java
M fe/src/main/java/org/apache/impala/analysis/BetweenPredicate.java
M fe/src/main/java/org/apache/impala/analysis/CompoundPredicate.java
M fe/src/main/java/org/apache/impala/analysis/ExistsPredicate.java
M fe/src/main/java/org/apache/impala/rewrite/ExprRewriteRule.java
A fe/src/main/java/org/apache/impala/rewrite/ExtractCommonConjunctRule.java
M fe/src/test/java/org/apache/impala/analysis/ExprRewriteRulesTest.java
M fe/src/test/java/org/apache/impala/analysis/ExprTest.java
M testdata/workloads/functional-planner/queries/PlannerTest/tpch-all.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpch-kudu.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpch-nested.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpch-views.test
13 files changed, 266 insertions(+), 23 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/77/4877/4
-- 
To view, visit http://gerrit.cloudera.org:8080/4877
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I3cf9b950afaa3fd753d1b09ba5e540b5258940ad
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>

[Impala-ASF-CR] IMPALA-1286: Extract common conjuncts from disjunctions.

Posted by "Bharath Vissapragada (Code Review)" <ge...@cloudera.org>.
Bharath Vissapragada has posted comments on this change.

Change subject: IMPALA-1286: Extract common conjuncts from disjunctions.
......................................................................


Patch Set 1:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/4877/1/fe/src/main/java/org/apache/impala/rewrite/ExtractCommonConjunctRule.java
File fe/src/main/java/org/apache/impala/rewrite/ExtractCommonConjunctRule.java:

Line 42:     if (!Expr.IS_OR_PREDICATE.apply(expr)) return expr;
> Having separate rules for normalization certainly makes sense, but I see th
Ah makes sense. I misunderstood the getConjuncts() call. Thanks for clearing it up.


http://gerrit.cloudera.org:8080/#/c/4877/1/testdata/workloads/functional-planner/queries/PlannerTest/tpch-all.test
File testdata/workloads/functional-planner/queries/PlannerTest/tpch-all.test:

Line 3384:       p_brand = 'Brand#12'
> Agree we need normalization as a separate set of rules.
Thanks. I think I understand the scope of this fix now. My example was just to show a complex predicate and I didn't check if normalization would apply specifically to that case. Also, if you think its fine, we should probably document a TODO somewhere in ExtractCommonConjunctRule the cases it can't deal without normalization, till we actually implement normalization as a separate step.


-- 
To view, visit http://gerrit.cloudera.org:8080/4877
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I3cf9b950afaa3fd753d1b09ba5e540b5258940ad
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-1286: Extract common conjuncts from disjunctions.

Posted by "Marcel Kornacker (Code Review)" <ge...@cloudera.org>.
Marcel Kornacker has posted comments on this change.

Change subject: IMPALA-1286: Extract common conjuncts from disjunctions.
......................................................................


Patch Set 2: Code-Review+2

(4 comments)

http://gerrit.cloudera.org:8080/#/c/4877/2/fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java
File fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java:

Line 70:         BetweenToCompoundRule.INSTANCE, ExtractCommonConjunctRule.INSTANCE);
for added effectiveness you want the between rule to be applied before the common-conjuncts rule.

point this out in a comment.


http://gerrit.cloudera.org:8080/#/c/4877/2/fe/src/main/java/org/apache/impala/analysis/Subquery.java
File fe/src/main/java/org/apache/impala/analysis/Subquery.java:

Line 155:     return stmt_.toSql().equals(((Subquery)o).stmt_.toSql());
hm, that seems a bit fragile, won't insignificant syntactic differences (such as extra parentheses) throw this off?

i guess we're missing a QueryStmt.equals(), and it's too ambitious for this patch, but please leave a todo.


http://gerrit.cloudera.org:8080/#/c/4877/2/fe/src/main/java/org/apache/impala/rewrite/ExtractCommonConjunctRule.java
File fe/src/main/java/org/apache/impala/rewrite/ExtractCommonConjunctRule.java:

Line 51:     List<Expr> child0Conjuncts = expr.getChild(0).getConjuncts();
checkstate that the child conjuncts aren't empty?

what about "pathological" clauses like (true or b)?


Line 62:         conjunct.setPrintSqlInParens(false);
add comment: why


-- 
To view, visit http://gerrit.cloudera.org:8080/4877
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I3cf9b950afaa3fd753d1b09ba5e540b5258940ad
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-1286: Extract common conjuncts from disjunctions.

Posted by "Alex Behm (Code Review)" <ge...@cloudera.org>.
Alex Behm has abandoned this change.

Change subject: IMPALA-1286: Extract common conjuncts from disjunctions.
......................................................................


Abandoned

Putting this on hold for now.

-- 
To view, visit http://gerrit.cloudera.org:8080/4877
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: abandon
Gerrit-Change-Id: I3cf9b950afaa3fd753d1b09ba5e540b5258940ad
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>

[Impala-ASF-CR] IMPALA-1286: Extract common conjuncts from disjunctions.

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-1286: Extract common conjuncts from disjunctions.
......................................................................


Patch Set 1:

(6 comments)

http://gerrit.cloudera.org:8080/#/c/4877/1/fe/src/main/java/org/apache/impala/rewrite/ExtractCommonConjunctRule.java
File fe/src/main/java/org/apache/impala/rewrite/ExtractCommonConjunctRule.java:

Line 31:  * Extracts common conjuncts from disjunctive CompoundPredicates.
Might be worth commenting on how this handles > 2 disjuncts. I.e. you need to apply the rule from the bottom-up.


Line 49:       if (child1Conjuncts.contains(conjunct)) {
This is n^2 so will get very slow if we have long lists of disjuncts, e.g. machine-generated queries. This is fine for small n but I think we should seriously consider switching to HashMaps for child1Conjuncts and commonConjuncts for n > some threshold. The core logic I think is the same if you factor it out into a function that operates on Collection<Expr>.


Line 72:     if (!child0Conjuncts.isEmpty()) {
At this point both child0Conjuncts and child1Conjuncts are non-empty because we would have returned early otherwise. So I think these branches are unnecessary.


PS1, Line 84: newDisjuncts.size() > 1
See above - this should always be true.


http://gerrit.cloudera.org:8080/#/c/4877/1/fe/src/test/java/org/apache/impala/analysis/ExprRewriteRulesTest.java
File fe/src/test/java/org/apache/impala/analysis/ExprRewriteRulesTest.java:

Line 146:         "(!(int_col=5 or tinyint_col > 9) and double_col = 7) or " +
Slightly tangential, but should we apply De Morgan's first to normalise the disjunct into a conjunct? That would help if we had something like:

  (!(int_col=5 or tinyint_col > 9) and double_col = 7) or (!(int_col = 5) and !(tinyint_col > 9) and double_col = 8)

I guess in general it would be nice to do some basic normalisation of expressions in other cases, e.g. convert !(tinyint_col > 9) to tinyint_col <= 9, convert 9 < tinyint_col to tinyint_col <= 9, etc.

I think that would make the common conjunct extraction a bit more robust. You can rewrite the query but avoiding that is the motivation for this in the first place, so it would be nice if it worked in some more simple cases.


Line 153:         "(int_col < 10 and id between 5 and 6) or " +
Does common conjunct extraction work if you have a BETWEEN expression and the equivalent expression after the rewrite? Should work if we apply the extraction after the BETWEEN rewrite, but would be nice to test.


-- 
To view, visit http://gerrit.cloudera.org:8080/4877
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I3cf9b950afaa3fd753d1b09ba5e540b5258940ad
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-HasComments: Yes