You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@impala.apache.org by "Zach Amsden (Code Review)" <ge...@cloudera.org> on 2017/03/14 17:12:17 UTC

[Impala-ASF-CR] IMPALA-5003: [DRAFT] Generic constant propagation in planner

Zach Amsden has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/6389

Change subject: IMPALA-5003: [DRAFT] Generic constant propagation in planner
......................................................................

IMPALA-5003: [DRAFT] Generic constant propagation in planner

Rather than specialize the constant propagation framework to be specific
to a particular node type, create a generic framework for migrating
constants between conjuncts.  Right now it is still hard coded as being
embedded in HdfsScanNode, but the initial support appears to be working.

Testing: So far, manual.  Plan is to examine query plans once the full
support is working.  Right now, the substitution and folding appears to
work, but this doesn't make it to predicates in the query plan, nor does
it help select partitions, so some piece is definitely missing.

 explain select * from zparted where o_orderkey = 10 and partition_id =
o_orderkey % 100 + o_orderkey;

ConstantPropagator.java:85] Constant substitution from slot 2:
         partition_id = 10 % 100 + 10
ConstantPropagator.java:106] Constant folded result: partition_id = 20

 00:SCAN HDFS [default.zparted]
 partitions=100/100 files=100 size=23.00MB
 predicates: o_orderkey = 10, partition_id = o_orderkey % 100 + o_orderkey

Change-Id: I79750a8edb945effee2a519fa3b8192b77042cb4
---
M fe/src/main/java/org/apache/impala/analysis/Analyzer.java
A fe/src/main/java/org/apache/impala/analysis/ConstantPropagator.java
M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
3 files changed, 121 insertions(+), 2 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/89/6389/1
-- 
To view, visit http://gerrit.cloudera.org:8080/6389
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: I79750a8edb945effee2a519fa3b8192b77042cb4
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden <za...@cloudera.com>

[Impala-ASF-CR] IMPALA-5003: [DRAFT] Generic constant propagation in planner

Posted by "Alex Behm (Code Review)" <ge...@cloudera.org>.
Alex Behm has posted comments on this change.

Change subject: IMPALA-5003: [DRAFT] Generic constant propagation in planner
......................................................................


Patch Set 2:

(12 comments)

Looks reasonable.

http://gerrit.cloudera.org:8080/#/c/6389/2/fe/src/main/java/org/apache/impala/analysis/Analyzer.java
File fe/src/main/java/org/apache/impala/analysis/Analyzer.java:

Line 2776:             LOG.trace("value transfer: to " + slotIds.first.toString()
nit: can we flip to/from


http://gerrit.cloudera.org:8080/#/c/6389/2/fe/src/main/java/org/apache/impala/analysis/ConstantPropagator.java
File fe/src/main/java/org/apache/impala/analysis/ConstantPropagator.java:

Line 46:   private static ExprRewriter rewriter;
ExprRewriter is not thread-safe. No good to make it static.


Line 51:     List<ExprRewriteRule> rewrite_rules = Lists.newArrayList();
Java style: rewriteRules


Line 53:     rewrite_rules.add(FoldConstantsRule.INSTANCE);
I think we want to re-run all the expr rewrites after constant propagation.

We could either cram everything in here, or minimize the ConstantPropagator. I'd prefer the latter. Constant folding is not strictly necessary here, but normalization is since your the constant-propagation code relies on it.


Line 63:   public static void propagateConstants(List<Expr> l, Analyzer analyzer)
l -> conjuncts?


Line 65:     if (l == null) return;
why not make this a Precondition


Line 66:     int changes = 1;
boolean changed?


Line 71:         if (expr instanceof BinaryPredicate &&
This is a debatable style thing, but in the rest of the FE code we typically invert such conditions and then 'continue' to avoid deep indentation. So in this case:


Line 84:                 ++changes;
Why not run the rewrites here directly. That way we skip over exprs that have not changed.


Line 85:                 LOG.info("Constant substitution from slot " +
Definitely don't want this at info level.


Line 106:             LOG.info("Constant folded result: " + rewritten.toSql());
Definitely don't want this at info level.


Line 111:       for (int i = 0; i < output.size(); ++i) l.set(i, output.get(i));
Collections.copy()


-- 
To view, visit http://gerrit.cloudera.org:8080/6389
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I79750a8edb945effee2a519fa3b8192b77042cb4
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden <za...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Zach Amsden <za...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-5003: Constant propagation in scan nodes

Posted by "Alex Behm (Code Review)" <ge...@cloudera.org>.
Alex Behm has posted comments on this change.

Change subject: IMPALA-5003: Constant propagation in scan nodes
......................................................................


Patch Set 15:

(14 comments)

Code is looking good, will take another pass over the tests

http://gerrit.cloudera.org:8080/#/c/6389/15/fe/src/main/java/org/apache/impala/analysis/Expr.java
File fe/src/main/java/org/apache/impala/analysis/Expr.java:

Line 923:    * Propagates constant expressions of the form <slot ref> = <constant> to
Explain candidate param


Line 924:    * other uses of slot ref in all expressions in conjuncts; returns a BitSet with
in all expressions in conjuncts -> in the given conjnucts


Line 942:         // Don't rewrite with our own substitution!  We may have already processed
Explanation seems more complicated than necessary. Clearly it would be wrong to propagate a constant to the originating binary predicate.

Your function comment is quite accurate "to other uses of slot ref" :)


Line 960:    * Returns true if the conjuncts may be true and false if a contradiction has
Shorter: Returns false if a contradiction has been implied, true otherwise.


Line 968:       BitSet candidates = new BitSet(conjuncts.size());
> Alex, thanks for the BitSet suggest - this worked out far cleaner than any 
Yea, this looks pretty good to me!


Line 980:           // applied by the rewriter.  We must also re-analyze result exprs to
I think we can remove everything after "We must also re-analyze  ..." to keep the comment short.

Also, I think constant propagation should fall under the "enable_expr_rewrites" query option, i.e., if false we should only remove duplicates in here. Otherwise, it's hard to say what that query option means at a high level.


Line 985:           if (index < 0) continue;
> This can be Preconditions.checkState(index >= 0) now.  In the prior form of
Agree. Makes sense.


Line 988:           // Reset Expr to force updating cost
Re-analyze expr to add implicit casts, resolve function signatures, and compute cost


Line 998:           // because they can't be evaluated if expr rewriting is turned off.
> I can't see a good way to test this with the fe tests.  Should we write a c
You should be able to do this with a PlannerTest. For example, look at PlannerTest.testParquetFiltering() which sets a query option and then runs a .test file.

But the easier solution, imo, is to not propagate constants when expr rewriting is disabled.


http://gerrit.cloudera.org:8080/#/c/6389/11/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
File fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java:

Line 1266: 
> Optimization should still be done in the form of eliminating duplicates, ev
Looks like the constant propagation is applied even when expr rewrites are disabled. Imo, it would be preferable not to propagate constants in that case. Otherwise, it's kind of hard to reason about the meaning of that query option.


http://gerrit.cloudera.org:8080/#/c/6389/15/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
File fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java:

Line 1273:     // Allow constant propagation within conjuncts.
This actually has two purposes:
1. constant propagation
2. expr optimization on fully-resolved conjuncts


http://gerrit.cloudera.org:8080/#/c/6389/15/testdata/workloads/functional-planner/queries/PlannerTest/constant-propagation.test
File testdata/workloads/functional-planner/queries/PlannerTest/constant-propagation.test:

Line 101: # XXX - this seems correct, but is this a legal transformation?
> I'm fairly convinced at this point that considering slotrefs through implic
I agree that implicit casts on both sides seem ok.

It does not seem fine if there is an explicit cast on the slotref.
In your example, the 'cast(10000 as tinyint)' is constant and will be folded (truncated), so this case seems fine.


http://gerrit.cloudera.org:8080/#/c/6389/15/testdata/workloads/functional-planner/queries/PlannerTest/joins.test
File testdata/workloads/functional-planner/queries/PlannerTest/joins.test:

Line 381: # hbase-hdfs join with scan filtering (bogus)
Can you be more specific than "(bogus)"? What behavior are we testing here?


Line 394: 02:HASH JOIN [INNER JOIN]
> This is a very sad plan.
Agree that this can be improved.


-- 
To view, visit http://gerrit.cloudera.org:8080/6389
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I79750a8edb945effee2a519fa3b8192b77042cb4
Gerrit-PatchSet: 15
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden <za...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>
Gerrit-Reviewer: Zach Amsden <za...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-5003: Constant propagation in scan conjuncts

Posted by "Zach Amsden (Code Review)" <ge...@cloudera.org>.
Zach Amsden has posted comments on this change.

Change subject: IMPALA-5003: Constant propagation in scan conjuncts
......................................................................


Patch Set 23:

So this tripped up on an functional.alltypes_datasource test, and the first failure was obvious, but now there is a new failure mode which isn't obvious at all, and unclear how it relates to this change:

[localhost:21000] > select string_col from alltypes_datasource
where string_col = 'VALIDATE_PREDICATES##id NOT_DISTINCT 1 && id DISTINCT_FROM 1 && id NOT_DISTINCT 1'
      and 1 <=> id and 1 IS DISTINCT FROM id and 1 IS NOT DISTINCT FROM id;
Query: select string_col from alltypes_datasource
where string_col = 'VALIDATE_PREDICATES##id NOT_DISTINCT 1 && id DISTINCT_FROM 1 && id NOT_DISTINCT 1'
      and 1 <=> id and 1 IS DISTINCT FROM id and 1 IS NOT DISTINCT FROM id
Query submitted at: 2017-04-26 17:20:43 (Coordinator: http://impala-dev:25000)
ERROR: InternalException: Data source DataSource{name=alltypesdatasource, location=hdfs://localhost:20500/test-warehouse/data-sources/test-data-source.jar, className=org.apache.impala.extdatasource.AllTypesDataSource, apiVersion=V1} returned an error from prepare(): Error in data source (path=/tmp/test-data-source.29405.0.jar, class=org.apache.impala.extdatasource.AllTypesDataSource, version=V1) prepare(): No error message returned by data source. Check the impalad log for more information.

-- 
To view, visit http://gerrit.cloudera.org:8080/6389
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I79750a8edb945effee2a519fa3b8192b77042cb4
Gerrit-PatchSet: 23
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden <za...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>
Gerrit-Reviewer: Zach Amsden <za...@cloudera.com>
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-5003: Constant propagation in scan conjuncts

Posted by "Alex Behm (Code Review)" <ge...@cloudera.org>.
Alex Behm has posted comments on this change.

Change subject: IMPALA-5003: Constant propagation in scan conjuncts
......................................................................


Patch Set 23: Code-Review+2

Let me know if you need help investigation that last issue.

-- 
To view, visit http://gerrit.cloudera.org:8080/6389
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I79750a8edb945effee2a519fa3b8192b77042cb4
Gerrit-PatchSet: 23
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden <za...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>
Gerrit-Reviewer: Zach Amsden <za...@cloudera.com>
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-5003: [DRAFT] Generic constant propagation in planner

Posted by "Zach Amsden (Code Review)" <ge...@cloudera.org>.
Zach Amsden has posted comments on this change.

Change subject: IMPALA-5003: [DRAFT] Generic constant propagation in planner
......................................................................


Patch Set 2:

(12 comments)

Thanks for the fast review!

http://gerrit.cloudera.org:8080/#/c/6389/2/fe/src/main/java/org/apache/impala/analysis/Analyzer.java
File fe/src/main/java/org/apache/impala/analysis/Analyzer.java:

Line 2776:             LOG.trace("value transfer: to " + slotIds.first.toString()
> nit: can we flip to/from
Done.  It's still not entirely clear which value is to and which is from, but I found the output more useful with both values.


http://gerrit.cloudera.org:8080/#/c/6389/2/fe/src/main/java/org/apache/impala/analysis/ConstantPropagator.java
File fe/src/main/java/org/apache/impala/analysis/ConstantPropagator.java:

Line 46:   private static ExprRewriter rewriter;
> ExprRewriter is not thread-safe. No good to make it static.
Done


Line 51:     List<ExprRewriteRule> rewrite_rules = Lists.newArrayList();
> Java style: rewriteRules
Done


Line 53:     rewrite_rules.add(FoldConstantsRule.INSTANCE);
> I think we want to re-run all the expr rewrites after constant propagation.
I added a separate optional pass to do this and consolidated all the rules in one place.


Line 63:   public static void propagateConstants(List<Expr> l, Analyzer analyzer)
> l -> conjuncts?
Done


Line 65:     if (l == null) return;
> why not make this a Precondition
Done


Line 66:     int changes = 1;
> boolean changed?
Done


Line 71:         if (expr instanceof BinaryPredicate &&
> This is a debatable style thing, but in the rest of the FE code we typicall
I like the negation style :)


Line 84:                 ++changes;
> Why not run the rewrites here directly. That way we skip over exprs that ha
Done


Line 85:                 LOG.info("Constant substitution from slot " +
> Definitely don't want this at info level.
Done


Line 106:             LOG.info("Constant folded result: " + rewritten.toSql());
> Definitely don't want this at info level.
Done


Line 111:       for (int i = 0; i < output.size(); ++i) l.set(i, output.get(i));
> Collections.copy()
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/6389
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I79750a8edb945effee2a519fa3b8192b77042cb4
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden <za...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Zach Amsden <za...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-5003: Constant propagation in scan conjuncts

Posted by "Zach Amsden (Code Review)" <ge...@cloudera.org>.
Hello Impala Public Jenkins, Alex Behm,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/6389

to look at the new patch set (#23).

Change subject: IMPALA-5003: Constant propagation in scan conjuncts
......................................................................

IMPALA-5003: Constant propagation in scan conjuncts

Implements constant proopagation within conjuncts and applies the
optimization to scan conjuncts and collection conjuncts within Hdfs
scan nodes.  The optimization is applied during planning.  At scan
nodes in particular, we want to optimize to enable partition pruning.
In certain cases, we might end up with a FALSE conditional, which
now will convert to an EmptySet node.

Testing: Expanded the test cases for the planner to achieve constant
propagation.  Added Kudu, datasource, Hdfs and HBase tests to validate
we can create EmptySetNodes.

Change-Id: I79750a8edb945effee2a519fa3b8192b77042cb4
---
M fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java
M fe/src/main/java/org/apache/impala/analysis/Analyzer.java
M fe/src/main/java/org/apache/impala/analysis/Expr.java
M fe/src/main/java/org/apache/impala/analysis/SelectList.java
M fe/src/main/java/org/apache/impala/planner/DataSourceScanNode.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M fe/src/main/java/org/apache/impala/planner/KuduScanNode.java
M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
M testdata/workloads/functional-planner/queries/PlannerTest/analytic-fns.test
M testdata/workloads/functional-planner/queries/PlannerTest/conjunct-ordering.test
A testdata/workloads/functional-planner/queries/PlannerTest/constant-propagation.test
M testdata/workloads/functional-planner/queries/PlannerTest/data-source-tables.test
M testdata/workloads/functional-planner/queries/PlannerTest/hdfs.test
M testdata/workloads/functional-planner/queries/PlannerTest/joins.test
M testdata/workloads/functional-planner/queries/PlannerTest/kudu.test
M testdata/workloads/functional-planner/queries/PlannerTest/predicate-propagation.test
M testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test
M testdata/workloads/functional-query/queries/QueryTest/data-source-tables.test
19 files changed, 634 insertions(+), 92 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/89/6389/23
-- 
To view, visit http://gerrit.cloudera.org:8080/6389
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I79750a8edb945effee2a519fa3b8192b77042cb4
Gerrit-PatchSet: 23
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden <za...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>
Gerrit-Reviewer: Zach Amsden <za...@cloudera.com>

[Impala-ASF-CR] IMPALA-5003: Constant propagation in scan nodes and inline views

Posted by "Zach Amsden (Code Review)" <ge...@cloudera.org>.
Zach Amsden has uploaded a new patch set (#13).

Change subject: IMPALA-5003: Constant propagation in scan nodes and inline views
......................................................................

IMPALA-5003: Constant propagation in scan nodes and inline views

When conjuncts are pushed into table refs and inline views, they can
be considered for constant progagation within that node.  In certain
cases, we might end up with a FALSE conditional and now we can
convert ScanNodes to EmptySet nodes when that occurs.

Testing: Expanded the test cases for the planner to achieve constant
propagation.  Added Kudu, datasource, Hdfs and HBase tests to validate
we can create EmptySetNodes.

Change-Id: I79750a8edb945effee2a519fa3b8192b77042cb4
---
M fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java
M fe/src/main/java/org/apache/impala/analysis/Analyzer.java
M fe/src/main/java/org/apache/impala/analysis/Expr.java
M fe/src/main/java/org/apache/impala/analysis/SelectList.java
M fe/src/main/java/org/apache/impala/planner/DataSourceScanNode.java
M fe/src/main/java/org/apache/impala/planner/KuduScanNode.java
M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
M fe/src/main/java/org/apache/impala/planner/ValueRange.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
M testdata/workloads/functional-planner/queries/PlannerTest/analytic-fns.test
M testdata/workloads/functional-planner/queries/PlannerTest/conjunct-ordering.test
A testdata/workloads/functional-planner/queries/PlannerTest/constant-propagation.test
M testdata/workloads/functional-planner/queries/PlannerTest/data-source-tables.test
M testdata/workloads/functional-planner/queries/PlannerTest/hdfs.test
M testdata/workloads/functional-planner/queries/PlannerTest/joins.test
M testdata/workloads/functional-planner/queries/PlannerTest/kudu.test
M testdata/workloads/functional-planner/queries/PlannerTest/predicate-propagation.test
M testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test
18 files changed, 535 insertions(+), 112 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/89/6389/13
-- 
To view, visit http://gerrit.cloudera.org:8080/6389
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I79750a8edb945effee2a519fa3b8192b77042cb4
Gerrit-PatchSet: 13
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden <za...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>
Gerrit-Reviewer: Zach Amsden <za...@cloudera.com>

[Impala-ASF-CR] IMPALA-5003: [DRAFT] Generic constant propagation in planner

Posted by "Zach Amsden (Code Review)" <ge...@cloudera.org>.
Zach Amsden has posted comments on this change.

Change subject: IMPALA-5003: [DRAFT] Generic constant propagation in planner
......................................................................


Patch Set 2:

I was not returning the result.  Surely there must be a short-hand for such list assignment that I am missing.

Now this is done, inline views just automatically work:

[localhost:21000] > show create table zhiddenpart;
Query: show create table zhiddenpart
+----------------------------------------------------------------------------------------------------------+
| result                                                                                                   |
+----------------------------------------------------------------------------------------------------------+
| CREATE VIEW `default`.zhiddenpart AS                                                                     |
| SELECT o_orderstatus, o_custkey, o_orderkey FROM `default`.zparted WHERE partition_id = o_orderkey % 100 |
+----------------------------------------------------------------------------------------------------------+
Fetched 1 row(s) in 0.01s
[localhost:21000] > explain select * from zhiddenpart where o_orderkey = 1;
Query: explain select * from zhiddenpart where o_orderkey = 1
+------------------------------------------------------------------------------------+
| Explain String                                                                     |
+------------------------------------------------------------------------------------+
| Estimated Per-Host Requirements: Memory=32.00MB VCores=1                           |
| WARNING: The following tables are missing relevant table and/or column statistics. |
| default.zparted                                                                    |
|                                                                                    |
| PLAN-ROOT SINK                                                                     |
| |                                                                                  |
| 01:EXCHANGE [UNPARTITIONED]                                                        |
| |                                                                                  |
| 00:SCAN HDFS [default.zparted]                                                     |
|    partitions=1/100 files=1 size=235.39KB                                          |
|    predicates: default.zparted.o_orderkey = 1                                      |
+------------------------------------------------------------------------------------+
Fetched 11 row(s) in 4.22s

-- 
To view, visit http://gerrit.cloudera.org:8080/6389
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I79750a8edb945effee2a519fa3b8192b77042cb4
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden <za...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Zach Amsden <za...@cloudera.com>
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-5003: Generic constant propagation in planner

Posted by "Zach Amsden (Code Review)" <ge...@cloudera.org>.
Zach Amsden has posted comments on this change.

Change subject: IMPALA-5003: Generic constant propagation in planner
......................................................................


Patch Set 3:

(17 comments)

No longer a draft.  Now with tests!

http://gerrit.cloudera.org:8080/#/c/6389/3/fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java
File fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java:

Line 79:     rewriter_ = ExprRewriter.createExprRewriter(
> I think we don't actually need the rewriter_ as a member, and instead we ca
I can incorporate this next round if you want, but this diff didn't introduce it as a member.


http://gerrit.cloudera.org:8080/#/c/6389/3/fe/src/main/java/org/apache/impala/analysis/Analyzer.java
File fe/src/main/java/org/apache/impala/analysis/Analyzer.java:

Line 2777:                       + " to " + slotIds.second.toString());
> I'm not even sure this message helps at all. Remove?
Done


http://gerrit.cloudera.org:8080/#/c/6389/3/fe/src/main/java/org/apache/impala/analysis/ConstantPropagator.java
File fe/src/main/java/org/apache/impala/analysis/ConstantPropagator.java:

Line 44: public final class ConstantPropagator {
> brief class comment
Done


Line 51:   //public static ConstantPropagator INSTANCE = new ConstantPropogator();
> remove
Done


PS3, Line 54:   
> comment on thrown exception and state of conjuncts in case of error
fixed


Line 56:   public static boolean propagateConstants(List<Expr> conjuncts, Analyzer analyzer)
> make private, we can still make it public later when the need arises, but f
Done


Line 63:     while (changed) {
> brief comment, maybe with a simple example to show why we need to do this r
Done


Line 80:             try {
> no need for try/catch if we're just going to throw e anyway
Done


Line 87:           output.set(i, rewritten);
> Seems cleaner to only do this when there was a change. That's how our expr 
Done


Line 90:       if (changed) Collections.copy(conjuncts, output);
> Move this outside of the loop. You can iterate over output in L65
Done


Line 95:   // Propogates constants, removes duplicates and optionally performs expr rewriting
> Mention that errors in the rewrite are logged, but ignored, and the 'conjun
Done


Line 96:   public static void optimizeConjuncts(List<Expr> conjuncts, Analyzer analyzer,
> instead of creating a class, why not just move this as static functions int
Expr is already a pretty long class and is more of a utility classes for expr than a policy or implementation class for any optimizations.  I didn't want aggressive optimization specific functions to start infecting Expr.

I also thought maybe we could try optimizing (SR1 <= K, SR2 <= SR1 -> SR2 <= K) (and >= as well, this propagation is valid for any total ordering, but not >,< ) in the future.

If we do that, this class would probably more warrant standing on its own.  Doing so required more machinery in Expr matching than we currently have so I didn't try it in this diff.


Line 101:         for (int i = 0; i < conjuncts.size(); ++i) {
> rewriter.rewriteList()
Done


Line 107:     catch (AnalysisException e) {
> goes after the } in L106
Done


Line 108:       LOG.error("Not able to analyze after rewrite: " + e.toString());
> LOG.warn and print the conjuncts in this message
Done


http://gerrit.cloudera.org:8080/#/c/6389/3/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
File fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java:

Line 1217:     ConstantPropagator.optimizeConjuncts(conjuncts, analyzer,
> To clean up this retrieval of enable_expr_rewrites, consider adding an crea
Let me see if that comes out cleaner.


http://gerrit.cloudera.org:8080/#/c/6389/3/fe/src/main/java/org/apache/impala/rewrite/ExprRewriter.java
File fe/src/main/java/org/apache/impala/rewrite/ExprRewriter.java:

Line 48:   public enum RuleSet {
> I see what you are doing, but I don't think we need to be quite as general 
Agreed.  Reduced this to two sets.


-- 
To view, visit http://gerrit.cloudera.org:8080/6389
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I79750a8edb945effee2a519fa3b8192b77042cb4
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden <za...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>
Gerrit-Reviewer: Zach Amsden <za...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-5003: Constant propagation in scan nodes and inline views

Posted by "Alex Behm (Code Review)" <ge...@cloudera.org>.
Alex Behm has posted comments on this change.

Change subject: IMPALA-5003: Constant propagation in scan nodes and inline views
......................................................................


Patch Set 11:

(15 comments)

http://gerrit.cloudera.org:8080/#/c/6389/11/fe/src/main/java/org/apache/impala/analysis/Expr.java
File fe/src/main/java/org/apache/impala/analysis/Expr.java:

Line 924:    * Propagates constant expression of the form <slot ref> = <constant> to
expressions


Line 931:     for (Expr expr: conjuncts) {
It's probably not that important to worry about perf, but I think it should be simple enough to change this algorithm to avoid redundant and unnecessary work. For example:
- we only need to apply the smap to non <slot ref> = <constant> exprs
- it's cheaper to create one big smap, rather than one smap with a single mapping
- each propagation round definitely reduces the set of exprs we need to consider in the next round


Line 964:   private static boolean collateInequalities(List<Expr> conjuncts, Analyzer analyzer) {
Please avoid scope creep and leave this out. Feel free to file a new JIRA for this improvement.


http://gerrit.cloudera.org:8080/#/c/6389/11/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
File fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java:

Line 968:    * to construct a ValueRange, 
whitespace


Line 993:       if (!(comp.getOp() == BinaryPredicate.Operator.GT)  &&
use !=


Line 996:           !(comp.getOp() == BinaryPredicate.Operator.LE)) continue;
use {} for multi-line ifs


Line 1134:     Expr.optimizeConjuncts(preds, analyzer);
I don't think we should add this in this patch because the behavior could be confusing to us and users (might get different plans depending on small changes to a query, even worse you might get better plans if you use more inline views!). Also why would we even optimize the pre-substituted exprs, i.e. before L1139?


Line 1252:     TupleDescriptor tupleDesc = tblRef.getDesc();
inline into the only use in L1256


Line 1256:     TupleId id = tupleDesc.getId();
id -> tid


Line 1266:     if (!Expr.optimizeConjuncts(conjuncts, analyzer)) {
Only if expr rewrites are enabled?


Line 1267:       // Conjuncts implied false; convert to emptySetNode
EmptySetNode


Line 1270:       return node;
nice!


http://gerrit.cloudera.org:8080/#/c/6389/11/fe/src/main/java/org/apache/impala/planner/ValueRange.java
File fe/src/main/java/org/apache/impala/planner/ValueRange.java:

Line 107:         !(upperBound_ instanceof LiteralExpr) || !(lowerBound_ instanceof LiteralExpr))
use {} for multi-line ifs


http://gerrit.cloudera.org:8080/#/c/6389/11/fe/src/main/java/org/apache/impala/rewrite/ExprRewriter.java
File fe/src/main/java/org/apache/impala/rewrite/ExprRewriter.java:

Line 48:   private Expr rewrite(Expr expr, Analyzer analyzer, boolean reanalyze)
Why are these changes needed? Can we avoid them?

Not re-analyzing is an issue, but I'd prefer to solve that differently.


Line 110:   public void reset() {
single line


-- 
To view, visit http://gerrit.cloudera.org:8080/6389
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I79750a8edb945effee2a519fa3b8192b77042cb4
Gerrit-PatchSet: 11
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden <za...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>
Gerrit-Reviewer: Zach Amsden <za...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-5003: Constant propagation in scan nodes and inline views

Posted by "Zach Amsden (Code Review)" <ge...@cloudera.org>.
Zach Amsden has posted comments on this change.

Change subject: IMPALA-5003: Constant propagation in scan nodes and inline views
......................................................................


Patch Set 10:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/6389/6/fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java
File fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java:

Line 72:     customRewriter_ = null;
This was an unexpected wrinkle that made things awkward.


http://gerrit.cloudera.org:8080/#/c/6389/10/fe/src/main/java/org/apache/impala/analysis/Expr.java
File fe/src/main/java/org/apache/impala/analysis/Expr.java:

Line 996:           info = new SlotInfo(it.nextIndex() - 1);
lol I forgot to put it in the map.  Worked out of the box after that.  Still needs some minor changes (this step only runs after successful constant propagation), but also should run even if no propagation is done.


-- 
To view, visit http://gerrit.cloudera.org:8080/6389
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I79750a8edb945effee2a519fa3b8192b77042cb4
Gerrit-PatchSet: 10
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden <za...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>
Gerrit-Reviewer: Zach Amsden <za...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-5003: [DRAFT] Generic constant propagation in planner

Posted by "Zach Amsden (Code Review)" <ge...@cloudera.org>.
Zach Amsden has uploaded a new patch set (#4).

Change subject: IMPALA-5003: [DRAFT] Generic constant propagation in planner
......................................................................

IMPALA-5003: [DRAFT] Generic constant propagation in planner

Rather than specialize the constant propagation framework to be specific
to a particular node type, create a generic framework for migrating
constants between conjuncts.  Right now it is still hard coded as being
embedded in HdfsScanNode, and a few other places where it potentially
matters for partition selection.  We could do better in some cases
and eliminate the scan node entirely.

Testing: Expanded the test cases for the planner to achieve constant
propagation.

Change-Id: I79750a8edb945effee2a519fa3b8192b77042cb4
---
M fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java
M fe/src/main/java/org/apache/impala/analysis/Analyzer.java
A fe/src/main/java/org/apache/impala/analysis/ConstantPropagator.java
M fe/src/main/java/org/apache/impala/analysis/Expr.java
M fe/src/main/java/org/apache/impala/analysis/SelectList.java
M fe/src/main/java/org/apache/impala/planner/KuduScanNode.java
M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
M fe/src/main/java/org/apache/impala/rewrite/ExprRewriter.java
M testdata/workloads/functional-planner/queries/PlannerTest/analytic-fns.test
M testdata/workloads/functional-planner/queries/PlannerTest/conjunct-ordering.test
A testdata/workloads/functional-planner/queries/PlannerTest/constant-propagation.test
M testdata/workloads/functional-planner/queries/PlannerTest/joins.test
M testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test
13 files changed, 252 insertions(+), 31 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/89/6389/4
-- 
To view, visit http://gerrit.cloudera.org:8080/6389
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I79750a8edb945effee2a519fa3b8192b77042cb4
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden <za...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>
Gerrit-Reviewer: Zach Amsden <za...@cloudera.com>

[Impala-ASF-CR] IMPALA-5003: Constant propagation in scan nodes and inline views

Posted by "Zach Amsden (Code Review)" <ge...@cloudera.org>.
Zach Amsden has posted comments on this change.

Change subject: IMPALA-5003: Constant propagation in scan nodes and inline views
......................................................................


Patch Set 11:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6389/11/fe/src/main/java/org/apache/impala/rewrite/NormalizeBinaryPredicatesRule.java
File fe/src/main/java/org/apache/impala/rewrite/NormalizeBinaryPredicatesRule.java:

Line 36:  * id1 > id2 -> id2 < id1
I am going to abandon changes here.  Although this would make it easier to extend to analysis chains, e.g. A <= B <= C <= A -> A = B = C, the complications introduced by the non-inclusive relation make implementing this quite a bit of work.  This change is already large enough and I'd rather keep the scope confined to the two simple steps.


-- 
To view, visit http://gerrit.cloudera.org:8080/6389
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I79750a8edb945effee2a519fa3b8192b77042cb4
Gerrit-PatchSet: 11
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden <za...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>
Gerrit-Reviewer: Zach Amsden <za...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-5003: Constant propagation in scan nodes

Posted by "Alex Behm (Code Review)" <ge...@cloudera.org>.
Alex Behm has posted comments on this change.

Change subject: IMPALA-5003: Constant propagation in scan nodes
......................................................................


Patch Set 15:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/6389/15/testdata/workloads/functional-planner/queries/PlannerTest/constant-propagation.test
File testdata/workloads/functional-planner/queries/PlannerTest/constant-propagation.test:

Line 109: # Aggregate doesn't affect pushing predicates
This test is not specific to constant propagation, so let's leave it out.


Line 124: # Again, the contradiction is derived in the ScanNode
Same here, this test seems to be more about predicate assignment than constant propagation.


Line 139: # Make sure predicates are not pushed into inline views with limits
Same here, this test is about predicate assignment (plenty of coverage elsewhere already).

We should really only be testing the optimizeConjuncts() functionality in isolation here,


Line 161: # We can optimize in some cases
Instead of testing various predicate assignment scenarios, we should focus more on interesting cases for optimizeConjuncts(). Some ideas;
- All conjuncts are optimized to a single "true"
- Conjuncts with NULLs
- Duplicate conjuncts
- Test propagation into non-trivial exprs, e.g. disjunctions, function call exprs
- Test errors during constant propagation (I'm still thinking whether there's an easy way to do this)

For HDFS scans, we should also optimize the collectionConjuncts_, and add 1-2 tests for that case.

Have you tried an extreme example with lots of conjuncts to see how long this optimization takes? Should we consider having a limit on when not to apply constant propagation due to prohibitive optimization cost?


http://gerrit.cloudera.org:8080/#/c/6389/15/testdata/workloads/functional-planner/queries/PlannerTest/kudu.test
File testdata/workloads/functional-planner/queries/PlannerTest/kudu.test:

Line 127: ---- SCANRANGELOCATIONS
Don't think we need the SCANRANGELOCATIONS and DISTRIBUTEDPLAN sections.


-- 
To view, visit http://gerrit.cloudera.org:8080/6389
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I79750a8edb945effee2a519fa3b8192b77042cb4
Gerrit-PatchSet: 15
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden <za...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>
Gerrit-Reviewer: Zach Amsden <za...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-5003: Generic constant propagation in planner

Posted by "Zach Amsden (Code Review)" <ge...@cloudera.org>.
Zach Amsden has posted comments on this change.

Change subject: IMPALA-5003: Generic constant propagation in planner
......................................................................


Patch Set 3:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6389/3/fe/src/main/java/org/apache/impala/analysis/ConstantPropagator.java
File fe/src/main/java/org/apache/impala/analysis/ConstantPropagator.java:

Line 96:   public static void optimizeConjuncts(List<Expr> conjuncts, Analyzer analyzer,
> a class requires an abstraction, and this doesn't appear to have an abstrac
I'm happy to move this back into Expr.  If it starts looking more like a class or actually needing state it can always become a subclass there.


-- 
To view, visit http://gerrit.cloudera.org:8080/6389
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I79750a8edb945effee2a519fa3b8192b77042cb4
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden <za...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>
Gerrit-Reviewer: Zach Amsden <za...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-5003: Constant propagation in scan nodes

Posted by "Zach Amsden (Code Review)" <ge...@cloudera.org>.
Zach Amsden has uploaded a new patch set (#15).

Change subject: IMPALA-5003: Constant propagation in scan nodes
......................................................................

IMPALA-5003: Constant propagation in scan nodes

When conjuncts are pushed into table refs from inline views, they can
be considered for constant progagation within that node.  In certain
cases, we might end up with a FALSE conditional and now we can
convert ScanNodes to EmptySet nodes when that occurs, and in other
cases, we might be able to prune partitions based on expressions which
have now become constant.

Testing: Expanded the test cases for the planner to achieve constant
propagation.  Added Kudu, datasource, Hdfs and HBase tests to validate
we can create EmptySetNodes.

Change-Id: I79750a8edb945effee2a519fa3b8192b77042cb4
---
M fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java
M fe/src/main/java/org/apache/impala/analysis/Analyzer.java
M fe/src/main/java/org/apache/impala/analysis/Expr.java
M fe/src/main/java/org/apache/impala/analysis/SelectList.java
M fe/src/main/java/org/apache/impala/planner/DataSourceScanNode.java
M fe/src/main/java/org/apache/impala/planner/KuduScanNode.java
M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
M testdata/workloads/functional-planner/queries/PlannerTest/analytic-fns.test
M testdata/workloads/functional-planner/queries/PlannerTest/conjunct-ordering.test
A testdata/workloads/functional-planner/queries/PlannerTest/constant-propagation.test
M testdata/workloads/functional-planner/queries/PlannerTest/data-source-tables.test
M testdata/workloads/functional-planner/queries/PlannerTest/hdfs.test
M testdata/workloads/functional-planner/queries/PlannerTest/joins.test
M testdata/workloads/functional-planner/queries/PlannerTest/kudu.test
M testdata/workloads/functional-planner/queries/PlannerTest/predicate-propagation.test
M testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test
17 files changed, 471 insertions(+), 88 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/89/6389/15
-- 
To view, visit http://gerrit.cloudera.org:8080/6389
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I79750a8edb945effee2a519fa3b8192b77042cb4
Gerrit-PatchSet: 15
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden <za...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>
Gerrit-Reviewer: Zach Amsden <za...@cloudera.com>

[Impala-ASF-CR] IMPALA-5003: Constant propagation in scan nodes and inline views

Posted by "Zach Amsden (Code Review)" <ge...@cloudera.org>.
Zach Amsden has posted comments on this change.

Change subject: IMPALA-5003: Constant propagation in scan nodes and inline views
......................................................................


Patch Set 13:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/6389/13/fe/src/main/java/org/apache/impala/analysis/Expr.java
File fe/src/main/java/org/apache/impala/analysis/Expr.java:

Line 36: import org.apache.impala.common.Pair;
Actually unused, will remove


http://gerrit.cloudera.org:8080/#/c/6389/13/fe/src/main/java/org/apache/impala/planner/ValueRange.java
File fe/src/main/java/org/apache/impala/planner/ValueRange.java:

Line 67:   public boolean incorporatePredicate(BinaryPredicate.Operator op, Expr bound) {
I supposed we should remove this code and add it back in a later diff (when it can actually be tested).


http://gerrit.cloudera.org:8080/#/c/6389/13/testdata/workloads/functional-planner/queries/PlannerTest/constant-propagation.test
File testdata/workloads/functional-planner/queries/PlannerTest/constant-propagation.test:

Line 116: 01:AGGREGATE [FINALIZE]
typo


-- 
To view, visit http://gerrit.cloudera.org:8080/6389
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I79750a8edb945effee2a519fa3b8192b77042cb4
Gerrit-PatchSet: 13
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden <za...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>
Gerrit-Reviewer: Zach Amsden <za...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-5003: Constant propagation in scan conjuncts

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-5003: Constant propagation in scan conjuncts
......................................................................


Patch Set 25: Verified+1

-- 
To view, visit http://gerrit.cloudera.org:8080/6389
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I79750a8edb945effee2a519fa3b8192b77042cb4
Gerrit-PatchSet: 25
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden <za...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>
Gerrit-Reviewer: Zach Amsden <za...@cloudera.com>
Gerrit-Reviewer: anujphadke <ap...@cloudera.com>
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-5003: Constant propagation in scan nodes and inline views

Posted by "Zach Amsden (Code Review)" <ge...@cloudera.org>.
Zach Amsden has uploaded a new patch set (#10).

Change subject: IMPALA-5003: Constant propagation in scan nodes and inline views
......................................................................

IMPALA-5003: Constant propagation in scan nodes and inline views

N.B. - preview diff again.  I added an inequality collation phase which
is not yet tested or finished, but should combine conjuncts such as
a < k1, a < k2  into a < min(k1, k2).

When conjuncts are pushed into table refs and inline views, they can
be considered for constant progagation within that node.  In certain
cases, we might end up with a FALSE conditional and now we can
convert ScanNodes to EmptySet nodes when that occurs.

Testing: Expanded the test cases for the planner to achieve constant
propagation.  Added Kudu, datasource, Hdfs and HBase tests to validate
we can create EmptySetNodes

Change-Id: I79750a8edb945effee2a519fa3b8192b77042cb4
---
M fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java
M fe/src/main/java/org/apache/impala/analysis/Analyzer.java
M fe/src/main/java/org/apache/impala/analysis/Expr.java
M fe/src/main/java/org/apache/impala/analysis/SelectList.java
M fe/src/main/java/org/apache/impala/planner/DataSourceScanNode.java
M fe/src/main/java/org/apache/impala/planner/KuduScanNode.java
M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
M fe/src/main/java/org/apache/impala/planner/ValueRange.java
M fe/src/main/java/org/apache/impala/rewrite/ExprRewriter.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
M testdata/workloads/functional-planner/queries/PlannerTest/analytic-fns.test
M testdata/workloads/functional-planner/queries/PlannerTest/conjunct-ordering.test
A testdata/workloads/functional-planner/queries/PlannerTest/constant-propagation.test
M testdata/workloads/functional-planner/queries/PlannerTest/data-source-tables.test
M testdata/workloads/functional-planner/queries/PlannerTest/hdfs.test
M testdata/workloads/functional-planner/queries/PlannerTest/joins.test
M testdata/workloads/functional-planner/queries/PlannerTest/kudu.test
M testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test
18 files changed, 607 insertions(+), 110 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/89/6389/10
-- 
To view, visit http://gerrit.cloudera.org:8080/6389
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I79750a8edb945effee2a519fa3b8192b77042cb4
Gerrit-PatchSet: 10
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden <za...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>
Gerrit-Reviewer: Zach Amsden <za...@cloudera.com>

[Impala-ASF-CR] IMPALA-5003: Constant propagation in scan conjuncts

Posted by "Zach Amsden (Code Review)" <ge...@cloudera.org>.
Hello Impala Public Jenkins, Alex Behm,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/6389

to look at the new patch set (#25).

Change subject: IMPALA-5003: Constant propagation in scan conjuncts
......................................................................

IMPALA-5003: Constant propagation in scan conjuncts

Implements constant propagation within conjuncts and applies the
optimization to scan conjuncts and collection conjuncts within Hdfs
scan nodes.  The optimization is applied during planning.  At scan
nodes in particular, we want to optimize to enable partition pruning.
In certain cases, we might end up with a FALSE conditional, which
now will convert to an EmptySet node.

Testing: Expanded the test cases for the planner to achieve constant
propagation.  Added Kudu, datasource, Hdfs and HBase tests to validate
we can create EmptySetNodes.

Change-Id: I79750a8edb945effee2a519fa3b8192b77042cb4
---
M fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java
M fe/src/main/java/org/apache/impala/analysis/Analyzer.java
M fe/src/main/java/org/apache/impala/analysis/Expr.java
M fe/src/main/java/org/apache/impala/analysis/SelectList.java
M fe/src/main/java/org/apache/impala/planner/DataSourceScanNode.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M fe/src/main/java/org/apache/impala/planner/KuduScanNode.java
M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
M testdata/workloads/functional-planner/queries/PlannerTest/analytic-fns.test
M testdata/workloads/functional-planner/queries/PlannerTest/conjunct-ordering.test
A testdata/workloads/functional-planner/queries/PlannerTest/constant-propagation.test
M testdata/workloads/functional-planner/queries/PlannerTest/data-source-tables.test
M testdata/workloads/functional-planner/queries/PlannerTest/hdfs.test
M testdata/workloads/functional-planner/queries/PlannerTest/joins.test
M testdata/workloads/functional-planner/queries/PlannerTest/kudu.test
M testdata/workloads/functional-planner/queries/PlannerTest/predicate-propagation.test
M testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test
M testdata/workloads/functional-query/queries/QueryTest/data-source-tables.test
19 files changed, 636 insertions(+), 93 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/89/6389/25
-- 
To view, visit http://gerrit.cloudera.org:8080/6389
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I79750a8edb945effee2a519fa3b8192b77042cb4
Gerrit-PatchSet: 25
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden <za...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>
Gerrit-Reviewer: Zach Amsden <za...@cloudera.com>
Gerrit-Reviewer: anujphadke <ap...@cloudera.com>

[Impala-ASF-CR] IMPALA-5003: Constant propagation in scan nodes

Posted by "Zach Amsden (Code Review)" <ge...@cloudera.org>.
Zach Amsden has posted comments on this change.

Change subject: IMPALA-5003: Constant propagation in scan nodes
......................................................................


Patch Set 16:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/6389/16/fe/src/main/java/org/apache/impala/analysis/Expr.java
File fe/src/main/java/org/apache/impala/analysis/Expr.java:

Line 749:       // conjuncts because NULLs are handled differently for CompoundPredicate.AND
This is the comment to which I was referring.


http://gerrit.cloudera.org:8080/#/c/6389/16/testdata/workloads/functional-planner/queries/PlannerTest/constant-propagation.test
File testdata/workloads/functional-planner/queries/PlannerTest/constant-propagation.test:

Line 245: (coalesce(NULL, T.int_col) + random() * T.tinyint_col = 100 OR
> Agree we can address the issue if we want to. I suggest we leave for now be
I am terrified at how invasive that diff could become.  Totally agree it should be in another change.


Line 335: 00:SCAN HDFS [functional.alltypes]
> I think this is a minor oversight in optimizeConjuncts(). If one of the con
Easy fix, will check to make sure this is consistent with existing behavior.


-- 
To view, visit http://gerrit.cloudera.org:8080/6389
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I79750a8edb945effee2a519fa3b8192b77042cb4
Gerrit-PatchSet: 16
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden <za...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>
Gerrit-Reviewer: Zach Amsden <za...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-5003: [DRAFT] Generic constant propagation in planner

Posted by "Zach Amsden (Code Review)" <ge...@cloudera.org>.
Zach Amsden has uploaded a new patch set (#3).

Change subject: IMPALA-5003: [DRAFT] Generic constant propagation in planner
......................................................................

IMPALA-5003: [DRAFT] Generic constant propagation in planner

Rather than specialize the constant propagation framework to be specific
to a particular node type, create a generic framework for migrating
constants between conjuncts.  Right now it is still hard coded as being
embedded in HdfsScanNode, and a few other places where it potentially
matters for partition selection.

Testing: So far, manual.  Plan is to examine query plans once the full
Substitution and folding work and are migrated into partition selection.

 explain select * from zparted where o_orderkey = 1 and partition_id =
o_orderkey % 100;

ConstantPropagator.java:85] Constant substitution from slot 2:
         partition_id = 1 % 100
ConstantPropagator.java:106] Constant folded result: partition_id = 1

 00:SCAN HDFS [default.zparted]
 partitions=1/100 files=100 size=23.00MB
 predicates: o_orderkey = 1

Change-Id: I79750a8edb945effee2a519fa3b8192b77042cb4
---
M fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java
M fe/src/main/java/org/apache/impala/analysis/Analyzer.java
A fe/src/main/java/org/apache/impala/analysis/ConstantPropagator.java
M fe/src/main/java/org/apache/impala/planner/KuduScanNode.java
M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
M fe/src/main/java/org/apache/impala/rewrite/ExprRewriter.java
6 files changed, 162 insertions(+), 21 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/89/6389/3
-- 
To view, visit http://gerrit.cloudera.org:8080/6389
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I79750a8edb945effee2a519fa3b8192b77042cb4
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden <za...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Zach Amsden <za...@cloudera.com>

[Impala-ASF-CR] IMPALA-5003: [DRAFT] Generic constant propagation in planner

Posted by "Zach Amsden (Code Review)" <ge...@cloudera.org>.
Zach Amsden has uploaded a new patch set (#2).

Change subject: IMPALA-5003: [DRAFT] Generic constant propagation in planner
......................................................................

IMPALA-5003: [DRAFT] Generic constant propagation in planner

Rather than specialize the constant propagation framework to be specific
to a particular node type, create a generic framework for migrating
constants between conjuncts.  Right now it is still hard coded as being
embedded in HdfsScanNode, but the initial support appears to be working.

Testing: So far, manual.  Plan is to examine query plans once the full
Substitution and folding work and are migrated into partition selection.

 explain select * from zparted where o_orderkey = 1 and partition_id =
o_orderkey % 100;

ConstantPropagator.java:85] Constant substitution from slot 2:
         partition_id = 1 % 100
ConstantPropagator.java:106] Constant folded result: partition_id = 1

 00:SCAN HDFS [default.zparted]
 partitions=1/100 files=100 size=23.00MB
 predicates: o_orderkey = 1

Change-Id: I79750a8edb945effee2a519fa3b8192b77042cb4
---
M fe/src/main/java/org/apache/impala/analysis/Analyzer.java
A fe/src/main/java/org/apache/impala/analysis/ConstantPropagator.java
M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
3 files changed, 121 insertions(+), 2 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/89/6389/2
-- 
To view, visit http://gerrit.cloudera.org:8080/6389
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I79750a8edb945effee2a519fa3b8192b77042cb4
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden <za...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>

[Impala-ASF-CR] IMPALA-5003: Constant propagation in scan nodes

Posted by "Alex Behm (Code Review)" <ge...@cloudera.org>.
Alex Behm has posted comments on this change.

Change subject: IMPALA-5003: Constant propagation in scan nodes
......................................................................


Patch Set 18:

(4 comments)

Basically done, final nits

http://gerrit.cloudera.org:8080/#/c/6389/18//COMMIT_MSG
Commit Message:

Line 7: IMPALA-5003: Constant propagation in scan nodes
Constant propagation in scan conjuncts


Line 9: When conjuncts are pushed into table refs from inline views, they can
Wording is slightly confusing because we don't "push conjuncts into table refs", we have specific terms for those things. How about saying something like (amend as you see fit obviously):

Implements constant propagation within a list of conjuncts and applies the optimization to scan conjuncts. The optimization is applied during planning to take the complete list of applicable conjuncts into account. In particular, we want to simplify conjuncts to enable partition pruning....


http://gerrit.cloudera.org:8080/#/c/6389/18/fe/src/main/java/org/apache/impala/analysis/Expr.java
File fe/src/main/java/org/apache/impala/analysis/Expr.java:

Line 922:   // Arbitrary cost bound for propagation
How about:

// Arbitrary max number of exprs considered during constant propagation to limit the cost due to the O(N^2) complexity.


Line 923:   private final static int CONST_PROPAGATION_LIMIT = 200;
CONST_PROPAGATION_EXPR_LIMIT


-- 
To view, visit http://gerrit.cloudera.org:8080/6389
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I79750a8edb945effee2a519fa3b8192b77042cb4
Gerrit-PatchSet: 18
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden <za...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>
Gerrit-Reviewer: Zach Amsden <za...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-5003: Constant propagation in scan nodes and inline views

Posted by "Zach Amsden (Code Review)" <ge...@cloudera.org>.
Zach Amsden has uploaded a new patch set (#6).

Change subject: IMPALA-5003: Constant propagation in scan nodes and inline views
......................................................................

IMPALA-5003: Constant propagation in scan nodes and inline views

When conjuncts are pushed into table refs and inline views, they can
be considered for constant progagation within that node.

Note: currently broken due to optimizing away column refs for HBase
nodes when FALSE conjuncts are present.  Working on a fix.

Testing: Expanded the test cases for the planner to achieve constant
propagation.  More tests being added (for HBase and Kudu)

Change-Id: I79750a8edb945effee2a519fa3b8192b77042cb4
---
M fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java
M fe/src/main/java/org/apache/impala/analysis/Analyzer.java
M fe/src/main/java/org/apache/impala/analysis/Expr.java
M fe/src/main/java/org/apache/impala/analysis/SelectList.java
M fe/src/main/java/org/apache/impala/planner/KuduScanNode.java
M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
M fe/src/main/java/org/apache/impala/rewrite/ExprRewriter.java
M testdata/workloads/functional-planner/queries/PlannerTest/analytic-fns.test
M testdata/workloads/functional-planner/queries/PlannerTest/conjunct-ordering.test
A testdata/workloads/functional-planner/queries/PlannerTest/constant-propagation.test
M testdata/workloads/functional-planner/queries/PlannerTest/joins.test
M testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test
A testdata/workloads/visa
13 files changed, 219 insertions(+), 43 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/89/6389/6
-- 
To view, visit http://gerrit.cloudera.org:8080/6389
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I79750a8edb945effee2a519fa3b8192b77042cb4
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden <za...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>
Gerrit-Reviewer: Zach Amsden <za...@cloudera.com>

[Impala-ASF-CR] IMPALA-5003: Constant propagation in scan conjuncts

Posted by "Alex Behm (Code Review)" <ge...@cloudera.org>.
Alex Behm has posted comments on this change.

Change subject: IMPALA-5003: Constant propagation in scan conjuncts
......................................................................


Patch Set 25: Code-Review+2

-- 
To view, visit http://gerrit.cloudera.org:8080/6389
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I79750a8edb945effee2a519fa3b8192b77042cb4
Gerrit-PatchSet: 25
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden <za...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>
Gerrit-Reviewer: Zach Amsden <za...@cloudera.com>
Gerrit-Reviewer: anujphadke <ap...@cloudera.com>
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-5003: Constant propagation in scan nodes and inline views

Posted by "Zach Amsden (Code Review)" <ge...@cloudera.org>.
Zach Amsden has uploaded a new patch set (#8).

Change subject: IMPALA-5003: Constant propagation in scan nodes and inline views
......................................................................

IMPALA-5003: Constant propagation in scan nodes and inline views

When conjuncts are pushed into table refs and inline views, they can
be considered for constant progagation within that node.  In certain
cases, we might end up with a FALSE conditional and now we can
convert ScanNodes to EmptySet nodes when that occurs.

Testing: Expanded the test cases for the planner to achieve constant
propagation.  Only thing missing right now is Kudu tests.

Change-Id: I79750a8edb945effee2a519fa3b8192b77042cb4
---
M fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java
M fe/src/main/java/org/apache/impala/analysis/Analyzer.java
M fe/src/main/java/org/apache/impala/analysis/Expr.java
M fe/src/main/java/org/apache/impala/analysis/SelectList.java
M fe/src/main/java/org/apache/impala/planner/DataSourceScanNode.java
M fe/src/main/java/org/apache/impala/planner/KuduScanNode.java
M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
M fe/src/main/java/org/apache/impala/rewrite/ExprRewriter.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
M testdata/workloads/functional-planner/queries/PlannerTest/analytic-fns.test
M testdata/workloads/functional-planner/queries/PlannerTest/conjunct-ordering.test
A testdata/workloads/functional-planner/queries/PlannerTest/constant-propagation.test
M testdata/workloads/functional-planner/queries/PlannerTest/hdfs.test
M testdata/workloads/functional-planner/queries/PlannerTest/joins.test
M testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test
15 files changed, 356 insertions(+), 86 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/89/6389/8
-- 
To view, visit http://gerrit.cloudera.org:8080/6389
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I79750a8edb945effee2a519fa3b8192b77042cb4
Gerrit-PatchSet: 8
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden <za...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>
Gerrit-Reviewer: Zach Amsden <za...@cloudera.com>

[Impala-ASF-CR] IMPALA-5003: Constant propagation in scan conjuncts

Posted by "Alex Behm (Code Review)" <ge...@cloudera.org>.
Alex Behm has posted comments on this change.

Change subject: IMPALA-5003: Constant propagation in scan conjuncts
......................................................................


Patch Set 24: Code-Review+2

-- 
To view, visit http://gerrit.cloudera.org:8080/6389
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I79750a8edb945effee2a519fa3b8192b77042cb4
Gerrit-PatchSet: 24
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden <za...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>
Gerrit-Reviewer: Zach Amsden <za...@cloudera.com>
Gerrit-Reviewer: anujphadke <ap...@cloudera.com>
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-5003: Generic constant propagation in planner

Posted by "Marcel Kornacker (Code Review)" <ge...@cloudera.org>.
Marcel Kornacker has posted comments on this change.

Change subject: IMPALA-5003: Generic constant propagation in planner
......................................................................


Patch Set 3:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6389/3/fe/src/main/java/org/apache/impala/analysis/ConstantPropagator.java
File fe/src/main/java/org/apache/impala/analysis/ConstantPropagator.java:

Line 96:   public static void optimizeConjuncts(List<Expr> conjuncts, Analyzer analyzer,
> Expr is already a pretty long class and is more of a utility classes for ex
a class requires an abstraction, and this doesn't appear to have an abstraction beyond what the public function does.

this doesn't look like a class to me. i'm happy to discuss this in person.


-- 
To view, visit http://gerrit.cloudera.org:8080/6389
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I79750a8edb945effee2a519fa3b8192b77042cb4
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden <za...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>
Gerrit-Reviewer: Zach Amsden <za...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-5003: Constant propagation in scan conjuncts

Posted by "Alex Behm (Code Review)" <ge...@cloudera.org>.
Alex Behm has submitted this change and it was merged.

Change subject: IMPALA-5003: Constant propagation in scan conjuncts
......................................................................


IMPALA-5003: Constant propagation in scan conjuncts

Implements constant propagation within conjuncts and applies the
optimization to scan conjuncts and collection conjuncts within Hdfs
scan nodes.  The optimization is applied during planning.  At scan
nodes in particular, we want to optimize to enable partition pruning.
In certain cases, we might end up with a FALSE conditional, which
now will convert to an EmptySet node.

Testing: Expanded the test cases for the planner to achieve constant
propagation.  Added Kudu, datasource, Hdfs and HBase tests to validate
we can create EmptySetNodes.

Change-Id: I79750a8edb945effee2a519fa3b8192b77042cb4
Reviewed-on: http://gerrit.cloudera.org:8080/6389
Tested-by: Impala Public Jenkins
Reviewed-by: Alex Behm <al...@cloudera.com>
---
M fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java
M fe/src/main/java/org/apache/impala/analysis/Analyzer.java
M fe/src/main/java/org/apache/impala/analysis/Expr.java
M fe/src/main/java/org/apache/impala/analysis/SelectList.java
M fe/src/main/java/org/apache/impala/planner/DataSourceScanNode.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M fe/src/main/java/org/apache/impala/planner/KuduScanNode.java
M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
M testdata/workloads/functional-planner/queries/PlannerTest/analytic-fns.test
M testdata/workloads/functional-planner/queries/PlannerTest/conjunct-ordering.test
A testdata/workloads/functional-planner/queries/PlannerTest/constant-propagation.test
M testdata/workloads/functional-planner/queries/PlannerTest/data-source-tables.test
M testdata/workloads/functional-planner/queries/PlannerTest/hdfs.test
M testdata/workloads/functional-planner/queries/PlannerTest/joins.test
M testdata/workloads/functional-planner/queries/PlannerTest/kudu.test
M testdata/workloads/functional-planner/queries/PlannerTest/predicate-propagation.test
M testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test
M testdata/workloads/functional-query/queries/QueryTest/data-source-tables.test
19 files changed, 636 insertions(+), 93 deletions(-)

Approvals:
  Impala Public Jenkins: Verified
  Alex Behm: Looks good to me, approved



-- 
To view, visit http://gerrit.cloudera.org:8080/6389
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: I79750a8edb945effee2a519fa3b8192b77042cb4
Gerrit-PatchSet: 26
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden <za...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>
Gerrit-Reviewer: Zach Amsden <za...@cloudera.com>
Gerrit-Reviewer: anujphadke <ap...@cloudera.com>

[Impala-ASF-CR] IMPALA-5003: Constant propagation in scan conjuncts

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-5003: Constant propagation in scan conjuncts
......................................................................


Patch Set 22:

Build started: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/507/

-- 
To view, visit http://gerrit.cloudera.org:8080/6389
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I79750a8edb945effee2a519fa3b8192b77042cb4
Gerrit-PatchSet: 22
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden <za...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>
Gerrit-Reviewer: Zach Amsden <za...@cloudera.com>
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-5003: [DRAFT] Generic constant propagation in planner

Posted by "Alex Behm (Code Review)" <ge...@cloudera.org>.
Alex Behm has posted comments on this change.

Change subject: IMPALA-5003: [DRAFT] Generic constant propagation in planner
......................................................................


Patch Set 3:

(17 comments)

We still need tests. I suggest either adding a new unittest similar to ExprRewriteTest, or just adding the tests in there.

http://gerrit.cloudera.org:8080/#/c/6389/3/fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java
File fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java:

Line 79:     rewriter_ = ExprRewriter.createExprRewriter(
I think we don't actually need the rewriter_ as a member, and instead we can create it on demand in analyze() below using the createExprRewriter() in Analyzer (see my other comment on adding this convenience function)


http://gerrit.cloudera.org:8080/#/c/6389/3/fe/src/main/java/org/apache/impala/analysis/Analyzer.java
File fe/src/main/java/org/apache/impala/analysis/Analyzer.java:

Line 2777:                       + " to " + slotIds.second.toString());
I'm not even sure this message helps at all. Remove?


http://gerrit.cloudera.org:8080/#/c/6389/3/fe/src/main/java/org/apache/impala/analysis/ConstantPropagator.java
File fe/src/main/java/org/apache/impala/analysis/ConstantPropagator.java:

Line 1: 
extra newline


Line 44: public final class ConstantPropagator {
brief class comment


Line 51:   //public static ConstantPropagator INSTANCE = new ConstantPropogator();
remove


PS3, Line 54:   
comment on thrown exception and state of conjuncts in case of error


Line 56:   public static boolean propagateConstants(List<Expr> conjuncts, Analyzer analyzer)
make private, we can still make it public later when the need arises, but for now we only expect optimizeConjuncts() to be called


Line 63:     while (changed) {
brief comment, maybe with a simple example to show why we need to do this repeatedly


Line 80:             try {
no need for try/catch if we're just going to throw e anyway


Line 87:           output.set(i, rewritten);
Seems cleaner to only do this when there was a change. That's how our expr rewrites generally work (returned expr == original expr, if there were no changes)


Line 90:       if (changed) Collections.copy(conjuncts, output);
Move this outside of the loop. You can iterate over output in L65


Line 95:   // Propogates constants, removes duplicates and optionally performs expr rewriting
Mention that errors in the rewrite are logged, but ignored, and the 'conjuncts' remain unchanged then


Line 101:         for (int i = 0; i < conjuncts.size(); ++i) {
rewriter.rewriteList()


Line 107:     catch (AnalysisException e) {
goes after the } in L106


Line 108:       LOG.error("Not able to analyze after rewrite: " + e.toString());
LOG.warn and print the conjuncts in this message


http://gerrit.cloudera.org:8080/#/c/6389/3/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
File fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java:

Line 1217:     ConstantPropagator.optimizeConjuncts(conjuncts, analyzer,
To clean up this retrieval of enable_expr_rewrites, consider adding an createExprRewriter() function in Analyzer.


http://gerrit.cloudera.org:8080/#/c/6389/3/fe/src/main/java/org/apache/impala/rewrite/ExprRewriter.java
File fe/src/main/java/org/apache/impala/rewrite/ExprRewriter.java:

Line 48:   public enum RuleSet {
I see what you are doing, but I don't think we need to be quite as general yet. The ExprRewriter framework is relatively new and there are still many missing rules. I'd like to keep it as simple as possible to add new rules and make modifications to how/where rules are applied.

The downside of the current approach is that rule implementers need to be aware of all the different uses of the ExprRewriter to pick the set of RuleSets it should be added to.

How about we just keep the two original rule sets? The required ones and then required + all others. I understand we may sometimes be invoking rules without a hope of them firing.

If excessive rewrites attempts become a perf problem, we can still revisit.


-- 
To view, visit http://gerrit.cloudera.org:8080/6389
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I79750a8edb945effee2a519fa3b8192b77042cb4
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden <za...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Zach Amsden <za...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-5003: Constant propagation in scan nodes

Posted by "Zach Amsden (Code Review)" <ge...@cloudera.org>.
Zach Amsden has posted comments on this change.

Change subject: IMPALA-5003: Constant propagation in scan nodes
......................................................................


Patch Set 15:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/6389/15/fe/src/main/java/org/apache/impala/analysis/Expr.java
File fe/src/main/java/org/apache/impala/analysis/Expr.java:

Line 968:       BitSet candidates = new BitSet(conjuncts.size());
Alex, thanks for the BitSet suggest - this worked out far cleaner than any other solution.


Line 985:           if (index < 0) continue;
This can be Preconditions.checkState(index >= 0) now.  In the prior form of the code, changed was a set of exprs, so we could have multiple rewrites of the same index.  With a BitSet, this is no longer true.


Line 998:           // because they can't be evaluated if expr rewriting is turned off.
I can't see a good way to test this with the fe tests.  Should we write a custom cluster or ee test that validates this?


http://gerrit.cloudera.org:8080/#/c/6389/15/testdata/workloads/functional-planner/queries/PlannerTest/constant-propagation.test
File testdata/workloads/functional-planner/queries/PlannerTest/constant-propagation.test:

Line 101: # XXX - this seems correct, but is this a legal transformation?
I'm fairly convinced at this point that considering slotrefs through implicit casts is legal.  The only case I can think of where this could be a problem is if the cast is applied to the constant, which results in truncation, but I don't believe we can create those kind of casts implicitly.  Consider:

select * from functional.alltypes a
where a.tinyint_col = cast(10000 as tinyint) and a.tinyint_col CONDITION cast(10000 as tinyint)

With our current integer casting situation, I think we truncate instead of convert to NULL, so this results in tinyint_col having a value, and since the cast can be evaluated as a literal, we may perform propagation.  No matter how I try to break this expression, though, it seems to always evaluate exactly the same as our truncation rule, so I think this is a legal transformation.


http://gerrit.cloudera.org:8080/#/c/6389/15/testdata/workloads/functional-planner/queries/PlannerTest/joins.test
File testdata/workloads/functional-planner/queries/PlannerTest/joins.test:

Line 394: 02:HASH JOIN [INNER JOIN]
This is a very sad plan.


-- 
To view, visit http://gerrit.cloudera.org:8080/6389
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I79750a8edb945effee2a519fa3b8192b77042cb4
Gerrit-PatchSet: 15
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden <za...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>
Gerrit-Reviewer: Zach Amsden <za...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-5003: Constant propagation in scan conjuncts

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-5003: Constant propagation in scan conjuncts
......................................................................


Patch Set 25:

Build started: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/522/

-- 
To view, visit http://gerrit.cloudera.org:8080/6389
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I79750a8edb945effee2a519fa3b8192b77042cb4
Gerrit-PatchSet: 25
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden <za...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>
Gerrit-Reviewer: Zach Amsden <za...@cloudera.com>
Gerrit-Reviewer: anujphadke <ap...@cloudera.com>
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-5003: Constant propagation in scan nodes

Posted by "Zach Amsden (Code Review)" <ge...@cloudera.org>.
Zach Amsden has uploaded a new patch set (#17).

Change subject: IMPALA-5003: Constant propagation in scan nodes
......................................................................

IMPALA-5003: Constant propagation in scan nodes

When conjuncts are pushed into table refs from inline views, they can
be considered for constant progagation within that node.  In certain
cases, we might end up with a FALSE conditional and now we can
convert ScanNodes to EmptySet nodes when that occurs, and in other
cases, we might be able to prune partitions based on expressions which
have now become constant.

Testing: Expanded the test cases for the planner to achieve constant
propagation.  Added Kudu, datasource, Hdfs and HBase tests to validate
we can create EmptySetNodes.

Change-Id: I79750a8edb945effee2a519fa3b8192b77042cb4
---
M fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java
M fe/src/main/java/org/apache/impala/analysis/Analyzer.java
M fe/src/main/java/org/apache/impala/analysis/Expr.java
M fe/src/main/java/org/apache/impala/analysis/SelectList.java
M fe/src/main/java/org/apache/impala/planner/DataSourceScanNode.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M fe/src/main/java/org/apache/impala/planner/KuduScanNode.java
M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
M testdata/workloads/functional-planner/queries/PlannerTest/analytic-fns.test
M testdata/workloads/functional-planner/queries/PlannerTest/conjunct-ordering.test
A testdata/workloads/functional-planner/queries/PlannerTest/constant-propagation.test
M testdata/workloads/functional-planner/queries/PlannerTest/data-source-tables.test
M testdata/workloads/functional-planner/queries/PlannerTest/hdfs.test
M testdata/workloads/functional-planner/queries/PlannerTest/joins.test
M testdata/workloads/functional-planner/queries/PlannerTest/kudu.test
M testdata/workloads/functional-planner/queries/PlannerTest/predicate-propagation.test
M testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test
18 files changed, 629 insertions(+), 88 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/89/6389/17
-- 
To view, visit http://gerrit.cloudera.org:8080/6389
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I79750a8edb945effee2a519fa3b8192b77042cb4
Gerrit-PatchSet: 17
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden <za...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>
Gerrit-Reviewer: Zach Amsden <za...@cloudera.com>

[Impala-ASF-CR] IMPALA-5003: Constant propagation in scan conjuncts

Posted by "Alex Behm (Code Review)" <ge...@cloudera.org>.
Alex Behm has posted comments on this change.

Change subject: IMPALA-5003: Constant propagation in scan conjuncts
......................................................................


Patch Set 19: Code-Review+2

-- 
To view, visit http://gerrit.cloudera.org:8080/6389
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I79750a8edb945effee2a519fa3b8192b77042cb4
Gerrit-PatchSet: 19
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden <za...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>
Gerrit-Reviewer: Zach Amsden <za...@cloudera.com>
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-5003: Generic constant propagation in planner

Posted by "Alex Behm (Code Review)" <ge...@cloudera.org>.
Alex Behm has posted comments on this change.

Change subject: IMPALA-5003: Generic constant propagation in planner
......................................................................


Patch Set 5:

(12 comments)

http://gerrit.cloudera.org:8080/#/c/6389/5//COMMIT_MSG
Commit Message:

Line 7: IMPALA-5003: Generic constant propagation in planner
Instead of "Generic constant propagation in planner" I'd propose something like "Constant propagation within scan conjuncts"

The constant propagation is not really "generic" because we are consciously missing out on a lot of opportunities.


Line 9: Rather than specialize the constant propagation framework to be specific
Maybe rephrase to what the new code does and where it is applied, e.g.,

Adds a new ConstantPropagator class that propagates constants within a list of conjuncts... and so on


http://gerrit.cloudera.org:8080/#/c/6389/5/fe/src/main/java/org/apache/impala/analysis/ConstantPropagator.java
File fe/src/main/java/org/apache/impala/analysis/ConstantPropagator.java:

Line 115:           if (e.isConstant() && !Expr.IS_FALSE_LITERAL.apply(e)) it.remove();
Looks wrong. We cannot just drop the 'false' predicate. I think we should either leave the predicate or address the TODO.

In any case, I don't think we should remove constant 'false' predicates here. The caller should decide what to do with them.

We can, however, remove constant 'true' predicates.


http://gerrit.cloudera.org:8080/#/c/6389/3/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
File fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java:

Line 1217:     ConstantPropagator.optimizeConjuncts(conjuncts, analyzer,
> Let me see if that comes out cleaner.
Did it not come out cleaner?


http://gerrit.cloudera.org:8080/#/c/6389/5/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
File fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java:

Line 1217:     ConstantPropagator.optimizeConjuncts(conjuncts, analyzer,
Have you tried running a few queries that end up with a constant 'false' predicate? I think in those cases we should generate an EmptySetNode instead of a ScanNode.


http://gerrit.cloudera.org:8080/#/c/6389/5/fe/src/main/java/org/apache/impala/rewrite/ExprRewriter.java
File fe/src/main/java/org/apache/impala/rewrite/ExprRewriter.java:

Line 48:   public enum RuleSet {
This enum seems to add an unnecessary indirection. It's the same as "enable_expr_rewrites", so let's use that directly.

We can make it more generic in the future if the need arises.


http://gerrit.cloudera.org:8080/#/c/6389/5/testdata/workloads/functional-planner/queries/PlannerTest/constant-propagation.test
File testdata/workloads/functional-planner/queries/PlannerTest/constant-propagation.test:

Line 16: # Test multiple forward propagataion
Add a similar test for Kudu in one of the Kudu .test files.

typo: propagation


Line 18: where a.int_col1 = 10 and a.int_col2 = a.int_col1 + 1 and a.int_col3 = a.int_col2 * 5
flip the lhs/rhs of some predicates to test the expr normalization within the constant propagator


Line 27:    predicates: a.int_col1 = 10, a.int_col2 = 11, a.int_col3 = 55, a.int_col4 = -385 
whitespace


Line 43: # Another impossibility
What's the impossibility?


Line 56: # An obviously empty query where the predicate is visible
Can we move this into an HBase .test file? It's sometimes better to keep the tests separate. For example, we'd need to skip this entire test if Kudu was not supported in some dev environment (based on the KUDU_IS_SUPPORTED env var)


Line 67: ====
Add negative test cases with arbitrary exprs, e.g. slot refs with explicit and implicit casts to show that those are not considered for propagation.

Examples:
// explicit cast
where cast(int_col as string) = 'abc' and cast(int_col as string) > 'xyz'

// implicit cast
where tinyint_col = 10000 and tinyint_col < 10000

// arbitrary expr
where to_date(timestamp_col) = '2017-12-20' and to_date(timestamp_col) substr(replace(to_date(timestamp_col), '-', ''), 1, 4) = '2016'


-- 
To view, visit http://gerrit.cloudera.org:8080/6389
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I79750a8edb945effee2a519fa3b8192b77042cb4
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden <za...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>
Gerrit-Reviewer: Zach Amsden <za...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-5003: Constant propagation in scan nodes

Posted by "Alex Behm (Code Review)" <ge...@cloudera.org>.
Alex Behm has posted comments on this change.

Change subject: IMPALA-5003: Constant propagation in scan nodes
......................................................................


Patch Set 16:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/6389/16/testdata/workloads/functional-planner/queries/PlannerTest/constant-propagation.test
File testdata/workloads/functional-planner/queries/PlannerTest/constant-propagation.test:

Line 245: (coalesce(NULL, T.int_col) + random() * T.tinyint_col = 100 OR
> It is fixable.  I think this is because the resulting Expr ends up with no 
Agree we can address the issue if we want to. I suggest we leave for now because the fix is possibly invasive and the gain is minimal. It's also unrelated to your change.


Line 335: 00:SCAN HDFS [functional.alltypes]
> bool_col = null is stripped from conjunct evaluation.  bool_col IS null is 
I think this is a minor oversight in optimizeConjuncts(). If one of the conjuncts becomes NULL, then it has the same as a FALSE literal, but in that case optimizeConjuncts() will return true.


-- 
To view, visit http://gerrit.cloudera.org:8080/6389
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I79750a8edb945effee2a519fa3b8192b77042cb4
Gerrit-PatchSet: 16
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden <za...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>
Gerrit-Reviewer: Zach Amsden <za...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-5003: Constant propagation in scan nodes and inline views

Posted by "Zach Amsden (Code Review)" <ge...@cloudera.org>.
Zach Amsden has uploaded a new patch set (#9).

Change subject: IMPALA-5003: Constant propagation in scan nodes and inline views
......................................................................

IMPALA-5003: Constant propagation in scan nodes and inline views

When conjuncts are pushed into table refs and inline views, they can
be considered for constant progagation within that node.  In certain
cases, we might end up with a FALSE conditional and now we can
convert ScanNodes to EmptySet nodes when that occurs.

Testing: Expanded the test cases for the planner to achieve constant
propagation.  Added Kudu, datasource, Hdfs and HBase tests to validate
we can create EmptySetNodes

Change-Id: I79750a8edb945effee2a519fa3b8192b77042cb4
---
M fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java
M fe/src/main/java/org/apache/impala/analysis/Analyzer.java
M fe/src/main/java/org/apache/impala/analysis/Expr.java
M fe/src/main/java/org/apache/impala/analysis/SelectList.java
M fe/src/main/java/org/apache/impala/planner/DataSourceScanNode.java
M fe/src/main/java/org/apache/impala/planner/KuduScanNode.java
M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
M fe/src/main/java/org/apache/impala/rewrite/ExprRewriter.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
M testdata/workloads/functional-planner/queries/PlannerTest/analytic-fns.test
M testdata/workloads/functional-planner/queries/PlannerTest/conjunct-ordering.test
A testdata/workloads/functional-planner/queries/PlannerTest/constant-propagation.test
M testdata/workloads/functional-planner/queries/PlannerTest/data-source-tables.test
M testdata/workloads/functional-planner/queries/PlannerTest/hdfs.test
M testdata/workloads/functional-planner/queries/PlannerTest/joins.test
M testdata/workloads/functional-planner/queries/PlannerTest/kudu.test
M testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test
17 files changed, 462 insertions(+), 86 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/89/6389/9
-- 
To view, visit http://gerrit.cloudera.org:8080/6389
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I79750a8edb945effee2a519fa3b8192b77042cb4
Gerrit-PatchSet: 9
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden <za...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>
Gerrit-Reviewer: Zach Amsden <za...@cloudera.com>

[Impala-ASF-CR] IMPALA-5003: Constant propagation in scan nodes and inline views

Posted by "Zach Amsden (Code Review)" <ge...@cloudera.org>.
Zach Amsden has posted comments on this change.

Change subject: IMPALA-5003: Constant propagation in scan nodes and inline views
......................................................................


Patch Set 11:

(15 comments)

Thanks for the review!

http://gerrit.cloudera.org:8080/#/c/6389/11/fe/src/main/java/org/apache/impala/analysis/Expr.java
File fe/src/main/java/org/apache/impala/analysis/Expr.java:

Line 924:    * Propagates constant expression of the form <slot ref> = <constant> to
> expressions
Done


Line 931:     for (Expr expr: conjuncts) {
> It's probably not that important to worry about perf, but I think it should
I thought about this.  There are certain problems.

1.  If we want to find contradictions, we need to apply the smap to even <slot reg> = <constant> exprs, for other slot refs.  This is what produces <5> = <7>, etc..
2.  It is and this is what I originally wanted to do.  But we need to apply the smap to all exprs except the original expr itself.  There does not seem to be any machinery to apply an smap to all conjuncts except for the conjunct from which the smap was built, nor an obvious and uncomplicated way to build it.
3.  This we should probably do.  I had a mock-up that did this at one point in time but abandoned it for a simpler design that allowed for subsequent or potentially concurrent optimization.  Now that complexity has crept in at least another dimension (inequality elision), it should be more obvious how to do this in a way that doesn't prevent the latter optimization at a future point in time.


Line 964:   private static boolean collateInequalities(List<Expr> conjuncts, Analyzer analyzer) {
> Please avoid scope creep and leave this out. Feel free to file a new JIRA f
Will do and I agree.  Will cause some minor test refactoring, but that is not a big problem.


http://gerrit.cloudera.org:8080/#/c/6389/11/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
File fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java:

Line 968:    * to construct a ValueRange, 
> whitespace
will fix comment


Line 993:       if (!(comp.getOp() == BinaryPredicate.Operator.GT)  &&
> use !=
Done


Line 996:           !(comp.getOp() == BinaryPredicate.Operator.LE)) continue;
> use {} for multi-line ifs
Done


Line 1134:     Expr.optimizeConjuncts(preds, analyzer);
> I don't think we should add this in this patch because the behavior could b
Indeed, optimizing pre-substituted exprs was a bug (caused test failures due to unassigned conjuncts which are fixed in the latest revision). (See line 1157 just below - eliminating duplicates and replacing exprs is not valid here!).


Line 1252:     TupleDescriptor tupleDesc = tblRef.getDesc();
> inline into the only use in L1256
Done


Line 1256:     TupleId id = tupleDesc.getId();
> id -> tid
Done


Line 1266:     if (!Expr.optimizeConjuncts(conjuncts, analyzer)) {
> Only if expr rewrites are enabled?
Optimization should still be done in the form of eliminating duplicates, even if rewrites are disabled.  It was my intention to make rewrites only happen if they were enabled, if that is not the case, will fix.


Line 1267:       // Conjuncts implied false; convert to emptySetNode
> EmptySetNode
Done


Line 1270:       return node;
> nice!
:)


http://gerrit.cloudera.org:8080/#/c/6389/11/fe/src/main/java/org/apache/impala/planner/ValueRange.java
File fe/src/main/java/org/apache/impala/planner/ValueRange.java:

Line 107:         !(upperBound_ instanceof LiteralExpr) || !(lowerBound_ instanceof LiteralExpr))
> use {} for multi-line ifs
Done


http://gerrit.cloudera.org:8080/#/c/6389/11/fe/src/main/java/org/apache/impala/rewrite/ExprRewriter.java
File fe/src/main/java/org/apache/impala/rewrite/ExprRewriter.java:

Line 48:   private Expr rewrite(Expr expr, Analyzer analyzer, boolean reanalyze)
> Why are these changes needed? Can we avoid them?
Let's discuss in person the best approach here.  The reason for re-analyze is that the higher level code has no idea what has been rewritten, so it can only attempt to reanalyze everything.


Line 110:   public void reset() {
> single line
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/6389
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I79750a8edb945effee2a519fa3b8192b77042cb4
Gerrit-PatchSet: 11
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden <za...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>
Gerrit-Reviewer: Zach Amsden <za...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-5003: Constant propagation in scan conjuncts

Posted by "Zach Amsden (Code Review)" <ge...@cloudera.org>.
Hello Impala Public Jenkins, Alex Behm,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/6389

to look at the new patch set (#24).

Change subject: IMPALA-5003: Constant propagation in scan conjuncts
......................................................................

IMPALA-5003: Constant propagation in scan conjuncts

Implements constant proopagation within conjuncts and applies the
optimization to scan conjuncts and collection conjuncts within Hdfs
scan nodes.  The optimization is applied during planning.  At scan
nodes in particular, we want to optimize to enable partition pruning.
In certain cases, we might end up with a FALSE conditional, which
now will convert to an EmptySet node.

Testing: Expanded the test cases for the planner to achieve constant
propagation.  Added Kudu, datasource, Hdfs and HBase tests to validate
we can create EmptySetNodes.

Change-Id: I79750a8edb945effee2a519fa3b8192b77042cb4
---
M fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java
M fe/src/main/java/org/apache/impala/analysis/Analyzer.java
M fe/src/main/java/org/apache/impala/analysis/Expr.java
M fe/src/main/java/org/apache/impala/analysis/SelectList.java
M fe/src/main/java/org/apache/impala/planner/DataSourceScanNode.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M fe/src/main/java/org/apache/impala/planner/KuduScanNode.java
M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
M testdata/workloads/functional-planner/queries/PlannerTest/analytic-fns.test
M testdata/workloads/functional-planner/queries/PlannerTest/conjunct-ordering.test
A testdata/workloads/functional-planner/queries/PlannerTest/constant-propagation.test
M testdata/workloads/functional-planner/queries/PlannerTest/data-source-tables.test
M testdata/workloads/functional-planner/queries/PlannerTest/hdfs.test
M testdata/workloads/functional-planner/queries/PlannerTest/joins.test
M testdata/workloads/functional-planner/queries/PlannerTest/kudu.test
M testdata/workloads/functional-planner/queries/PlannerTest/predicate-propagation.test
M testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test
M testdata/workloads/functional-query/queries/QueryTest/data-source-tables.test
19 files changed, 636 insertions(+), 93 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/89/6389/24
-- 
To view, visit http://gerrit.cloudera.org:8080/6389
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I79750a8edb945effee2a519fa3b8192b77042cb4
Gerrit-PatchSet: 24
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden <za...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>
Gerrit-Reviewer: Zach Amsden <za...@cloudera.com>
Gerrit-Reviewer: anujphadke <ap...@cloudera.com>

[Impala-ASF-CR] IMPALA-5003: Constant propagation in scan nodes

Posted by "Alex Behm (Code Review)" <ge...@cloudera.org>.
Alex Behm has posted comments on this change.

Change subject: IMPALA-5003: Constant propagation in scan nodes
......................................................................


Patch Set 16:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6389/16/fe/src/main/java/org/apache/impala/analysis/Expr.java
File fe/src/main/java/org/apache/impala/analysis/Expr.java:

Line 749:       // conjuncts because NULLs are handled differently for CompoundPredicate.AND
> This is the comment to which I was referring.
Thanks. I'm not sure I even follow this comment. It doesn't seem to apply to this patch.


-- 
To view, visit http://gerrit.cloudera.org:8080/6389
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I79750a8edb945effee2a519fa3b8192b77042cb4
Gerrit-PatchSet: 16
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden <za...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>
Gerrit-Reviewer: Zach Amsden <za...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-5003: [DRAFT] Generic constant propagation in planner

Posted by "Marcel Kornacker (Code Review)" <ge...@cloudera.org>.
Marcel Kornacker has posted comments on this change.

Change subject: IMPALA-5003: [DRAFT] Generic constant propagation in planner
......................................................................


Patch Set 3:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6389/3/fe/src/main/java/org/apache/impala/analysis/ConstantPropagator.java
File fe/src/main/java/org/apache/impala/analysis/ConstantPropagator.java:

Line 96:   public static void optimizeConjuncts(List<Expr> conjuncts, Analyzer analyzer,
instead of creating a class, why not just move this as static functions into Expr?


-- 
To view, visit http://gerrit.cloudera.org:8080/6389
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I79750a8edb945effee2a519fa3b8192b77042cb4
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden <za...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>
Gerrit-Reviewer: Zach Amsden <za...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-5003: Constant propagation in scan nodes

Posted by "Zach Amsden (Code Review)" <ge...@cloudera.org>.
Zach Amsden has posted comments on this change.

Change subject: IMPALA-5003: Constant propagation in scan nodes
......................................................................


Patch Set 15:

(12 comments)

http://gerrit.cloudera.org:8080/#/c/6389/15/fe/src/main/java/org/apache/impala/analysis/Expr.java
File fe/src/main/java/org/apache/impala/analysis/Expr.java:

Line 923:    * Propagates constant expressions of the form <slot ref> = <constant> to
> Explain candidate param
Done


Line 924:    * other uses of slot ref in all expressions in conjuncts; returns a BitSet with
> in all expressions in conjuncts -> in the given conjnucts
Done


Line 942:         // Don't rewrite with our own substitution!  We may have already processed
> Explanation seems more complicated than necessary. Clearly it would be wron
Done


Line 960:    * Returns true if the conjuncts may be true and false if a contradiction has
> Shorter: Returns false if a contradiction has been implied, true otherwise.
Done


Line 980:           // applied by the rewriter.  We must also re-analyze result exprs to
> I think we can remove everything after "We must also re-analyze  ..." to ke
Done


Line 985:           if (index < 0) continue;
> Agree. Makes sense.
Done


Line 988:           // Reset Expr to force updating cost
> Re-analyze expr to add implicit casts, resolve function signatures, and com
I made this slightly simpler due to required indentation level


Line 998:           // because they can't be evaluated if expr rewriting is turned off.
> You should be able to do this with a PlannerTest. For example, look at Plan
Done


http://gerrit.cloudera.org:8080/#/c/6389/11/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
File fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java:

Line 1266: 
> Looks like the constant propagation is applied even when expr rewrites are 
Done


http://gerrit.cloudera.org:8080/#/c/6389/15/testdata/workloads/functional-planner/queries/PlannerTest/constant-propagation.test
File testdata/workloads/functional-planner/queries/PlannerTest/constant-propagation.test:

Line 101: # XXX - this seems correct, but is this a legal transformation?
> I agree that implicit casts on both sides seem ok.
Done


Line 161: # We can optimize in some cases
> Instead of testing various predicate assignment scenarios, we should focus 
Done.  I haven't tried an extreme example, but I added a few crazy ones to the test.


http://gerrit.cloudera.org:8080/#/c/6389/15/testdata/workloads/functional-planner/queries/PlannerTest/kudu.test
File testdata/workloads/functional-planner/queries/PlannerTest/kudu.test:

Line 127: ---- SCANRANGELOCATIONS
> Don't think we need the SCANRANGELOCATIONS and DISTRIBUTEDPLAN sections.
nope


-- 
To view, visit http://gerrit.cloudera.org:8080/6389
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I79750a8edb945effee2a519fa3b8192b77042cb4
Gerrit-PatchSet: 15
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden <za...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>
Gerrit-Reviewer: Zach Amsden <za...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-5003: Constant propagation in scan nodes and inline views

Posted by "Zach Amsden (Code Review)" <ge...@cloudera.org>.
Zach Amsden has uploaded a new patch set (#11).

Change subject: IMPALA-5003: Constant propagation in scan nodes and inline views
......................................................................

IMPALA-5003: Constant propagation in scan nodes and inline views

When conjuncts are pushed into table refs and inline views, they can
be considered for constant progagation within that node.  In certain
cases, we might end up with a FALSE conditional and now we can
convert ScanNodes to EmptySet nodes when that occurs.

I also added an inequality collation phase which is now partially
tested and will combine conjuncts such as a < k1, a < k2  into
a < min(k1, k2), as well as detect equivalence from a >= k, a <= k,
and determine conflicting bounds requirements to be false.

This could be expanded to do analysis against other slotrefs in the
future, but this should probably be saved for another diff.

Testing: Expanded the test cases for the planner to achieve constant
propagation.  Added Kudu, datasource, Hdfs and HBase tests to validate
we can create EmptySetNodes.  Some manual testing for inequality
conjuncts but nothing formal yet.

Query: explain select * from functional_hbase.widetable_250_cols a
where a.int_col1 > 1 and a.int_col1 <= 20 and a.int_col1 < 50 and
a.int_col1 > 2
+-------------------------------------------------------------------
| Explain String
+-------------------------------------------------------------------
| Estimated Per-Host Requirements: Memory=1.00GB VCores=1
| PLAN-ROOT SINK
| |
| 01:EXCHANGE [UNPARTITIONED]
| |
| 00:SCAN HBASE [functional_hbase.widetable_250_cols a]
|    predicates: a.int_col1 <= 20, a.int_col1 > 2
+-------------------------------------------------------------------
Fetched 10 row(s) in 0.08s

Change-Id: I79750a8edb945effee2a519fa3b8192b77042cb4
---
M fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java
M fe/src/main/java/org/apache/impala/analysis/Analyzer.java
M fe/src/main/java/org/apache/impala/analysis/Expr.java
M fe/src/main/java/org/apache/impala/analysis/SelectList.java
M fe/src/main/java/org/apache/impala/planner/DataSourceScanNode.java
M fe/src/main/java/org/apache/impala/planner/KuduScanNode.java
M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
M fe/src/main/java/org/apache/impala/planner/ValueRange.java
M fe/src/main/java/org/apache/impala/rewrite/ExprRewriter.java
M fe/src/main/java/org/apache/impala/rewrite/NormalizeBinaryPredicatesRule.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
M testdata/workloads/functional-planner/queries/PlannerTest/analytic-fns.test
M testdata/workloads/functional-planner/queries/PlannerTest/conjunct-ordering.test
A testdata/workloads/functional-planner/queries/PlannerTest/constant-propagation.test
M testdata/workloads/functional-planner/queries/PlannerTest/data-source-tables.test
M testdata/workloads/functional-planner/queries/PlannerTest/hdfs.test
M testdata/workloads/functional-planner/queries/PlannerTest/joins.test
M testdata/workloads/functional-planner/queries/PlannerTest/kudu.test
M testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test
19 files changed, 644 insertions(+), 117 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/89/6389/11
-- 
To view, visit http://gerrit.cloudera.org:8080/6389
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I79750a8edb945effee2a519fa3b8192b77042cb4
Gerrit-PatchSet: 11
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden <za...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>
Gerrit-Reviewer: Zach Amsden <za...@cloudera.com>

[Impala-ASF-CR] IMPALA-5003: Constant propagation in scan conjuncts

Posted by "Zach Amsden (Code Review)" <ge...@cloudera.org>.
Zach Amsden has posted comments on this change.

Change subject: IMPALA-5003: Constant propagation in scan conjuncts
......................................................................


Patch Set 21:

Got a green light test run for this:

http://sandbox.jenkins.cloudera.com/view/Impala/view/Private-Utility/job/impala-private-build-and-test/5527/console

-- 
To view, visit http://gerrit.cloudera.org:8080/6389
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I79750a8edb945effee2a519fa3b8192b77042cb4
Gerrit-PatchSet: 21
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden <za...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>
Gerrit-Reviewer: Zach Amsden <za...@cloudera.com>
Gerrit-Reviewer: anujphadke <ap...@cloudera.com>
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-5003: Constant propagation in scan nodes and inline views

Posted by "Zach Amsden (Code Review)" <ge...@cloudera.org>.
Zach Amsden has uploaded a new patch set (#7).

Change subject: IMPALA-5003: Constant propagation in scan nodes and inline views
......................................................................

IMPALA-5003: Constant propagation in scan nodes and inline views

When conjuncts are pushed into table refs and inline views, they can
be considered for constant progagation within that node.  In certain
cases, we might end up with a FALSE conditional and now we can
convert ScanNodes to EmptySet nodes when that occurs.

Testing: Expanded the test cases for the planner to achieve constant
propagation.  Only thing missing right now is Kudu tests.

Change-Id: I79750a8edb945effee2a519fa3b8192b77042cb4
---
M fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java
M fe/src/main/java/org/apache/impala/analysis/Analyzer.java
M fe/src/main/java/org/apache/impala/analysis/Expr.java
M fe/src/main/java/org/apache/impala/analysis/SelectList.java
M fe/src/main/java/org/apache/impala/planner/DataSourceScanNode.java
M fe/src/main/java/org/apache/impala/planner/KuduScanNode.java
M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
M fe/src/main/java/org/apache/impala/rewrite/ExprRewriter.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
M testdata/workloads/functional-planner/queries/PlannerTest/analytic-fns.test
M testdata/workloads/functional-planner/queries/PlannerTest/conjunct-ordering.test
A testdata/workloads/functional-planner/queries/PlannerTest/constant-propagation.test
M testdata/workloads/functional-planner/queries/PlannerTest/hdfs.test
M testdata/workloads/functional-planner/queries/PlannerTest/joins.test
M testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test
A testdata/workloads/visa
16 files changed, 357 insertions(+), 86 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/89/6389/7
-- 
To view, visit http://gerrit.cloudera.org:8080/6389
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I79750a8edb945effee2a519fa3b8192b77042cb4
Gerrit-PatchSet: 7
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden <za...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>
Gerrit-Reviewer: Zach Amsden <za...@cloudera.com>

[Impala-ASF-CR] IMPALA-5003: Constant propagation in scan nodes and inline views

Posted by "Zach Amsden (Code Review)" <ge...@cloudera.org>.
Zach Amsden has uploaded a new patch set (#12).

Change subject: IMPALA-5003: Constant propagation in scan nodes and inline views
......................................................................

IMPALA-5003: Constant propagation in scan nodes and inline views

When conjuncts are pushed into table refs and inline views, they can
be considered for constant progagation within that node.  In certain
cases, we might end up with a FALSE conditional and now we can
convert ScanNodes to EmptySet nodes when that occurs.

I also added an inequality collation phase which is now partially
tested and will combine conjuncts such as a < k1, a < k2  into
a < min(k1, k2), as well as detect equivalence from a >= k, a <= k,
and determine conflicting bounds requirements to be false.

This could be expanded to do analysis against other slotrefs in the
future, but this should probably be saved for another diff.

Testing: Expanded the test cases for the planner to achieve constant
propagation.  Added Kudu, datasource, Hdfs and HBase tests to validate
we can create EmptySetNodes.

Query: explain select * from functional_hbase.widetable_250_cols a
where a.int_col1 > 1 and a.int_col1 <= 20 and a.int_col1 < 50 and
a.int_col1 > 2
+-------------------------------------------------------------------
| Explain String
+-------------------------------------------------------------------
| Estimated Per-Host Requirements: Memory=1.00GB VCores=1
| PLAN-ROOT SINK
| |
| 01:EXCHANGE [UNPARTITIONED]
| |
| 00:SCAN HBASE [functional_hbase.widetable_250_cols a]
|    predicates: a.int_col1 <= 20, a.int_col1 > 2
+-------------------------------------------------------------------
Fetched 10 row(s) in 0.08s

Right now I'm seeing some suspicious test output, specifically the
SCANRANGELOCATIONS seem to be arbitrarily modified:

section SCANRANGELOCATIONS of query:
select * from functional_hbase.alltypesagg
where bigint_col is not null and bool_col = true
Actual does not match expected result:
  HBASE KEYRANGE port=16201 <unbounded>:1
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  HBASE KEYRANGE port=16202 1:5
  HBASE KEYRANGE port=16202 9:<unbounded>
  HBASE KEYRANGE port=16203 5:9
NODE 0:

Expected:
  HBASE KEYRANGE port=16201 <unbounded>:3
  HBASE KEYRANGE port=16202 3:7
  HBASE KEYRANGE port=16203 7:<unbounded>
NODE 0:

Change-Id: I79750a8edb945effee2a519fa3b8192b77042cb4
---
M fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java
M fe/src/main/java/org/apache/impala/analysis/Analyzer.java
M fe/src/main/java/org/apache/impala/analysis/Expr.java
M fe/src/main/java/org/apache/impala/analysis/SelectList.java
M fe/src/main/java/org/apache/impala/planner/DataSourceScanNode.java
M fe/src/main/java/org/apache/impala/planner/KuduScanNode.java
M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
M fe/src/main/java/org/apache/impala/planner/ValueRange.java
M fe/src/main/java/org/apache/impala/rewrite/ExprRewriter.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
M testdata/workloads/functional-planner/queries/PlannerTest/analytic-fns.test
M testdata/workloads/functional-planner/queries/PlannerTest/conjunct-ordering.test
A testdata/workloads/functional-planner/queries/PlannerTest/constant-propagation.test
M testdata/workloads/functional-planner/queries/PlannerTest/data-source-tables.test
M testdata/workloads/functional-planner/queries/PlannerTest/hbase.test
M testdata/workloads/functional-planner/queries/PlannerTest/hdfs.test
M testdata/workloads/functional-planner/queries/PlannerTest/inline-view-limit.test
M testdata/workloads/functional-planner/queries/PlannerTest/joins.test
M testdata/workloads/functional-planner/queries/PlannerTest/kudu.test
M testdata/workloads/functional-planner/queries/PlannerTest/predicate-propagation.test
M testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test
21 files changed, 640 insertions(+), 128 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/89/6389/12
-- 
To view, visit http://gerrit.cloudera.org:8080/6389
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I79750a8edb945effee2a519fa3b8192b77042cb4
Gerrit-PatchSet: 12
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden <za...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>
Gerrit-Reviewer: Zach Amsden <za...@cloudera.com>

[Impala-ASF-CR] IMPALA-5003: Generic constant propagation in planner

Posted by "Zach Amsden (Code Review)" <ge...@cloudera.org>.
Zach Amsden has posted comments on this change.

Change subject: IMPALA-5003: Generic constant propagation in planner
......................................................................


Patch Set 3:

(12 comments)

I'll post a WIP update to this diff as there was some refactoring that turned out to be necessary, and some new revelations about HBase key filtering have come up.

http://gerrit.cloudera.org:8080/#/c/6389/5//COMMIT_MSG
Commit Message:

Line 7: IMPALA-5003: [DRAFT] Generic constant propagation in planner
> Instead of "Generic constant propagation in planner" I'd propose something 
Done


Line 9: Rather than specialize the constant propagation framework to be specific
> Maybe rephrase to what the new code does and where it is applied, e.g.,
Done


http://gerrit.cloudera.org:8080/#/c/6389/5/fe/src/main/java/org/apache/impala/analysis/ConstantPropagator.java
File fe/src/main/java/org/apache/impala/analysis/ConstantPropagator.java:

Line 115
> Looks wrong. We cannot just drop the 'false' predicate. I think we should e
No, you missed the ! - we certainly can't drop the false predicate, but we do want to drop constant true predicates.  I can probably (e.isConstant() && Expr.IS_TRUE_LITERAL) to make this more clear - this skips dropping constants that don't evaluate to bools, but there shouldn't be any such conjuncts anyway.


http://gerrit.cloudera.org:8080/#/c/6389/3/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
File fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java:

Line 1217:     ConstantPropagator.optimizeConjuncts(conjuncts, analyzer,
> Did it not come out cleaner?
It did, but required some refactoring.


http://gerrit.cloudera.org:8080/#/c/6389/5/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
File fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java:

Line 1217:     ConstantPropagator.optimizeConjuncts(conjuncts, analyzer,
> Have you tried running a few queries that end up with a constant 'false' pr
In practice this is more complicated.  In many places we've already created a ScanNode of some sort and end up adding impossible false filters on to it.  For HBase key filtering, this is actually made worse by optimizing (key = 'a' && false) => (false), and then the key = 'a' conjunct is not picked up by key range optimization and we end up with an unbounded key range scan across all data on all nodes :(

I'm trying to rework so that this converts to an EmptySetNode, but this is actually more difficult than it should be.  Ideally we would pull the conjunct optimization to figure out impossible scans first and have a generic way to create an EmptySetNode similar across all scan types, but some more refactoring would be needed to do that cleanly (turns out this already works for inline views).


http://gerrit.cloudera.org:8080/#/c/6389/5/fe/src/main/java/org/apache/impala/rewrite/ExprRewriter.java
File fe/src/main/java/org/apache/impala/rewrite/ExprRewriter.java:

Line 48:   public enum RuleSet {
> This enum seems to add an unnecessary indirection. It's the same as "enable
Done


http://gerrit.cloudera.org:8080/#/c/6389/5/testdata/workloads/functional-planner/queries/PlannerTest/constant-propagation.test
File testdata/workloads/functional-planner/queries/PlannerTest/constant-propagation.test:

Line 16
> Add a similar test for Kudu in one of the Kudu .test files.
Will do.

Also, this word 'propagation' is my nemesis of all words in the English language.


Line 18
> flip the lhs/rhs of some predicates to test the expr normalization within t
Done


Line 27
> whitespace
I noticed this too after the diff was sent


Line 43
> What's the impossibility?
copypasta mistake


Line 56
> Can we move this into an HBase .test file? It's sometimes better to keep th
Done


Line 67
> Add negative test cases with arbitrary exprs, e.g. slot refs with explicit 
I think we also want to test with analytic expressions and functions returning bool as conjuncts


-- 
To view, visit http://gerrit.cloudera.org:8080/6389
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I79750a8edb945effee2a519fa3b8192b77042cb4
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden <za...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>
Gerrit-Reviewer: Zach Amsden <za...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-5003: Constant propagation in scan conjuncts

Posted by "Alex Behm (Code Review)" <ge...@cloudera.org>.
Alex Behm has posted comments on this change.

Change subject: IMPALA-5003: Constant propagation in scan conjuncts
......................................................................


Patch Set 22: Code-Review+2

-- 
To view, visit http://gerrit.cloudera.org:8080/6389
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I79750a8edb945effee2a519fa3b8192b77042cb4
Gerrit-PatchSet: 22
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden <za...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>
Gerrit-Reviewer: Zach Amsden <za...@cloudera.com>
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-5003: Constant propagation in scan conjuncts

Posted by "Zach Amsden (Code Review)" <ge...@cloudera.org>.
Zach Amsden has uploaded a new patch set (#19).

Change subject: IMPALA-5003: Constant propagation in scan conjuncts
......................................................................

IMPALA-5003: Constant propagation in scan conjuncts

Implements constant proopagation within conjuncts and applies the
optimization to scan conjuncts and collection conjuncts within Hdfs
scan nodes.  The optimization is applied during planning.  At scan
nodes in particular, we want to optimize to enable partition pruning.
In certain cases, we might end up with a FALSE conditional, which
now will convert to an EmptySet node.

Testing: Expanded the test cases for the planner to achieve constant
propagation.  Added Kudu, datasource, Hdfs and HBase tests to validate
we can create EmptySetNodes.

Change-Id: I79750a8edb945effee2a519fa3b8192b77042cb4
---
M fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java
M fe/src/main/java/org/apache/impala/analysis/Analyzer.java
M fe/src/main/java/org/apache/impala/analysis/Expr.java
M fe/src/main/java/org/apache/impala/analysis/SelectList.java
M fe/src/main/java/org/apache/impala/planner/DataSourceScanNode.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M fe/src/main/java/org/apache/impala/planner/KuduScanNode.java
M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
M testdata/workloads/functional-planner/queries/PlannerTest/analytic-fns.test
M testdata/workloads/functional-planner/queries/PlannerTest/conjunct-ordering.test
A testdata/workloads/functional-planner/queries/PlannerTest/constant-propagation.test
M testdata/workloads/functional-planner/queries/PlannerTest/data-source-tables.test
M testdata/workloads/functional-planner/queries/PlannerTest/hdfs.test
M testdata/workloads/functional-planner/queries/PlannerTest/joins.test
M testdata/workloads/functional-planner/queries/PlannerTest/kudu.test
M testdata/workloads/functional-planner/queries/PlannerTest/predicate-propagation.test
M testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test
18 files changed, 630 insertions(+), 88 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/89/6389/19
-- 
To view, visit http://gerrit.cloudera.org:8080/6389
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I79750a8edb945effee2a519fa3b8192b77042cb4
Gerrit-PatchSet: 19
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden <za...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>
Gerrit-Reviewer: Zach Amsden <za...@cloudera.com>

[Impala-ASF-CR] IMPALA-5003: Constant propagation in scan nodes and inline views

Posted by "Zach Amsden (Code Review)" <ge...@cloudera.org>.
Zach Amsden has uploaded a new patch set (#14).

Change subject: IMPALA-5003: Constant propagation in scan nodes and inline views
......................................................................

IMPALA-5003: Constant propagation in scan nodes and inline views

When conjuncts are pushed into table refs and inline views, they can
be considered for constant progagation within that node.  In certain
cases, we might end up with a FALSE conditional and now we can
convert ScanNodes to EmptySet nodes when that occurs.

Testing: Expanded the test cases for the planner to achieve constant
propagation.  Added Kudu, datasource, Hdfs and HBase tests to validate
we can create EmptySetNodes.

Change-Id: I79750a8edb945effee2a519fa3b8192b77042cb4
---
M fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java
M fe/src/main/java/org/apache/impala/analysis/Analyzer.java
M fe/src/main/java/org/apache/impala/analysis/Expr.java
M fe/src/main/java/org/apache/impala/analysis/SelectList.java
M fe/src/main/java/org/apache/impala/planner/DataSourceScanNode.java
M fe/src/main/java/org/apache/impala/planner/KuduScanNode.java
M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
M testdata/workloads/functional-planner/queries/PlannerTest/analytic-fns.test
M testdata/workloads/functional-planner/queries/PlannerTest/conjunct-ordering.test
A testdata/workloads/functional-planner/queries/PlannerTest/constant-propagation.test
M testdata/workloads/functional-planner/queries/PlannerTest/data-source-tables.test
M testdata/workloads/functional-planner/queries/PlannerTest/hdfs.test
M testdata/workloads/functional-planner/queries/PlannerTest/joins.test
M testdata/workloads/functional-planner/queries/PlannerTest/kudu.test
M testdata/workloads/functional-planner/queries/PlannerTest/predicate-propagation.test
M testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test
17 files changed, 471 insertions(+), 88 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/89/6389/14
-- 
To view, visit http://gerrit.cloudera.org:8080/6389
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I79750a8edb945effee2a519fa3b8192b77042cb4
Gerrit-PatchSet: 14
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden <za...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>
Gerrit-Reviewer: Zach Amsden <za...@cloudera.com>

[Impala-ASF-CR] IMPALA-5003: Constant propagation in scan nodes

Posted by "Zach Amsden (Code Review)" <ge...@cloudera.org>.
Zach Amsden has uploaded a new patch set (#18).

Change subject: IMPALA-5003: Constant propagation in scan nodes
......................................................................

IMPALA-5003: Constant propagation in scan nodes

When conjuncts are pushed into table refs from inline views, they can
be considered for constant progagation within that node.  In certain
cases, we might end up with a FALSE conditional and now we can
convert ScanNodes to EmptySet nodes when that occurs, and in other
cases, we might be able to prune partitions based on expressions which
have now become constant.

Testing: Expanded the test cases for the planner to achieve constant
propagation.  Added Kudu, datasource, Hdfs and HBase tests to validate
we can create EmptySetNodes.

Change-Id: I79750a8edb945effee2a519fa3b8192b77042cb4
---
M fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java
M fe/src/main/java/org/apache/impala/analysis/Analyzer.java
M fe/src/main/java/org/apache/impala/analysis/Expr.java
M fe/src/main/java/org/apache/impala/analysis/SelectList.java
M fe/src/main/java/org/apache/impala/planner/DataSourceScanNode.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M fe/src/main/java/org/apache/impala/planner/KuduScanNode.java
M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
M testdata/workloads/functional-planner/queries/PlannerTest/analytic-fns.test
M testdata/workloads/functional-planner/queries/PlannerTest/conjunct-ordering.test
A testdata/workloads/functional-planner/queries/PlannerTest/constant-propagation.test
M testdata/workloads/functional-planner/queries/PlannerTest/data-source-tables.test
M testdata/workloads/functional-planner/queries/PlannerTest/hdfs.test
M testdata/workloads/functional-planner/queries/PlannerTest/joins.test
M testdata/workloads/functional-planner/queries/PlannerTest/kudu.test
M testdata/workloads/functional-planner/queries/PlannerTest/predicate-propagation.test
M testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test
18 files changed, 630 insertions(+), 88 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/89/6389/18
-- 
To view, visit http://gerrit.cloudera.org:8080/6389
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I79750a8edb945effee2a519fa3b8192b77042cb4
Gerrit-PatchSet: 18
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden <za...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>
Gerrit-Reviewer: Zach Amsden <za...@cloudera.com>

[Impala-ASF-CR] IMPALA-5003: Constant propagation in scan conjuncts

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-5003: Constant propagation in scan conjuncts
......................................................................


Patch Set 22: Verified-1

Build failed: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/507/

-- 
To view, visit http://gerrit.cloudera.org:8080/6389
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I79750a8edb945effee2a519fa3b8192b77042cb4
Gerrit-PatchSet: 22
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden <za...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>
Gerrit-Reviewer: Zach Amsden <za...@cloudera.com>
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-5003: Constant propagation in scan nodes

Posted by "Zach Amsden (Code Review)" <ge...@cloudera.org>.
Zach Amsden has posted comments on this change.

Change subject: IMPALA-5003: Constant propagation in scan nodes
......................................................................


Patch Set 16:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6389/16/testdata/workloads/functional-planner/queries/PlannerTest/constant-propagation.test
File testdata/workloads/functional-planner/queries/PlannerTest/constant-propagation.test:

Line 245: (coalesce(NULL, T.int_col) + random() * T.tinyint_col = 100 OR
In testing this, I discovered the following:

Query: explain select count(*) from
  (select * from functional.alltypes where int_col = 10) T
where (coalesce(NULL, T.int_col) + random() * T.int_col = 100 OR coalesce(NULL, T.int_col) + T.int_col = 20)
ERROR: NotImplementedException: Unsupported non-deterministic predicate: TRUE OR 10 + random() * 10 = 100

This looks like a bug, but is it a regression?


-- 
To view, visit http://gerrit.cloudera.org:8080/6389
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I79750a8edb945effee2a519fa3b8192b77042cb4
Gerrit-PatchSet: 16
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden <za...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>
Gerrit-Reviewer: Zach Amsden <za...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-5003: Constant propagation in scan conjuncts

Posted by "Zach Amsden (Code Review)" <ge...@cloudera.org>.
Zach Amsden has posted comments on this change.

Change subject: IMPALA-5003: Constant propagation in scan conjuncts
......................................................................


Patch Set 25:

Got some test failures attributed to databases that do no exist - I think I need to reload data.  Can we try a verify run to see if this is good to go?

-- 
To view, visit http://gerrit.cloudera.org:8080/6389
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I79750a8edb945effee2a519fa3b8192b77042cb4
Gerrit-PatchSet: 25
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden <za...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>
Gerrit-Reviewer: Zach Amsden <za...@cloudera.com>
Gerrit-Reviewer: anujphadke <ap...@cloudera.com>
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-5003: Constant propagation in scan conjuncts

Posted by "Zach Amsden (Code Review)" <ge...@cloudera.org>.
Zach Amsden has posted comments on this change.

Change subject: IMPALA-5003: Constant propagation in scan conjuncts
......................................................................


Patch Set 24:

I'm running the python tests again to look for any other gotchas.

-- 
To view, visit http://gerrit.cloudera.org:8080/6389
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I79750a8edb945effee2a519fa3b8192b77042cb4
Gerrit-PatchSet: 24
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden <za...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>
Gerrit-Reviewer: Zach Amsden <za...@cloudera.com>
Gerrit-Reviewer: anujphadke <ap...@cloudera.com>
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-5003: Generic constant propagation in planner

Posted by "Zach Amsden (Code Review)" <ge...@cloudera.org>.
Zach Amsden has uploaded a new patch set (#5).

Change subject: IMPALA-5003: Generic constant propagation in planner
......................................................................

IMPALA-5003: Generic constant propagation in planner

Rather than specialize the constant propagation framework to be specific
to a particular node type, create a generic framework for migrating
constants between conjuncts.  Right now it is still hard coded as being
embedded in HdfsScanNode, and a few other places where it potentially
matters for partition selection.  We could do better in some cases
and eliminate the scan node entirely.

Testing: Expanded the test cases for the planner to achieve constant
propagation.

Change-Id: I79750a8edb945effee2a519fa3b8192b77042cb4
---
M fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java
M fe/src/main/java/org/apache/impala/analysis/Analyzer.java
A fe/src/main/java/org/apache/impala/analysis/ConstantPropagator.java
M fe/src/main/java/org/apache/impala/analysis/Expr.java
M fe/src/main/java/org/apache/impala/analysis/SelectList.java
M fe/src/main/java/org/apache/impala/planner/KuduScanNode.java
M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
M fe/src/main/java/org/apache/impala/rewrite/ExprRewriter.java
M testdata/workloads/functional-planner/queries/PlannerTest/analytic-fns.test
M testdata/workloads/functional-planner/queries/PlannerTest/conjunct-ordering.test
A testdata/workloads/functional-planner/queries/PlannerTest/constant-propagation.test
M testdata/workloads/functional-planner/queries/PlannerTest/joins.test
M testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test
13 files changed, 252 insertions(+), 31 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/89/6389/5
-- 
To view, visit http://gerrit.cloudera.org:8080/6389
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I79750a8edb945effee2a519fa3b8192b77042cb4
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden <za...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>
Gerrit-Reviewer: Zach Amsden <za...@cloudera.com>

[Impala-ASF-CR] IMPALA-5003: Constant propagation in scan nodes

Posted by "Alex Behm (Code Review)" <ge...@cloudera.org>.
Alex Behm has posted comments on this change.

Change subject: IMPALA-5003: Constant propagation in scan nodes
......................................................................


Patch Set 16:

(6 comments)

http://gerrit.cloudera.org:8080/#/c/6389/16/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
File fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java:

Line 418:       if (analyzer.getQueryCtx().client_request.getQuery_options().enable_expr_rewrites) {
use analyzer.getQueryOptions()


http://gerrit.cloudera.org:8080/#/c/6389/16/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
File fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java:

Line 1274:     if (analyzer.getQueryCtx().client_request.query_options.enable_expr_rewrites) {
use analyzer.getQueryOptions()


http://gerrit.cloudera.org:8080/#/c/6389/16/testdata/workloads/functional-planner/queries/PlannerTest/constant-propagation.test
File testdata/workloads/functional-planner/queries/PlannerTest/constant-propagation.test:

Line 159:  select count(*) from
There's still a question as to whether we should impose a limit on the size of the conjunct list we'll optimize (due to N^2 cost), similar to what we do in ExtractCommonConjunctsRule.

I'd suggest running perf on some extreme examples to see what a reasonable cutoff might be. Or you can also adopt the ExtractCommonConjunctsRule threshold.


Line 245: (coalesce(NULL, T.int_col) + random() * T.tinyint_col = 100 OR
> In testing this, I discovered the following:
This is the current intended behavior. See the bottom of hdfs.test planner test. The behavior is questionable, but it certainly has nothing to do with your change.


Line 289: where l_partkey < l_suppkey and c.c_nationkey = 10 and o_orderkey = o_shippriority and l_suppkey = 10 and o_shippriority = c_nationkey
long line


Line 335: 00:SCAN HDFS [functional.alltypes]
Why not an EmptySetNode?


-- 
To view, visit http://gerrit.cloudera.org:8080/6389
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I79750a8edb945effee2a519fa3b8192b77042cb4
Gerrit-PatchSet: 16
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden <za...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>
Gerrit-Reviewer: Zach Amsden <za...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-5003: Constant propagation in scan nodes

Posted by "Zach Amsden (Code Review)" <ge...@cloudera.org>.
Zach Amsden has uploaded a new patch set (#16).

Change subject: IMPALA-5003: Constant propagation in scan nodes
......................................................................

IMPALA-5003: Constant propagation in scan nodes

When conjuncts are pushed into table refs from inline views, they can
be considered for constant progagation within that node.  In certain
cases, we might end up with a FALSE conditional and now we can
convert ScanNodes to EmptySet nodes when that occurs, and in other
cases, we might be able to prune partitions based on expressions which
have now become constant.

Testing: Expanded the test cases for the planner to achieve constant
propagation.  Added Kudu, datasource, Hdfs and HBase tests to validate
we can create EmptySetNodes.

Change-Id: I79750a8edb945effee2a519fa3b8192b77042cb4
---
M fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java
M fe/src/main/java/org/apache/impala/analysis/Analyzer.java
M fe/src/main/java/org/apache/impala/analysis/Expr.java
M fe/src/main/java/org/apache/impala/analysis/SelectList.java
M fe/src/main/java/org/apache/impala/planner/DataSourceScanNode.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M fe/src/main/java/org/apache/impala/planner/KuduScanNode.java
M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
M testdata/workloads/functional-planner/queries/PlannerTest/analytic-fns.test
M testdata/workloads/functional-planner/queries/PlannerTest/conjunct-ordering.test
A testdata/workloads/functional-planner/queries/PlannerTest/constant-propagation.test
M testdata/workloads/functional-planner/queries/PlannerTest/data-source-tables.test
M testdata/workloads/functional-planner/queries/PlannerTest/hdfs.test
M testdata/workloads/functional-planner/queries/PlannerTest/joins.test
M testdata/workloads/functional-planner/queries/PlannerTest/kudu.test
M testdata/workloads/functional-planner/queries/PlannerTest/predicate-propagation.test
M testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test
18 files changed, 602 insertions(+), 88 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/89/6389/16
-- 
To view, visit http://gerrit.cloudera.org:8080/6389
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I79750a8edb945effee2a519fa3b8192b77042cb4
Gerrit-PatchSet: 16
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden <za...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>
Gerrit-Reviewer: Zach Amsden <za...@cloudera.com>

[Impala-ASF-CR] IMPALA-5003: Constant propagation in scan nodes

Posted by "Zach Amsden (Code Review)" <ge...@cloudera.org>.
Zach Amsden has posted comments on this change.

Change subject: IMPALA-5003: Constant propagation in scan nodes
......................................................................


Patch Set 16:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/6389/16/testdata/workloads/functional-planner/queries/PlannerTest/constant-propagation.test
File testdata/workloads/functional-planner/queries/PlannerTest/constant-propagation.test:

Line 159:  select count(*) from
> There's still a question as to whether we should impose a limit on the size
I'll probably just re-use the same rule.


Line 245: (coalesce(NULL, T.int_col) + random() * T.tinyint_col = 100 OR
> This is the current intended behavior. See the bottom of hdfs.test planner 
It is fixable.  I think this is because the resulting Expr ends up with no SlotRefs, and as a result conjunct.isBoundBySlotIds(partitionSlots_) returns true, which is questionable.  Changing that would have deep consequences, but we can always build something to collect all the SlotRefs from an Expr and validate they are a non-empty subset of partitionSlots_


Line 289: where l_partkey < l_suppkey and c.c_nationkey = 10 and o_orderkey = o_shippriority and l_suppkey = 10 and o_shippriority = c_nationkey
> long line
Done


Line 335: 00:SCAN HDFS [functional.alltypes]
> Why not an EmptySetNode?
bool_col = null is stripped from conjunct evaluation.  bool_col IS null is not.  I am guessing because 10 = null => null and 10 IS NULL => false.

Is this behavior incorrect?  It certainly is suspect, but there are some comments about how conjunct evaluation treats null conditions specially and I wasn't sure which was the correct behavior here.


-- 
To view, visit http://gerrit.cloudera.org:8080/6389
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I79750a8edb945effee2a519fa3b8192b77042cb4
Gerrit-PatchSet: 16
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden <za...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>
Gerrit-Reviewer: Zach Amsden <za...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-5003: Constant propagation in scan conjuncts

Posted by "anujphadke (Code Review)" <ge...@cloudera.org>.
anujphadke has posted comments on this change.

Change subject: IMPALA-5003: Constant propagation in scan conjuncts
......................................................................


Patch Set 23:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6389/23//COMMIT_MSG
Commit Message:

PS23, Line 9: proopagation
nit: typo


-- 
To view, visit http://gerrit.cloudera.org:8080/6389
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I79750a8edb945effee2a519fa3b8192b77042cb4
Gerrit-PatchSet: 23
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Zach Amsden <za...@cloudera.com>
Gerrit-Reviewer: Alex Behm <al...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Marcel Kornacker <ma...@cloudera.com>
Gerrit-Reviewer: Zach Amsden <za...@cloudera.com>
Gerrit-Reviewer: anujphadke <ap...@cloudera.com>
Gerrit-HasComments: Yes