You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@impala.apache.org by "Qifan Chen (Code Review)" <ge...@cloudera.org> on 2021/07/20 20:02:46 UTC

[Impala-ASF-CR] [WIP] IMPALA-3430: Runtime filter : Extend runtime filter to support Min/Max values for HDFS scans

Qifan Chen has uploaded this change for review. ( http://gerrit.cloudera.org:8080/17706


Change subject: [WIP] IMPALA-3430: Runtime filter : Extend runtime filter to support Min/Max values for HDFS scans
......................................................................

[WIP] IMPALA-3430: Runtime filter : Extend runtime filter to support Min/Max values for HDFS scans

This patches enables min/max filtering in which the filters are built
from non-correlated subqueries that return one row and the filtering
target is the scan node to be qualified by one of the subqueries.
Shown below is one such query that normally gets compiled into a nested
loop join.

 select count(*) from store_sales
 where ss_sales_price < (select avg(ss_wholesale_cost) from store_sales);

Change-Id: I7c2bb5baad622051d1002c9c162c672d428e5446
---
M be/src/exec/nested-loop-join-builder.cc
M be/src/exec/nested-loop-join-builder.h
M be/src/exec/nested-loop-join-node.cc
3 files changed, 156 insertions(+), 3 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/06/17706/2
-- 
To view, visit http://gerrit.cloudera.org:8080/17706
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I7c2bb5baad622051d1002c9c162c672d428e5446
Gerrit-Change-Number: 17706
Gerrit-PatchSet: 2
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>

[Impala-ASF-CR] IMPALA-3430: Runtime filter : Extend runtime filter to support Min/Max values for HDFS scans

Posted by "Qifan Chen (Code Review)" <ge...@cloudera.org>.
Qifan Chen has uploaded a new patch set (#9). ( http://gerrit.cloudera.org:8080/17706 )

Change subject: IMPALA-3430: Runtime filter : Extend runtime filter to support Min/Max values for HDFS scans
......................................................................

IMPALA-3430: Runtime filter : Extend runtime filter to support Min/Max values for HDFS scans

This patches enables min/max filtering for non-correlated subqueries
that return one row. In this case, the filters are built from the
results of the subqueries and the filtering target is the scan node to
be qualified by one of the subqueries. Shown below is one such query
that normally gets compiled into a nested loop join. The filtering
limits the values from column store_sales.ss_sales_price to be within
[-infinite, avg(ss_wholesale_cost)].

  select count(*) from store_sales
    where ss_sales_price <=
      (select min(ss_wholesale_cost) from store_sales);

In the patch, the min/max filtering infrastructure is integrated with
the nested loop join.

Change-Id: I7c2bb5baad622051d1002c9c162c672d428e5446
---
M be/src/exec/nested-loop-join-builder.cc
M be/src/exec/nested-loop-join-builder.h
M be/src/exec/nested-loop-join-node.cc
M be/src/runtime/coordinator.cc
M be/src/runtime/query-state.h
M common/thrift/PlanNodes.thrift
M fe/src/main/java/org/apache/impala/analysis/BinaryPredicate.java
M fe/src/main/java/org/apache/impala/analysis/Predicate.java
M fe/src/main/java/org/apache/impala/analysis/SlotRef.java
M fe/src/main/java/org/apache/impala/analysis/TupleDescriptor.java
M fe/src/main/java/org/apache/impala/planner/AggregationNode.java
M fe/src/main/java/org/apache/impala/planner/JoinNode.java
M fe/src/main/java/org/apache/impala/planner/NestedLoopJoinNode.java
M fe/src/main/java/org/apache/impala/planner/PlanNode.java
M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java
M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
16 files changed, 395 insertions(+), 28 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/06/17706/9
-- 
To view, visit http://gerrit.cloudera.org:8080/17706
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I7c2bb5baad622051d1002c9c162c672d428e5446
Gerrit-Change-Number: 17706
Gerrit-PatchSet: 9
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>

[Impala-ASF-CR] IMPALA-3430: Runtime filter : Extend runtime filter to support Min/Max values for HDFS scans

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17706 )

Change subject: IMPALA-3430: Runtime filter : Extend runtime filter to support Min/Max values for HDFS scans
......................................................................


Patch Set 13:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/9211/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/17706
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7c2bb5baad622051d1002c9c162c672d428e5446
Gerrit-Change-Number: 17706
Gerrit-PatchSet: 13
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Comment-Date: Fri, 30 Jul 2021 01:12:37 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-3430: Runtime filter : Extend runtime filter to support Min/Max values for HDFS scans

Posted by "Qifan Chen (Code Review)" <ge...@cloudera.org>.
Qifan Chen has uploaded a new patch set (#16). ( http://gerrit.cloudera.org:8080/17706 )

Change subject: IMPALA-3430: Runtime filter : Extend runtime filter to support Min/Max values for HDFS scans
......................................................................

IMPALA-3430: Runtime filter : Extend runtime filter to support Min/Max values for HDFS scans

This patches enables min/max filtering for non-correlated subqueries
that return one row. In this case, the filters are built from the
results of the subqueries and the filtering target is the scan node to
be qualified by one of the subqueries. Shown below is one such query
that normally gets compiled into a nested loop join. The filtering
limits the values from column store_sales.ss_sales_price to be within
[-infinite, min(ss_wholesale_cost)].

  select count(*) from store_sales
    where ss_sales_price <=
      (select min(ss_wholesale_cost) from store_sales);

In the patch, the min/max filtering infrastructure is integrated with
the nested loop join as follows.

 1. Similar to hash join, NljBuilderConfig is populated with filter
    descriptors from nested join plan node via
    NljBuilder::CreateEmbeddedBuilder();
 2. NljBuilder is populated with filter contexts utilizing the filter
    descriptors in NljBuilderConfig. Filter contexts are the interface
    to actual min/max filters;
 3. New insertion methods InertFor<op>(), where <op> is LE, LT, GE and
    GT, added to MinMaxFilter class hierarcy. They are used for join
    predicate target <op> src_expr;
 4. RuntimeContext::InsertPerCompareOp() calls one of the new
    insertion methods above based on the comparison op saved in the
    filter descriptor;
 5. NljBuilder::InsertRuntimeFilters() calls the new methods.

Testing:
 1. Add single range insertion tests in min-max-filter-test.cc;
 2. Add positive and negative plan tests in overlap_min_max_filters.test;

Change-Id: I7c2bb5baad622051d1002c9c162c672d428e5446
---
M be/src/exec/filter-context.cc
M be/src/exec/filter-context.h
M be/src/exec/nested-loop-join-builder.cc
M be/src/exec/nested-loop-join-builder.h
M be/src/exec/nested-loop-join-node.cc
M be/src/runtime/coordinator.cc
M be/src/runtime/query-state.h
M be/src/runtime/runtime-filter.h
M be/src/runtime/timestamp-value.h
M be/src/util/min-max-filter-ir.cc
M be/src/util/min-max-filter-test.cc
M be/src/util/min-max-filter.cc
M be/src/util/min-max-filter.h
M common/thrift/PlanNodes.thrift
M fe/src/main/java/org/apache/impala/analysis/BinaryPredicate.java
M fe/src/main/java/org/apache/impala/analysis/Predicate.java
M fe/src/main/java/org/apache/impala/analysis/SlotRef.java
M fe/src/main/java/org/apache/impala/analysis/TupleDescriptor.java
M fe/src/main/java/org/apache/impala/catalog/ScalarType.java
M fe/src/main/java/org/apache/impala/planner/AggregationNode.java
M fe/src/main/java/org/apache/impala/planner/JoinNode.java
M fe/src/main/java/org/apache/impala/planner/NestedLoopJoinNode.java
M fe/src/main/java/org/apache/impala/planner/PlanNode.java
M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java
M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
M testdata/workloads/functional-query/queries/QueryTest/overlap_min_max_filters.test
26 files changed, 1,190 insertions(+), 49 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/06/17706/16
-- 
To view, visit http://gerrit.cloudera.org:8080/17706
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I7c2bb5baad622051d1002c9c162c672d428e5446
Gerrit-Change-Number: 17706
Gerrit-PatchSet: 16
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>

[Impala-ASF-CR] IMPALA-3430: Runtime filter : Extend runtime filter to support Min/Max values for HDFS scans

Posted by "Zoltan Borok-Nagy (Code Review)" <ge...@cloudera.org>.
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/17706 )

Change subject: IMPALA-3430: Runtime filter : Extend runtime filter to support Min/Max values for HDFS scans
......................................................................


Patch Set 20:

(8 comments)

Did a first round. The change looks really nice and promising!

http://gerrit.cloudera.org:8080/#/c/17706/20//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/17706/20//COMMIT_MSG@9
PS20, Line 9: patches
patch


http://gerrit.cloudera.org:8080/#/c/17706/20//COMMIT_MSG@10
PS20, Line 10: one row
and only one column, right?


http://gerrit.cloudera.org:8080/#/c/17706/20//COMMIT_MSG@30
PS20, Line 30: InertFor
InsertFor


http://gerrit.cloudera.org:8080/#/c/17706/20/be/src/exec/nested-loop-join-builder.h
File be/src/exec/nested-loop-join-builder.h:

http://gerrit.cloudera.org:8080/#/c/17706/20/be/src/exec/nested-loop-join-builder.h@104
PS20, Line 104:       if (build_filters) {
Probably the compiler is smart enough to do that, but this 'if' could be moved out of the FOREACH_ROW:

 if (build_filters) {
   FOREACH_ROW(...
 }


http://gerrit.cloudera.org:8080/#/c/17706/20/be/src/exec/nested-loop-join-builder.cc
File be/src/exec/nested-loop-join-builder.cc:

http://gerrit.cloudera.org:8080/#/c/17706/20/be/src/exec/nested-loop-join-builder.cc@157
PS20, Line 157:  To be optimized*
Is it a TODO for the current CR? If not, could you please open a Jira ticket?


http://gerrit.cloudera.org:8080/#/c/17706/20/be/src/exec/nested-loop-join-builder.cc@259
PS20, Line 259: void NljBuilder::PublishRuntimeFilters(int64_t num_build_rows) {
nit: There are some code duplication with partitioned-hash-join-builder.cc. Is it possible to move some parts to a common place?


http://gerrit.cloudera.org:8080/#/c/17706/20/be/src/util/min-max-filter-ir.cc
File be/src/util/min-max-filter-ir.cc:

http://gerrit.cloudera.org:8080/#/c/17706/20/be/src/util/min-max-filter-ir.cc@115
PS20, Line 115:   DCHECK(false) << "StringMinMaxFilter::InsertForLE() is not supported";
Maybe we could use TruncateUp/TruncateDown:
https://github.com/apache/impala/blob/master/be/src/util/string-util.cc#L34:8

And when they don't work maybe we could just disable the filter by setting AlwaysTrue?


http://gerrit.cloudera.org:8080/#/c/17706/20/fe/src/main/java/org/apache/impala/planner/PlanNode.java
File fe/src/main/java/org/apache/impala/planner/PlanNode.java:

http://gerrit.cloudera.org:8080/#/c/17706/20/fe/src/main/java/org/apache/impala/planner/PlanNode.java@1073
PS20, Line 1073: .produceOneValueLogically(analyzer, expr);
Can we use isScalarSubquery() instead:
https://github.com/apache/impala/blob/954eb5c85d329af7690698cdc4d0f409260e6d18/fe/src/main/java/org/apache/impala/analysis/Expr.java#L1486:18

This would enable this optimization for more cases, e.g. SELECT without FROM, select with LIMIT 1.



-- 
To view, visit http://gerrit.cloudera.org:8080/17706
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7c2bb5baad622051d1002c9c162c672d428e5446
Gerrit-Change-Number: 17706
Gerrit-PatchSet: 20
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Amogh Margoor <am...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Fri, 06 Aug 2021 18:27:31 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-3430: Runtime filter : Extend runtime filter to support Min/Max values for HDFS scans

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17706 )

Change subject: IMPALA-3430: Runtime filter : Extend runtime filter to support Min/Max values for HDFS scans
......................................................................


Patch Set 25:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/9277/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/17706
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7c2bb5baad622051d1002c9c162c672d428e5446
Gerrit-Change-Number: 17706
Gerrit-PatchSet: 25
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Amogh Margoor <am...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Wed, 11 Aug 2021 00:08:51 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-3430: Runtime filter : Extend runtime filter to support Min/Max values for HDFS scans

Posted by "Qifan Chen (Code Review)" <ge...@cloudera.org>.
Qifan Chen has uploaded a new patch set (#15). ( http://gerrit.cloudera.org:8080/17706 )

Change subject: IMPALA-3430: Runtime filter : Extend runtime filter to support Min/Max values for HDFS scans
......................................................................

IMPALA-3430: Runtime filter : Extend runtime filter to support Min/Max values for HDFS scans

This patches enables min/max filtering for non-correlated subqueries
that return one row. In this case, the filters are built from the
results of the subqueries and the filtering target is the scan node to
be qualified by one of the subqueries. Shown below is one such query
that normally gets compiled into a nested loop join. The filtering
limits the values from column store_sales.ss_sales_price to be within
[-infinite, min(ss_wholesale_cost)].

  select count(*) from store_sales
    where ss_sales_price <=
      (select min(ss_wholesale_cost) from store_sales);

In the patch, the min/max filtering infrastructure is integrated with
the nested loop join as follows.

 1. Similar to hash join, NljBuilderConfig is populated with filter
    descriptors from nested join plan node via
    NljBuilder::CreateEmbeddedBuilder();
 2. NljBuilder is populated with filter contexts utilizing the filter
    descriptors in NljBuilderConfig. Filter contexts are the interface
    to actual min/max filters;
 3. New insertion methods InertFor<op>(), where <op> is LE, LT, GE and
    GT, added to MinMaxFilter class hierarcy. They are used for join
    predicate target <op> src_expr;
 4. RuntimeContext::InsertPerCompareOp() calls one of the new
    insertion methods above based on the comparison op saved in the
    filter descriptor;
 5. NljBuilder::InsertRuntimeFilters() calls the new methods.

Testing:
 1. Add single range insertion tests in min-max-filter-test.cc;

Change-Id: I7c2bb5baad622051d1002c9c162c672d428e5446
---
M be/src/exec/filter-context.cc
M be/src/exec/filter-context.h
M be/src/exec/nested-loop-join-builder.cc
M be/src/exec/nested-loop-join-builder.h
M be/src/exec/nested-loop-join-node.cc
M be/src/runtime/coordinator.cc
M be/src/runtime/query-state.h
M be/src/runtime/runtime-filter.h
M be/src/runtime/timestamp-value.h
M be/src/util/min-max-filter-ir.cc
M be/src/util/min-max-filter-test.cc
M be/src/util/min-max-filter.cc
M be/src/util/min-max-filter.h
M common/thrift/PlanNodes.thrift
M fe/src/main/java/org/apache/impala/analysis/BinaryPredicate.java
M fe/src/main/java/org/apache/impala/analysis/Predicate.java
M fe/src/main/java/org/apache/impala/analysis/SlotRef.java
M fe/src/main/java/org/apache/impala/analysis/TupleDescriptor.java
M fe/src/main/java/org/apache/impala/catalog/ScalarType.java
M fe/src/main/java/org/apache/impala/planner/AggregationNode.java
M fe/src/main/java/org/apache/impala/planner/JoinNode.java
M fe/src/main/java/org/apache/impala/planner/NestedLoopJoinNode.java
M fe/src/main/java/org/apache/impala/planner/PlanNode.java
M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java
M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
25 files changed, 1,082 insertions(+), 50 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/06/17706/15
-- 
To view, visit http://gerrit.cloudera.org:8080/17706
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I7c2bb5baad622051d1002c9c162c672d428e5446
Gerrit-Change-Number: 17706
Gerrit-PatchSet: 15
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>

[Impala-ASF-CR] IMPALA-3430: Runtime filter : Extend runtime filter to support Min/Max values for HDFS scans

Posted by "Qifan Chen (Code Review)" <ge...@cloudera.org>.
Qifan Chen has uploaded a new patch set (#19). ( http://gerrit.cloudera.org:8080/17706 )

Change subject: IMPALA-3430: Runtime filter : Extend runtime filter to support Min/Max values for HDFS scans
......................................................................

IMPALA-3430: Runtime filter : Extend runtime filter to support Min/Max values for HDFS scans

This patches enables min/max filtering for non-correlated subqueries
that return one row. In this case, the filters are built from the
results of the subqueries and the filtering target is the scan node to
be qualified by one of the subqueries. Shown below is one such query
that normally gets compiled into a nested loop join. The filtering
limits the values from column store_sales.ss_sales_price to be within
[-infinite, min(ss_wholesale_cost)].

  select count(*) from store_sales
    where ss_sales_price <=
      (select min(ss_wholesale_cost) from store_sales);

In the patch, the min/max filtering infrastructure is integrated with
the nested loop join as follows.

 1. Similar to hash join, NljBuilderConfig is populated with filter
    descriptors from nested join plan node via
    NljBuilder::CreateEmbeddedBuilder();
 2. NljBuilder is populated with filter contexts utilizing the filter
    descriptors in NljBuilderConfig. Filter contexts are the interface
    to actual min/max filters;
 3. New insertion methods InertFor<op>(), where <op> is LE, LT, GE and
    GT, are added to MinMaxFilter class hierarcy. They are used for
    join predicate target <op> src_expr;
 4. RuntimeContext::InsertPerCompareOp() calls one of the new
    insertion methods above based on the comparison op saved in the
    filter descriptor;
 5. NljBuilder::InsertRuntimeFilters() calls the new methods.

By default, the feature is turned on only for sorted or partitioned
join columns.

Testing:
 1. Add single range insertion tests in min-max-filter-test.cc;
 2. Add positive and negative plan tests in
    overlap_min_max_filters.test;
 3. Add tests in overlap_min_max_filters_on_partition_columns.test;
 4. Add tests in overlap_min_max_filters_on_sorted_columns.test;
 5. Run core tests.

Change-Id: I7c2bb5baad622051d1002c9c162c672d428e5446
---
M be/src/exec/filter-context.cc
M be/src/exec/filter-context.h
M be/src/exec/nested-loop-join-builder.cc
M be/src/exec/nested-loop-join-builder.h
M be/src/exec/nested-loop-join-node.cc
M be/src/runtime/coordinator.cc
M be/src/runtime/query-state.h
M be/src/runtime/runtime-filter.h
M be/src/runtime/timestamp-value.h
M be/src/util/min-max-filter-ir.cc
M be/src/util/min-max-filter-test.cc
M be/src/util/min-max-filter.cc
M be/src/util/min-max-filter.h
M common/thrift/PlanNodes.thrift
M fe/src/main/java/org/apache/impala/analysis/BinaryPredicate.java
M fe/src/main/java/org/apache/impala/analysis/Predicate.java
M fe/src/main/java/org/apache/impala/analysis/SlotRef.java
M fe/src/main/java/org/apache/impala/analysis/TupleDescriptor.java
M fe/src/main/java/org/apache/impala/catalog/ScalarType.java
M fe/src/main/java/org/apache/impala/planner/AggregationNode.java
M fe/src/main/java/org/apache/impala/planner/JoinNode.java
M fe/src/main/java/org/apache/impala/planner/NestedLoopJoinNode.java
M fe/src/main/java/org/apache/impala/planner/PlanNode.java
M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java
M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
M testdata/workloads/functional-query/queries/QueryTest/overlap_min_max_filters.test
M testdata/workloads/functional-query/queries/QueryTest/overlap_min_max_filters_on_partition_columns.test
M testdata/workloads/functional-query/queries/QueryTest/overlap_min_max_filters_on_sorted_columns.test
28 files changed, 1,223 insertions(+), 50 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/06/17706/19
-- 
To view, visit http://gerrit.cloudera.org:8080/17706
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I7c2bb5baad622051d1002c9c162c672d428e5446
Gerrit-Change-Number: 17706
Gerrit-PatchSet: 19
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Amogh Margoor <am...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>

[Impala-ASF-CR] IMPALA-3430: Runtime filter : Extend runtime filter to support Min/Max values for HDFS scans

Posted by "Qifan Chen (Code Review)" <ge...@cloudera.org>.
Qifan Chen has uploaded a new patch set (#26). ( http://gerrit.cloudera.org:8080/17706 )

Change subject: IMPALA-3430: Runtime filter : Extend runtime filter to support Min/Max values for HDFS scans
......................................................................

IMPALA-3430: Runtime filter : Extend runtime filter to support Min/Max values for HDFS scans

This patch enables min/max filtering for non-correlated subqueries
that return one value. In this case, the filters are built from the
results of the subqueries and the filtering target is the scan node to
be qualified by one of the subqueries. Shown below is one such query
that normally gets compiled into a nested loop join. The filtering
limits the values from column store_sales.ss_sales_price to be within
[-infinite, min(ss_wholesale_cost)].

  select count(*) from store_sales
    where ss_sales_price <=
      (select min(ss_wholesale_cost) from store_sales);

In FE, the fact that the above scalar subquery exists is recorded
in a flag in InlineViewRef in analyzer and later on transferred to
AggregationNode in planner.

In BE, the min/max filtering infrastructure is integrated with the
nested loop join as follows.

 1. Similar to hash join, NljBuilderConfig is populated with filter
    descriptors from nested join plan node via
    NljBuilder::CreateEmbeddedBuilder();
 2. NljBuilder is populated with filter contexts utilizing the filter
    descriptors in NljBuilderConfig. Filter contexts are the interface
    to actual min/max filters;
 3. New insertion methods InsertFor<op>(), where <op> is LE, LT, GE and
    GT, are added to MinMaxFilter class hierarcy. They are used for
    join predicate target <op> src_expr;
 4. RuntimeContext::InsertPerCompareOp() calls one of the new
    insertion methods above based on the comparison op saved in the
    filter descriptor;
 5. NljBuilder::InsertRuntimeFilters() calls the new methods.

By default, the feature is turned on only for sorted or partitioned
join columns.

Testing:
 1. Add single range insertion tests in min-max-filter-test.cc;
 2. Add positive and negative plan tests in
    overlap_min_max_filters.test;
 3. Add tests in overlap_min_max_filters_on_partition_columns.test;
 4. Add tests in overlap_min_max_filters_on_sorted_columns.test;
 5. Run core tests.

Change-Id: I7c2bb5baad622051d1002c9c162c672d428e5446
---
M be/src/exec/data-sink.h
M be/src/exec/filter-context.cc
M be/src/exec/filter-context.h
M be/src/exec/join-builder.cc
M be/src/exec/join-builder.h
M be/src/exec/nested-loop-join-builder.cc
M be/src/exec/nested-loop-join-builder.h
M be/src/exec/nested-loop-join-node.cc
M be/src/exec/partitioned-hash-join-builder.cc
M be/src/runtime/coordinator.cc
M be/src/runtime/query-state.h
M be/src/runtime/runtime-filter.h
M be/src/runtime/string-value-test.cc
M be/src/runtime/string-value.cc
M be/src/runtime/string-value.h
M be/src/runtime/timestamp-value.h
M be/src/util/min-max-filter-ir.cc
M be/src/util/min-max-filter-test.cc
M be/src/util/min-max-filter.cc
M be/src/util/min-max-filter.h
M common/thrift/PlanNodes.thrift
M fe/src/main/java/org/apache/impala/analysis/BinaryPredicate.java
M fe/src/main/java/org/apache/impala/analysis/InlineViewRef.java
M fe/src/main/java/org/apache/impala/analysis/Predicate.java
M fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java
M fe/src/main/java/org/apache/impala/catalog/ScalarType.java
M fe/src/main/java/org/apache/impala/planner/AggregationNode.java
M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
M fe/src/main/java/org/apache/impala/planner/JoinNode.java
M fe/src/main/java/org/apache/impala/planner/NestedLoopJoinNode.java
M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java
M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
M testdata/workloads/functional-query/queries/QueryTest/overlap_min_max_filters.test
M testdata/workloads/functional-query/queries/QueryTest/overlap_min_max_filters_on_partition_columns.test
M testdata/workloads/functional-query/queries/QueryTest/overlap_min_max_filters_on_sorted_columns.test
35 files changed, 1,497 insertions(+), 124 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/06/17706/26
-- 
To view, visit http://gerrit.cloudera.org:8080/17706
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I7c2bb5baad622051d1002c9c162c672d428e5446
Gerrit-Change-Number: 17706
Gerrit-PatchSet: 26
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Amogh Margoor <am...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>

[Impala-ASF-CR] IMPALA-3430: Runtime filter : Extend runtime filter to support Min/Max values for HDFS scans

Posted by "Qifan Chen (Code Review)" <ge...@cloudera.org>.
Qifan Chen has uploaded a new patch set (#20). ( http://gerrit.cloudera.org:8080/17706 )

Change subject: IMPALA-3430: Runtime filter : Extend runtime filter to support Min/Max values for HDFS scans
......................................................................

IMPALA-3430: Runtime filter : Extend runtime filter to support Min/Max values for HDFS scans

This patches enables min/max filtering for non-correlated subqueries
that return one row. In this case, the filters are built from the
results of the subqueries and the filtering target is the scan node to
be qualified by one of the subqueries. Shown below is one such query
that normally gets compiled into a nested loop join. The filtering
limits the values from column store_sales.ss_sales_price to be within
[-infinite, min(ss_wholesale_cost)].

  select count(*) from store_sales
    where ss_sales_price <=
      (select min(ss_wholesale_cost) from store_sales);

In the patch, the min/max filtering infrastructure is integrated with
the nested loop join as follows.

 1. Similar to hash join, NljBuilderConfig is populated with filter
    descriptors from nested join plan node via
    NljBuilder::CreateEmbeddedBuilder();
 2. NljBuilder is populated with filter contexts utilizing the filter
    descriptors in NljBuilderConfig. Filter contexts are the interface
    to actual min/max filters;
 3. New insertion methods InertFor<op>(), where <op> is LE, LT, GE and
    GT, are added to MinMaxFilter class hierarcy. They are used for
    join predicate target <op> src_expr;
 4. RuntimeContext::InsertPerCompareOp() calls one of the new
    insertion methods above based on the comparison op saved in the
    filter descriptor;
 5. NljBuilder::InsertRuntimeFilters() calls the new methods.

By default, the feature is turned on only for sorted or partitioned
join columns.

Testing:
 1. Add single range insertion tests in min-max-filter-test.cc;
 2. Add positive and negative plan tests in
    overlap_min_max_filters.test;
 3. Add tests in overlap_min_max_filters_on_partition_columns.test;
 4. Add tests in overlap_min_max_filters_on_sorted_columns.test;
 5. Run core tests.

Change-Id: I7c2bb5baad622051d1002c9c162c672d428e5446
---
M be/src/exec/filter-context.cc
M be/src/exec/filter-context.h
M be/src/exec/nested-loop-join-builder.cc
M be/src/exec/nested-loop-join-builder.h
M be/src/exec/nested-loop-join-node.cc
M be/src/runtime/coordinator.cc
M be/src/runtime/query-state.h
M be/src/runtime/runtime-filter.h
M be/src/runtime/timestamp-value.h
M be/src/util/min-max-filter-ir.cc
M be/src/util/min-max-filter-test.cc
M be/src/util/min-max-filter.cc
M be/src/util/min-max-filter.h
M common/thrift/PlanNodes.thrift
M fe/src/main/java/org/apache/impala/analysis/BinaryPredicate.java
M fe/src/main/java/org/apache/impala/analysis/Predicate.java
M fe/src/main/java/org/apache/impala/analysis/SlotRef.java
M fe/src/main/java/org/apache/impala/analysis/TupleDescriptor.java
M fe/src/main/java/org/apache/impala/catalog/ScalarType.java
M fe/src/main/java/org/apache/impala/planner/AggregationNode.java
M fe/src/main/java/org/apache/impala/planner/JoinNode.java
M fe/src/main/java/org/apache/impala/planner/NestedLoopJoinNode.java
M fe/src/main/java/org/apache/impala/planner/PlanNode.java
M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java
M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
M testdata/workloads/functional-query/queries/QueryTest/overlap_min_max_filters.test
M testdata/workloads/functional-query/queries/QueryTest/overlap_min_max_filters_on_partition_columns.test
M testdata/workloads/functional-query/queries/QueryTest/overlap_min_max_filters_on_sorted_columns.test
28 files changed, 1,240 insertions(+), 52 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/06/17706/20
-- 
To view, visit http://gerrit.cloudera.org:8080/17706
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I7c2bb5baad622051d1002c9c162c672d428e5446
Gerrit-Change-Number: 17706
Gerrit-PatchSet: 20
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Amogh Margoor <am...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>

[Impala-ASF-CR] [WIP] IMPALA-3430: Runtime filter : Extend runtime filter to support Min/Max values for HDFS scans

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17706 )

Change subject: [WIP] IMPALA-3430: Runtime filter : Extend runtime filter to support Min/Max values for HDFS scans
......................................................................


Patch Set 7:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/9188/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/17706
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7c2bb5baad622051d1002c9c162c672d428e5446
Gerrit-Change-Number: 17706
Gerrit-PatchSet: 7
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Comment-Date: Tue, 27 Jul 2021 19:38:03 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] [WIP] IMPALA-3430: Runtime filter : Extend runtime filter to support Min/Max values for HDFS scans

Posted by "Qifan Chen (Code Review)" <ge...@cloudera.org>.
Qifan Chen has uploaded a new patch set (#7). ( http://gerrit.cloudera.org:8080/17706 )

Change subject: [WIP] IMPALA-3430: Runtime filter : Extend runtime filter to support Min/Max values for HDFS scans
......................................................................

[WIP] IMPALA-3430: Runtime filter : Extend runtime filter to support Min/Max values for HDFS scans

This patches enables min/max filtering for non-correlated subqueries
that return one row. In this case, the filters are built from the
results of the subqueries and the filtering target is the scan node to
be qualified by one of the subqueries. Shown below is one such query
that normally gets compiled into a nested loop join. The filtering
limits the values from column store_sales.ss_sales_price within the
range [-infinite, avg(ss_wholesale_cost)].

  select count(*) from store_sales
    where ss_sales_price <=
      (select min(ss_wholesale_cost) from store_sales);

In the patch, the min/max filters are extended for the nested loop
join.

Change-Id: I7c2bb5baad622051d1002c9c162c672d428e5446
---
M be/src/exec/nested-loop-join-builder.cc
M be/src/exec/nested-loop-join-builder.h
M be/src/exec/nested-loop-join-node.cc
M be/src/runtime/coordinator.cc
M be/src/runtime/query-state.h
M fe/src/main/java/org/apache/impala/analysis/BinaryPredicate.java
M fe/src/main/java/org/apache/impala/analysis/Predicate.java
M fe/src/main/java/org/apache/impala/analysis/SlotRef.java
M fe/src/main/java/org/apache/impala/analysis/TupleDescriptor.java
M fe/src/main/java/org/apache/impala/planner/AggregationNode.java
M fe/src/main/java/org/apache/impala/planner/JoinNode.java
M fe/src/main/java/org/apache/impala/planner/NestedLoopJoinNode.java
M fe/src/main/java/org/apache/impala/planner/PlanNode.java
M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java
M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
15 files changed, 391 insertions(+), 27 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/06/17706/7
-- 
To view, visit http://gerrit.cloudera.org:8080/17706
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I7c2bb5baad622051d1002c9c162c672d428e5446
Gerrit-Change-Number: 17706
Gerrit-PatchSet: 7
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>

[Impala-ASF-CR] IMPALA-3430: Runtime filter : Extend runtime filter to support Min/Max values for HDFS scans

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17706 )

Change subject: IMPALA-3430: Runtime filter : Extend runtime filter to support Min/Max values for HDFS scans
......................................................................


Patch Set 28:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/17706/28/be/src/util/min-max-filter-test.cc
File be/src/util/min-max-filter-test.cc:

http://gerrit.cloudera.org:8080/#/c/17706/28/be/src/util/min-max-filter-test.cc@556
PS28, Line 556:   // final range in the filter is [MAX_BOUND_STRING, MAX_BOUND_STRING]. 
line has trailing whitespace



-- 
To view, visit http://gerrit.cloudera.org:8080/17706
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7c2bb5baad622051d1002c9c162c672d428e5446
Gerrit-Change-Number: 17706
Gerrit-PatchSet: 28
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Amogh Margoor <am...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Thu, 12 Aug 2021 19:18:34 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-3430: Runtime filter : Extend runtime filter to support Min/Max values for HDFS scans

Posted by "Qifan Chen (Code Review)" <ge...@cloudera.org>.
Qifan Chen has uploaded a new patch set (#32). ( http://gerrit.cloudera.org:8080/17706 )

Change subject: IMPALA-3430: Runtime filter : Extend runtime filter to support Min/Max values for HDFS scans
......................................................................

IMPALA-3430: Runtime filter : Extend runtime filter to support Min/Max values for HDFS scans

This patch enables min/max filtering for non-correlated subqueries
that return one value. In this case, the filters are built from the
results of the subqueries and the filtering target is the scan node to
be qualified by one of the subqueries. Shown below is one such query
that normally gets compiled into a nested loop join. The filtering
limits the values from column store_sales.ss_sales_price to be within
[-infinite, min(ss_wholesale_cost)].

  select count(*) from store_sales
    where ss_sales_price <=
      (select min(ss_wholesale_cost) from store_sales);

In FE, the fact that the above scalar subquery exists is recorded
in a flag in InlineViewRef in analyzer and later on transferred to
AggregationNode in planner.

In BE, the min/max filtering infrastructure is integrated with the
nested loop join as follows.

 1. NljBuilderConfig is populated with filter descriptors from nested
    join plan node via NljBuilder::CreateEmbeddedBuilder() (similar
    to hash join), or in NljBuilderConfig::Init() when the sink config
    is created (for separate builder case);
 2. NljBuilder is populated with filter contexts utilizing the filter
    descriptors in NljBuilderConfig. Filter contexts are the interface
    to actual min/max filters;
 3. New insertion methods InsertFor<op>(), where <op> is LE, LT, GE and
    GT, are added to MinMaxFilter class hierarcy. They are used for
    join predicate target <op> src_expr;
 4. RuntimeContext::InsertPerCompareOp() calls one of the new
    insertion methods above based on the comparison op saved in the
    filter descriptor;
 5. NljBuilder::InsertRuntimeFilters() calls the new methods.

By default, the feature is turned on only for sorted or partitioned
join columns.

Testing:
 1. Add single range insertion tests in min-max-filter-test.cc;
 2. Add positive and negative plan tests in
    overlap_min_max_filters.test;
 3. Add tests in overlap_min_max_filters_on_partition_columns.test;
 4. Add tests in overlap_min_max_filters_on_sorted_columns.test;
 5. Run core tests.

TODO in follow-up patches:
 1. Extend min/max filter for inequality subquery for other use cases
    (IMPALA-10869).

Change-Id: I7c2bb5baad622051d1002c9c162c672d428e5446
---
M be/src/exec/data-sink.h
M be/src/exec/filter-context.cc
M be/src/exec/filter-context.h
M be/src/exec/join-builder.cc
M be/src/exec/join-builder.h
M be/src/exec/nested-loop-join-builder.cc
M be/src/exec/nested-loop-join-builder.h
M be/src/exec/nested-loop-join-node.cc
M be/src/exec/partitioned-hash-join-builder.cc
M be/src/runtime/coordinator.cc
M be/src/runtime/query-state.h
M be/src/runtime/runtime-filter.h
M be/src/runtime/string-value-test.cc
M be/src/runtime/string-value.cc
M be/src/runtime/string-value.h
M be/src/runtime/timestamp-value.h
M be/src/util/min-max-filter-ir.cc
M be/src/util/min-max-filter-test.cc
M be/src/util/min-max-filter.cc
M be/src/util/min-max-filter.h
M common/thrift/PlanNodes.thrift
M fe/src/main/java/org/apache/impala/analysis/BinaryPredicate.java
M fe/src/main/java/org/apache/impala/analysis/InlineViewRef.java
M fe/src/main/java/org/apache/impala/analysis/Predicate.java
M fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java
M fe/src/main/java/org/apache/impala/catalog/ScalarType.java
M fe/src/main/java/org/apache/impala/planner/AggregationNode.java
M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
M fe/src/main/java/org/apache/impala/planner/JoinBuildSink.java
M fe/src/main/java/org/apache/impala/planner/JoinNode.java
M fe/src/main/java/org/apache/impala/planner/NestedLoopJoinNode.java
M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java
M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java
M testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test
M testdata/workloads/functional-query/queries/QueryTest/overlap_min_max_filters.test
M testdata/workloads/functional-query/queries/QueryTest/overlap_min_max_filters_on_partition_columns.test
M testdata/workloads/functional-query/queries/QueryTest/overlap_min_max_filters_on_sorted_columns.test
38 files changed, 1,586 insertions(+), 166 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/06/17706/32
-- 
To view, visit http://gerrit.cloudera.org:8080/17706
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I7c2bb5baad622051d1002c9c162c672d428e5446
Gerrit-Change-Number: 17706
Gerrit-PatchSet: 32
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Amogh Margoor <am...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>

[Impala-ASF-CR] IMPALA-3430: Runtime filter : Extend runtime filter to support Min/Max values for HDFS scans

Posted by "Qifan Chen (Code Review)" <ge...@cloudera.org>.
Qifan Chen has uploaded a new patch set (#22). ( http://gerrit.cloudera.org:8080/17706 )

Change subject: IMPALA-3430: Runtime filter : Extend runtime filter to support Min/Max values for HDFS scans
......................................................................

IMPALA-3430: Runtime filter : Extend runtime filter to support Min/Max values for HDFS scans

This patch enables min/max filtering for non-correlated subqueries
that return one value. In this case, the filters are built from the
results of the subqueries and the filtering target is the scan node to
be qualified by one of the subqueries. Shown below is one such query
that normally gets compiled into a nested loop join. The filtering
limits the values from column store_sales.ss_sales_price to be within
[-infinite, min(ss_wholesale_cost)].

  select count(*) from store_sales
    where ss_sales_price <=
      (select min(ss_wholesale_cost) from store_sales);

In the patch, the min/max filtering infrastructure is integrated with
the nested loop join as follows.

 1. Similar to hash join, NljBuilderConfig is populated with filter
    descriptors from nested join plan node via
    NljBuilder::CreateEmbeddedBuilder();
 2. NljBuilder is populated with filter contexts utilizing the filter
    descriptors in NljBuilderConfig. Filter contexts are the interface
    to actual min/max filters;
 3. New insertion methods InsertFor<op>(), where <op> is LE, LT, GE and
    GT, are added to MinMaxFilter class hierarcy. They are used for
    join predicate target <op> src_expr;
 4. RuntimeContext::InsertPerCompareOp() calls one of the new
    insertion methods above based on the comparison op saved in the
    filter descriptor;
 5. NljBuilder::InsertRuntimeFilters() calls the new methods.

By default, the feature is turned on only for sorted or partitioned
join columns.

Testing:
 1. Add single range insertion tests in min-max-filter-test.cc;
 2. Add positive and negative plan tests in
    overlap_min_max_filters.test;
 3. Add tests in overlap_min_max_filters_on_partition_columns.test;
 4. Add tests in overlap_min_max_filters_on_sorted_columns.test;
 5. Run core tests.

Change-Id: I7c2bb5baad622051d1002c9c162c672d428e5446
---
M be/src/exec/filter-context.cc
M be/src/exec/filter-context.h
M be/src/exec/nested-loop-join-builder.cc
M be/src/exec/nested-loop-join-builder.h
M be/src/exec/nested-loop-join-node.cc
M be/src/runtime/coordinator.cc
M be/src/runtime/query-state.h
M be/src/runtime/runtime-filter.h
M be/src/runtime/timestamp-value.h
M be/src/util/min-max-filter-ir.cc
M be/src/util/min-max-filter-test.cc
M be/src/util/min-max-filter.cc
M be/src/util/min-max-filter.h
M common/thrift/PlanNodes.thrift
M fe/src/main/java/org/apache/impala/analysis/BinaryPredicate.java
M fe/src/main/java/org/apache/impala/analysis/InlineViewRef.java
M fe/src/main/java/org/apache/impala/analysis/Predicate.java
M fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java
M fe/src/main/java/org/apache/impala/catalog/ScalarType.java
M fe/src/main/java/org/apache/impala/planner/AggregationNode.java
M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
M fe/src/main/java/org/apache/impala/planner/JoinNode.java
M fe/src/main/java/org/apache/impala/planner/NestedLoopJoinNode.java
M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java
M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
M testdata/workloads/functional-query/queries/QueryTest/overlap_min_max_filters.test
M testdata/workloads/functional-query/queries/QueryTest/overlap_min_max_filters_on_partition_columns.test
M testdata/workloads/functional-query/queries/QueryTest/overlap_min_max_filters_on_sorted_columns.test
28 files changed, 1,200 insertions(+), 50 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/06/17706/22
-- 
To view, visit http://gerrit.cloudera.org:8080/17706
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I7c2bb5baad622051d1002c9c162c672d428e5446
Gerrit-Change-Number: 17706
Gerrit-PatchSet: 22
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Amogh Margoor <am...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>

[Impala-ASF-CR] IMPALA-3430: Runtime filter : Extend runtime filter to support Min/Max values for HDFS scans

Posted by "Zoltan Borok-Nagy (Code Review)" <ge...@cloudera.org>.
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/17706 )

Change subject: IMPALA-3430: Runtime filter : Extend runtime filter to support Min/Max values for HDFS scans
......................................................................


Patch Set 25:

(14 comments)

Thanks for applying the changes.
I mainly reviewed the string +1/-1 part this time.

http://gerrit.cloudera.org:8080/#/c/17706/25/be/src/runtime/string-value.h
File be/src/runtime/string-value.h:

http://gerrit.cloudera.org:8080/#/c/17706/25/be/src/runtime/string-value.h@130
PS25, Line 130:   std::string LeastSmallerString() const;
please add some backend tests for these functions


http://gerrit.cloudera.org:8080/#/c/17706/25/be/src/runtime/string-value.cc
File be/src/runtime/string-value.cc:

http://gerrit.cloudera.org:8080/#/c/17706/25/be/src/runtime/string-value.cc@49
PS25, Line 49: string()
nit: could be just return ""; or return {};


http://gerrit.cloudera.org:8080/#/c/17706/25/be/src/runtime/string-value.cc@52
PS25, Line 52:   while (i >= 0 && ptr[i] == 0x00) {
             :     i--;
             :   }
nit: could fit in a single line


http://gerrit.cloudera.org:8080/#/c/17706/25/be/src/runtime/string-value.cc@55
PS25, Line 55: i == -1
probably add UNLIKELY?


http://gerrit.cloudera.org:8080/#/c/17706/25/be/src/runtime/string-value.cc@56
PS25, Line 56: 0xff chars.
Shouldn't be 0x00 chars if we want a string smaller than '*this'?


http://gerrit.cloudera.org:8080/#/c/17706/25/be/src/runtime/string-value.cc@62
PS25, Line 62:   string result(ptr, i);
             :   // append a char which is ptr[i]-1
             :   result.append(1, (uint8_t)(ptr[i]) - 1);
             :   // copy all remaining characters (which must be 0x00) in [i+1, size()-1]
             :   result.append(ptr + i + 1, len - i - 1);
The appends might need to allocate a bigger buffer for the string. We could pre-allocate the buffer with the required size:

 string result;
 result.reserve(len);
 result.append(ptr, i);
 ...


http://gerrit.cloudera.org:8080/#/c/17706/25/be/src/runtime/string-value.cc@75
PS25, Line 75: i == -1)
probably add UNLIKELY?


http://gerrit.cloudera.org:8080/#/c/17706/25/be/src/runtime/string-value.cc@81
PS25, Line 81: len+1 0x00 chars
Shouldn't be len+1 0xFF if we want a larger string?

Or (len 0xFF) + 0x00 if we want the least larger string?


http://gerrit.cloudera.org:8080/#/c/17706/25/be/src/runtime/string-value.cc@87
PS25, Line 87:   string result(ptr, i);
Same as above. I think we should reserve the required amount of space before appending.


http://gerrit.cloudera.org:8080/#/c/17706/25/be/src/runtime/string-value.cc@89
PS25, Line 89:   result.append(1, (uint8_t)(ptr[i]) + 1);
At this point result is already larger then *this. I don't think that we need to copy the remaining parts of the string.


http://gerrit.cloudera.org:8080/#/c/17706/25/be/src/util/min-max-filter-test.cc
File be/src/util/min-max-filter-test.cc:

http://gerrit.cloudera.org:8080/#/c/17706/25/be/src/util/min-max-filter-test.cc@572
PS25, Line 572: two0xffChars
It should be the least smaller string of threeNullVal, but two0xffChars is greater than threeNullVal.


http://gerrit.cloudera.org:8080/#/c/17706/25/be/src/util/min-max-filter.h
File be/src/util/min-max-filter.h:

http://gerrit.cloudera.org:8080/#/c/17706/25/be/src/util/min-max-filter.h@248
PS25, Line 248: a string of 1 byte of 0x0.
shouldn't be the empty string?


http://gerrit.cloudera.org:8080/#/c/17706/25/fe/src/main/java/org/apache/impala/analysis/InlineViewRef.java
File fe/src/main/java/org/apache/impala/analysis/InlineViewRef.java:

http://gerrit.cloudera.org:8080/#/c/17706/25/fe/src/main/java/org/apache/impala/analysis/InlineViewRef.java@457
PS25, Line 457: getIsNonCorrelatedScalarSubquery
nit: maybe just isNonCorrelatedScalarSubquery()?


http://gerrit.cloudera.org:8080/#/c/17706/25/fe/src/main/java/org/apache/impala/planner/AggregationNode.java
File fe/src/main/java/org/apache/impala/planner/AggregationNode.java:

http://gerrit.cloudera.org:8080/#/c/17706/25/fe/src/main/java/org/apache/impala/planner/AggregationNode.java@641
PS25, Line 641: getIsNonCorrelatedScalarSubquery
nit: isNonCorrelatedScalarSubquery()?



-- 
To view, visit http://gerrit.cloudera.org:8080/17706
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7c2bb5baad622051d1002c9c162c672d428e5446
Gerrit-Change-Number: 17706
Gerrit-PatchSet: 25
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Amogh Margoor <am...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Wed, 11 Aug 2021 11:25:29 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-3430: Runtime filter : Extend runtime filter to support Min/Max values for HDFS scans

Posted by "Qifan Chen (Code Review)" <ge...@cloudera.org>.
Qifan Chen has uploaded a new patch set (#11). ( http://gerrit.cloudera.org:8080/17706 )

Change subject: IMPALA-3430: Runtime filter : Extend runtime filter to support Min/Max values for HDFS scans
......................................................................

IMPALA-3430: Runtime filter : Extend runtime filter to support Min/Max values for HDFS scans

This patches enables min/max filtering for non-correlated subqueries
that return one row. In this case, the filters are built from the
results of the subqueries and the filtering target is the scan node to
be qualified by one of the subqueries. Shown below is one such query
that normally gets compiled into a nested loop join. The filtering
limits the values from column store_sales.ss_sales_price to be within
[-infinite, min(ss_wholesale_cost)].

  select count(*) from store_sales
    where ss_sales_price <=
      (select min(ss_wholesale_cost) from store_sales);

In the patch, the min/max filtering infrastructure is integrated with
the nested loop join as follows.

 1. Two new insertion methods InertForLE() and InsertForGE() are added
    to MinMaxFilter class hierarcy which maintain only the max or the
    min value respectivively in a min/max filter;
 2. RuntimeContext::InsertPerCompareOp() calls one of the two new
    insertion methods above based on the comparison op saved in the
    filter descriptor;
 3. NljBuilder::InsertRuntimeFilters() calls the new method in 2).

Change-Id: I7c2bb5baad622051d1002c9c162c672d428e5446
---
M be/src/exec/filter-context.cc
M be/src/exec/filter-context.h
M be/src/exec/nested-loop-join-builder.cc
M be/src/exec/nested-loop-join-builder.h
M be/src/exec/nested-loop-join-node.cc
M be/src/runtime/coordinator.cc
M be/src/runtime/decimal-value.h
M be/src/runtime/query-state.h
M be/src/runtime/runtime-filter.h
M be/src/runtime/timestamp-value.h
M be/src/util/min-max-filter-ir.cc
M be/src/util/min-max-filter.cc
M be/src/util/min-max-filter.h
M common/thrift/PlanNodes.thrift
M fe/src/main/java/org/apache/impala/analysis/BinaryPredicate.java
M fe/src/main/java/org/apache/impala/analysis/Predicate.java
M fe/src/main/java/org/apache/impala/analysis/SlotRef.java
M fe/src/main/java/org/apache/impala/analysis/TupleDescriptor.java
M fe/src/main/java/org/apache/impala/planner/AggregationNode.java
M fe/src/main/java/org/apache/impala/planner/JoinNode.java
M fe/src/main/java/org/apache/impala/planner/NestedLoopJoinNode.java
M fe/src/main/java/org/apache/impala/planner/PlanNode.java
M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java
M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
24 files changed, 643 insertions(+), 32 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/06/17706/11
-- 
To view, visit http://gerrit.cloudera.org:8080/17706
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I7c2bb5baad622051d1002c9c162c672d428e5446
Gerrit-Change-Number: 17706
Gerrit-PatchSet: 11
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>

[Impala-ASF-CR] IMPALA-3430: Runtime filter : Extend runtime filter to support Min/Max values for HDFS scans

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17706 )

Change subject: IMPALA-3430: Runtime filter : Extend runtime filter to support Min/Max values for HDFS scans
......................................................................


Patch Set 32:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/9320/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/17706
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7c2bb5baad622051d1002c9c162c672d428e5446
Gerrit-Change-Number: 17706
Gerrit-PatchSet: 32
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Amogh Margoor <am...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Wed, 18 Aug 2021 20:45:40 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-3430: Runtime filter : Extend runtime filter to support Min/Max values for HDFS scans

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17706 )

Change subject: IMPALA-3430: Runtime filter : Extend runtime filter to support Min/Max values for HDFS scans
......................................................................


Patch Set 22:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/9261/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/17706
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7c2bb5baad622051d1002c9c162c672d428e5446
Gerrit-Change-Number: 17706
Gerrit-PatchSet: 22
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Amogh Margoor <am...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Mon, 09 Aug 2021 19:19:49 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-3430: Runtime filter : Extend runtime filter to support Min/Max values for HDFS scans

Posted by "Qifan Chen (Code Review)" <ge...@cloudera.org>.
Qifan Chen has posted comments on this change. ( http://gerrit.cloudera.org:8080/17706 )

Change subject: IMPALA-3430: Runtime filter : Extend runtime filter to support Min/Max values for HDFS scans
......................................................................


Patch Set 27:

Thanks Zoltan for the review. Appreciate it.


-- 
To view, visit http://gerrit.cloudera.org:8080/17706
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7c2bb5baad622051d1002c9c162c672d428e5446
Gerrit-Change-Number: 17706
Gerrit-PatchSet: 27
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Amogh Margoor <am...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Wed, 11 Aug 2021 22:06:46 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-3430: Runtime filter : Extend runtime filter to support Min/Max values for HDFS scans

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17706 )

Change subject: IMPALA-3430: Runtime filter : Extend runtime filter to support Min/Max values for HDFS scans
......................................................................


Patch Set 33: Verified+1


-- 
To view, visit http://gerrit.cloudera.org:8080/17706
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7c2bb5baad622051d1002c9c162c672d428e5446
Gerrit-Change-Number: 17706
Gerrit-PatchSet: 33
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Amogh Margoor <am...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Sat, 21 Aug 2021 14:46:50 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] [WIP] IMPALA-3430: Runtime filter : Extend runtime filter to support Min/Max values for HDFS scans

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17706 )

Change subject: [WIP] IMPALA-3430: Runtime filter : Extend runtime filter to support Min/Max values for HDFS scans
......................................................................


Patch Set 5:

Build Failed 

https://jenkins.impala.io/job/gerrit-code-review-checks/9173/ : Initial code review checks failed. See linked job for details on the failure.


-- 
To view, visit http://gerrit.cloudera.org:8080/17706
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7c2bb5baad622051d1002c9c162c672d428e5446
Gerrit-Change-Number: 17706
Gerrit-PatchSet: 5
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Comment-Date: Mon, 26 Jul 2021 16:34:33 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-3430: Runtime filter : Extend runtime filter to support Min/Max values for HDFS scans

Posted by "Zoltan Borok-Nagy (Code Review)" <ge...@cloudera.org>.
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/17706 )

Change subject: IMPALA-3430: Runtime filter : Extend runtime filter to support Min/Max values for HDFS scans
......................................................................


Patch Set 32: Code-Review+2

LGTM!


-- 
To view, visit http://gerrit.cloudera.org:8080/17706
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7c2bb5baad622051d1002c9c162c672d428e5446
Gerrit-Change-Number: 17706
Gerrit-PatchSet: 32
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Amogh Margoor <am...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Sat, 21 Aug 2021 08:31:36 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-3430: Runtime filter : Extend runtime filter to support Min/Max values for HDFS scans

Posted by "Qifan Chen (Code Review)" <ge...@cloudera.org>.
Qifan Chen has uploaded a new patch set (#10). ( http://gerrit.cloudera.org:8080/17706 )

Change subject: IMPALA-3430: Runtime filter : Extend runtime filter to support Min/Max values for HDFS scans
......................................................................

IMPALA-3430: Runtime filter : Extend runtime filter to support Min/Max values for HDFS scans

This patches enables min/max filtering for non-correlated subqueries
that return one row. In this case, the filters are built from the
results of the subqueries and the filtering target is the scan node to
be qualified by one of the subqueries. Shown below is one such query
that normally gets compiled into a nested loop join. The filtering
limits the values from column store_sales.ss_sales_price to be within
[-infinite, min(ss_wholesale_cost)].

  select count(*) from store_sales
    where ss_sales_price <=
      (select min(ss_wholesale_cost) from store_sales);

In the patch, the min/max filtering infrastructure is integrated with
the nested loop join.

Change-Id: I7c2bb5baad622051d1002c9c162c672d428e5446
---
M be/src/exec/nested-loop-join-builder.cc
M be/src/exec/nested-loop-join-builder.h
M be/src/exec/nested-loop-join-node.cc
M be/src/runtime/coordinator.cc
M be/src/runtime/query-state.h
M common/thrift/PlanNodes.thrift
M fe/src/main/java/org/apache/impala/analysis/BinaryPredicate.java
M fe/src/main/java/org/apache/impala/analysis/Predicate.java
M fe/src/main/java/org/apache/impala/analysis/SlotRef.java
M fe/src/main/java/org/apache/impala/analysis/TupleDescriptor.java
M fe/src/main/java/org/apache/impala/planner/AggregationNode.java
M fe/src/main/java/org/apache/impala/planner/JoinNode.java
M fe/src/main/java/org/apache/impala/planner/NestedLoopJoinNode.java
M fe/src/main/java/org/apache/impala/planner/PlanNode.java
M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java
M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
16 files changed, 395 insertions(+), 28 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/06/17706/10
-- 
To view, visit http://gerrit.cloudera.org:8080/17706
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I7c2bb5baad622051d1002c9c162c672d428e5446
Gerrit-Change-Number: 17706
Gerrit-PatchSet: 10
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>

[Impala-ASF-CR] IMPALA-3430: Runtime filter : Extend runtime filter to support Min/Max values for HDFS scans

Posted by "Qifan Chen (Code Review)" <ge...@cloudera.org>.
Qifan Chen has uploaded a new patch set (#24). ( http://gerrit.cloudera.org:8080/17706 )

Change subject: IMPALA-3430: Runtime filter : Extend runtime filter to support Min/Max values for HDFS scans
......................................................................

IMPALA-3430: Runtime filter : Extend runtime filter to support Min/Max values for HDFS scans

This patch enables min/max filtering for non-correlated subqueries
that return one value. In this case, the filters are built from the
results of the subqueries and the filtering target is the scan node to
be qualified by one of the subqueries. Shown below is one such query
that normally gets compiled into a nested loop join. The filtering
limits the values from column store_sales.ss_sales_price to be within
[-infinite, min(ss_wholesale_cost)].

  select count(*) from store_sales
    where ss_sales_price <=
      (select min(ss_wholesale_cost) from store_sales);

In FE, the fact that the above scalar subquery exists is recorded
in a flag in InlineViewRef in analyzer and later on transferred to
AggregationNode in planner.

In BE, the min/max filtering infrastructure is integrated with the
nested loop join as follows.

 1. Similar to hash join, NljBuilderConfig is populated with filter
    descriptors from nested join plan node via
    NljBuilder::CreateEmbeddedBuilder();
 2. NljBuilder is populated with filter contexts utilizing the filter
    descriptors in NljBuilderConfig. Filter contexts are the interface
    to actual min/max filters;
 3. New insertion methods InsertFor<op>(), where <op> is LE, LT, GE and
    GT, are added to MinMaxFilter class hierarcy. They are used for
    join predicate target <op> src_expr;
 4. RuntimeContext::InsertPerCompareOp() calls one of the new
    insertion methods above based on the comparison op saved in the
    filter descriptor;
 5. NljBuilder::InsertRuntimeFilters() calls the new methods.

By default, the feature is turned on only for sorted or partitioned
join columns.

Testing:
 1. Add single range insertion tests in min-max-filter-test.cc;
 2. Add positive and negative plan tests in
    overlap_min_max_filters.test;
 3. Add tests in overlap_min_max_filters_on_partition_columns.test;
 4. Add tests in overlap_min_max_filters_on_sorted_columns.test;
 5. Run core tests.

Change-Id: I7c2bb5baad622051d1002c9c162c672d428e5446
---
M be/src/exec/data-sink.h
M be/src/exec/filter-context.cc
M be/src/exec/filter-context.h
M be/src/exec/join-builder.cc
M be/src/exec/join-builder.h
M be/src/exec/nested-loop-join-builder.cc
M be/src/exec/nested-loop-join-builder.h
M be/src/exec/nested-loop-join-node.cc
M be/src/exec/partitioned-hash-join-builder.cc
M be/src/runtime/coordinator.cc
M be/src/runtime/query-state.h
M be/src/runtime/runtime-filter.h
M be/src/runtime/timestamp-value.h
M be/src/util/min-max-filter-ir.cc
M be/src/util/min-max-filter-test.cc
M be/src/util/min-max-filter.cc
M be/src/util/min-max-filter.h
M common/thrift/PlanNodes.thrift
M fe/src/main/java/org/apache/impala/analysis/BinaryPredicate.java
M fe/src/main/java/org/apache/impala/analysis/InlineViewRef.java
M fe/src/main/java/org/apache/impala/analysis/Predicate.java
M fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java
M fe/src/main/java/org/apache/impala/catalog/ScalarType.java
M fe/src/main/java/org/apache/impala/planner/AggregationNode.java
M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
M fe/src/main/java/org/apache/impala/planner/JoinNode.java
M fe/src/main/java/org/apache/impala/planner/NestedLoopJoinNode.java
M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java
M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
M testdata/workloads/functional-query/queries/QueryTest/overlap_min_max_filters.test
M testdata/workloads/functional-query/queries/QueryTest/overlap_min_max_filters_on_partition_columns.test
M testdata/workloads/functional-query/queries/QueryTest/overlap_min_max_filters_on_sorted_columns.test
32 files changed, 1,228 insertions(+), 123 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/06/17706/24
-- 
To view, visit http://gerrit.cloudera.org:8080/17706
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I7c2bb5baad622051d1002c9c162c672d428e5446
Gerrit-Change-Number: 17706
Gerrit-PatchSet: 24
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Amogh Margoor <am...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>

[Impala-ASF-CR] IMPALA-3430: Runtime filter : Extend runtime filter to support Min/Max values for HDFS scans

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17706 )

Change subject: IMPALA-3430: Runtime filter : Extend runtime filter to support Min/Max values for HDFS scans
......................................................................


Patch Set 31:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/9305/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/17706
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7c2bb5baad622051d1002c9c162c672d428e5446
Gerrit-Change-Number: 17706
Gerrit-PatchSet: 31
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Amogh Margoor <am...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Tue, 17 Aug 2021 03:12:01 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-3430: Runtime filter : Extend runtime filter to support Min/Max values for HDFS scans

Posted by "Qifan Chen (Code Review)" <ge...@cloudera.org>.
Qifan Chen has uploaded a new patch set (#21). ( http://gerrit.cloudera.org:8080/17706 )

Change subject: IMPALA-3430: Runtime filter : Extend runtime filter to support Min/Max values for HDFS scans
......................................................................

IMPALA-3430: Runtime filter : Extend runtime filter to support Min/Max values for HDFS scans

This patches enables min/max filtering for non-correlated subqueries
that return one row. In this case, the filters are built from the
results of the subqueries and the filtering target is the scan node to
be qualified by one of the subqueries. Shown below is one such query
that normally gets compiled into a nested loop join. The filtering
limits the values from column store_sales.ss_sales_price to be within
[-infinite, min(ss_wholesale_cost)].

  select count(*) from store_sales
    where ss_sales_price <=
      (select min(ss_wholesale_cost) from store_sales);

In the patch, the min/max filtering infrastructure is integrated with
the nested loop join as follows.

 1. Similar to hash join, NljBuilderConfig is populated with filter
    descriptors from nested join plan node via
    NljBuilder::CreateEmbeddedBuilder();
 2. NljBuilder is populated with filter contexts utilizing the filter
    descriptors in NljBuilderConfig. Filter contexts are the interface
    to actual min/max filters;
 3. New insertion methods InertFor<op>(), where <op> is LE, LT, GE and
    GT, are added to MinMaxFilter class hierarcy. They are used for
    join predicate target <op> src_expr;
 4. RuntimeContext::InsertPerCompareOp() calls one of the new
    insertion methods above based on the comparison op saved in the
    filter descriptor;
 5. NljBuilder::InsertRuntimeFilters() calls the new methods.

By default, the feature is turned on only for sorted or partitioned
join columns.

Testing:
 1. Add single range insertion tests in min-max-filter-test.cc;
 2. Add positive and negative plan tests in
    overlap_min_max_filters.test;
 3. Add tests in overlap_min_max_filters_on_partition_columns.test;
 4. Add tests in overlap_min_max_filters_on_sorted_columns.test;
 5. Run core tests.

Change-Id: I7c2bb5baad622051d1002c9c162c672d428e5446
---
M be/src/exec/filter-context.cc
M be/src/exec/filter-context.h
M be/src/exec/nested-loop-join-builder.cc
M be/src/exec/nested-loop-join-builder.h
M be/src/exec/nested-loop-join-node.cc
M be/src/runtime/coordinator.cc
M be/src/runtime/query-state.h
M be/src/runtime/runtime-filter.h
M be/src/runtime/timestamp-value.h
M be/src/util/min-max-filter-ir.cc
M be/src/util/min-max-filter-test.cc
M be/src/util/min-max-filter.cc
M be/src/util/min-max-filter.h
M common/thrift/PlanNodes.thrift
M fe/src/main/java/org/apache/impala/analysis/BinaryPredicate.java
M fe/src/main/java/org/apache/impala/analysis/InlineViewRef.java
M fe/src/main/java/org/apache/impala/analysis/Predicate.java
M fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java
M fe/src/main/java/org/apache/impala/catalog/ScalarType.java
M fe/src/main/java/org/apache/impala/planner/AggregationNode.java
M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
M fe/src/main/java/org/apache/impala/planner/JoinNode.java
M fe/src/main/java/org/apache/impala/planner/NestedLoopJoinNode.java
M fe/src/main/java/org/apache/impala/planner/PlanNode.java
M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java
M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
M testdata/workloads/functional-query/queries/QueryTest/overlap_min_max_filters.test
M testdata/workloads/functional-query/queries/QueryTest/overlap_min_max_filters_on_partition_columns.test
M testdata/workloads/functional-query/queries/QueryTest/overlap_min_max_filters_on_sorted_columns.test
29 files changed, 1,205 insertions(+), 52 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/06/17706/21
-- 
To view, visit http://gerrit.cloudera.org:8080/17706
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I7c2bb5baad622051d1002c9c162c672d428e5446
Gerrit-Change-Number: 17706
Gerrit-PatchSet: 21
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Amogh Margoor <am...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>

[Impala-ASF-CR] IMPALA-3430: Runtime filter : Extend runtime filter to support Min/Max values for HDFS scans

Posted by "Qifan Chen (Code Review)" <ge...@cloudera.org>.
Qifan Chen has posted comments on this change. ( http://gerrit.cloudera.org:8080/17706 )

Change subject: IMPALA-3430: Runtime filter : Extend runtime filter to support Min/Max values for HDFS scans
......................................................................


Patch Set 32:

IMPALA-10869 Extend min/max filter for inequality subquery for other use cases was filed and quoted in the commit message.


-- 
To view, visit http://gerrit.cloudera.org:8080/17706
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7c2bb5baad622051d1002c9c162c672d428e5446
Gerrit-Change-Number: 17706
Gerrit-PatchSet: 32
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Amogh Margoor <am...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Wed, 18 Aug 2021 20:21:59 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-3430: Runtime filter : Extend runtime filter to support Min/Max values for HDFS scans

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17706 )

Change subject: IMPALA-3430: Runtime filter : Extend runtime filter to support Min/Max values for HDFS scans
......................................................................


Patch Set 10:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/9192/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/17706
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7c2bb5baad622051d1002c9c162c672d428e5446
Gerrit-Change-Number: 17706
Gerrit-PatchSet: 10
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Comment-Date: Tue, 27 Jul 2021 22:05:49 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] [WIP] IMPALA-3430: Runtime filter : Extend runtime filter to support Min/Max values for HDFS scans

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17706 )

Change subject: [WIP] IMPALA-3430: Runtime filter : Extend runtime filter to support Min/Max values for HDFS scans
......................................................................


Patch Set 3:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/17706/3/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java
File fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java:

http://gerrit.cloudera.org:8080/#/c/17706/3/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java@741
PS3, Line 741:       LOG.error("C0: " + root.getChild(0).debugString() + ", label=" + root.getChild(0).getDisplayLabel());
line too long (107 > 90)


http://gerrit.cloudera.org:8080/#/c/17706/3/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java@742
PS3, Line 742:       LOG.error("C1: " + root.getChild(1).debugString() + ", label=" + root.getChild(1).getDisplayLabel());
line too long (107 > 90)


http://gerrit.cloudera.org:8080/#/c/17706/3/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java@743
PS3, Line 743:       LOG.error("C1.0: " + root.getChild(1).getChild(0).debugString() + ", label=" + root.getChild(1).getChild(0).getDisplayLabel());
line too long (133 > 90)



-- 
To view, visit http://gerrit.cloudera.org:8080/17706
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7c2bb5baad622051d1002c9c162c672d428e5446
Gerrit-Change-Number: 17706
Gerrit-PatchSet: 3
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Comment-Date: Wed, 21 Jul 2021 20:42:27 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-3430: Runtime filter : Extend runtime filter to support Min/Max values for HDFS scans

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17706 )

Change subject: IMPALA-3430: Runtime filter : Extend runtime filter to support Min/Max values for HDFS scans
......................................................................


Patch Set 29:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/9291/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/17706
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7c2bb5baad622051d1002c9c162c672d428e5446
Gerrit-Change-Number: 17706
Gerrit-PatchSet: 29
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Amogh Margoor <am...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Fri, 13 Aug 2021 01:58:11 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-3430: Runtime filter : Extend runtime filter to support Min/Max values for HDFS scans

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17706 )

Change subject: IMPALA-3430: Runtime filter : Extend runtime filter to support Min/Max values for HDFS scans
......................................................................


Patch Set 18:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/9238/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/17706
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7c2bb5baad622051d1002c9c162c672d428e5446
Gerrit-Change-Number: 17706
Gerrit-PatchSet: 18
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Amogh Margoor <am...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Wed, 04 Aug 2021 21:03:10 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-3430: Runtime filter : Extend runtime filter to support Min/Max values for HDFS scans

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17706 )

Change subject: IMPALA-3430: Runtime filter : Extend runtime filter to support Min/Max values for HDFS scans
......................................................................


Patch Set 9:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/9191/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/17706
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7c2bb5baad622051d1002c9c162c672d428e5446
Gerrit-Change-Number: 17706
Gerrit-PatchSet: 9
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Comment-Date: Tue, 27 Jul 2021 22:05:43 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-3430: Runtime filter : Extend runtime filter to support Min/Max values for HDFS scans

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17706 )

Change subject: IMPALA-3430: Runtime filter : Extend runtime filter to support Min/Max values for HDFS scans
......................................................................


Patch Set 27:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/9284/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/17706
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7c2bb5baad622051d1002c9c162c672d428e5446
Gerrit-Change-Number: 17706
Gerrit-PatchSet: 27
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Amogh Margoor <am...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Wed, 11 Aug 2021 20:39:38 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-3430: Runtime filter : Extend runtime filter to support Min/Max values for HDFS scans

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17706 )

Change subject: IMPALA-3430: Runtime filter : Extend runtime filter to support Min/Max values for HDFS scans
......................................................................


Patch Set 21:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/9254/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/17706
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7c2bb5baad622051d1002c9c162c672d428e5446
Gerrit-Change-Number: 17706
Gerrit-PatchSet: 21
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Amogh Margoor <am...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Fri, 06 Aug 2021 20:58:53 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-3430: Runtime filter : Extend runtime filter to support Min/Max values for HDFS scans

Posted by "Qifan Chen (Code Review)" <ge...@cloudera.org>.
Qifan Chen has uploaded a new patch set (#13). ( http://gerrit.cloudera.org:8080/17706 )

Change subject: IMPALA-3430: Runtime filter : Extend runtime filter to support Min/Max values for HDFS scans
......................................................................

IMPALA-3430: Runtime filter : Extend runtime filter to support Min/Max values for HDFS scans

This patches enables min/max filtering for non-correlated subqueries
that return one row. In this case, the filters are built from the
results of the subqueries and the filtering target is the scan node to
be qualified by one of the subqueries. Shown below is one such query
that normally gets compiled into a nested loop join. The filtering
limits the values from column store_sales.ss_sales_price to be within
[-infinite, min(ss_wholesale_cost)].

  select count(*) from store_sales
    where ss_sales_price <=
      (select min(ss_wholesale_cost) from store_sales);

In the patch, the min/max filtering infrastructure is integrated with
the nested loop join as follows.

 1. Similar to hash join, NljBuilderConfig is populated with filter
    descriptors from nested join plan node via
    NljBuilder::CreateEmbeddedBuilder();
 2. NljBuilder is populated with filter contexts utilizing the filter
    descriptors in NljBuilderConfig. Filter contexts are the interface
    to actual min/max filters;
 3. New insertion methods InertFor<op>(), where <op> is LE, LT, GE and
    GT, added to MinMaxFilter class hierarcy. They are used for join
    predicate target <op> src_expr;
 4. RuntimeContext::InsertPerCompareOp() calls one of the new
    insertion methods above based on the comparison op saved in the
    filter descriptor;
 5. NljBuilder::InsertRuntimeFilters() calls the new methods.

Change-Id: I7c2bb5baad622051d1002c9c162c672d428e5446
---
M be/src/exec/filter-context.cc
M be/src/exec/filter-context.h
M be/src/exec/nested-loop-join-builder.cc
M be/src/exec/nested-loop-join-builder.h
M be/src/exec/nested-loop-join-node.cc
M be/src/runtime/coordinator.cc
M be/src/runtime/query-state.h
M be/src/runtime/runtime-filter.h
M be/src/runtime/timestamp-value.h
M be/src/util/min-max-filter-ir.cc
M be/src/util/min-max-filter.cc
M be/src/util/min-max-filter.h
M common/thrift/PlanNodes.thrift
M fe/src/main/java/org/apache/impala/analysis/BinaryPredicate.java
M fe/src/main/java/org/apache/impala/analysis/Predicate.java
M fe/src/main/java/org/apache/impala/analysis/SlotRef.java
M fe/src/main/java/org/apache/impala/analysis/TupleDescriptor.java
M fe/src/main/java/org/apache/impala/planner/AggregationNode.java
M fe/src/main/java/org/apache/impala/planner/JoinNode.java
M fe/src/main/java/org/apache/impala/planner/NestedLoopJoinNode.java
M fe/src/main/java/org/apache/impala/planner/PlanNode.java
M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java
M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
23 files changed, 898 insertions(+), 45 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/06/17706/13
-- 
To view, visit http://gerrit.cloudera.org:8080/17706
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I7c2bb5baad622051d1002c9c162c672d428e5446
Gerrit-Change-Number: 17706
Gerrit-PatchSet: 13
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>

[Impala-ASF-CR] IMPALA-3430: Runtime filter : Extend runtime filter to support Min/Max values for HDFS scans

Posted by "Qifan Chen (Code Review)" <ge...@cloudera.org>.
Qifan Chen has uploaded a new patch set (#12). ( http://gerrit.cloudera.org:8080/17706 )

Change subject: IMPALA-3430: Runtime filter : Extend runtime filter to support Min/Max values for HDFS scans
......................................................................

IMPALA-3430: Runtime filter : Extend runtime filter to support Min/Max values for HDFS scans

This patches enables min/max filtering for non-correlated subqueries
that return one row. In this case, the filters are built from the
results of the subqueries and the filtering target is the scan node to
be qualified by one of the subqueries. Shown below is one such query
that normally gets compiled into a nested loop join. The filtering
limits the values from column store_sales.ss_sales_price to be within
[-infinite, min(ss_wholesale_cost)].

  select count(*) from store_sales
    where ss_sales_price <=
      (select min(ss_wholesale_cost) from store_sales);

In the patch, the min/max filtering infrastructure is integrated with
the nested loop join as follows.

 1. Similar to hash join, NljBuilderConfig is populated with filter
    descriptors from nested join plan node via
    NljBuilder::CreateEmbeddedBuilder();
 2. NljBuilder is populated with filter contexts utilizing the filter
    descriptors in NljBuilderConfig. Filter contexts are the interface
    to actual min/max filters;
 3. New insertion methods InertFor<op>(), where <op> is LE, LT, GE and
    GT, added to MinMaxFilter class hierarcy. They are used for join
    predicate target <op> src_expr;
 4. RuntimeContext::InsertPerCompareOp() calls one of the new
    insertion methods above based on the comparison op saved in the
    filter descriptor;
 5. NljBuilder::InsertRuntimeFilters() calls the new methods.

Change-Id: I7c2bb5baad622051d1002c9c162c672d428e5446
---
M be/src/exec/filter-context.cc
M be/src/exec/filter-context.h
M be/src/exec/nested-loop-join-builder.cc
M be/src/exec/nested-loop-join-builder.h
M be/src/exec/nested-loop-join-node.cc
M be/src/runtime/coordinator.cc
M be/src/runtime/query-state.h
M be/src/runtime/runtime-filter.h
M be/src/runtime/timestamp-value.h
M be/src/util/min-max-filter-ir.cc
M be/src/util/min-max-filter.cc
M be/src/util/min-max-filter.h
M common/thrift/PlanNodes.thrift
M fe/src/main/java/org/apache/impala/analysis/BinaryPredicate.java
M fe/src/main/java/org/apache/impala/analysis/Predicate.java
M fe/src/main/java/org/apache/impala/analysis/SlotRef.java
M fe/src/main/java/org/apache/impala/analysis/TupleDescriptor.java
M fe/src/main/java/org/apache/impala/planner/AggregationNode.java
M fe/src/main/java/org/apache/impala/planner/JoinNode.java
M fe/src/main/java/org/apache/impala/planner/NestedLoopJoinNode.java
M fe/src/main/java/org/apache/impala/planner/PlanNode.java
M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java
M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
23 files changed, 891 insertions(+), 45 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/06/17706/12
-- 
To view, visit http://gerrit.cloudera.org:8080/17706
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I7c2bb5baad622051d1002c9c162c672d428e5446
Gerrit-Change-Number: 17706
Gerrit-PatchSet: 12
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>

[Impala-ASF-CR] [WIP] IMPALA-3430: Runtime filter : Extend runtime filter to support Min/Max values for HDFS scans

Posted by "Qifan Chen (Code Review)" <ge...@cloudera.org>.
Qifan Chen has uploaded a new patch set (#4). ( http://gerrit.cloudera.org:8080/17706 )

Change subject: [WIP] IMPALA-3430: Runtime filter : Extend runtime filter to support Min/Max values for HDFS scans
......................................................................

[WIP] IMPALA-3430: Runtime filter : Extend runtime filter to support Min/Max values for HDFS scans

This patches enables min/max filtering for non-correlated subqueries
that return one row. In this case, the filters are built from the
results of the subqueries and the filtering target is the scan node to
be qualified by one of the subqueries. Shown below is one such query
that normally gets compiled into a nested loop join. The filtering
limits the values from column store_sales.ss_sales_price within the
range [-infinite, avg(ss_wholesale_cost)].

 select count(*) from store_sales
 where ss_sales_price <= (select avg(ss_wholesale_cost) from store_sales);

Change-Id: I7c2bb5baad622051d1002c9c162c672d428e5446
---
M be/src/exec/nested-loop-join-builder.cc
M be/src/exec/nested-loop-join-builder.h
M be/src/exec/nested-loop-join-node.cc
M fe/src/main/java/org/apache/impala/analysis/BinaryPredicate.java
M fe/src/main/java/org/apache/impala/analysis/Predicate.java
M fe/src/main/java/org/apache/impala/analysis/SlotRef.java
M fe/src/main/java/org/apache/impala/analysis/TupleDescriptor.java
M fe/src/main/java/org/apache/impala/planner/AggregationNode.java
M fe/src/main/java/org/apache/impala/planner/JoinNode.java
M fe/src/main/java/org/apache/impala/planner/PlanNode.java
M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java
M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
12 files changed, 307 insertions(+), 13 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/06/17706/4
-- 
To view, visit http://gerrit.cloudera.org:8080/17706
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I7c2bb5baad622051d1002c9c162c672d428e5446
Gerrit-Change-Number: 17706
Gerrit-PatchSet: 4
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>

[Impala-ASF-CR] [WIP] IMPALA-3430: Runtime filter : Extend runtime filter to support Min/Max values for HDFS scans

Posted by "Qifan Chen (Code Review)" <ge...@cloudera.org>.
Qifan Chen has uploaded a new patch set (#5). ( http://gerrit.cloudera.org:8080/17706 )

Change subject: [WIP] IMPALA-3430: Runtime filter : Extend runtime filter to support Min/Max values for HDFS scans
......................................................................

[WIP] IMPALA-3430: Runtime filter : Extend runtime filter to support Min/Max values for HDFS scans

This patches enables min/max filtering for non-correlated subqueries
that return one row. In this case, the filters are built from the
results of the subqueries and the filtering target is the scan node to
be qualified by one of the subqueries. Shown below is one such query
that normally gets compiled into a nested loop join. The filtering
limits the values from column store_sales.ss_sales_price within the
range [-infinite, avg(ss_wholesale_cost)].

  select count(*) from store_sales
  where ss_sales_price <= (select min(ss_wholesale_cost) from store_sales);

Change-Id: I7c2bb5baad622051d1002c9c162c672d428e5446
---
M be/src/exec/nested-loop-join-builder.cc
M be/src/exec/nested-loop-join-builder.h
M be/src/exec/nested-loop-join-node.cc
M be/src/runtime/coordinator.cc
M fe/src/main/java/org/apache/impala/analysis/BinaryPredicate.java
M fe/src/main/java/org/apache/impala/analysis/Predicate.java
M fe/src/main/java/org/apache/impala/analysis/SlotRef.java
M fe/src/main/java/org/apache/impala/analysis/TupleDescriptor.java
M fe/src/main/java/org/apache/impala/planner/AggregationNode.java
M fe/src/main/java/org/apache/impala/planner/JoinNode.java
M fe/src/main/java/org/apache/impala/planner/NestedLoopJoinNode.java
M fe/src/main/java/org/apache/impala/planner/PlanNode.java
M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java
M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
14 files changed, 315 insertions(+), 15 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/06/17706/5
-- 
To view, visit http://gerrit.cloudera.org:8080/17706
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I7c2bb5baad622051d1002c9c162c672d428e5446
Gerrit-Change-Number: 17706
Gerrit-PatchSet: 5
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>

[Impala-ASF-CR] IMPALA-3430: Runtime filter : Extend runtime filter to support Min/Max values for HDFS scans

Posted by "Qifan Chen (Code Review)" <ge...@cloudera.org>.
Qifan Chen has uploaded a new patch set (#14). ( http://gerrit.cloudera.org:8080/17706 )

Change subject: IMPALA-3430: Runtime filter : Extend runtime filter to support Min/Max values for HDFS scans
......................................................................

IMPALA-3430: Runtime filter : Extend runtime filter to support Min/Max values for HDFS scans

This patches enables min/max filtering for non-correlated subqueries
that return one row. In this case, the filters are built from the
results of the subqueries and the filtering target is the scan node to
be qualified by one of the subqueries. Shown below is one such query
that normally gets compiled into a nested loop join. The filtering
limits the values from column store_sales.ss_sales_price to be within
[-infinite, min(ss_wholesale_cost)].

  select count(*) from store_sales
    where ss_sales_price <=
      (select min(ss_wholesale_cost) from store_sales);

In the patch, the min/max filtering infrastructure is integrated with
the nested loop join as follows.

 1. Similar to hash join, NljBuilderConfig is populated with filter
    descriptors from nested join plan node via
    NljBuilder::CreateEmbeddedBuilder();
 2. NljBuilder is populated with filter contexts utilizing the filter
    descriptors in NljBuilderConfig. Filter contexts are the interface
    to actual min/max filters;
 3. New insertion methods InertFor<op>(), where <op> is LE, LT, GE and
    GT, added to MinMaxFilter class hierarcy. They are used for join
    predicate target <op> src_expr;
 4. RuntimeContext::InsertPerCompareOp() calls one of the new
    insertion methods above based on the comparison op saved in the
    filter descriptor;
 5. NljBuilder::InsertRuntimeFilters() calls the new methods.

Testing:
 1. Add single range insertion tests in min-max-filter-test.cc;

Change-Id: I7c2bb5baad622051d1002c9c162c672d428e5446
---
M be/src/exec/filter-context.cc
M be/src/exec/filter-context.h
M be/src/exec/nested-loop-join-builder.cc
M be/src/exec/nested-loop-join-builder.h
M be/src/exec/nested-loop-join-node.cc
M be/src/runtime/coordinator.cc
M be/src/runtime/query-state.h
M be/src/runtime/runtime-filter.h
M be/src/runtime/timestamp-value.h
M be/src/util/min-max-filter-ir.cc
M be/src/util/min-max-filter-test.cc
M be/src/util/min-max-filter.cc
M be/src/util/min-max-filter.h
M common/thrift/PlanNodes.thrift
M fe/src/main/java/org/apache/impala/analysis/BinaryPredicate.java
M fe/src/main/java/org/apache/impala/analysis/Predicate.java
M fe/src/main/java/org/apache/impala/analysis/SlotRef.java
M fe/src/main/java/org/apache/impala/analysis/TupleDescriptor.java
M fe/src/main/java/org/apache/impala/planner/AggregationNode.java
M fe/src/main/java/org/apache/impala/planner/JoinNode.java
M fe/src/main/java/org/apache/impala/planner/NestedLoopJoinNode.java
M fe/src/main/java/org/apache/impala/planner/PlanNode.java
M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java
M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
24 files changed, 1,064 insertions(+), 50 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/06/17706/14
-- 
To view, visit http://gerrit.cloudera.org:8080/17706
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I7c2bb5baad622051d1002c9c162c672d428e5446
Gerrit-Change-Number: 17706
Gerrit-PatchSet: 14
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>

[Impala-ASF-CR] IMPALA-3430: Runtime filter : Extend runtime filter to support Min/Max values for HDFS scans

Posted by "Zoltan Borok-Nagy (Code Review)" <ge...@cloudera.org>.
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/17706 )

Change subject: IMPALA-3430: Runtime filter : Extend runtime filter to support Min/Max values for HDFS scans
......................................................................


Patch Set 25: Code-Review+1

(8 comments)

I ran over the code one more time before I go on vacation.

I had some comments, but the code looks great overall. Feel free to carry my +1 once those comments are resolved.

http://gerrit.cloudera.org:8080/#/c/17706/25//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/17706/25//COMMIT_MSG@42
PS25, Line 42: only for sorted or partitioned
             : join columns.
I think it would make sense to at least turn this on at the file/row group/page. Probably the overhead of building the filter is negligible in this case.


http://gerrit.cloudera.org:8080/#/c/17706/25/be/src/exec/nested-loop-join-builder.h
File be/src/exec/nested-loop-join-builder.h:

http://gerrit.cloudera.org:8080/#/c/17706/25/be/src/exec/nested-loop-join-builder.h@131
PS25, Line 131:   /// This is replaced at runtime with code generated by CodegenInsertRuntimeFilters().
Is this true?

PartitionedHashJoinBuilder has some mechanism for codegen, e.g.:
https://github.com/apache/impala/blob/5a9dcd108d8a1c6f3ea0062d8de750b6e41fb635/be/src/exec/partitioned-hash-join-builder.cc#L1406

But I don't see that for NestedLoopJoinBuilder. Probably we should remove this, or add TODO with a new Jira ticket?

But since we insert only one value from the scalar subquery it's probably not a big deal.


http://gerrit.cloudera.org:8080/#/c/17706/25/be/src/exec/nested-loop-join-builder.h@149
PS25, Line 149: initFilterContexts
nit: InitFilterContexts (Upper Camel Case)


http://gerrit.cloudera.org:8080/#/c/17706/25/be/src/exec/nested-loop-join-builder.cc
File be/src/exec/nested-loop-join-builder.cc:

http://gerrit.cloudera.org:8080/#/c/17706/25/be/src/exec/nested-loop-join-builder.cc@191
PS25, Line 191:     PublishRuntimeFilters(
Don't we want to publish the runtime filters for separate builds as well?


http://gerrit.cloudera.org:8080/#/c/17706/25/be/src/util/min-max-filter-ir.cc
File be/src/util/min-max-filter-ir.cc:

http://gerrit.cloudera.org:8080/#/c/17706/25/be/src/util/min-max-filter-ir.cc@130
PS25, Line 130:   CopyToBuffer(&min_buffer_, &min_, min_.len);
Do we always need to invoke CopyToBuffer() for the min_ value?


http://gerrit.cloudera.org:8080/#/c/17706/25/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java
File fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java:

http://gerrit.cloudera.org:8080/#/c/17706/25/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java@410
PS25, Line 410: miax
max()


http://gerrit.cloudera.org:8080/#/c/17706/25/testdata/workloads/functional-query/queries/QueryTest/overlap_min_max_filters.test
File testdata/workloads/functional-query/queries/QueryTest/overlap_min_max_filters.test:

http://gerrit.cloudera.org:8080/#/c/17706/25/testdata/workloads/functional-query/queries/QueryTest/overlap_min_max_filters.test@409
PS25, Line 409:  < 
Could you please also add a test with '='?


http://gerrit.cloudera.org:8080/#/c/17706/25/testdata/workloads/functional-query/queries/QueryTest/overlap_min_max_filters.test@429
PS25, Line 429: # String data type is not supported as it is impossible to represent the min() and max()
              : # value for the data type.
This seems to be supported now.



-- 
To view, visit http://gerrit.cloudera.org:8080/17706
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7c2bb5baad622051d1002c9c162c672d428e5446
Gerrit-Change-Number: 17706
Gerrit-PatchSet: 25
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Amogh Margoor <am...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Wed, 11 Aug 2021 15:36:48 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] [WIP] IMPALA-3430: Runtime filter : Extend runtime filter to support Min/Max values for HDFS scans

Posted by "Qifan Chen (Code Review)" <ge...@cloudera.org>.
Qifan Chen has uploaded a new patch set (#3). ( http://gerrit.cloudera.org:8080/17706 )

Change subject: [WIP] IMPALA-3430: Runtime filter : Extend runtime filter to support Min/Max values for HDFS scans
......................................................................

[WIP] IMPALA-3430: Runtime filter : Extend runtime filter to support Min/Max values for HDFS scans

This patches enables min/max filtering in which the filters are built
from non-correlated subqueries that return one row and the filtering
target is the scan node to be qualified by one of the subqueries.
Shown below is one such query that normally gets compiled into a nested
loop join.

 select count(*) from store_sales
 where ss_sales_price < (select avg(ss_wholesale_cost) from store_sales);

Change-Id: I7c2bb5baad622051d1002c9c162c672d428e5446
---
M be/src/exec/nested-loop-join-builder.cc
M be/src/exec/nested-loop-join-builder.h
M be/src/exec/nested-loop-join-node.cc
M fe/src/main/java/org/apache/impala/analysis/BinaryPredicate.java
M fe/src/main/java/org/apache/impala/analysis/Predicate.java
M fe/src/main/java/org/apache/impala/planner/AggregationNode.java
M fe/src/main/java/org/apache/impala/planner/JoinNode.java
M fe/src/main/java/org/apache/impala/planner/PlanNode.java
M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java
9 files changed, 225 insertions(+), 10 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/06/17706/3
-- 
To view, visit http://gerrit.cloudera.org:8080/17706
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I7c2bb5baad622051d1002c9c162c672d428e5446
Gerrit-Change-Number: 17706
Gerrit-PatchSet: 3
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>

[Impala-ASF-CR] IMPALA-3430: Runtime filter : Extend runtime filter to support Min/Max values for HDFS scans

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17706 )

Change subject: IMPALA-3430: Runtime filter : Extend runtime filter to support Min/Max values for HDFS scans
......................................................................


Patch Set 8:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/9190/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/17706
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7c2bb5baad622051d1002c9c162c672d428e5446
Gerrit-Change-Number: 17706
Gerrit-PatchSet: 8
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Comment-Date: Tue, 27 Jul 2021 22:00:45 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-3430: Runtime filter : Extend runtime filter to support Min/Max values for HDFS scans

Posted by "Amogh Margoor (Code Review)" <ge...@cloudera.org>.
Amogh Margoor has posted comments on this change. ( http://gerrit.cloudera.org:8080/17706 )

Change subject: IMPALA-3430: Runtime filter : Extend runtime filter to support Min/Max values for HDFS scans
......................................................................


Patch Set 27:

(6 comments)

Change looks good. I just reviewed the NLJ support for Min/Max Filter and left few nits and questions. Will take a look into FE in second round tomorrow.

http://gerrit.cloudera.org:8080/#/c/17706/27/be/src/exec/filter-context.cc
File be/src/exec/filter-context.cc:

http://gerrit.cloudera.org:8080/#/c/17706/27/be/src/exec/filter-context.cc@97
PS27, Line 97: void FilterContext::InsertPerCompareOp(TupleRow* row) const noexcept {
Will this be executed for ever row on the build side of join ? If yes, I am worried about the switch statement. Do we remove it via codegen ?


http://gerrit.cloudera.org:8080/#/c/17706/27/be/src/exec/join-builder.cc
File be/src/exec/join-builder.cc:

http://gerrit.cloudera.org:8080/#/c/17706/27/be/src/exec/join-builder.cc@148
PS27, Line 148:               << ", fillter details=" << ctx.local_min_max_filter->DebugString()
nit: filter instead of fillter


http://gerrit.cloudera.org:8080/#/c/17706/27/be/src/exec/nested-loop-join-builder.cc
File be/src/exec/nested-loop-join-builder.cc:

http://gerrit.cloudera.org:8080/#/c/17706/27/be/src/exec/nested-loop-join-builder.cc@60
PS27, Line 60:     const TPlanFragmentInstanceCtx& instance_ctx = state->instance_ctx();
nit: instance_ctx and filters_produced can be read only once, before the loop.


http://gerrit.cloudera.org:8080/#/c/17706/27/be/src/exec/nested-loop-join-builder.cc@205
PS27, Line 205:   copied_build_batches_.Reset();
Do we need to invoke MinMaxFilter::Close() for allocated filters ?


http://gerrit.cloudera.org:8080/#/c/17706/27/be/src/runtime/string-value.h
File be/src/runtime/string-value.h:

http://gerrit.cloudera.org:8080/#/c/17706/27/be/src/runtime/string-value.h@131
PS27, Line 131:   std::string LeastSmallerString(int max_lken) const;
nit: 'max_len'


http://gerrit.cloudera.org:8080/#/c/17706/27/be/src/runtime/string-value.h@136
PS27, Line 136:   std::string LeastLargerString(int max_len) const;
These two functions here seems like utility function and would be better placed in util/string-util.h



-- 
To view, visit http://gerrit.cloudera.org:8080/17706
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7c2bb5baad622051d1002c9c162c672d428e5446
Gerrit-Change-Number: 17706
Gerrit-PatchSet: 27
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Amogh Margoor <am...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Wed, 11 Aug 2021 22:16:03 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] [WIP] IMPALA-3430: Runtime filter : Extend runtime filter to support Min/Max values for HDFS scans

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17706 )

Change subject: [WIP] IMPALA-3430: Runtime filter : Extend runtime filter to support Min/Max values for HDFS scans
......................................................................


Patch Set 6:

Build Failed 

https://jenkins.impala.io/job/gerrit-code-review-checks/9177/ : Initial code review checks failed. See linked job for details on the failure.


-- 
To view, visit http://gerrit.cloudera.org:8080/17706
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7c2bb5baad622051d1002c9c162c672d428e5446
Gerrit-Change-Number: 17706
Gerrit-PatchSet: 6
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Comment-Date: Mon, 26 Jul 2021 21:38:46 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-3430: Runtime filter : Extend runtime filter to support Min/Max values for HDFS scans

Posted by "Amogh Margoor (Code Review)" <ge...@cloudera.org>.
Amogh Margoor has posted comments on this change. ( http://gerrit.cloudera.org:8080/17706 )

Change subject: IMPALA-3430: Runtime filter : Extend runtime filter to support Min/Max values for HDFS scans
......................................................................


Patch Set 29:

(4 comments)

Looked at the FE changes and changes look quite good. Only major comment I have added are for few more test scenarios to be included. Other comment which can be addressed in separate patch is extension of this to handle (its good to have JIRAs for them): 
1. non-aggregate, non-correlated scalar subqueries.
2. Runtime filters for nested loop join in general, not just for NC scalar subqueries.

http://gerrit.cloudera.org:8080/#/c/17706/29/fe/src/main/java/org/apache/impala/analysis/BinaryPredicate.java
File fe/src/main/java/org/apache/impala/analysis/BinaryPredicate.java:

http://gerrit.cloudera.org:8080/#/c/17706/29/fe/src/main/java/org/apache/impala/analysis/BinaryPredicate.java@84
PS29, Line 84:     public boolean isSingleRange() {
I think isRelationalOperator() (from PL terminology) or isComparisionOperator() is a better name. Predicates are evaluated to boolean, so they specifying range can be confusing.


http://gerrit.cloudera.org:8080/#/c/17706/29/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java
File fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java:

http://gerrit.cloudera.org:8080/#/c/17706/29/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java@392
PS29, Line 392:             if (!(child1 instanceof AggregationNode)
Any reason we are limiting this to just Aggregate ? It appears we can extend it to even non-aggregate scalar subqueries.


http://gerrit.cloudera.org:8080/#/c/17706/29/testdata/workloads/functional-query/queries/QueryTest/overlap_min_max_filters.test
File testdata/workloads/functional-query/queries/QueryTest/overlap_min_max_filters.test:

http://gerrit.cloudera.org:8080/#/c/17706/29/testdata/workloads/functional-query/queries/QueryTest/overlap_min_max_filters.test@334
PS29, Line 334: ---- QUERY
Can we add test for more than 1 NC scalar subqueries in a query ?


http://gerrit.cloudera.org:8080/#/c/17706/29/testdata/workloads/functional-query/queries/QueryTest/overlap_min_max_filters.test@437
PS29, Line 437: # Negative tests to check out the explain output involving a non-correlated one-row
Can we add negative test for:
1. non-aggregate, non-correlated scalar subquery
2. correlated scalar subquery



-- 
To view, visit http://gerrit.cloudera.org:8080/17706
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7c2bb5baad622051d1002c9c162c672d428e5446
Gerrit-Change-Number: 17706
Gerrit-PatchSet: 29
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Amogh Margoor <am...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Mon, 16 Aug 2021 17:01:25 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-3430: Runtime filter : Extend runtime filter to support Min/Max values for HDFS scans

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17706 )

Change subject: IMPALA-3430: Runtime filter : Extend runtime filter to support Min/Max values for HDFS scans
......................................................................


Patch Set 16:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/9230/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/17706
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7c2bb5baad622051d1002c9c162c672d428e5446
Gerrit-Change-Number: 17706
Gerrit-PatchSet: 16
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Comment-Date: Tue, 03 Aug 2021 17:43:07 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-3430: Runtime filter : Extend runtime filter to support Min/Max values for HDFS scans

Posted by "Qifan Chen (Code Review)" <ge...@cloudera.org>.
Qifan Chen has posted comments on this change. ( http://gerrit.cloudera.org:8080/17706 )

Change subject: IMPALA-3430: Runtime filter : Extend runtime filter to support Min/Max values for HDFS scans
......................................................................


Patch Set 8:

Add comments.


-- 
To view, visit http://gerrit.cloudera.org:8080/17706
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7c2bb5baad622051d1002c9c162c672d428e5446
Gerrit-Change-Number: 17706
Gerrit-PatchSet: 8
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Comment-Date: Tue, 27 Jul 2021 21:38:56 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] [WIP] IMPALA-3430: Runtime filter : Extend runtime filter to support Min/Max values for HDFS scans

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17706 )

Change subject: [WIP] IMPALA-3430: Runtime filter : Extend runtime filter to support Min/Max values for HDFS scans
......................................................................


Patch Set 3:

Build Failed 

https://jenkins.impala.io/job/gerrit-code-review-checks/9137/ : Initial code review checks failed. See linked job for details on the failure.


-- 
To view, visit http://gerrit.cloudera.org:8080/17706
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7c2bb5baad622051d1002c9c162c672d428e5446
Gerrit-Change-Number: 17706
Gerrit-PatchSet: 3
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Comment-Date: Wed, 21 Jul 2021 21:08:42 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] [WIP] IMPALA-3430: Runtime filter : Extend runtime filter to support Min/Max values for HDFS scans

Posted by "Qifan Chen (Code Review)" <ge...@cloudera.org>.
Qifan Chen has posted comments on this change. ( http://gerrit.cloudera.org:8080/17706 )

Change subject: [WIP] IMPALA-3430: Runtime filter : Extend runtime filter to support Min/Max values for HDFS scans
......................................................................


Patch Set 7:

Addressed a few issues regarding the setup of filter contexts and filter expression evaluators in BE that prevents the filters from reach the scan node.


-- 
To view, visit http://gerrit.cloudera.org:8080/17706
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7c2bb5baad622051d1002c9c162c672d428e5446
Gerrit-Change-Number: 17706
Gerrit-PatchSet: 7
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Comment-Date: Tue, 27 Jul 2021 19:17:29 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-3430: Runtime filter : Extend runtime filter to support Min/Max values for HDFS scans

Posted by "Qifan Chen (Code Review)" <ge...@cloudera.org>.
Qifan Chen has uploaded a new patch set (#25). ( http://gerrit.cloudera.org:8080/17706 )

Change subject: IMPALA-3430: Runtime filter : Extend runtime filter to support Min/Max values for HDFS scans
......................................................................

IMPALA-3430: Runtime filter : Extend runtime filter to support Min/Max values for HDFS scans

This patch enables min/max filtering for non-correlated subqueries
that return one value. In this case, the filters are built from the
results of the subqueries and the filtering target is the scan node to
be qualified by one of the subqueries. Shown below is one such query
that normally gets compiled into a nested loop join. The filtering
limits the values from column store_sales.ss_sales_price to be within
[-infinite, min(ss_wholesale_cost)].

  select count(*) from store_sales
    where ss_sales_price <=
      (select min(ss_wholesale_cost) from store_sales);

In FE, the fact that the above scalar subquery exists is recorded
in a flag in InlineViewRef in analyzer and later on transferred to
AggregationNode in planner.

In BE, the min/max filtering infrastructure is integrated with the
nested loop join as follows.

 1. Similar to hash join, NljBuilderConfig is populated with filter
    descriptors from nested join plan node via
    NljBuilder::CreateEmbeddedBuilder();
 2. NljBuilder is populated with filter contexts utilizing the filter
    descriptors in NljBuilderConfig. Filter contexts are the interface
    to actual min/max filters;
 3. New insertion methods InsertFor<op>(), where <op> is LE, LT, GE and
    GT, are added to MinMaxFilter class hierarcy. They are used for
    join predicate target <op> src_expr;
 4. RuntimeContext::InsertPerCompareOp() calls one of the new
    insertion methods above based on the comparison op saved in the
    filter descriptor;
 5. NljBuilder::InsertRuntimeFilters() calls the new methods.

By default, the feature is turned on only for sorted or partitioned
join columns.

Testing:
 1. Add single range insertion tests in min-max-filter-test.cc;
 2. Add positive and negative plan tests in
    overlap_min_max_filters.test;
 3. Add tests in overlap_min_max_filters_on_partition_columns.test;
 4. Add tests in overlap_min_max_filters_on_sorted_columns.test;
 5. Run core tests.

Change-Id: I7c2bb5baad622051d1002c9c162c672d428e5446
---
M be/src/exec/data-sink.h
M be/src/exec/filter-context.cc
M be/src/exec/filter-context.h
M be/src/exec/join-builder.cc
M be/src/exec/join-builder.h
M be/src/exec/nested-loop-join-builder.cc
M be/src/exec/nested-loop-join-builder.h
M be/src/exec/nested-loop-join-node.cc
M be/src/exec/partitioned-hash-join-builder.cc
M be/src/runtime/coordinator.cc
M be/src/runtime/query-state.h
M be/src/runtime/runtime-filter.h
M be/src/runtime/string-value.cc
M be/src/runtime/string-value.h
M be/src/runtime/timestamp-value.h
M be/src/util/min-max-filter-ir.cc
M be/src/util/min-max-filter-test.cc
M be/src/util/min-max-filter.cc
M be/src/util/min-max-filter.h
M common/thrift/PlanNodes.thrift
M fe/src/main/java/org/apache/impala/analysis/BinaryPredicate.java
M fe/src/main/java/org/apache/impala/analysis/InlineViewRef.java
M fe/src/main/java/org/apache/impala/analysis/Predicate.java
M fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java
M fe/src/main/java/org/apache/impala/catalog/ScalarType.java
M fe/src/main/java/org/apache/impala/planner/AggregationNode.java
M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
M fe/src/main/java/org/apache/impala/planner/JoinNode.java
M fe/src/main/java/org/apache/impala/planner/NestedLoopJoinNode.java
M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java
M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
M testdata/workloads/functional-query/queries/QueryTest/overlap_min_max_filters.test
M testdata/workloads/functional-query/queries/QueryTest/overlap_min_max_filters_on_partition_columns.test
M testdata/workloads/functional-query/queries/QueryTest/overlap_min_max_filters_on_sorted_columns.test
34 files changed, 1,434 insertions(+), 124 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/06/17706/25
-- 
To view, visit http://gerrit.cloudera.org:8080/17706
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I7c2bb5baad622051d1002c9c162c672d428e5446
Gerrit-Change-Number: 17706
Gerrit-PatchSet: 25
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Amogh Margoor <am...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>

[Impala-ASF-CR] IMPALA-3430: Runtime filter : Extend runtime filter to support Min/Max values for HDFS scans

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17706 )

Change subject: IMPALA-3430: Runtime filter : Extend runtime filter to support Min/Max values for HDFS scans
......................................................................


Patch Set 33:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/7410/ DRY_RUN=false


-- 
To view, visit http://gerrit.cloudera.org:8080/17706
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7c2bb5baad622051d1002c9c162c672d428e5446
Gerrit-Change-Number: 17706
Gerrit-PatchSet: 33
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Amogh Margoor <am...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Sat, 21 Aug 2021 08:32:10 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-3430: Runtime filter : Extend runtime filter to support Min/Max values for HDFS scans

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17706 )

Change subject: IMPALA-3430: Runtime filter : Extend runtime filter to support Min/Max values for HDFS scans
......................................................................


Patch Set 26:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/9282/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/17706
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7c2bb5baad622051d1002c9c162c672d428e5446
Gerrit-Change-Number: 17706
Gerrit-PatchSet: 26
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Amogh Margoor <am...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Wed, 11 Aug 2021 17:48:53 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-3430: Runtime filter : Extend runtime filter to support Min/Max values for HDFS scans

Posted by "Qifan Chen (Code Review)" <ge...@cloudera.org>.
Qifan Chen has posted comments on this change. ( http://gerrit.cloudera.org:8080/17706 )

Change subject: IMPALA-3430: Runtime filter : Extend runtime filter to support Min/Max values for HDFS scans
......................................................................


Patch Set 25:

(1 comment)

Zoltan: 

Completed the rework for all 8/6/2021 comments. Please let me know if you are able to see the replies.

http://gerrit.cloudera.org:8080/#/c/17706/20/be/src/util/min-max-filter-ir.cc
File be/src/util/min-max-filter-ir.cc:

http://gerrit.cloudera.org:8080/#/c/17706/20/be/src/util/min-max-filter-ir.cc@115
PS20, Line 115:   if (UNLIKELY(always_false_)) {
> Maybe we could use TruncateUp/TruncateDown:
Implement the logic for string. Turns out it is a little bit complicated to deal with +1 and -1 logic. 

Please review.



-- 
To view, visit http://gerrit.cloudera.org:8080/17706
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7c2bb5baad622051d1002c9c162c672d428e5446
Gerrit-Change-Number: 17706
Gerrit-PatchSet: 25
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Amogh Margoor <am...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Tue, 10 Aug 2021 23:59:02 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-3430: Runtime filter : Extend runtime filter to support Min/Max values for HDFS scans

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17706 )

Change subject: IMPALA-3430: Runtime filter : Extend runtime filter to support Min/Max values for HDFS scans
......................................................................


Patch Set 12:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/9208/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/17706
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7c2bb5baad622051d1002c9c162c672d428e5446
Gerrit-Change-Number: 17706
Gerrit-PatchSet: 12
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Comment-Date: Thu, 29 Jul 2021 16:55:46 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-3430: Runtime filter : Extend runtime filter to support Min/Max values for HDFS scans

Posted by "Qifan Chen (Code Review)" <ge...@cloudera.org>.
Qifan Chen has uploaded a new patch set (#18). ( http://gerrit.cloudera.org:8080/17706 )

Change subject: IMPALA-3430: Runtime filter : Extend runtime filter to support Min/Max values for HDFS scans
......................................................................

IMPALA-3430: Runtime filter : Extend runtime filter to support Min/Max values for HDFS scans

This patches enables min/max filtering for non-correlated subqueries
that return one row. In this case, the filters are built from the
results of the subqueries and the filtering target is the scan node to
be qualified by one of the subqueries. Shown below is one such query
that normally gets compiled into a nested loop join. The filtering
limits the values from column store_sales.ss_sales_price to be within
[-infinite, min(ss_wholesale_cost)].

  select count(*) from store_sales
    where ss_sales_price <=
      (select min(ss_wholesale_cost) from store_sales);

In the patch, the min/max filtering infrastructure is integrated with
the nested loop join as follows.

 1. Similar to hash join, NljBuilderConfig is populated with filter
    descriptors from nested join plan node via
    NljBuilder::CreateEmbeddedBuilder();
 2. NljBuilder is populated with filter contexts utilizing the filter
    descriptors in NljBuilderConfig. Filter contexts are the interface
    to actual min/max filters;
 3. New insertion methods InertFor<op>(), where <op> is LE, LT, GE and
    GT, are added to MinMaxFilter class hierarcy. They are used for
    join predicate target <op> src_expr;
 4. RuntimeContext::InsertPerCompareOp() calls one of the new
    insertion methods above based on the comparison op saved in the
    filter descriptor;
 5. NljBuilder::InsertRuntimeFilters() calls the new methods.

By default, the feature is turned on only for sorted or partitioned
join columns.

Testing:
 1. Add single range insertion tests in min-max-filter-test.cc;
 2. Add positive and negative plan tests in
    overlap_min_max_filters.test;
 3. Add tests in overlap_min_max_filters_on_partition_columns.test;
 4. Add tests in overlap_min_max_filters_on_sorted_columns.test;
 5. Run core tests.

Change-Id: I7c2bb5baad622051d1002c9c162c672d428e5446
---
M be/src/exec/filter-context.cc
M be/src/exec/filter-context.h
M be/src/exec/nested-loop-join-builder.cc
M be/src/exec/nested-loop-join-builder.h
M be/src/exec/nested-loop-join-node.cc
M be/src/runtime/coordinator.cc
M be/src/runtime/query-state.h
M be/src/runtime/runtime-filter.h
M be/src/runtime/timestamp-value.h
M be/src/util/min-max-filter-ir.cc
M be/src/util/min-max-filter-test.cc
M be/src/util/min-max-filter.cc
M be/src/util/min-max-filter.h
M common/thrift/PlanNodes.thrift
M fe/src/main/java/org/apache/impala/analysis/BinaryPredicate.java
M fe/src/main/java/org/apache/impala/analysis/Predicate.java
M fe/src/main/java/org/apache/impala/analysis/SlotRef.java
M fe/src/main/java/org/apache/impala/analysis/TupleDescriptor.java
M fe/src/main/java/org/apache/impala/catalog/ScalarType.java
M fe/src/main/java/org/apache/impala/planner/AggregationNode.java
M fe/src/main/java/org/apache/impala/planner/JoinNode.java
M fe/src/main/java/org/apache/impala/planner/NestedLoopJoinNode.java
M fe/src/main/java/org/apache/impala/planner/PlanNode.java
M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java
M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
M testdata/workloads/functional-query/queries/QueryTest/overlap_min_max_filters.test
M testdata/workloads/functional-query/queries/QueryTest/overlap_min_max_filters_on_partition_columns.test
M testdata/workloads/functional-query/queries/QueryTest/overlap_min_max_filters_on_sorted_columns.test
28 files changed, 1,222 insertions(+), 49 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/06/17706/18
-- 
To view, visit http://gerrit.cloudera.org:8080/17706
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I7c2bb5baad622051d1002c9c162c672d428e5446
Gerrit-Change-Number: 17706
Gerrit-PatchSet: 18
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Amogh Margoor <am...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>

[Impala-ASF-CR] IMPALA-3430: Runtime filter : Extend runtime filter to support Min/Max values for HDFS scans

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17706 )

Change subject: IMPALA-3430: Runtime filter : Extend runtime filter to support Min/Max values for HDFS scans
......................................................................


Patch Set 19:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/9239/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/17706
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7c2bb5baad622051d1002c9c162c672d428e5446
Gerrit-Change-Number: 17706
Gerrit-PatchSet: 19
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Amogh Margoor <am...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Wed, 04 Aug 2021 21:06:46 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-3430: Runtime filter : Extend runtime filter to support Min/Max values for HDFS scans

Posted by "Qifan Chen (Code Review)" <ge...@cloudera.org>.
Qifan Chen has uploaded a new patch set (#27). ( http://gerrit.cloudera.org:8080/17706 )

Change subject: IMPALA-3430: Runtime filter : Extend runtime filter to support Min/Max values for HDFS scans
......................................................................

IMPALA-3430: Runtime filter : Extend runtime filter to support Min/Max values for HDFS scans

This patch enables min/max filtering for non-correlated subqueries
that return one value. In this case, the filters are built from the
results of the subqueries and the filtering target is the scan node to
be qualified by one of the subqueries. Shown below is one such query
that normally gets compiled into a nested loop join. The filtering
limits the values from column store_sales.ss_sales_price to be within
[-infinite, min(ss_wholesale_cost)].

  select count(*) from store_sales
    where ss_sales_price <=
      (select min(ss_wholesale_cost) from store_sales);

In FE, the fact that the above scalar subquery exists is recorded
in a flag in InlineViewRef in analyzer and later on transferred to
AggregationNode in planner.

In BE, the min/max filtering infrastructure is integrated with the
nested loop join as follows.

 1. Similar to hash join, NljBuilderConfig is populated with filter
    descriptors from nested join plan node via
    NljBuilder::CreateEmbeddedBuilder();
 2. NljBuilder is populated with filter contexts utilizing the filter
    descriptors in NljBuilderConfig. Filter contexts are the interface
    to actual min/max filters;
 3. New insertion methods InsertFor<op>(), where <op> is LE, LT, GE and
    GT, are added to MinMaxFilter class hierarcy. They are used for
    join predicate target <op> src_expr;
 4. RuntimeContext::InsertPerCompareOp() calls one of the new
    insertion methods above based on the comparison op saved in the
    filter descriptor;
 5. NljBuilder::InsertRuntimeFilters() calls the new methods.

By default, the feature is turned on only for sorted or partitioned
join columns.

Testing:
 1. Add single range insertion tests in min-max-filter-test.cc;
 2. Add positive and negative plan tests in
    overlap_min_max_filters.test;
 3. Add tests in overlap_min_max_filters_on_partition_columns.test;
 4. Add tests in overlap_min_max_filters_on_sorted_columns.test;
 5. Run core tests.

Change-Id: I7c2bb5baad622051d1002c9c162c672d428e5446
---
M be/src/exec/data-sink.h
M be/src/exec/filter-context.cc
M be/src/exec/filter-context.h
M be/src/exec/join-builder.cc
M be/src/exec/join-builder.h
M be/src/exec/nested-loop-join-builder.cc
M be/src/exec/nested-loop-join-builder.h
M be/src/exec/nested-loop-join-node.cc
M be/src/exec/partitioned-hash-join-builder.cc
M be/src/runtime/coordinator.cc
M be/src/runtime/query-state.h
M be/src/runtime/runtime-filter.h
M be/src/runtime/string-value-test.cc
M be/src/runtime/string-value.cc
M be/src/runtime/string-value.h
M be/src/runtime/timestamp-value.h
M be/src/util/min-max-filter-ir.cc
M be/src/util/min-max-filter-test.cc
M be/src/util/min-max-filter.cc
M be/src/util/min-max-filter.h
M common/thrift/PlanNodes.thrift
M fe/src/main/java/org/apache/impala/analysis/BinaryPredicate.java
M fe/src/main/java/org/apache/impala/analysis/InlineViewRef.java
M fe/src/main/java/org/apache/impala/analysis/Predicate.java
M fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java
M fe/src/main/java/org/apache/impala/catalog/ScalarType.java
M fe/src/main/java/org/apache/impala/planner/AggregationNode.java
M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
M fe/src/main/java/org/apache/impala/planner/JoinNode.java
M fe/src/main/java/org/apache/impala/planner/NestedLoopJoinNode.java
M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java
M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
M testdata/workloads/functional-query/queries/QueryTest/overlap_min_max_filters.test
M testdata/workloads/functional-query/queries/QueryTest/overlap_min_max_filters_on_partition_columns.test
M testdata/workloads/functional-query/queries/QueryTest/overlap_min_max_filters_on_sorted_columns.test
35 files changed, 1,493 insertions(+), 123 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/06/17706/27
-- 
To view, visit http://gerrit.cloudera.org:8080/17706
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I7c2bb5baad622051d1002c9c162c672d428e5446
Gerrit-Change-Number: 17706
Gerrit-PatchSet: 27
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Amogh Margoor <am...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>

[Impala-ASF-CR] IMPALA-3430: Runtime filter : Extend runtime filter to support Min/Max values for HDFS scans

Posted by "Qifan Chen (Code Review)" <ge...@cloudera.org>.
Qifan Chen has posted comments on this change. ( http://gerrit.cloudera.org:8080/17706 )

Change subject: IMPALA-3430: Runtime filter : Extend runtime filter to support Min/Max values for HDFS scans
......................................................................


Patch Set 25:

(7 comments)

http://gerrit.cloudera.org:8080/#/c/17706/20//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/17706/20//COMMIT_MSG@9
PS20, Line 9: patch e
> Done
Done


http://gerrit.cloudera.org:8080/#/c/17706/20//COMMIT_MSG@10
PS20, Line 10: one val
> Done
Done


http://gerrit.cloudera.org:8080/#/c/17706/20//COMMIT_MSG@30
PS20, Line 30: ddedBuil
> Done
Done


http://gerrit.cloudera.org:8080/#/c/17706/20/be/src/exec/nested-loop-join-builder.h
File be/src/exec/nested-loop-join-builder.h:

http://gerrit.cloudera.org:8080/#/c/17706/20/be/src/exec/nested-loop-join-builder.h@104
PS20, Line 104:         TupleRow* build_row = build_batch_iter.Get();
> Good point. 
Done


http://gerrit.cloudera.org:8080/#/c/17706/20/be/src/exec/nested-loop-join-builder.cc
File be/src/exec/nested-loop-join-builder.cc:

http://gerrit.cloudera.org:8080/#/c/17706/20/be/src/exec/nested-loop-join-builder.cc@157
PS20, Line 157: 
> Reworked the code in this area. It turns out the boolean flag can be comput
Done


http://gerrit.cloudera.org:8080/#/c/17706/20/be/src/exec/nested-loop-join-builder.cc@259
PS20, Line 259: 
> Done
Done


http://gerrit.cloudera.org:8080/#/c/17706/20/fe/src/main/java/org/apache/impala/planner/PlanNode.java
File fe/src/main/java/org/apache/impala/planner/PlanNode.java:

http://gerrit.cloudera.org:8080/#/c/17706/20/fe/src/main/java/org/apache/impala/planner/PlanNode.java@1073
PS20, Line 1073: 
> Yes, the flag on scalar subquery now is set to InlineViewRef in analyzer an
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/17706
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7c2bb5baad622051d1002c9c162c672d428e5446
Gerrit-Change-Number: 17706
Gerrit-PatchSet: 25
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Amogh Margoor <am...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Wed, 11 Aug 2021 00:26:50 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-3430: Runtime filter : Extend runtime filter to support Min/Max values for HDFS scans

Posted by "Qifan Chen (Code Review)" <ge...@cloudera.org>.
Qifan Chen has uploaded a new patch set (#31). ( http://gerrit.cloudera.org:8080/17706 )

Change subject: IMPALA-3430: Runtime filter : Extend runtime filter to support Min/Max values for HDFS scans
......................................................................

IMPALA-3430: Runtime filter : Extend runtime filter to support Min/Max values for HDFS scans

This patch enables min/max filtering for non-correlated subqueries
that return one value. In this case, the filters are built from the
results of the subqueries and the filtering target is the scan node to
be qualified by one of the subqueries. Shown below is one such query
that normally gets compiled into a nested loop join. The filtering
limits the values from column store_sales.ss_sales_price to be within
[-infinite, min(ss_wholesale_cost)].

  select count(*) from store_sales
    where ss_sales_price <=
      (select min(ss_wholesale_cost) from store_sales);

In FE, the fact that the above scalar subquery exists is recorded
in a flag in InlineViewRef in analyzer and later on transferred to
AggregationNode in planner.

In BE, the min/max filtering infrastructure is integrated with the
nested loop join as follows.

 1. NljBuilderConfig is populated with filter descriptors from nested
    join plan node via NljBuilder::CreateEmbeddedBuilder() (similar
    to hash join), or in NljBuilderConfig::Init() when the sink config
    is created (for separate builder case);
 2. NljBuilder is populated with filter contexts utilizing the filter
    descriptors in NljBuilderConfig. Filter contexts are the interface
    to actual min/max filters;
 3. New insertion methods InsertFor<op>(), where <op> is LE, LT, GE and
    GT, are added to MinMaxFilter class hierarcy. They are used for
    join predicate target <op> src_expr;
 4. RuntimeContext::InsertPerCompareOp() calls one of the new
    insertion methods above based on the comparison op saved in the
    filter descriptor;
 5. NljBuilder::InsertRuntimeFilters() calls the new methods.

By default, the feature is turned on only for sorted or partitioned
join columns.

Testing:
 1. Add single range insertion tests in min-max-filter-test.cc;
 2. Add positive and negative plan tests in
    overlap_min_max_filters.test;
 3. Add tests in overlap_min_max_filters_on_partition_columns.test;
 4. Add tests in overlap_min_max_filters_on_sorted_columns.test;
 5. Run core tests.

Change-Id: I7c2bb5baad622051d1002c9c162c672d428e5446
---
M be/src/exec/data-sink.h
M be/src/exec/filter-context.cc
M be/src/exec/filter-context.h
M be/src/exec/join-builder.cc
M be/src/exec/join-builder.h
M be/src/exec/nested-loop-join-builder.cc
M be/src/exec/nested-loop-join-builder.h
M be/src/exec/nested-loop-join-node.cc
M be/src/exec/partitioned-hash-join-builder.cc
M be/src/runtime/coordinator.cc
M be/src/runtime/query-state.h
M be/src/runtime/runtime-filter.h
M be/src/runtime/string-value-test.cc
M be/src/runtime/string-value.cc
M be/src/runtime/string-value.h
M be/src/runtime/timestamp-value.h
M be/src/util/min-max-filter-ir.cc
M be/src/util/min-max-filter-test.cc
M be/src/util/min-max-filter.cc
M be/src/util/min-max-filter.h
M common/thrift/PlanNodes.thrift
M fe/src/main/java/org/apache/impala/analysis/BinaryPredicate.java
M fe/src/main/java/org/apache/impala/analysis/InlineViewRef.java
M fe/src/main/java/org/apache/impala/analysis/Predicate.java
M fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java
M fe/src/main/java/org/apache/impala/catalog/ScalarType.java
M fe/src/main/java/org/apache/impala/planner/AggregationNode.java
M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
M fe/src/main/java/org/apache/impala/planner/JoinBuildSink.java
M fe/src/main/java/org/apache/impala/planner/JoinNode.java
M fe/src/main/java/org/apache/impala/planner/NestedLoopJoinNode.java
M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java
M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java
M testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test
M testdata/workloads/functional-query/queries/QueryTest/overlap_min_max_filters.test
M testdata/workloads/functional-query/queries/QueryTest/overlap_min_max_filters_on_partition_columns.test
M testdata/workloads/functional-query/queries/QueryTest/overlap_min_max_filters_on_sorted_columns.test
38 files changed, 1,586 insertions(+), 166 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/06/17706/31
-- 
To view, visit http://gerrit.cloudera.org:8080/17706
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I7c2bb5baad622051d1002c9c162c672d428e5446
Gerrit-Change-Number: 17706
Gerrit-PatchSet: 31
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Amogh Margoor <am...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>

[Impala-ASF-CR] IMPALA-3430: Runtime filter : Extend runtime filter to support Min/Max values for HDFS scans

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17706 )

Change subject: IMPALA-3430: Runtime filter : Extend runtime filter to support Min/Max values for HDFS scans
......................................................................


Patch Set 28:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/9289/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/17706
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7c2bb5baad622051d1002c9c162c672d428e5446
Gerrit-Change-Number: 17706
Gerrit-PatchSet: 28
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Amogh Margoor <am...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Thu, 12 Aug 2021 19:41:13 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-3430: Runtime filter : Extend runtime filter to support Min/Max values for HDFS scans

Posted by "Qifan Chen (Code Review)" <ge...@cloudera.org>.
Qifan Chen has uploaded a new patch set (#29). ( http://gerrit.cloudera.org:8080/17706 )

Change subject: IMPALA-3430: Runtime filter : Extend runtime filter to support Min/Max values for HDFS scans
......................................................................

IMPALA-3430: Runtime filter : Extend runtime filter to support Min/Max values for HDFS scans

This patch enables min/max filtering for non-correlated subqueries
that return one value. In this case, the filters are built from the
results of the subqueries and the filtering target is the scan node to
be qualified by one of the subqueries. Shown below is one such query
that normally gets compiled into a nested loop join. The filtering
limits the values from column store_sales.ss_sales_price to be within
[-infinite, min(ss_wholesale_cost)].

  select count(*) from store_sales
    where ss_sales_price <=
      (select min(ss_wholesale_cost) from store_sales);

In FE, the fact that the above scalar subquery exists is recorded
in a flag in InlineViewRef in analyzer and later on transferred to
AggregationNode in planner.

In BE, the min/max filtering infrastructure is integrated with the
nested loop join as follows.

 1. Similar to hash join, NljBuilderConfig is populated with filter
    descriptors from nested join plan node via
    NljBuilder::CreateEmbeddedBuilder();
 2. NljBuilder is populated with filter contexts utilizing the filter
    descriptors in NljBuilderConfig. Filter contexts are the interface
    to actual min/max filters;
 3. New insertion methods InsertFor<op>(), where <op> is LE, LT, GE and
    GT, are added to MinMaxFilter class hierarcy. They are used for
    join predicate target <op> src_expr;
 4. RuntimeContext::InsertPerCompareOp() calls one of the new
    insertion methods above based on the comparison op saved in the
    filter descriptor;
 5. NljBuilder::InsertRuntimeFilters() calls the new methods.

By default, the feature is turned on only for sorted or partitioned
join columns.

Testing:
 1. Add single range insertion tests in min-max-filter-test.cc;
 2. Add positive and negative plan tests in
    overlap_min_max_filters.test;
 3. Add tests in overlap_min_max_filters_on_partition_columns.test;
 4. Add tests in overlap_min_max_filters_on_sorted_columns.test;
 5. Run core tests.

Change-Id: I7c2bb5baad622051d1002c9c162c672d428e5446
---
M be/src/exec/data-sink.h
M be/src/exec/filter-context.cc
M be/src/exec/filter-context.h
M be/src/exec/join-builder.cc
M be/src/exec/join-builder.h
M be/src/exec/nested-loop-join-builder.cc
M be/src/exec/nested-loop-join-builder.h
M be/src/exec/nested-loop-join-node.cc
M be/src/exec/partitioned-hash-join-builder.cc
M be/src/runtime/coordinator.cc
M be/src/runtime/query-state.h
M be/src/runtime/runtime-filter.h
M be/src/runtime/string-value-test.cc
M be/src/runtime/string-value.cc
M be/src/runtime/string-value.h
M be/src/runtime/timestamp-value.h
M be/src/util/min-max-filter-ir.cc
M be/src/util/min-max-filter-test.cc
M be/src/util/min-max-filter.cc
M be/src/util/min-max-filter.h
M common/thrift/PlanNodes.thrift
M fe/src/main/java/org/apache/impala/analysis/BinaryPredicate.java
M fe/src/main/java/org/apache/impala/analysis/InlineViewRef.java
M fe/src/main/java/org/apache/impala/analysis/Predicate.java
M fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java
M fe/src/main/java/org/apache/impala/catalog/ScalarType.java
M fe/src/main/java/org/apache/impala/planner/AggregationNode.java
M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
M fe/src/main/java/org/apache/impala/planner/JoinNode.java
M fe/src/main/java/org/apache/impala/planner/NestedLoopJoinNode.java
M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java
M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
M testdata/workloads/functional-query/queries/QueryTest/overlap_min_max_filters.test
M testdata/workloads/functional-query/queries/QueryTest/overlap_min_max_filters_on_partition_columns.test
M testdata/workloads/functional-query/queries/QueryTest/overlap_min_max_filters_on_sorted_columns.test
35 files changed, 1,503 insertions(+), 152 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/06/17706/29
-- 
To view, visit http://gerrit.cloudera.org:8080/17706
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I7c2bb5baad622051d1002c9c162c672d428e5446
Gerrit-Change-Number: 17706
Gerrit-PatchSet: 29
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Amogh Margoor <am...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>

[Impala-ASF-CR] IMPALA-3430: Runtime filter : Extend runtime filter to support Min/Max values for HDFS scans

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17706 )

Change subject: IMPALA-3430: Runtime filter : Extend runtime filter to support Min/Max values for HDFS scans
......................................................................


Patch Set 30:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/9302/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/17706
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7c2bb5baad622051d1002c9c162c672d428e5446
Gerrit-Change-Number: 17706
Gerrit-PatchSet: 30
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Amogh Margoor <am...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Mon, 16 Aug 2021 22:22:25 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-3430: Runtime filter : Extend runtime filter to support Min/Max values for HDFS scans

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17706 )

Change subject: IMPALA-3430: Runtime filter : Extend runtime filter to support Min/Max values for HDFS scans
......................................................................


Patch Set 11:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/9203/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/17706
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7c2bb5baad622051d1002c9c162c672d428e5446
Gerrit-Change-Number: 17706
Gerrit-PatchSet: 11
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Comment-Date: Thu, 29 Jul 2021 01:12:10 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-3430: Runtime filter : Extend runtime filter to support Min/Max values for HDFS scans

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17706 )

Change subject: IMPALA-3430: Runtime filter : Extend runtime filter to support Min/Max values for HDFS scans
......................................................................


Patch Set 14:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/9217/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/17706
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7c2bb5baad622051d1002c9c162c672d428e5446
Gerrit-Change-Number: 17706
Gerrit-PatchSet: 14
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Comment-Date: Sat, 31 Jul 2021 02:52:52 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-3430: Runtime filter : Extend runtime filter to support Min/Max values for HDFS scans

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17706 )

Change subject: IMPALA-3430: Runtime filter : Extend runtime filter to support Min/Max values for HDFS scans
......................................................................


Patch Set 20:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/9242/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/17706
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7c2bb5baad622051d1002c9c162c672d428e5446
Gerrit-Change-Number: 17706
Gerrit-PatchSet: 20
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Amogh Margoor <am...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Thu, 05 Aug 2021 02:56:11 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-3430: Runtime filter : Extend runtime filter to support Min/Max values for HDFS scans

Posted by "Qifan Chen (Code Review)" <ge...@cloudera.org>.
Qifan Chen has posted comments on this change. ( http://gerrit.cloudera.org:8080/17706 )

Change subject: IMPALA-3430: Runtime filter : Extend runtime filter to support Min/Max values for HDFS scans
......................................................................


Patch Set 27:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/17706/25/be/src/exec/nested-loop-join-builder.cc
File be/src/exec/nested-loop-join-builder.cc:

http://gerrit.cloudera.org:8080/#/c/17706/25/be/src/exec/nested-loop-join-builder.cc@191
PS25, Line 191:   return Status::OK();
> I'll file a JIRA for future investigation on the subject.
Did some research and found that a non-embedded sink is allocated one per fragment. In addition, since NJ's child(1) is broadcasted, it is safe to call PublishRuntimeFilters() regardless of whether the sinker is embedded or not. 

Moved the call outside the IF stmt.



-- 
To view, visit http://gerrit.cloudera.org:8080/17706
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7c2bb5baad622051d1002c9c162c672d428e5446
Gerrit-Change-Number: 17706
Gerrit-PatchSet: 27
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Amogh Margoor <am...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Wed, 11 Aug 2021 20:16:43 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-3430: Runtime filter : Extend runtime filter to support Min/Max values for HDFS scans

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17706 )

Change subject: IMPALA-3430: Runtime filter : Extend runtime filter to support Min/Max values for HDFS scans
......................................................................


Patch Set 15:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/9223/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/17706
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7c2bb5baad622051d1002c9c162c672d428e5446
Gerrit-Change-Number: 17706
Gerrit-PatchSet: 15
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Comment-Date: Tue, 03 Aug 2021 00:44:03 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] [WIP] IMPALA-3430: Runtime filter : Extend runtime filter to support Min/Max values for HDFS scans

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17706 )

Change subject: [WIP] IMPALA-3430: Runtime filter : Extend runtime filter to support Min/Max values for HDFS scans
......................................................................


Patch Set 4:

Build Failed 

https://jenkins.impala.io/job/gerrit-code-review-checks/9158/ : Initial code review checks failed. See linked job for details on the failure.


-- 
To view, visit http://gerrit.cloudera.org:8080/17706
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7c2bb5baad622051d1002c9c162c672d428e5446
Gerrit-Change-Number: 17706
Gerrit-PatchSet: 4
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Comment-Date: Fri, 23 Jul 2021 20:41:49 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] [WIP] IMPALA-3430: Runtime filter : Extend runtime filter to support Min/Max values for HDFS scans

Posted by "Qifan Chen (Code Review)" <ge...@cloudera.org>.
Qifan Chen has uploaded a new patch set (#6). ( http://gerrit.cloudera.org:8080/17706 )

Change subject: [WIP] IMPALA-3430: Runtime filter : Extend runtime filter to support Min/Max values for HDFS scans
......................................................................

[WIP] IMPALA-3430: Runtime filter : Extend runtime filter to support Min/Max values for HDFS scans

This patches enables min/max filtering for non-correlated subqueries
that return one row. In this case, the filters are built from the
results of the subqueries and the filtering target is the scan node to
be qualified by one of the subqueries. Shown below is one such query
that normally gets compiled into a nested loop join. The filtering
limits the values from column store_sales.ss_sales_price within the
range [-infinite, avg(ss_wholesale_cost)].

  select count(*) from store_sales
  where ss_sales_price <= (select min(ss_wholesale_cost) from store_sales);

Change-Id: I7c2bb5baad622051d1002c9c162c672d428e5446
---
M be/src/exec/nested-loop-join-builder.cc
M be/src/exec/nested-loop-join-builder.h
M be/src/exec/nested-loop-join-node.cc
M be/src/runtime/coordinator.cc
M fe/src/main/java/org/apache/impala/analysis/BinaryPredicate.java
M fe/src/main/java/org/apache/impala/analysis/Predicate.java
M fe/src/main/java/org/apache/impala/analysis/SlotRef.java
M fe/src/main/java/org/apache/impala/analysis/TupleDescriptor.java
M fe/src/main/java/org/apache/impala/planner/AggregationNode.java
M fe/src/main/java/org/apache/impala/planner/JoinNode.java
M fe/src/main/java/org/apache/impala/planner/NestedLoopJoinNode.java
M fe/src/main/java/org/apache/impala/planner/PlanNode.java
M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java
M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
14 files changed, 354 insertions(+), 24 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/06/17706/6
-- 
To view, visit http://gerrit.cloudera.org:8080/17706
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I7c2bb5baad622051d1002c9c162c672d428e5446
Gerrit-Change-Number: 17706
Gerrit-PatchSet: 6
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>

[Impala-ASF-CR] [WIP] IMPALA-3430: Runtime filter : Extend runtime filter to support Min/Max values for HDFS scans

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17706 )

Change subject: [WIP] IMPALA-3430: Runtime filter : Extend runtime filter to support Min/Max values for HDFS scans
......................................................................


Patch Set 2:

Build Failed 

https://jenkins.impala.io/job/gerrit-code-review-checks/9127/ : Initial code review checks failed. See linked job for details on the failure.


-- 
To view, visit http://gerrit.cloudera.org:8080/17706
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7c2bb5baad622051d1002c9c162c672d428e5446
Gerrit-Change-Number: 17706
Gerrit-PatchSet: 2
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Tue, 20 Jul 2021 20:32:44 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-3430: Runtime filter : Extend runtime filter to support Min/Max values for HDFS scans

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/17706 )

Change subject: IMPALA-3430: Runtime filter : Extend runtime filter to support Min/Max values for HDFS scans
......................................................................

IMPALA-3430: Runtime filter : Extend runtime filter to support Min/Max values for HDFS scans

This patch enables min/max filtering for non-correlated subqueries
that return one value. In this case, the filters are built from the
results of the subqueries and the filtering target is the scan node to
be qualified by one of the subqueries. Shown below is one such query
that normally gets compiled into a nested loop join. The filtering
limits the values from column store_sales.ss_sales_price to be within
[-infinite, min(ss_wholesale_cost)].

  select count(*) from store_sales
    where ss_sales_price <=
      (select min(ss_wholesale_cost) from store_sales);

In FE, the fact that the above scalar subquery exists is recorded
in a flag in InlineViewRef in analyzer and later on transferred to
AggregationNode in planner.

In BE, the min/max filtering infrastructure is integrated with the
nested loop join as follows.

 1. NljBuilderConfig is populated with filter descriptors from nested
    join plan node via NljBuilder::CreateEmbeddedBuilder() (similar
    to hash join), or in NljBuilderConfig::Init() when the sink config
    is created (for separate builder case);
 2. NljBuilder is populated with filter contexts utilizing the filter
    descriptors in NljBuilderConfig. Filter contexts are the interface
    to actual min/max filters;
 3. New insertion methods InsertFor<op>(), where <op> is LE, LT, GE and
    GT, are added to MinMaxFilter class hierarcy. They are used for
    join predicate target <op> src_expr;
 4. RuntimeContext::InsertPerCompareOp() calls one of the new
    insertion methods above based on the comparison op saved in the
    filter descriptor;
 5. NljBuilder::InsertRuntimeFilters() calls the new methods.

By default, the feature is turned on only for sorted or partitioned
join columns.

Testing:
 1. Add single range insertion tests in min-max-filter-test.cc;
 2. Add positive and negative plan tests in
    overlap_min_max_filters.test;
 3. Add tests in overlap_min_max_filters_on_partition_columns.test;
 4. Add tests in overlap_min_max_filters_on_sorted_columns.test;
 5. Run core tests.

TODO in follow-up patches:
 1. Extend min/max filter for inequality subquery for other use cases
    (IMPALA-10869).

Change-Id: I7c2bb5baad622051d1002c9c162c672d428e5446
Reviewed-on: http://gerrit.cloudera.org:8080/17706
Reviewed-by: Impala Public Jenkins <im...@cloudera.com>
Tested-by: Impala Public Jenkins <im...@cloudera.com>
---
M be/src/exec/data-sink.h
M be/src/exec/filter-context.cc
M be/src/exec/filter-context.h
M be/src/exec/join-builder.cc
M be/src/exec/join-builder.h
M be/src/exec/nested-loop-join-builder.cc
M be/src/exec/nested-loop-join-builder.h
M be/src/exec/nested-loop-join-node.cc
M be/src/exec/partitioned-hash-join-builder.cc
M be/src/runtime/coordinator.cc
M be/src/runtime/query-state.h
M be/src/runtime/runtime-filter.h
M be/src/runtime/string-value-test.cc
M be/src/runtime/string-value.cc
M be/src/runtime/string-value.h
M be/src/runtime/timestamp-value.h
M be/src/util/min-max-filter-ir.cc
M be/src/util/min-max-filter-test.cc
M be/src/util/min-max-filter.cc
M be/src/util/min-max-filter.h
M common/thrift/PlanNodes.thrift
M fe/src/main/java/org/apache/impala/analysis/BinaryPredicate.java
M fe/src/main/java/org/apache/impala/analysis/InlineViewRef.java
M fe/src/main/java/org/apache/impala/analysis/Predicate.java
M fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java
M fe/src/main/java/org/apache/impala/catalog/ScalarType.java
M fe/src/main/java/org/apache/impala/planner/AggregationNode.java
M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
M fe/src/main/java/org/apache/impala/planner/JoinBuildSink.java
M fe/src/main/java/org/apache/impala/planner/JoinNode.java
M fe/src/main/java/org/apache/impala/planner/NestedLoopJoinNode.java
M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java
M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java
M testdata/workloads/functional-planner/queries/PlannerTest/subquery-rewrite.test
M testdata/workloads/functional-query/queries/QueryTest/overlap_min_max_filters.test
M testdata/workloads/functional-query/queries/QueryTest/overlap_min_max_filters_on_partition_columns.test
M testdata/workloads/functional-query/queries/QueryTest/overlap_min_max_filters_on_sorted_columns.test
38 files changed, 1,586 insertions(+), 166 deletions(-)

Approvals:
  Impala Public Jenkins: Looks good to me, approved; Verified

-- 
To view, visit http://gerrit.cloudera.org:8080/17706
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I7c2bb5baad622051d1002c9c162c672d428e5446
Gerrit-Change-Number: 17706
Gerrit-PatchSet: 34
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Amogh Margoor <am...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>

[Impala-ASF-CR] IMPALA-3430: Runtime filter : Extend runtime filter to support Min/Max values for HDFS scans

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17706 )

Change subject: IMPALA-3430: Runtime filter : Extend runtime filter to support Min/Max values for HDFS scans
......................................................................


Patch Set 33: Code-Review+2


-- 
To view, visit http://gerrit.cloudera.org:8080/17706
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7c2bb5baad622051d1002c9c162c672d428e5446
Gerrit-Change-Number: 17706
Gerrit-PatchSet: 33
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Amogh Margoor <am...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Sat, 21 Aug 2021 08:32:10 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-3430: Runtime filter : Extend runtime filter to support Min/Max values for HDFS scans

Posted by "Qifan Chen (Code Review)" <ge...@cloudera.org>.
Qifan Chen has uploaded a new patch set (#30). ( http://gerrit.cloudera.org:8080/17706 )

Change subject: IMPALA-3430: Runtime filter : Extend runtime filter to support Min/Max values for HDFS scans
......................................................................

IMPALA-3430: Runtime filter : Extend runtime filter to support Min/Max values for HDFS scans

This patch enables min/max filtering for non-correlated subqueries
that return one value. In this case, the filters are built from the
results of the subqueries and the filtering target is the scan node to
be qualified by one of the subqueries. Shown below is one such query
that normally gets compiled into a nested loop join. The filtering
limits the values from column store_sales.ss_sales_price to be within
[-infinite, min(ss_wholesale_cost)].

  select count(*) from store_sales
    where ss_sales_price <=
      (select min(ss_wholesale_cost) from store_sales);

In FE, the fact that the above scalar subquery exists is recorded
in a flag in InlineViewRef in analyzer and later on transferred to
AggregationNode in planner.

In BE, the min/max filtering infrastructure is integrated with the
nested loop join as follows.

 1. NljBuilderConfig is populated with filter descriptors from nested
    join plan node via NljBuilder::CreateEmbeddedBuilder() (similar
    to hash join), or in NljBuilderConfig::Init() when the sink config
    is created (for separate builder case);
 2. NljBuilder is populated with filter contexts utilizing the filter
    descriptors in NljBuilderConfig. Filter contexts are the interface
    to actual min/max filters;
 3. New insertion methods InsertFor<op>(), where <op> is LE, LT, GE and
    GT, are added to MinMaxFilter class hierarcy. They are used for
    join predicate target <op> src_expr;
 4. RuntimeContext::InsertPerCompareOp() calls one of the new
    insertion methods above based on the comparison op saved in the
    filter descriptor;
 5. NljBuilder::InsertRuntimeFilters() calls the new methods.

By default, the feature is turned on only for sorted or partitioned
join columns.

Testing:
 1. Add single range insertion tests in min-max-filter-test.cc;
 2. Add positive and negative plan tests in
    overlap_min_max_filters.test;
 3. Add tests in overlap_min_max_filters_on_partition_columns.test;
 4. Add tests in overlap_min_max_filters_on_sorted_columns.test;
 5. Run core tests.

Change-Id: I7c2bb5baad622051d1002c9c162c672d428e5446
---
M be/src/exec/data-sink.h
M be/src/exec/filter-context.cc
M be/src/exec/filter-context.h
M be/src/exec/join-builder.cc
M be/src/exec/join-builder.h
M be/src/exec/nested-loop-join-builder.cc
M be/src/exec/nested-loop-join-builder.h
M be/src/exec/nested-loop-join-node.cc
M be/src/exec/partitioned-hash-join-builder.cc
M be/src/runtime/coordinator.cc
M be/src/runtime/query-state.h
M be/src/runtime/runtime-filter.h
M be/src/runtime/string-value-test.cc
M be/src/runtime/string-value.cc
M be/src/runtime/string-value.h
M be/src/runtime/timestamp-value.h
M be/src/util/min-max-filter-ir.cc
M be/src/util/min-max-filter-test.cc
M be/src/util/min-max-filter.cc
M be/src/util/min-max-filter.h
M common/thrift/PlanNodes.thrift
M fe/src/main/java/org/apache/impala/analysis/BinaryPredicate.java
M fe/src/main/java/org/apache/impala/analysis/InlineViewRef.java
M fe/src/main/java/org/apache/impala/analysis/Predicate.java
M fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java
M fe/src/main/java/org/apache/impala/catalog/ScalarType.java
M fe/src/main/java/org/apache/impala/planner/AggregationNode.java
M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
M fe/src/main/java/org/apache/impala/planner/JoinBuildSink.java
M fe/src/main/java/org/apache/impala/planner/JoinNode.java
M fe/src/main/java/org/apache/impala/planner/NestedLoopJoinNode.java
M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java
M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
M testdata/workloads/functional-query/queries/QueryTest/overlap_min_max_filters.test
M testdata/workloads/functional-query/queries/QueryTest/overlap_min_max_filters_on_partition_columns.test
M testdata/workloads/functional-query/queries/QueryTest/overlap_min_max_filters_on_sorted_columns.test
36 files changed, 1,572 insertions(+), 157 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/06/17706/30
-- 
To view, visit http://gerrit.cloudera.org:8080/17706
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I7c2bb5baad622051d1002c9c162c672d428e5446
Gerrit-Change-Number: 17706
Gerrit-PatchSet: 30
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Amogh Margoor <am...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>

[Impala-ASF-CR] IMPALA-3430: Runtime filter : Extend runtime filter to support Min/Max values for HDFS scans

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17706 )

Change subject: IMPALA-3430: Runtime filter : Extend runtime filter to support Min/Max values for HDFS scans
......................................................................


Patch Set 24:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/9266/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/17706
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7c2bb5baad622051d1002c9c162c672d428e5446
Gerrit-Change-Number: 17706
Gerrit-PatchSet: 24
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Amogh Margoor <am...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Tue, 10 Aug 2021 02:34:13 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-3430: Runtime filter : Extend runtime filter to support Min/Max values for HDFS scans

Posted by "Amogh Margoor (Code Review)" <ge...@cloudera.org>.
Amogh Margoor has posted comments on this change. ( http://gerrit.cloudera.org:8080/17706 )

Change subject: IMPALA-3430: Runtime filter : Extend runtime filter to support Min/Max values for HDFS scans
......................................................................


Patch Set 31: Code-Review+1

(2 comments)

+1 LGTM. Just left one comment to change the commit message.

http://gerrit.cloudera.org:8080/#/c/17706/29/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java
File fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java:

http://gerrit.cloudera.org:8080/#/c/17706/29/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java@392
PS29, Line 392:             // AggregationNode that implements a non-correlated scalar subquery
> This form is particularly called for in the JIRA (https://gerrit.cloudera.o
For non-aggregate scalar subqueries such conversions from "select ss_wholesale_cost from tpcds_parquet.store_sales" to "select min(ss_wholesale_cost) from tpcds_parquet.store_sales" may not always be semantically right like for "select ss_wholesale_cost from tpcds_parquet.store_sales limit 1". It is fine to just handle aggregate scalar subquery in this patch as it may be the most common case - but its better to mention that in commit message and also have a JIRA for extending it to non-aggregate ones too.


http://gerrit.cloudera.org:8080/#/c/17706/29/testdata/workloads/functional-query/queries/QueryTest/overlap_min_max_filters.test
File testdata/workloads/functional-query/queries/QueryTest/overlap_min_max_filters.test:

http://gerrit.cloudera.org:8080/#/c/17706/29/testdata/workloads/functional-query/queries/QueryTest/overlap_min_max_filters.test@437
PS29, Line 437: ---- QUERY
> The first one was added. 
Makes sense.



-- 
To view, visit http://gerrit.cloudera.org:8080/17706
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7c2bb5baad622051d1002c9c162c672d428e5446
Gerrit-Change-Number: 17706
Gerrit-PatchSet: 31
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Amogh Margoor <am...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Wed, 18 Aug 2021 13:56:50 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-3430: Runtime filter : Extend runtime filter to support Min/Max values for HDFS scans

Posted by "Qifan Chen (Code Review)" <ge...@cloudera.org>.
Qifan Chen has uploaded a new patch set (#28). ( http://gerrit.cloudera.org:8080/17706 )

Change subject: IMPALA-3430: Runtime filter : Extend runtime filter to support Min/Max values for HDFS scans
......................................................................

IMPALA-3430: Runtime filter : Extend runtime filter to support Min/Max values for HDFS scans

This patch enables min/max filtering for non-correlated subqueries
that return one value. In this case, the filters are built from the
results of the subqueries and the filtering target is the scan node to
be qualified by one of the subqueries. Shown below is one such query
that normally gets compiled into a nested loop join. The filtering
limits the values from column store_sales.ss_sales_price to be within
[-infinite, min(ss_wholesale_cost)].

  select count(*) from store_sales
    where ss_sales_price <=
      (select min(ss_wholesale_cost) from store_sales);

In FE, the fact that the above scalar subquery exists is recorded
in a flag in InlineViewRef in analyzer and later on transferred to
AggregationNode in planner.

In BE, the min/max filtering infrastructure is integrated with the
nested loop join as follows.

 1. Similar to hash join, NljBuilderConfig is populated with filter
    descriptors from nested join plan node via
    NljBuilder::CreateEmbeddedBuilder();
 2. NljBuilder is populated with filter contexts utilizing the filter
    descriptors in NljBuilderConfig. Filter contexts are the interface
    to actual min/max filters;
 3. New insertion methods InsertFor<op>(), where <op> is LE, LT, GE and
    GT, are added to MinMaxFilter class hierarcy. They are used for
    join predicate target <op> src_expr;
 4. RuntimeContext::InsertPerCompareOp() calls one of the new
    insertion methods above based on the comparison op saved in the
    filter descriptor;
 5. NljBuilder::InsertRuntimeFilters() calls the new methods.

By default, the feature is turned on only for sorted or partitioned
join columns.

Testing:
 1. Add single range insertion tests in min-max-filter-test.cc;
 2. Add positive and negative plan tests in
    overlap_min_max_filters.test;
 3. Add tests in overlap_min_max_filters_on_partition_columns.test;
 4. Add tests in overlap_min_max_filters_on_sorted_columns.test;
 5. Run core tests.

Change-Id: I7c2bb5baad622051d1002c9c162c672d428e5446
---
M be/src/exec/data-sink.h
M be/src/exec/filter-context.cc
M be/src/exec/filter-context.h
M be/src/exec/join-builder.cc
M be/src/exec/join-builder.h
M be/src/exec/nested-loop-join-builder.cc
M be/src/exec/nested-loop-join-builder.h
M be/src/exec/nested-loop-join-node.cc
M be/src/exec/partitioned-hash-join-builder.cc
M be/src/runtime/coordinator.cc
M be/src/runtime/query-state.h
M be/src/runtime/runtime-filter.h
M be/src/runtime/string-value-test.cc
M be/src/runtime/string-value.cc
M be/src/runtime/string-value.h
M be/src/runtime/timestamp-value.h
M be/src/util/min-max-filter-ir.cc
M be/src/util/min-max-filter-test.cc
M be/src/util/min-max-filter.cc
M be/src/util/min-max-filter.h
M common/thrift/PlanNodes.thrift
M fe/src/main/java/org/apache/impala/analysis/BinaryPredicate.java
M fe/src/main/java/org/apache/impala/analysis/InlineViewRef.java
M fe/src/main/java/org/apache/impala/analysis/Predicate.java
M fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java
M fe/src/main/java/org/apache/impala/catalog/ScalarType.java
M fe/src/main/java/org/apache/impala/planner/AggregationNode.java
M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
M fe/src/main/java/org/apache/impala/planner/JoinNode.java
M fe/src/main/java/org/apache/impala/planner/NestedLoopJoinNode.java
M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java
M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
M testdata/workloads/functional-query/queries/QueryTest/overlap_min_max_filters.test
M testdata/workloads/functional-query/queries/QueryTest/overlap_min_max_filters_on_partition_columns.test
M testdata/workloads/functional-query/queries/QueryTest/overlap_min_max_filters_on_sorted_columns.test
35 files changed, 1,499 insertions(+), 152 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/06/17706/28
-- 
To view, visit http://gerrit.cloudera.org:8080/17706
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I7c2bb5baad622051d1002c9c162c672d428e5446
Gerrit-Change-Number: 17706
Gerrit-PatchSet: 28
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Amogh Margoor <am...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>

[Impala-ASF-CR] IMPALA-3430: Runtime filter : Extend runtime filter to support Min/Max values for HDFS scans

Posted by "Qifan Chen (Code Review)" <ge...@cloudera.org>.
Qifan Chen has uploaded a new patch set (#8). ( http://gerrit.cloudera.org:8080/17706 )

Change subject: IMPALA-3430: Runtime filter : Extend runtime filter to support Min/Max values for HDFS scans
......................................................................

IMPALA-3430: Runtime filter : Extend runtime filter to support Min/Max values for HDFS scans

This patches enables min/max filtering for non-correlated subqueries
that return one row. In this case, the filters are built from the
results of the subqueries and the filtering target is the scan node to
be qualified by one of the subqueries. Shown below is one such query
that normally gets compiled into a nested loop join. The filtering
limits the values from column store_sales.ss_sales_price within the
range [-infinite, avg(ss_wholesale_cost)].

  select count(*) from store_sales
    where ss_sales_price <=
      (select min(ss_wholesale_cost) from store_sales);

In the patch, the min/max filtering infrastructure is integrated with
the nested loop join.

Change-Id: I7c2bb5baad622051d1002c9c162c672d428e5446
---
M be/src/exec/nested-loop-join-builder.cc
M be/src/exec/nested-loop-join-builder.h
M be/src/exec/nested-loop-join-node.cc
M be/src/runtime/coordinator.cc
M be/src/runtime/query-state.h
M common/thrift/PlanNodes.thrift
M fe/src/main/java/org/apache/impala/analysis/BinaryPredicate.java
M fe/src/main/java/org/apache/impala/analysis/Predicate.java
M fe/src/main/java/org/apache/impala/analysis/SlotRef.java
M fe/src/main/java/org/apache/impala/analysis/TupleDescriptor.java
M fe/src/main/java/org/apache/impala/planner/AggregationNode.java
M fe/src/main/java/org/apache/impala/planner/JoinNode.java
M fe/src/main/java/org/apache/impala/planner/NestedLoopJoinNode.java
M fe/src/main/java/org/apache/impala/planner/PlanNode.java
M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java
M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
16 files changed, 395 insertions(+), 28 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/06/17706/8
-- 
To view, visit http://gerrit.cloudera.org:8080/17706
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I7c2bb5baad622051d1002c9c162c672d428e5446
Gerrit-Change-Number: 17706
Gerrit-PatchSet: 8
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>