You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@impala.apache.org by "Qifan Chen (Code Review)" <ge...@cloudera.org> on 2021/06/09 18:04:00 UTC

[Impala-ASF-CR] IMPALA-10738: Min/max filters should be enabled for partition columns

Qifan Chen has uploaded this change for review. ( http://gerrit.cloudera.org:8080/17568


Change subject: IMPALA-10738: Min/max filters should be enabled for partition columns
......................................................................

IMPALA-10738: Min/max filters should be enabled for partition columns

This patch enables min/max filters for partitoned columns to take
advantage of the min/max filter infrastructure already built. To turn
on the feature, set query option minmax_filter_threshold to a value
greater than 0.

In the patch, the existing query option enabled_runtime_filter_types
is made to play a role in the types of the filters generated. The
default value ALL generates both the bloom and min/max filters. The
alternative value BLOOM generates only the bloom filters and another
alternagive value MIN_MAX generates only the min/max filters.

Testing:
  1). Added a new test in overlap_min_max_filters.test to verify
      that a min/max filter is generated for a partition column
      and is able to filter out all partitions.
  2). Core tests [TBD]

Change-Id: I89e135ef48b4bb36d70075287b03d1c12496b042
---
M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java
M testdata/workloads/functional-query/queries/QueryTest/overlap_min_max_filters.test
2 files changed, 34 insertions(+), 7 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/68/17568/1
-- 
To view, visit http://gerrit.cloudera.org:8080/17568
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I89e135ef48b4bb36d70075287b03d1c12496b042
Gerrit-Change-Number: 17568
Gerrit-PatchSet: 1
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>

[Impala-ASF-CR] IMPALA-10738: Min/max filters should be enabled for partition columns

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17568 )

Change subject: IMPALA-10738: Min/max filters should be enabled for partition columns
......................................................................


Patch Set 14:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/9008/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/17568
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I89e135ef48b4bb36d70075287b03d1c12496b042
Gerrit-Change-Number: 17568
Gerrit-PatchSet: 14
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Fri, 25 Jun 2021 14:24:38 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10738: Min/max filters should be enabled for partition columns

Posted by "Qifan Chen (Code Review)" <ge...@cloudera.org>.
Qifan Chen has posted comments on this change. ( http://gerrit.cloudera.org:8080/17568 )

Change subject: IMPALA-10738: Min/max filters should be enabled for partition columns
......................................................................


Patch Set 14:

(1 comment)

Create a new jira to track min/max filtering on iceberg partitions: https://issues.apache.org/jira/browse/IMPALA-10777.

http://gerrit.cloudera.org:8080/#/c/17568/14/be/src/service/query-options.cc
File be/src/service/query-options.cc:

http://gerrit.cloudera.org:8080/#/c/17568/14/be/src/service/query-options.cc@1093
PS14, Line 1093:  
> nit: space
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/17568
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I89e135ef48b4bb36d70075287b03d1c12496b042
Gerrit-Change-Number: 17568
Gerrit-PatchSet: 14
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Mon, 05 Jul 2021 18:50:16 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-10738: Min/max filters should be enabled for partition columns

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17568 )

Change subject: IMPALA-10738: Min/max filters should be enabled for partition columns
......................................................................


Patch Set 3:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/8924/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/17568
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I89e135ef48b4bb36d70075287b03d1c12496b042
Gerrit-Change-Number: 17568
Gerrit-PatchSet: 3
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Comment-Date: Tue, 15 Jun 2021 18:37:48 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10738: Min/max filters should be enabled for partition columns

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17568 )

Change subject: IMPALA-10738: Min/max filters should be enabled for partition columns
......................................................................


Patch Set 18: Code-Review+2


-- 
To view, visit http://gerrit.cloudera.org:8080/17568
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I89e135ef48b4bb36d70075287b03d1c12496b042
Gerrit-Change-Number: 17568
Gerrit-PatchSet: 18
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Fri, 09 Jul 2021 07:42:14 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10738: Min/max filters should be enabled for partition columns

Posted by "Qifan Chen (Code Review)" <ge...@cloudera.org>.
Qifan Chen has uploaded a new patch set (#9). ( http://gerrit.cloudera.org:8080/17568 )

Change subject: IMPALA-10738: Min/max filters should be enabled for partition columns
......................................................................

IMPALA-10738: Min/max filters should be enabled for partition columns

This patch enables min/max filters for partitoned columns to take
advantage of the min/max filter infrastructure already built by default.
To turn off the feature, set the new query option
minmax_filter_partition_column to false.

In the patch, the existing query option enabled_runtime_filter_types
is utilized to play a role in the types of the filters generated. The
default value ALL generates both the bloom and min/max filters. The
alternative value BLOOM generates only the bloom filters and another
alternative value MIN_MAX generates only the min/max filters.

The normal control knobs minmax_filter_threshold (for threshold) and
minmax_filtering_level (for filtering level) still work. When the
threshold is 0, the patch automatically assigns a reasonable value
for the threshhold.

Testing:
  1). Added new tests in
      overlap_min_max_filters_on_partition_columns.test;
  2). Core tests [TBD]

Change-Id: I89e135ef48b4bb36d70075287b03d1c12496b042
---
M be/src/exec/hdfs-scan-node-base.cc
M be/src/exec/hdfs-scanner.h
M be/src/exec/parquet/hdfs-parquet-scanner.cc
M be/src/exec/parquet/hdfs-parquet-scanner.h
M be/src/runtime/runtime-filter.h
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/ImpalaService.thrift
M common/thrift/Query.thrift
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M fe/src/main/java/org/apache/impala/planner/Planner.java
M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
M fe/src/test/java/org/apache/impala/planner/TpcdsPlannerTest.java
M testdata/workloads/functional-planner/queries/PlannerTest/bloom-filter-assignment.test
M testdata/workloads/functional-planner/queries/PlannerTest/min-max-runtime-filters-hdfs-num-rows-est-enabled.test
M testdata/workloads/functional-planner/queries/PlannerTest/min-max-runtime-filters.test
M testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-query-options.test
A testdata/workloads/functional-query/queries/QueryTest/overlap_min_max_filters_on_partition_columns.test
M tests/query_test/test_runtime_filters.py
20 files changed, 357 insertions(+), 218 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/68/17568/9
-- 
To view, visit http://gerrit.cloudera.org:8080/17568
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I89e135ef48b4bb36d70075287b03d1c12496b042
Gerrit-Change-Number: 17568
Gerrit-PatchSet: 9
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>

[Impala-ASF-CR] IMPALA-10738: Min/max filters should be enabled for partition columns

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17568 )

Change subject: IMPALA-10738: Min/max filters should be enabled for partition columns
......................................................................


Patch Set 12:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/8983/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/17568
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I89e135ef48b4bb36d70075287b03d1c12496b042
Gerrit-Change-Number: 17568
Gerrit-PatchSet: 12
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Comment-Date: Wed, 23 Jun 2021 17:09:59 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10738: Min/max filters should be enabled for partition columns

Posted by "Qifan Chen (Code Review)" <ge...@cloudera.org>.
Qifan Chen has uploaded a new patch set (#3). ( http://gerrit.cloudera.org:8080/17568 )

Change subject: IMPALA-10738: Min/max filters should be enabled for partition columns
......................................................................

IMPALA-10738: Min/max filters should be enabled for partition columns

This patch enables min/max filters for partitoned columns to take
advantage of the min/max filter infrastructure already built. To turn
on the feature, set the new query option minmax_filter_partition_column
to true (default).

In the patch, the existing query option enabled_runtime_filter_types
is made to play a role in the types of the filters generated. The
default value ALL generates both the bloom and min/max filters. The
alternative value BLOOM generates only the bloom filters and another
alternative value MIN_MAX generates only the min/max filters.

Testing:
  1). Added a new test in overlap_min_max_filters.test to verify
      that a min/max filter is generated for a partition column
      and is able to filter out all partitions.
  2). Core tests [TBD]

Change-Id: I89e135ef48b4bb36d70075287b03d1c12496b042
---
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/ImpalaService.thrift
M common/thrift/Query.thrift
M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java
M fe/src/test/java/org/apache/impala/planner/TpcdsPlannerTest.java
M testdata/workloads/functional-planner/queries/PlannerTest/min-max-runtime-filters-hdfs-num-rows-est-enabled.test
M testdata/workloads/functional-planner/queries/PlannerTest/min-max-runtime-filters.test
M testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-query-options.test
M testdata/workloads/functional-query/queries/QueryTest/overlap_min_max_filters.test
10 files changed, 105 insertions(+), 61 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/68/17568/3
-- 
To view, visit http://gerrit.cloudera.org:8080/17568
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I89e135ef48b4bb36d70075287b03d1c12496b042
Gerrit-Change-Number: 17568
Gerrit-PatchSet: 3
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>

[Impala-ASF-CR] IMPALA-10738: Min/max filters should be enabled for partition columns

Posted by "Qifan Chen (Code Review)" <ge...@cloudera.org>.
Qifan Chen has uploaded a new patch set (#13). ( http://gerrit.cloudera.org:8080/17568 )

Change subject: IMPALA-10738: Min/max filters should be enabled for partition columns
......................................................................

IMPALA-10738: Min/max filters should be enabled for partition columns

This patch enables min/max filters for partitoned columns to take
advantage of the min/max filter infrastructure already built and to
provide coverage for certain equi-joins in which the stats filters
are not feasible.

The new feature is turned on by default and to turn off the feature,
set the new query option minmax_filter_partition_column to false.

In the patch, the existing query option enabled_runtime_filter_types
is enforced in specifying the types of the filters generated. The
default value ALL generates both the bloom and min/max filters. The
alternative value BLOOM generates only the bloom filters and another
alternative value MIN_MAX generates only the min/max filters.

The normal control knobs minmax_filter_threshold (for threshold) and
minmax_filtering_level (for filtering level) still work. When the
threshold is 0, the patch automatically assigns a reasonable value
for the threshhold.

Testing:
  1). Added new tests in
      overlap_min_max_filters_on_partition_columns.test;
  2). Core tests

Change-Id: I89e135ef48b4bb36d70075287b03d1c12496b042
---
M be/src/exec/hdfs-scan-node-base.cc
M be/src/exec/hdfs-scanner.h
M be/src/exec/parquet/hdfs-parquet-scanner.cc
M be/src/exec/parquet/hdfs-parquet-scanner.h
M be/src/runtime/runtime-filter.h
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/ImpalaService.thrift
M common/thrift/Query.thrift
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M fe/src/main/java/org/apache/impala/planner/Planner.java
M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
M fe/src/test/java/org/apache/impala/planner/TpcdsPlannerTest.java
M testdata/workloads/functional-planner/queries/PlannerTest/bloom-filter-assignment.test
M testdata/workloads/functional-planner/queries/PlannerTest/min-max-runtime-filters-hdfs-num-rows-est-enabled.test
M testdata/workloads/functional-planner/queries/PlannerTest/min-max-runtime-filters.test
M testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-query-options.test
M testdata/workloads/functional-query/queries/QueryTest/overlap_min_max_filters.test
A testdata/workloads/functional-query/queries/QueryTest/overlap_min_max_filters_on_partition_columns.test
M testdata/workloads/functional-query/queries/QueryTest/runtime_filters.test
M testdata/workloads/functional-query/queries/QueryTest/runtime_filters_mt_dop.test
M tests/query_test/test_runtime_filters.py
23 files changed, 323 insertions(+), 172 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/68/17568/13
-- 
To view, visit http://gerrit.cloudera.org:8080/17568
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I89e135ef48b4bb36d70075287b03d1c12496b042
Gerrit-Change-Number: 17568
Gerrit-PatchSet: 13
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>

[Impala-ASF-CR] IMPALA-10738: Min/max filters should be enabled for partition columns

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17568 )

Change subject: IMPALA-10738: Min/max filters should be enabled for partition columns
......................................................................


Patch Set 16:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/7278/ DRY_RUN=false


-- 
To view, visit http://gerrit.cloudera.org:8080/17568
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I89e135ef48b4bb36d70075287b03d1c12496b042
Gerrit-Change-Number: 17568
Gerrit-PatchSet: 16
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Wed, 07 Jul 2021 16:57:34 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10738: Min/max filters should be enabled for partition columns

Posted by "Qifan Chen (Code Review)" <ge...@cloudera.org>.
Qifan Chen has uploaded a new patch set (#6). ( http://gerrit.cloudera.org:8080/17568 )

Change subject: IMPALA-10738: Min/max filters should be enabled for partition columns
......................................................................

IMPALA-10738: Min/max filters should be enabled for partition columns

This patch enables min/max filters for partitoned columns to take
advantage of the min/max filter infrastructure already built by default.
To turn off the feature, set the new query option
minmax_filter_partition_column to false.

In the patch, the existing query option enabled_runtime_filter_types
is utilized to play a role in the types of the filters generated. The
default value ALL generates both the bloom and min/max filters. The
alternative value BLOOM generates only the bloom filters and another
alternative value MIN_MAX generates only the min/max filters.

The normal control knobs minmax_filter_threshold (for threshold) and
minmax_filtering_level (for filtering level) still work. When the
threshold is 0, the patch automatically assigns a reasonable value
for the threshhold.

Testing:
  1). Added new tests in
      overlap_min_max_filters_on_partition_columns.test;
  2). Core tests [TBD]

Change-Id: I89e135ef48b4bb36d70075287b03d1c12496b042
---
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/ImpalaService.thrift
M common/thrift/Query.thrift
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M fe/src/main/java/org/apache/impala/planner/Planner.java
M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java
M fe/src/test/java/org/apache/impala/planner/TpcdsPlannerTest.java
M testdata/workloads/functional-planner/queries/PlannerTest/min-max-runtime-filters-hdfs-num-rows-est-enabled.test
M testdata/workloads/functional-planner/queries/PlannerTest/min-max-runtime-filters.test
M testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-query-options.test
A testdata/workloads/functional-query/queries/QueryTest/overlap_min_max_filters_on_partition_columns.test
M tests/query_test/test_runtime_filters.py
13 files changed, 173 insertions(+), 71 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/68/17568/6
-- 
To view, visit http://gerrit.cloudera.org:8080/17568
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I89e135ef48b4bb36d70075287b03d1c12496b042
Gerrit-Change-Number: 17568
Gerrit-PatchSet: 6
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>

[Impala-ASF-CR] IMPALA-10738: Min/max filters should be enabled for partition columns

Posted by "Qifan Chen (Code Review)" <ge...@cloudera.org>.
Qifan Chen has uploaded a new patch set (#14). ( http://gerrit.cloudera.org:8080/17568 )

Change subject: IMPALA-10738: Min/max filters should be enabled for partition columns
......................................................................

IMPALA-10738: Min/max filters should be enabled for partition columns

This patch enables min/max filters for partitoned columns to take
advantage of the min/max filter infrastructure already built and to
provide coverage for certain equi-joins in which the stats filters
are not feasible.

The new feature is turned on by default and to turn off the feature,
set the new query option minmax_filter_partition_column to false.

In the patch, the existing query option enabled_runtime_filter_types
is enforced in specifying the types of the filters generated. The
default value ALL generates both the bloom and min/max filters. The
alternative value BLOOM generates only the bloom filters and another
alternative value MIN_MAX generates only the min/max filters.

The normal control knobs minmax_filter_threshold (for threshold) and
minmax_filtering_level (for filtering level) still work. When the
threshold is 0, the patch automatically assigns a reasonable value
for the threshhold.

Testing:
  1). Added new tests in
      overlap_min_max_filters_on_partition_columns.test;
  2). Core tests

Change-Id: I89e135ef48b4bb36d70075287b03d1c12496b042
---
M be/src/exec/hdfs-scan-node-base.cc
M be/src/exec/hdfs-scanner.h
M be/src/exec/parquet/hdfs-parquet-scanner.cc
M be/src/exec/parquet/hdfs-parquet-scanner.h
M be/src/runtime/runtime-filter.h
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/ImpalaService.thrift
M common/thrift/Query.thrift
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M fe/src/main/java/org/apache/impala/planner/Planner.java
M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
M fe/src/test/java/org/apache/impala/planner/TpcdsPlannerTest.java
M testdata/workloads/functional-planner/queries/PlannerTest/bloom-filter-assignment.test
M testdata/workloads/functional-planner/queries/PlannerTest/min-max-runtime-filters-hdfs-num-rows-est-enabled.test
M testdata/workloads/functional-planner/queries/PlannerTest/min-max-runtime-filters.test
M testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-query-options.test
M testdata/workloads/functional-query/queries/QueryTest/overlap_min_max_filters.test
A testdata/workloads/functional-query/queries/QueryTest/overlap_min_max_filters_on_partition_columns.test
M testdata/workloads/functional-query/queries/QueryTest/runtime_filters.test
M testdata/workloads/functional-query/queries/QueryTest/runtime_filters_mt_dop.test
M tests/query_test/test_runtime_filters.py
23 files changed, 323 insertions(+), 171 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/68/17568/14
-- 
To view, visit http://gerrit.cloudera.org:8080/17568
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I89e135ef48b4bb36d70075287b03d1c12496b042
Gerrit-Change-Number: 17568
Gerrit-PatchSet: 14
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>

[Impala-ASF-CR] IMPALA-10738: Min/max filters should be enabled for partition columns

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17568 )

Change subject: IMPALA-10738: Min/max filters should be enabled for partition columns
......................................................................


Patch Set 10:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/8978/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/17568
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I89e135ef48b4bb36d70075287b03d1c12496b042
Gerrit-Change-Number: 17568
Gerrit-PatchSet: 10
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Comment-Date: Wed, 23 Jun 2021 02:26:29 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10738: Min/max filters should be enabled for partition columns

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17568 )

Change subject: IMPALA-10738: Min/max filters should be enabled for partition columns
......................................................................


Patch Set 2:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/8881/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/17568
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I89e135ef48b4bb36d70075287b03d1c12496b042
Gerrit-Change-Number: 17568
Gerrit-PatchSet: 2
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Thu, 10 Jun 2021 15:13:50 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10738: Min/max filters should be enabled for partition columns

Posted by "Zoltan Borok-Nagy (Code Review)" <ge...@cloudera.org>.
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/17568 )

Change subject: IMPALA-10738: Min/max filters should be enabled for partition columns
......................................................................


Patch Set 15: Code-Review+2

Thanks, Qifan! LGTM!


-- 
To view, visit http://gerrit.cloudera.org:8080/17568
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I89e135ef48b4bb36d70075287b03d1c12496b042
Gerrit-Change-Number: 17568
Gerrit-PatchSet: 15
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Wed, 07 Jul 2021 16:56:49 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10738: Min/max filters should be enabled for partition columns

Posted by "Qifan Chen (Code Review)" <ge...@cloudera.org>.
Qifan Chen has uploaded a new patch set (#2). ( http://gerrit.cloudera.org:8080/17568 )

Change subject: IMPALA-10738: Min/max filters should be enabled for partition columns
......................................................................

IMPALA-10738: Min/max filters should be enabled for partition columns

This patch enables min/max filters for partitoned columns to take
advantage of the min/max filter infrastructure already built. To turn
on the feature, set query option minmax_filter_threshold to a value
greater than 0.

In the patch, the existing query option enabled_runtime_filter_types
is made to play a role in the types of the filters generated. The
default value ALL generates both the bloom and min/max filters. The
alternative value BLOOM generates only the bloom filters and another
alternative value MIN_MAX generates only the min/max filters.

Testing:
  1). Added a new test in overlap_min_max_filters.test to verify
      that a min/max filter is generated for a partition column
      and is able to filter out all partitions.
  2). Core tests [TBD]

Change-Id: I89e135ef48b4bb36d70075287b03d1c12496b042
---
M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java
M testdata/workloads/functional-query/queries/QueryTest/overlap_min_max_filters.test
2 files changed, 34 insertions(+), 7 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/68/17568/2
-- 
To view, visit http://gerrit.cloudera.org:8080/17568
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I89e135ef48b4bb36d70075287b03d1c12496b042
Gerrit-Change-Number: 17568
Gerrit-PatchSet: 2
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>

[Impala-ASF-CR] IMPALA-10738: Min/max filters should be enabled for partition columns

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17568 )

Change subject: IMPALA-10738: Min/max filters should be enabled for partition columns
......................................................................


Patch Set 4:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/8926/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/17568
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I89e135ef48b4bb36d70075287b03d1c12496b042
Gerrit-Change-Number: 17568
Gerrit-PatchSet: 4
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Comment-Date: Tue, 15 Jun 2021 20:09:15 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10738: Min/max filters should be enabled for partition columns

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17568 )

Change subject: IMPALA-10738: Min/max filters should be enabled for partition columns
......................................................................


Patch Set 18: Verified+1


-- 
To view, visit http://gerrit.cloudera.org:8080/17568
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I89e135ef48b4bb36d70075287b03d1c12496b042
Gerrit-Change-Number: 17568
Gerrit-PatchSet: 18
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Fri, 09 Jul 2021 13:40:39 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10738: Min/max filters should be enabled for partition columns

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17568 )

Change subject: IMPALA-10738: Min/max filters should be enabled for partition columns
......................................................................


Patch Set 13:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/9000/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/17568
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I89e135ef48b4bb36d70075287b03d1c12496b042
Gerrit-Change-Number: 17568
Gerrit-PatchSet: 13
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Comment-Date: Thu, 24 Jun 2021 14:44:44 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10738: Min/max filters should be enabled for partition columns

Posted by "Qifan Chen (Code Review)" <ge...@cloudera.org>.
Qifan Chen has uploaded a new patch set (#4). ( http://gerrit.cloudera.org:8080/17568 )

Change subject: IMPALA-10738: Min/max filters should be enabled for partition columns
......................................................................

IMPALA-10738: Min/max filters should be enabled for partition columns

This patch enables min/max filters for partitoned columns to take
advantage of the min/max filter infrastructure already built by default.
To turn off the feature, set the new query option
minmax_filter_partition_column to false.

In the patch, the existing query option enabled_runtime_filter_types
is made to play a role in the types of the filters generated. The
default value ALL generates both the bloom and min/max filters. The
alternative value BLOOM generates only the bloom filters and another
alternative value MIN_MAX generates only the min/max filters.

The normal control knobs minmax_filter_threshold (for threshold) and
minmax_filtering_level (for filtering level) still work. When the
threshold is 0, the patch automatically assigns a reasonable value
for the threshhold.

Testing:
  1). Added a new test in overlap_min_max_filters.test to verify
      that a min/max filter is generated for a partition column
      and is able to filter out all partitions.
  2). Core tests [TBD]

Change-Id: I89e135ef48b4bb36d70075287b03d1c12496b042
---
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/ImpalaService.thrift
M common/thrift/Query.thrift
M fe/src/main/java/org/apache/impala/planner/Planner.java
M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java
M fe/src/test/java/org/apache/impala/planner/TpcdsPlannerTest.java
M testdata/workloads/functional-planner/queries/PlannerTest/min-max-runtime-filters-hdfs-num-rows-est-enabled.test
M testdata/workloads/functional-planner/queries/PlannerTest/min-max-runtime-filters.test
M testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-query-options.test
M testdata/workloads/functional-query/queries/QueryTest/overlap_min_max_filters.test
11 files changed, 112 insertions(+), 63 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/68/17568/4
-- 
To view, visit http://gerrit.cloudera.org:8080/17568
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I89e135ef48b4bb36d70075287b03d1c12496b042
Gerrit-Change-Number: 17568
Gerrit-PatchSet: 4
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>

[Impala-ASF-CR] IMPALA-10738: Min/max filters should be enabled for partition columns

Posted by "Zoltan Borok-Nagy (Code Review)" <ge...@cloudera.org>.
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/17568 )

Change subject: IMPALA-10738: Min/max filters should be enabled for partition columns
......................................................................


Patch Set 14:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/17568/14/be/src/service/query-options.cc
File be/src/service/query-options.cc:

http://gerrit.cloudera.org:8080/#/c/17568/14/be/src/service/query-options.cc@1093
PS14, Line 1093:  
> Done
Seems like you haven't uploaded the new PS.


http://gerrit.cloudera.org:8080/#/c/17568/14/testdata/workloads/functional-planner/queries/PlannerTest/bloom-filter-assignment.test
File testdata/workloads/functional-planner/queries/PlannerTest/bloom-filter-assignment.test:

http://gerrit.cloudera.org:8080/#/c/17568/14/testdata/workloads/functional-planner/queries/PlannerTest/bloom-filter-assignment.test@9
PS14, Line 9: F00:PLAN FRAGMENT [UNPARTITIONED] hosts=1 instances=1
            : |  Per-Host Resources: mem-estimate=34.94MB mem-reservation=5.00MB thread-reservation=3 runtime-filters-memory=1.00MB
            : PLAN-ROOT SINK
            : |  output exprs: count(*)
            : |  mem-estimate=4.00MB mem-reservation=4.00MB spill-buffer=2.00MB thread-reservation=0
            : |
            : 03:AGGREGATE [FINALIZE]
            : |  output: count(*)
            : |  mem-estimate=16.00KB mem-reservation=0B spill-buffer=2.00MB thread-reservation=0
            : |  tuple-ids=2 row-size=8B cardinality=1
            : |  in pipelines: 03(GETNEXT), 00(OPEN)
            : |
            : 02:HASH JOIN [INNER JOIN]
            : |  hash predicates: a.id = b.id
            : |  fk/pk conjuncts: assumed fk/pk
            : |  runtime filters: RF000[bloom] <- b.id
            : |  mem-estimate=1.94MB mem-reservation=1.94MB spill-buffer=64.00KB thread-reservation=0
            : |  tuple-ids=0,1 row-size=8B cardinality=12.82K
            : |  in pipelines: 00(GETNEXT), 01(OPEN)
            : |
            : |--01:SCAN HDFS [functional_parquet.alltypes b]
            : |     HDFS partitions=24/24 files=24 size=201.59KB
            : |     stored statistics:
            : |       table: rows=unavailable size=unavailable
            : |       partitions: 0/24 rows=12.82K
            : |       columns: unavailable
            : |     extrapolated-rows=disabled max-scan-range-rows=unavailable
            : |     mem-estimate=16.00MB mem-reservation=16.00KB thread-reservation=1
            : |     tuple-ids=1 row-size=4B cardinality=12.82K
            : |     in pipelines: 01(GETNEXT)
            : |
            : 00:SCAN HDFS [functional_parquet.alltypes a]
            :    HDFS partitions=24/24 files=24 size=201.59KB
            :    runtime filters: RF000[bloom] -> a.id
            :    stored statistics:
            :      table: rows=unavailable size=unavailable
            :      partitions: 0/24 rows=12.82K
            :      columns: unavailable
            :    extrapolated-rows=disabled max-scan-range-rows=unavailable
            :    mem-estimate=16.00MB mem-reservation=16.00KB thread-reservation=1
            :    tuple-ids=0 row-size=4B cardinality=12.82K
            :    in pipelines: 00(GETNEXT)
Why did the numbers change here while there is no new filter in the plan?


http://gerrit.cloudera.org:8080/#/c/17568/14/testdata/workloads/functional-planner/queries/PlannerTest/min-max-runtime-filters-hdfs-num-rows-est-enabled.test
File testdata/workloads/functional-planner/queries/PlannerTest/min-max-runtime-filters-hdfs-num-rows-est-enabled.test:

http://gerrit.cloudera.org:8080/#/c/17568/14/testdata/workloads/functional-planner/queries/PlannerTest/min-max-runtime-filters-hdfs-num-rows-est-enabled.test@a25
PS14, Line 25: 
Why the bloom filters don't get generated in the new plans?



-- 
To view, visit http://gerrit.cloudera.org:8080/17568
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I89e135ef48b4bb36d70075287b03d1c12496b042
Gerrit-Change-Number: 17568
Gerrit-PatchSet: 14
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Tue, 06 Jul 2021 10:11:01 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-10738: Min/max filters should be enabled for partition columns

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17568 )

Change subject: IMPALA-10738: Min/max filters should be enabled for partition columns
......................................................................


Patch Set 16: Code-Review+2


-- 
To view, visit http://gerrit.cloudera.org:8080/17568
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I89e135ef48b4bb36d70075287b03d1c12496b042
Gerrit-Change-Number: 17568
Gerrit-PatchSet: 16
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Wed, 07 Jul 2021 16:57:33 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10738: Min/max filters should be enabled for partition columns

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17568 )

Change subject: IMPALA-10738: Min/max filters should be enabled for partition columns
......................................................................


Patch Set 16: Verified-1

Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/7278/


-- 
To view, visit http://gerrit.cloudera.org:8080/17568
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I89e135ef48b4bb36d70075287b03d1c12496b042
Gerrit-Change-Number: 17568
Gerrit-PatchSet: 16
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Wed, 07 Jul 2021 23:02:41 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10738: Min/max filters should be enabled for partition columns

Posted by "Qifan Chen (Code Review)" <ge...@cloudera.org>.
Qifan Chen has uploaded a new patch set (#7). ( http://gerrit.cloudera.org:8080/17568 )

Change subject: IMPALA-10738: Min/max filters should be enabled for partition columns
......................................................................

IMPALA-10738: Min/max filters should be enabled for partition columns

This patch enables min/max filters for partitoned columns to take
advantage of the min/max filter infrastructure already built by default.
To turn off the feature, set the new query option
minmax_filter_partition_column to false.

In the patch, the existing query option enabled_runtime_filter_types
is utilized to play a role in the types of the filters generated. The
default value ALL generates both the bloom and min/max filters. The
alternative value BLOOM generates only the bloom filters and another
alternative value MIN_MAX generates only the min/max filters.

The normal control knobs minmax_filter_threshold (for threshold) and
minmax_filtering_level (for filtering level) still work. When the
threshold is 0, the patch automatically assigns a reasonable value
for the threshhold.

Testing:
  1). Added new tests in
      overlap_min_max_filters_on_partition_columns.test;
  2). Core tests [TBD]

Change-Id: I89e135ef48b4bb36d70075287b03d1c12496b042
---
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/ImpalaService.thrift
M common/thrift/Query.thrift
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M fe/src/main/java/org/apache/impala/planner/Planner.java
M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
M fe/src/test/java/org/apache/impala/planner/TpcdsPlannerTest.java
M testdata/workloads/functional-planner/queries/PlannerTest/bloom-filter-assignment.test
M testdata/workloads/functional-planner/queries/PlannerTest/min-max-runtime-filters-hdfs-num-rows-est-enabled.test
M testdata/workloads/functional-planner/queries/PlannerTest/min-max-runtime-filters.test
M testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-query-options.test
A testdata/workloads/functional-query/queries/QueryTest/overlap_min_max_filters_on_partition_columns.test
M tests/query_test/test_runtime_filters.py
15 files changed, 301 insertions(+), 197 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/68/17568/7
-- 
To view, visit http://gerrit.cloudera.org:8080/17568
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I89e135ef48b4bb36d70075287b03d1c12496b042
Gerrit-Change-Number: 17568
Gerrit-PatchSet: 7
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>

[Impala-ASF-CR] IMPALA-10738: Min/max filters should be enabled for partition columns

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17568 )

Change subject: IMPALA-10738: Min/max filters should be enabled for partition columns
......................................................................


Patch Set 1:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/8870/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/17568
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I89e135ef48b4bb36d70075287b03d1c12496b042
Gerrit-Change-Number: 17568
Gerrit-PatchSet: 1
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Wed, 09 Jun 2021 18:26:03 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10738: Min/max filters should be enabled for partition columns

Posted by "Qifan Chen (Code Review)" <ge...@cloudera.org>.
Qifan Chen has uploaded a new patch set (#17). ( http://gerrit.cloudera.org:8080/17568 )

Change subject: IMPALA-10738: Min/max filters should be enabled for partition columns
......................................................................

IMPALA-10738: Min/max filters should be enabled for partition columns

This patch enables min/max filters for partitoned columns to take
advantage of the min/max filter infrastructure already built and to
provide coverage for certain equi-joins in which the stats filters
are not feasible.

The new feature is turned on by default and to turn off the feature,
set the new query option minmax_filter_partition_column to false.

In the patch, the existing query option enabled_runtime_filter_types
is enforced in specifying the types of the filters generated. The
default value ALL generates both the bloom and min/max filters. The
alternative value BLOOM generates only the bloom filters and another
alternative value MIN_MAX generates only the min/max filters.

The normal control knobs minmax_filter_threshold (for threshold) and
minmax_filtering_level (for filtering level) still work. When the
threshold is 0, the patch automatically assigns a reasonable value
for the threshhold.

Testing:
  1). Added new tests in
      overlap_min_max_filters_on_partition_columns.test;
  2). Core tests

Change-Id: I89e135ef48b4bb36d70075287b03d1c12496b042
---
M be/src/exec/hdfs-scan-node-base.cc
M be/src/exec/hdfs-scanner.h
M be/src/exec/parquet/hdfs-parquet-scanner.cc
M be/src/exec/parquet/hdfs-parquet-scanner.h
M be/src/runtime/runtime-filter.h
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/ImpalaService.thrift
M common/thrift/Query.thrift
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M fe/src/main/java/org/apache/impala/planner/Planner.java
M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
M fe/src/test/java/org/apache/impala/planner/TpcdsPlannerTest.java
M testdata/workloads/functional-planner/queries/PlannerTest/bloom-filter-assignment.test
M testdata/workloads/functional-planner/queries/PlannerTest/min-max-runtime-filters-hdfs-num-rows-est-enabled.test
M testdata/workloads/functional-planner/queries/PlannerTest/min-max-runtime-filters.test
M testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-query-options.test
M testdata/workloads/functional-query/queries/QueryTest/overlap_min_max_filters.test
A testdata/workloads/functional-query/queries/QueryTest/overlap_min_max_filters_on_partition_columns.test
M testdata/workloads/functional-query/queries/QueryTest/runtime_filters.test
M testdata/workloads/functional-query/queries/QueryTest/runtime_filters_mt_dop.test
M tests/query_test/test_runtime_filters.py
23 files changed, 251 insertions(+), 95 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/68/17568/17
-- 
To view, visit http://gerrit.cloudera.org:8080/17568
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I89e135ef48b4bb36d70075287b03d1c12496b042
Gerrit-Change-Number: 17568
Gerrit-PatchSet: 17
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>

[Impala-ASF-CR] IMPALA-10738: Min/max filters should be enabled for partition columns

Posted by "Qifan Chen (Code Review)" <ge...@cloudera.org>.
Qifan Chen has uploaded a new patch set (#12). ( http://gerrit.cloudera.org:8080/17568 )

Change subject: IMPALA-10738: Min/max filters should be enabled for partition columns
......................................................................

IMPALA-10738: Min/max filters should be enabled for partition columns

This patch enables min/max filters for partitoned columns to take
advantage of the min/max filter infrastructure already built by default.
To turn off the feature, set the new query option
minmax_filter_partition_column to false.

In the patch, the existing query option enabled_runtime_filter_types
is utilized to play a role in the types of the filters generated. The
default value ALL generates both the bloom and min/max filters. The
alternative value BLOOM generates only the bloom filters and another
alternative value MIN_MAX generates only the min/max filters.

The normal control knobs minmax_filter_threshold (for threshold) and
minmax_filtering_level (for filtering level) still work. When the
threshold is 0, the patch automatically assigns a reasonable value
for the threshhold.

Testing:
  1). Added new tests in
      overlap_min_max_filters_on_partition_columns.test;
  2). Core tests [TBD]

Change-Id: I89e135ef48b4bb36d70075287b03d1c12496b042
---
M be/src/exec/hdfs-scan-node-base.cc
M be/src/exec/hdfs-scanner.h
M be/src/exec/parquet/hdfs-parquet-scanner.cc
M be/src/exec/parquet/hdfs-parquet-scanner.h
M be/src/runtime/runtime-filter.h
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/ImpalaService.thrift
M common/thrift/Query.thrift
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M fe/src/main/java/org/apache/impala/planner/Planner.java
M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
M fe/src/test/java/org/apache/impala/planner/TpcdsPlannerTest.java
M testdata/workloads/functional-planner/queries/PlannerTest/bloom-filter-assignment.test
M testdata/workloads/functional-planner/queries/PlannerTest/min-max-runtime-filters-hdfs-num-rows-est-enabled.test
M testdata/workloads/functional-planner/queries/PlannerTest/min-max-runtime-filters.test
M testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-query-options.test
M testdata/workloads/functional-query/queries/QueryTest/overlap_min_max_filters.test
A testdata/workloads/functional-query/queries/QueryTest/overlap_min_max_filters_on_partition_columns.test
M testdata/workloads/functional-query/queries/QueryTest/runtime_filters.test
M testdata/workloads/functional-query/queries/QueryTest/runtime_filters_mt_dop.test
M tests/query_test/test_runtime_filters.py
23 files changed, 322 insertions(+), 172 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/68/17568/12
-- 
To view, visit http://gerrit.cloudera.org:8080/17568
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I89e135ef48b4bb36d70075287b03d1c12496b042
Gerrit-Change-Number: 17568
Gerrit-PatchSet: 12
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>

[Impala-ASF-CR] IMPALA-10738: Min/max filters should be enabled for partition columns

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17568 )

Change subject: IMPALA-10738: Min/max filters should be enabled for partition columns
......................................................................


Patch Set 7:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/8944/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/17568
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I89e135ef48b4bb36d70075287b03d1c12496b042
Gerrit-Change-Number: 17568
Gerrit-PatchSet: 7
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Comment-Date: Wed, 16 Jun 2021 20:40:13 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10738: Min/max filters should be enabled for partition columns

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17568 )

Change subject: IMPALA-10738: Min/max filters should be enabled for partition columns
......................................................................


Patch Set 11:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/8979/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/17568
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I89e135ef48b4bb36d70075287b03d1c12496b042
Gerrit-Change-Number: 17568
Gerrit-PatchSet: 11
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Comment-Date: Wed, 23 Jun 2021 04:36:47 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10738: Min/max filters should be enabled for partition columns

Posted by "Qifan Chen (Code Review)" <ge...@cloudera.org>.
Qifan Chen has uploaded a new patch set (#8). ( http://gerrit.cloudera.org:8080/17568 )

Change subject: IMPALA-10738: Min/max filters should be enabled for partition columns
......................................................................

IMPALA-10738: Min/max filters should be enabled for partition columns

This patch enables min/max filters for partitoned columns to take
advantage of the min/max filter infrastructure already built by default.
To turn off the feature, set the new query option
minmax_filter_partition_column to false.

In the patch, the existing query option enabled_runtime_filter_types
is utilized to play a role in the types of the filters generated. The
default value ALL generates both the bloom and min/max filters. The
alternative value BLOOM generates only the bloom filters and another
alternative value MIN_MAX generates only the min/max filters.

The normal control knobs minmax_filter_threshold (for threshold) and
minmax_filtering_level (for filtering level) still work. When the
threshold is 0, the patch automatically assigns a reasonable value
for the threshhold.

Testing:
  1). Added new tests in
      overlap_min_max_filters_on_partition_columns.test;
  2). Core tests [TBD]

Change-Id: I89e135ef48b4bb36d70075287b03d1c12496b042
---
M be/src/exec/hdfs-scan-node-base.cc
M be/src/exec/hdfs-scanner.h
M be/src/exec/parquet/hdfs-parquet-scanner.cc
M be/src/exec/parquet/hdfs-parquet-scanner.h
M be/src/runtime/runtime-filter.h
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/ImpalaService.thrift
M common/thrift/Query.thrift
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M fe/src/main/java/org/apache/impala/planner/Planner.java
M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
M fe/src/test/java/org/apache/impala/planner/TpcdsPlannerTest.java
M testdata/workloads/functional-planner/queries/PlannerTest/bloom-filter-assignment.test
M testdata/workloads/functional-planner/queries/PlannerTest/min-max-runtime-filters-hdfs-num-rows-est-enabled.test
M testdata/workloads/functional-planner/queries/PlannerTest/min-max-runtime-filters.test
M testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-query-options.test
A testdata/workloads/functional-query/queries/QueryTest/overlap_min_max_filters_on_partition_columns.test
M tests/query_test/test_runtime_filters.py
20 files changed, 326 insertions(+), 199 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/68/17568/8
-- 
To view, visit http://gerrit.cloudera.org:8080/17568
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I89e135ef48b4bb36d70075287b03d1c12496b042
Gerrit-Change-Number: 17568
Gerrit-PatchSet: 8
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>

[Impala-ASF-CR] IMPALA-10738: Min/max filters should be enabled for partition columns

Posted by "Qifan Chen (Code Review)" <ge...@cloudera.org>.
Qifan Chen has uploaded a new patch set (#11). ( http://gerrit.cloudera.org:8080/17568 )

Change subject: IMPALA-10738: Min/max filters should be enabled for partition columns
......................................................................

IMPALA-10738: Min/max filters should be enabled for partition columns

This patch enables min/max filters for partitoned columns to take
advantage of the min/max filter infrastructure already built by default.
To turn off the feature, set the new query option
minmax_filter_partition_column to false.

In the patch, the existing query option enabled_runtime_filter_types
is utilized to play a role in the types of the filters generated. The
default value ALL generates both the bloom and min/max filters. The
alternative value BLOOM generates only the bloom filters and another
alternative value MIN_MAX generates only the min/max filters.

The normal control knobs minmax_filter_threshold (for threshold) and
minmax_filtering_level (for filtering level) still work. When the
threshold is 0, the patch automatically assigns a reasonable value
for the threshhold.

Testing:
  1). Added new tests in
      overlap_min_max_filters_on_partition_columns.test;
  2). Core tests [TBD]

Change-Id: I89e135ef48b4bb36d70075287b03d1c12496b042
---
M be/src/exec/hdfs-scan-node-base.cc
M be/src/exec/hdfs-scanner.h
M be/src/exec/parquet/hdfs-parquet-scanner.cc
M be/src/exec/parquet/hdfs-parquet-scanner.h
M be/src/runtime/runtime-filter.h
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/ImpalaService.thrift
M common/thrift/Query.thrift
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M fe/src/main/java/org/apache/impala/planner/Planner.java
M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
M fe/src/test/java/org/apache/impala/planner/TpcdsPlannerTest.java
M testdata/workloads/functional-planner/queries/PlannerTest/bloom-filter-assignment.test
M testdata/workloads/functional-planner/queries/PlannerTest/min-max-runtime-filters-hdfs-num-rows-est-enabled.test
M testdata/workloads/functional-planner/queries/PlannerTest/min-max-runtime-filters.test
M testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-query-options.test
M testdata/workloads/functional-query/queries/QueryTest/overlap_min_max_filters.test
A testdata/workloads/functional-query/queries/QueryTest/overlap_min_max_filters_on_partition_columns.test
M tests/query_test/test_runtime_filters.py
21 files changed, 371 insertions(+), 225 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/68/17568/11
-- 
To view, visit http://gerrit.cloudera.org:8080/17568
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I89e135ef48b4bb36d70075287b03d1c12496b042
Gerrit-Change-Number: 17568
Gerrit-PatchSet: 11
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>

[Impala-ASF-CR] IMPALA-10738: Min/max filters should be enabled for partition columns

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17568 )

Change subject: IMPALA-10738: Min/max filters should be enabled for partition columns
......................................................................


Patch Set 18:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/7282/ DRY_RUN=false


-- 
To view, visit http://gerrit.cloudera.org:8080/17568
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I89e135ef48b4bb36d70075287b03d1c12496b042
Gerrit-Change-Number: 17568
Gerrit-PatchSet: 18
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Fri, 09 Jul 2021 07:42:15 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10738: Min/max filters should be enabled for partition columns

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17568 )

Change subject: IMPALA-10738: Min/max filters should be enabled for partition columns
......................................................................


Patch Set 8:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/8959/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/17568
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I89e135ef48b4bb36d70075287b03d1c12496b042
Gerrit-Change-Number: 17568
Gerrit-PatchSet: 8
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Comment-Date: Mon, 21 Jun 2021 21:10:22 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10738: Min/max filters should be enabled for partition columns

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/17568 )

Change subject: IMPALA-10738: Min/max filters should be enabled for partition columns
......................................................................

IMPALA-10738: Min/max filters should be enabled for partition columns

This patch enables min/max filters for partitoned columns to take
advantage of the min/max filter infrastructure already built and to
provide coverage for certain equi-joins in which the stats filters
are not feasible.

The new feature is turned on by default and to turn off the feature,
set the new query option minmax_filter_partition_column to false.

In the patch, the existing query option enabled_runtime_filter_types
is enforced in specifying the types of the filters generated. The
default value ALL generates both the bloom and min/max filters. The
alternative value BLOOM generates only the bloom filters and another
alternative value MIN_MAX generates only the min/max filters.

The normal control knobs minmax_filter_threshold (for threshold) and
minmax_filtering_level (for filtering level) still work. When the
threshold is 0, the patch automatically assigns a reasonable value
for the threshhold.

Testing:
  1). Added new tests in
      overlap_min_max_filters_on_partition_columns.test;
  2). Core tests

Change-Id: I89e135ef48b4bb36d70075287b03d1c12496b042
Reviewed-on: http://gerrit.cloudera.org:8080/17568
Reviewed-by: Impala Public Jenkins <im...@cloudera.com>
Tested-by: Impala Public Jenkins <im...@cloudera.com>
---
M be/src/exec/hdfs-scan-node-base.cc
M be/src/exec/hdfs-scanner.h
M be/src/exec/parquet/hdfs-parquet-scanner.cc
M be/src/exec/parquet/hdfs-parquet-scanner.h
M be/src/runtime/runtime-filter.h
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/ImpalaService.thrift
M common/thrift/Query.thrift
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M fe/src/main/java/org/apache/impala/planner/Planner.java
M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
M fe/src/test/java/org/apache/impala/planner/TpcdsPlannerTest.java
M testdata/workloads/functional-planner/queries/PlannerTest/bloom-filter-assignment.test
M testdata/workloads/functional-planner/queries/PlannerTest/min-max-runtime-filters-hdfs-num-rows-est-enabled.test
M testdata/workloads/functional-planner/queries/PlannerTest/min-max-runtime-filters.test
M testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-query-options.test
M testdata/workloads/functional-query/queries/QueryTest/overlap_min_max_filters.test
A testdata/workloads/functional-query/queries/QueryTest/overlap_min_max_filters_on_partition_columns.test
M testdata/workloads/functional-query/queries/QueryTest/runtime_filters.test
M testdata/workloads/functional-query/queries/QueryTest/runtime_filters_mt_dop.test
M tests/query_test/test_runtime_filters.py
23 files changed, 251 insertions(+), 95 deletions(-)

Approvals:
  Impala Public Jenkins: Looks good to me, approved; Verified

-- 
To view, visit http://gerrit.cloudera.org:8080/17568
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I89e135ef48b4bb36d70075287b03d1c12496b042
Gerrit-Change-Number: 17568
Gerrit-PatchSet: 19
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>

[Impala-ASF-CR] IMPALA-10738: Min/max filters should be enabled for partition columns

Posted by "Zoltan Borok-Nagy (Code Review)" <ge...@cloudera.org>.
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/17568 )

Change subject: IMPALA-10738: Min/max filters should be enabled for partition columns
......................................................................


Patch Set 14:

(1 comment)

I took an initial look. The change looks good to me.

Is it possible to make it work for Iceberg partitions?
You can see an example of how to check if a column is used in the Iceberg partition spec:
https://github.com/apache/impala/blob/fcaea30b151d89f412816a8e49d5feeef6964a0f/fe/src/main/java/org/apache/impala/analysis/InsertStmt.java#L966-L969

If that gets too complicated we can create a separate Jira for it.

http://gerrit.cloudera.org:8080/#/c/17568/14/be/src/service/query-options.cc
File be/src/service/query-options.cc:

http://gerrit.cloudera.org:8080/#/c/17568/14/be/src/service/query-options.cc@1093
PS14, Line 1093:  
nit: space



-- 
To view, visit http://gerrit.cloudera.org:8080/17568
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I89e135ef48b4bb36d70075287b03d1c12496b042
Gerrit-Change-Number: 17568
Gerrit-PatchSet: 14
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Mon, 05 Jul 2021 16:29:04 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-10738: Min/max filters should be enabled for partition columns

Posted by "Qifan Chen (Code Review)" <ge...@cloudera.org>.
Qifan Chen has uploaded a new patch set (#15). ( http://gerrit.cloudera.org:8080/17568 )

Change subject: IMPALA-10738: Min/max filters should be enabled for partition columns
......................................................................

IMPALA-10738: Min/max filters should be enabled for partition columns

This patch enables min/max filters for partitoned columns to take
advantage of the min/max filter infrastructure already built and to
provide coverage for certain equi-joins in which the stats filters
are not feasible.

The new feature is turned on by default and to turn off the feature,
set the new query option minmax_filter_partition_column to false.

In the patch, the existing query option enabled_runtime_filter_types
is enforced in specifying the types of the filters generated. The
default value ALL generates both the bloom and min/max filters. The
alternative value BLOOM generates only the bloom filters and another
alternative value MIN_MAX generates only the min/max filters.

The normal control knobs minmax_filter_threshold (for threshold) and
minmax_filtering_level (for filtering level) still work. When the
threshold is 0, the patch automatically assigns a reasonable value
for the threshhold.

Testing:
  1). Added new tests in
      overlap_min_max_filters_on_partition_columns.test;
  2). Core tests

Change-Id: I89e135ef48b4bb36d70075287b03d1c12496b042
---
M be/src/exec/hdfs-scan-node-base.cc
M be/src/exec/hdfs-scanner.h
M be/src/exec/parquet/hdfs-parquet-scanner.cc
M be/src/exec/parquet/hdfs-parquet-scanner.h
M be/src/runtime/runtime-filter.h
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/ImpalaService.thrift
M common/thrift/Query.thrift
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M fe/src/main/java/org/apache/impala/planner/Planner.java
M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
M fe/src/test/java/org/apache/impala/planner/TpcdsPlannerTest.java
M testdata/workloads/functional-planner/queries/PlannerTest/bloom-filter-assignment.test
M testdata/workloads/functional-planner/queries/PlannerTest/min-max-runtime-filters-hdfs-num-rows-est-enabled.test
M testdata/workloads/functional-planner/queries/PlannerTest/min-max-runtime-filters.test
M testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-query-options.test
M testdata/workloads/functional-query/queries/QueryTest/overlap_min_max_filters.test
A testdata/workloads/functional-query/queries/QueryTest/overlap_min_max_filters_on_partition_columns.test
M testdata/workloads/functional-query/queries/QueryTest/runtime_filters.test
M testdata/workloads/functional-query/queries/QueryTest/runtime_filters_mt_dop.test
M tests/query_test/test_runtime_filters.py
23 files changed, 247 insertions(+), 95 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/68/17568/15
-- 
To view, visit http://gerrit.cloudera.org:8080/17568
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I89e135ef48b4bb36d70075287b03d1c12496b042
Gerrit-Change-Number: 17568
Gerrit-PatchSet: 15
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>

[Impala-ASF-CR] IMPALA-10738: Min/max filters should be enabled for partition columns

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17568 )

Change subject: IMPALA-10738: Min/max filters should be enabled for partition columns
......................................................................


Patch Set 15:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/9037/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/17568
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I89e135ef48b4bb36d70075287b03d1c12496b042
Gerrit-Change-Number: 17568
Gerrit-PatchSet: 15
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Tue, 06 Jul 2021 17:14:11 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10738: Min/max filters should be enabled for partition columns

Posted by "Qifan Chen (Code Review)" <ge...@cloudera.org>.
Qifan Chen has uploaded a new patch set (#10). ( http://gerrit.cloudera.org:8080/17568 )

Change subject: IMPALA-10738: Min/max filters should be enabled for partition columns
......................................................................

IMPALA-10738: Min/max filters should be enabled for partition columns

This patch enables min/max filters for partitoned columns to take
advantage of the min/max filter infrastructure already built by default.
To turn off the feature, set the new query option
minmax_filter_partition_column to false.

In the patch, the existing query option enabled_runtime_filter_types
is utilized to play a role in the types of the filters generated. The
default value ALL generates both the bloom and min/max filters. The
alternative value BLOOM generates only the bloom filters and another
alternative value MIN_MAX generates only the min/max filters.

The normal control knobs minmax_filter_threshold (for threshold) and
minmax_filtering_level (for filtering level) still work. When the
threshold is 0, the patch automatically assigns a reasonable value
for the threshhold.

Testing:
  1). Added new tests in
      overlap_min_max_filters_on_partition_columns.test;
  2). Core tests [TBD]

Change-Id: I89e135ef48b4bb36d70075287b03d1c12496b042
---
M be/src/exec/hdfs-scan-node-base.cc
M be/src/exec/hdfs-scanner.h
M be/src/exec/parquet/hdfs-parquet-scanner.cc
M be/src/exec/parquet/hdfs-parquet-scanner.h
M be/src/runtime/runtime-filter.h
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/ImpalaService.thrift
M common/thrift/Query.thrift
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M fe/src/main/java/org/apache/impala/planner/Planner.java
M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
M fe/src/test/java/org/apache/impala/planner/TpcdsPlannerTest.java
M testdata/workloads/functional-planner/queries/PlannerTest/bloom-filter-assignment.test
M testdata/workloads/functional-planner/queries/PlannerTest/min-max-runtime-filters-hdfs-num-rows-est-enabled.test
M testdata/workloads/functional-planner/queries/PlannerTest/min-max-runtime-filters.test
M testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-query-options.test
A testdata/workloads/functional-query/queries/QueryTest/overlap_min_max_filters_on_partition_columns.test
M tests/query_test/test_runtime_filters.py
20 files changed, 361 insertions(+), 218 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/68/17568/10
-- 
To view, visit http://gerrit.cloudera.org:8080/17568
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I89e135ef48b4bb36d70075287b03d1c12496b042
Gerrit-Change-Number: 17568
Gerrit-PatchSet: 10
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>

[Impala-ASF-CR] IMPALA-10738: Min/max filters should be enabled for partition columns

Posted by "Zoltan Borok-Nagy (Code Review)" <ge...@cloudera.org>.
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/17568 )

Change subject: IMPALA-10738: Min/max filters should be enabled for partition columns
......................................................................


Patch Set 17: Code-Review+2


-- 
To view, visit http://gerrit.cloudera.org:8080/17568
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I89e135ef48b4bb36d70075287b03d1c12496b042
Gerrit-Change-Number: 17568
Gerrit-PatchSet: 17
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Fri, 09 Jul 2021 07:41:28 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10738: Min/max filters should be enabled for partition columns

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17568 )

Change subject: IMPALA-10738: Min/max filters should be enabled for partition columns
......................................................................


Patch Set 17:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/9053/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/17568
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I89e135ef48b4bb36d70075287b03d1c12496b042
Gerrit-Change-Number: 17568
Gerrit-PatchSet: 17
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Thu, 08 Jul 2021 19:43:12 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10738: Min/max filters should be enabled for partition columns

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17568 )

Change subject: IMPALA-10738: Min/max filters should be enabled for partition columns
......................................................................


Patch Set 6:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/8943/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/17568
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I89e135ef48b4bb36d70075287b03d1c12496b042
Gerrit-Change-Number: 17568
Gerrit-PatchSet: 6
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Comment-Date: Wed, 16 Jun 2021 17:38:26 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10738: Min/max filters should be enabled for partition columns

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17568 )

Change subject: IMPALA-10738: Min/max filters should be enabled for partition columns
......................................................................


Patch Set 9:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/8968/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/17568
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I89e135ef48b4bb36d70075287b03d1c12496b042
Gerrit-Change-Number: 17568
Gerrit-PatchSet: 9
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Comment-Date: Tue, 22 Jun 2021 16:32:30 +0000
Gerrit-HasComments: No