You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@impala.apache.org by "Qifan Chen (Code Review)" <ge...@cloudera.org> on 2021/03/31 17:18:46 UTC

[Impala-ASF-CR] [WIP] IMPALA-10532: TestOverlapMinMaxFilters.test overlap min max filters seems flaky

Qifan Chen has uploaded this change for review. ( http://gerrit.cloudera.org:8080/17252


Change subject: [WIP] IMPALA-10532: TestOverlapMinMaxFilters.test_overlap_min_max_filters seems flaky
......................................................................

[WIP] IMPALA-10532: TestOverlapMinMaxFilters.test_overlap_min_max_filters seems flaky

Change-Id: I326317833979efcbe02ce6c95ad80133dd5c7964
---
M be/src/runtime/coordinator-filter-state.h
M be/src/runtime/coordinator.cc
2 files changed, 101 insertions(+), 10 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/52/17252/1
-- 
To view, visit http://gerrit.cloudera.org:8080/17252
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I326317833979efcbe02ce6c95ad80133dd5c7964
Gerrit-Change-Number: 17252
Gerrit-PatchSet: 1
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>

[Impala-ASF-CR] IMPALA-10647 Improve always-true min/max filter handling in coordinator

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17252 )

Change subject: IMPALA-10647 Improve always-true min/max filter handling in coordinator
......................................................................


Patch Set 20:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/8561/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/17252
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I326317833979efcbe02ce6c95ad80133dd5c7964
Gerrit-Change-Number: 17252
Gerrit-PatchSet: 20
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Fang-Yu Rao <fa...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Tue, 13 Apr 2021 13:25:52 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10647 Improve always-true min/max filter handling in coordinator

Posted by "Qifan Chen (Code Review)" <ge...@cloudera.org>.
Qifan Chen has uploaded a new patch set (#18). ( http://gerrit.cloudera.org:8080/17252 )

Change subject: IMPALA-10647 Improve always-true min/max filter handling in coordinator
......................................................................

IMPALA-10647 Improve always-true min/max filter handling in coordinator

The change improves how a coordinator behaves when a just
arriving min/max filter is always true. A new member
'always_true_filter_received_' is introduced to record such a
fact. Similarily, the new member always_false_flipped_ is
added to indicate that the always false flag is flipped from
'true' to 'false'. These two members only influence how the min
and max columns in "Filter routing table" and "Final filter
table" in profile are displayed as follows.

  1. 'PartialUpdates' - The min and the max are partially updated;
  2. 'AlwaysTrue'     - One received filter is AlwaysTrue;
  3. 'AlwaysFalse'    - No filter is received or all received
                        filters are empty;
  4. 'Real values'    - The final accumulated min/max from all
                        received filters.

A second change introduced is to record, in scan node, the
arrival time of min/max filters (as a timestamp since the system
is rebooted, obtained by calling MonotonicMillis()). A timestamp
of similar nature is recorded for hdfs parquet scanners when a
row group is processed. By comparing these two timestamps, one
can easily diagnose issues related to late arrival of min/max
filters.

Testing:
  1. Ran unit tests;
  2. Ran core tests.

Change-Id: I326317833979efcbe02ce6c95ad80133dd5c7964
---
M be/src/exec/parquet/hdfs-parquet-scanner.cc
M be/src/exec/partitioned-hash-join-builder.cc
M be/src/exec/scan-node.cc
M be/src/runtime/coordinator-filter-state.h
M be/src/runtime/coordinator.cc
M be/src/util/min-max-filter.cc
M be/src/util/min-max-filter.h
M testdata/workloads/functional-query/queries/QueryTest/overlap_min_max_filters.test
8 files changed, 223 insertions(+), 28 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/52/17252/18
-- 
To view, visit http://gerrit.cloudera.org:8080/17252
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I326317833979efcbe02ce6c95ad80133dd5c7964
Gerrit-Change-Number: 17252
Gerrit-PatchSet: 18
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Fang-Yu Rao <fa...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>

[Impala-ASF-CR] IMPALA-10647 Improve always-true min/max filter handling in coordinator

Posted by "Qifan Chen (Code Review)" <ge...@cloudera.org>.
Qifan Chen has posted comments on this change. ( http://gerrit.cloudera.org:8080/17252 )

Change subject: IMPALA-10647 Improve always-true min/max filter handling in coordinator
......................................................................


Patch Set 21:

retest


-- 
To view, visit http://gerrit.cloudera.org:8080/17252
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I326317833979efcbe02ce6c95ad80133dd5c7964
Gerrit-Change-Number: 17252
Gerrit-PatchSet: 21
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Fang-Yu Rao <fa...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Thu, 15 Apr 2021 02:33:03 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10647 Improve always-true min/max filter handling in coordinator

Posted by "Riza Suminto (Code Review)" <ge...@cloudera.org>.
Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/17252 )

Change subject: IMPALA-10647 Improve always-true min/max filter handling in coordinator
......................................................................


Patch Set 15:

(2 comments)

Thanks Qifan for adjusting the changes.
I have 2 comments for patch set 15.

http://gerrit.cloudera.org:8080/#/c/17252/15//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/17252/15//COMMIT_MSG@29
PS15, Line 29: Testing:
             :   1. Ran unit tests;
             :   2. Ran core tests.
Can we add a test to verify the content of MIN/MAX column in filter routing table?


http://gerrit.cloudera.org:8080/#/c/17252/15/be/src/runtime/coordinator.cc
File be/src/runtime/coordinator.cc:

http://gerrit.cloudera.org:8080/#/c/17252/15/be/src/runtime/coordinator.cc@659
PS15, Line 659: minmax_filterPB.has_always_true() && minmax_filterPB.always_true()
After a filter is disabled and broadcasted, isn't this always evaluate to True?
Maybe we should check state.AlwaysTrueFilterReceived() instead?



-- 
To view, visit http://gerrit.cloudera.org:8080/17252
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I326317833979efcbe02ce6c95ad80133dd5c7964
Gerrit-Change-Number: 17252
Gerrit-PatchSet: 15
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Fang-Yu Rao <fa...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Sat, 10 Apr 2021 01:19:33 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-10647 Improve always-true min/max filter handling in coordinator

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/17252 )

Change subject: IMPALA-10647 Improve always-true min/max filter handling in coordinator
......................................................................

IMPALA-10647 Improve always-true min/max filter handling in coordinator

The change improves how a coordinator behaves when a just
arriving min/max filter is always true. A new member
'always_true_filter_received_' is introduced to record such a
fact. Similarily, the new member always_false_flipped_to_false_
is added to indicate that the always false flag is flipped from
'true' to 'false'. These two members only influence how the min
and max columns in "Filter routing table" and "Final filter
table" in profile are displayed as follows.

  1. 'PartialUpdates' - The min and the max are partially updated;
  2. 'AlwaysTrue'     - One received filter is AlwaysTrue;
  3. 'AlwaysFalse'    - No filter is received or all received
                        filters are empty;
  4. 'Real values'    - The final accumulated min/max from all
                        received filters.

A second change introduced is to record, in scan node, the
arrival time of min/max filters (as a timestamp since the system
is rebooted, obtained by calling MonotonicMillis()). A timestamp
of similar nature is recorded for hdfs parquet scanners when a
row group is processed. By comparing these two timestamps, one
can easily diagnose issues related to late arrival of min/max
filters.

This change also addresses a flaw with rows unexpectedly
filtered out, due to the reason that the always_true_ flag in
a min/max filter, when set, is ignored in the eval code path
in RuntimeFilter::Eval().

Testing:
  1. Added three new tests in overlap_min_max_filters.test to
     verify that the min/max are displayed correctly when the
     min/max filter in hash join builder is set to always true,
     always false, or a pair of meaningful min and max values.
  2. Ran unit tests;
  3. Ran runtime-filter-test;
  4. Ran core tests successfully.

Change-Id: I326317833979efcbe02ce6c95ad80133dd5c7964
Reviewed-on: http://gerrit.cloudera.org:8080/17252
Reviewed-by: Joe McDonnell <jo...@cloudera.com>
Tested-by: Impala Public Jenkins <im...@cloudera.com>
---
M be/src/exec/parquet/hdfs-parquet-scanner.cc
M be/src/exec/partitioned-hash-join-builder.cc
M be/src/exec/scan-node.cc
M be/src/runtime/coordinator-filter-state.h
M be/src/runtime/coordinator.cc
M be/src/runtime/runtime-filter-ir.cc
M be/src/util/min-max-filter.cc
M be/src/util/min-max-filter.h
M testdata/workloads/functional-query/queries/QueryTest/overlap_min_max_filters.test
9 files changed, 224 insertions(+), 30 deletions(-)

Approvals:
  Joe McDonnell: Looks good to me, approved
  Impala Public Jenkins: Verified

-- 
To view, visit http://gerrit.cloudera.org:8080/17252
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I326317833979efcbe02ce6c95ad80133dd5c7964
Gerrit-Change-Number: 17252
Gerrit-PatchSet: 24
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Fang-Yu Rao <fa...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>

[Impala-ASF-CR] IMPALA-10647 Improve always-true min/max filter handling in coordinator

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17252 )

Change subject: IMPALA-10647 Improve always-true min/max filter handling in coordinator
......................................................................


Patch Set 11:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/17252/11/tests/query_test/test_runtime_filters.py
File tests/query_test/test_runtime_filters.py:

http://gerrit.cloudera.org:8080/#/c/17252/11/tests/query_test/test_runtime_filters.py@30
PS11, Line 30: from tests.common.skip import SkipIfNotHdfsMinicluster
flake8: F401 'tests.common.skip.SkipIfNotHdfsMinicluster' imported but unused



-- 
To view, visit http://gerrit.cloudera.org:8080/17252
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I326317833979efcbe02ce6c95ad80133dd5c7964
Gerrit-Change-Number: 17252
Gerrit-PatchSet: 11
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Fang-Yu Rao <fa...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Fri, 09 Apr 2021 19:12:48 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-10647 Improve always-true min/max filter handling in coordinator

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17252 )

Change subject: IMPALA-10647 Improve always-true min/max filter handling in coordinator
......................................................................


Patch Set 22:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/8597/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/17252
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I326317833979efcbe02ce6c95ad80133dd5c7964
Gerrit-Change-Number: 17252
Gerrit-PatchSet: 22
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Fang-Yu Rao <fa...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Mon, 19 Apr 2021 17:15:44 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10647 Improve always-true min/max filter handling in coordinator

Posted by "Qifan Chen (Code Review)" <ge...@cloudera.org>.
Qifan Chen has posted comments on this change. ( http://gerrit.cloudera.org:8080/17252 )

Change subject: IMPALA-10647 Improve always-true min/max filter handling in coordinator
......................................................................


Patch Set 16:

> Will add the new tests tomorrow. Thanks all for the great comments.


-- 
To view, visit http://gerrit.cloudera.org:8080/17252
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I326317833979efcbe02ce6c95ad80133dd5c7964
Gerrit-Change-Number: 17252
Gerrit-PatchSet: 16
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Fang-Yu Rao <fa...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Mon, 12 Apr 2021 01:23:31 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10647 Improve always-true min/max filter handling in coordinator

Posted by "Wenzhe Zhou (Code Review)" <ge...@cloudera.org>.
Wenzhe Zhou has posted comments on this change. ( http://gerrit.cloudera.org:8080/17252 )

Change subject: IMPALA-10647 Improve always-true min/max filter handling in coordinator
......................................................................


Patch Set 20: Code-Review+1


-- 
To view, visit http://gerrit.cloudera.org:8080/17252
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I326317833979efcbe02ce6c95ad80133dd5c7964
Gerrit-Change-Number: 17252
Gerrit-PatchSet: 20
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Fang-Yu Rao <fa...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Tue, 13 Apr 2021 17:45:14 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10647 Improve always-true min/max filter handling in coordinator

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17252 )

Change subject: IMPALA-10647 Improve always-true min/max filter handling in coordinator
......................................................................


Patch Set 23:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/7082/ DRY_RUN=false


-- 
To view, visit http://gerrit.cloudera.org:8080/17252
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I326317833979efcbe02ce6c95ad80133dd5c7964
Gerrit-Change-Number: 17252
Gerrit-PatchSet: 23
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Fang-Yu Rao <fa...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Mon, 19 Apr 2021 18:57:36 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10647 Improve always-true min/max filter handling in coordinator

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17252 )

Change subject: IMPALA-10647 Improve always-true min/max filter handling in coordinator
......................................................................


Patch Set 17:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/8553/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/17252
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I326317833979efcbe02ce6c95ad80133dd5c7964
Gerrit-Change-Number: 17252
Gerrit-PatchSet: 17
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Fang-Yu Rao <fa...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Mon, 12 Apr 2021 17:48:30 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10647 Improve always-true min/max filter handling in coordinator

Posted by "Qifan Chen (Code Review)" <ge...@cloudera.org>.
Qifan Chen has uploaded a new patch set (#17). ( http://gerrit.cloudera.org:8080/17252 )

Change subject: IMPALA-10647 Improve always-true min/max filter handling in coordinator
......................................................................

IMPALA-10647 Improve always-true min/max filter handling in coordinator

The change improves how a coordinator behaves when a just
arriving min/max filter is always true. A new member
'always_true_filter_received_' is introduced to record such a
fact.  Similarily, the new member always_false_flipped_ is
added to indicate that the always false flag is flipped from
true to false. These two members only influence how the min
and max columns in "Filter routing table" and "Final filter
table" in profile are displayed as follows.

  1. 'PartialUpdates' - The min and the max are partially updated;
  2. 'AlwaysTrue'     - One received filter is AlwaysTrue;
  3. 'AlwaysFalse'    - No filter is received or all received
                        filters are empty;
  4. 'Real values'    - The final accumulated min/max from all
                        received filters.

A second change introduced is to record, in scan node, the
arrival time of min/max filters (as a timestamp since the system
is rebooted obtained by calling MonotonicMillis()). A timestamp
of similar nature is recorded for hdfs parquet scanners when a
row group is processed. By comparing these two timestamps, one
can easily diagnose issues related to late arrival of min/max
filters.

Testing:
  1. Ran unit tests;
  2. Ran core tests.

Change-Id: I326317833979efcbe02ce6c95ad80133dd5c7964
---
M be/src/exec/parquet/hdfs-parquet-scanner.cc
M be/src/exec/partitioned-hash-join-builder.cc
M be/src/exec/scan-node.cc
M be/src/runtime/coordinator-filter-state.h
M be/src/runtime/coordinator.cc
M be/src/util/min-max-filter.cc
M be/src/util/min-max-filter.h
M testdata/workloads/functional-query/queries/QueryTest/overlap_min_max_filters.test
8 files changed, 214 insertions(+), 28 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/52/17252/17
-- 
To view, visit http://gerrit.cloudera.org:8080/17252
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I326317833979efcbe02ce6c95ad80133dd5c7964
Gerrit-Change-Number: 17252
Gerrit-PatchSet: 17
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Fang-Yu Rao <fa...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>

[Impala-ASF-CR] IMPALA-10532: TestOverlapMinMaxFilters.test overlap min max filters seems flaky

Posted by "Qifan Chen (Code Review)" <ge...@cloudera.org>.
Qifan Chen has uploaded a new patch set (#9). ( http://gerrit.cloudera.org:8080/17252 )

Change subject: IMPALA-10532: TestOverlapMinMaxFilters.test_overlap_min_max_filters seems flaky
......................................................................

IMPALA-10532: TestOverlapMinMaxFilters.test_overlap_min_max_filters seems flaky

This change disables the overlap min/max filter test for hdfs in
erasure coding, due to the query plan change (from 3-node scan to
2-node scan) which splits the row groups among scan nodes differently.

The change also improves how a coordinator behaves when a just
arriving min/max filter is the last one to arrive or is always true.
Previously, the coordinator disables the corresponding filter
representation by setting it to Always True, which makes it
impossible to differentiate a true AlwaysTrue filter (say, set in the
hash join building step) from the one being disabled. A dedicated
Boolean variable minmaxDisabled_ is introduced to record the disabled
state. The Always True state of a filter is never altered. The
enhancement improves the display of the min and max column in
"Filter routing table" and "Final filter table" in profile. These two
columns now display the following possible values.
  1. 'PartialUpdates' - The min and the max are partially updated;
  2. 'AlwaysTrue'     - The filter is always true;
  3. 'AlwaysFalse'    - The filter is always false;
  4. Real values      - The filter is neither always true or false,
                        fully updated with the min/max real values.

A third change introduced is to record, in profile for scan node, the
arrival time of min/max filters (in elapsed time since the system is
rebooted obtained by calling MonotonicMillis()). It can help the
diagnosis of late arrival of filters, when compared with the elpased
time when a row group is filtered with these filters.

Testing:
  1. Ran unit tests;
  2. Ran core tests.

Change-Id: I326317833979efcbe02ce6c95ad80133dd5c7964
---
M be/src/exec/parquet/hdfs-parquet-scanner.cc
M be/src/exec/scan-node.cc
M be/src/runtime/coordinator-filter-state.h
M be/src/runtime/coordinator.cc
M be/src/util/min-max-filter.cc
M be/src/util/min-max-filter.h
M tests/query_test/test_runtime_filters.py
7 files changed, 112 insertions(+), 18 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/52/17252/9
-- 
To view, visit http://gerrit.cloudera.org:8080/17252
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I326317833979efcbe02ce6c95ad80133dd5c7964
Gerrit-Change-Number: 17252
Gerrit-PatchSet: 9
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>

[Impala-ASF-CR] IMPALA-10532: TestOverlapMinMaxFilters.test overlap min max filters seems flaky

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17252 )

Change subject: IMPALA-10532: TestOverlapMinMaxFilters.test_overlap_min_max_filters seems flaky
......................................................................


Patch Set 10:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/8514/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/17252
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I326317833979efcbe02ce6c95ad80133dd5c7964
Gerrit-Change-Number: 17252
Gerrit-PatchSet: 10
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Wed, 07 Apr 2021 16:27:05 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10647 Improve always-true min/max filter handling in coordinator

Posted by "Qifan Chen (Code Review)" <ge...@cloudera.org>.
Qifan Chen has posted comments on this change. ( http://gerrit.cloudera.org:8080/17252 )

Change subject: IMPALA-10647 Improve always-true min/max filter handling in coordinator
......................................................................


Patch Set 17:

Three new tests are added in Test/overlap_min_max_filters.test.


-- 
To view, visit http://gerrit.cloudera.org:8080/17252
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I326317833979efcbe02ce6c95ad80133dd5c7964
Gerrit-Change-Number: 17252
Gerrit-PatchSet: 17
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Fang-Yu Rao <fa...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Mon, 12 Apr 2021 17:29:15 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10647 Improve always-true min/max filter handling in coordinator

Posted by "Qifan Chen (Code Review)" <ge...@cloudera.org>.
Qifan Chen has uploaded a new patch set (#16). ( http://gerrit.cloudera.org:8080/17252 )

Change subject: IMPALA-10647 Improve always-true min/max filter handling in coordinator
......................................................................

IMPALA-10647 Improve always-true min/max filter handling in coordinator

The change improves how a coordinator behaves when a just
arriving min/max filter is always true. A new member
'always_true_filter_received_' is introduced to record such a fact
and used when display the min and max column in
"Filter routing table" and "Final filter table" in profile. These
two columns now display the following possible values.

  1. 'PartialUpdates' - The min and the max are partially updated;
  2. 'AlwaysTrue'     - One received filter is AlwaysTrue;
  3. 'AlwaysFalse'    - No filter is received or all received
                        filters are empty;
  4. 'Real values'    - The final accumulated min/max from all
                        received filters.

A second change introduced is to record, in scan node, the
arrival time of min/max filters (as a timestamp since the system
is rebooted obtained by calling MonotonicMillis()). A timestamp
of similar nature is recorded for hdfs parquet scanners when a
row group is processed. By comparing these two timestamps, one
can easily diagnose issues related to late arrival of min/max
filters.

Testing:
  1. Ran unit tests;
  2. Ran core tests.

Change-Id: I326317833979efcbe02ce6c95ad80133dd5c7964
---
M be/src/exec/parquet/hdfs-parquet-scanner.cc
M be/src/exec/scan-node.cc
M be/src/runtime/coordinator-filter-state.h
M be/src/runtime/coordinator.cc
M be/src/util/min-max-filter.cc
M be/src/util/min-max-filter.h
6 files changed, 122 insertions(+), 26 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/52/17252/16
-- 
To view, visit http://gerrit.cloudera.org:8080/17252
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I326317833979efcbe02ce6c95ad80133dd5c7964
Gerrit-Change-Number: 17252
Gerrit-PatchSet: 16
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Fang-Yu Rao <fa...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>

[Impala-ASF-CR] IMPALA-10647 Improve always-true min/max filter handling in coordinator

Posted by "Wenzhe Zhou (Code Review)" <ge...@cloudera.org>.
Wenzhe Zhou has posted comments on this change. ( http://gerrit.cloudera.org:8080/17252 )

Change subject: IMPALA-10647 Improve always-true min/max filter handling in coordinator
......................................................................


Patch Set 15: Code-Review+1


-- 
To view, visit http://gerrit.cloudera.org:8080/17252
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I326317833979efcbe02ce6c95ad80133dd5c7964
Gerrit-Change-Number: 17252
Gerrit-PatchSet: 15
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Fang-Yu Rao <fa...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Sat, 10 Apr 2021 01:13:53 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10647 Improve always-true min/max filter handling in coordinator

Posted by "Qifan Chen (Code Review)" <ge...@cloudera.org>.
Qifan Chen has uploaded a new patch set (#22). ( http://gerrit.cloudera.org:8080/17252 )

Change subject: IMPALA-10647 Improve always-true min/max filter handling in coordinator
......................................................................

IMPALA-10647 Improve always-true min/max filter handling in coordinator

The change improves how a coordinator behaves when a just
arriving min/max filter is always true. A new member
'always_true_filter_received_' is introduced to record such a
fact. Similarily, the new member always_false_flipped_to_false_
is added to indicate that the always false flag is flipped from
'true' to 'false'. These two members only influence how the min
and max columns in "Filter routing table" and "Final filter
table" in profile are displayed as follows.

  1. 'PartialUpdates' - The min and the max are partially updated;
  2. 'AlwaysTrue'     - One received filter is AlwaysTrue;
  3. 'AlwaysFalse'    - No filter is received or all received
                        filters are empty;
  4. 'Real values'    - The final accumulated min/max from all
                        received filters.

A second change introduced is to record, in scan node, the
arrival time of min/max filters (as a timestamp since the system
is rebooted, obtained by calling MonotonicMillis()). A timestamp
of similar nature is recorded for hdfs parquet scanners when a
row group is processed. By comparing these two timestamps, one
can easily diagnose issues related to late arrival of min/max
filters.

This change also addresses a flaw with rows unexpectedly
filtered out, due to the reason that the always_true_ flag in
a min/max filter, when set, is ignored in the eval code path
in RuntimeFilter::Eval().

Testing:
  1. Added three new tests in overlap_min_max_filters.test to
     verify that the min/max are displayed correctly when the
     min/max filter in hash join builder is set to always true,
     always false, or a pair of meaningful min and max values.
  2. Ran unit tests;
  3. Ran runtime-filter-test;
  4. Ran core tests successfully.

Change-Id: I326317833979efcbe02ce6c95ad80133dd5c7964
---
M be/src/exec/parquet/hdfs-parquet-scanner.cc
M be/src/exec/partitioned-hash-join-builder.cc
M be/src/exec/scan-node.cc
M be/src/runtime/coordinator-filter-state.h
M be/src/runtime/coordinator.cc
M be/src/runtime/runtime-filter-ir.cc
M be/src/util/min-max-filter.cc
M be/src/util/min-max-filter.h
M testdata/workloads/functional-query/queries/QueryTest/overlap_min_max_filters.test
9 files changed, 224 insertions(+), 30 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/52/17252/22
-- 
To view, visit http://gerrit.cloudera.org:8080/17252
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I326317833979efcbe02ce6c95ad80133dd5c7964
Gerrit-Change-Number: 17252
Gerrit-PatchSet: 22
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Fang-Yu Rao <fa...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>

[Impala-ASF-CR] IMPALA-10647 Improve always-true min/max filter handling in coordinator

Posted by "Riza Suminto (Code Review)" <ge...@cloudera.org>.
Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/17252 )

Change subject: IMPALA-10647 Improve always-true min/max filter handling in coordinator
......................................................................


Patch Set 20: Code-Review+1


-- 
To view, visit http://gerrit.cloudera.org:8080/17252
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I326317833979efcbe02ce6c95ad80133dd5c7964
Gerrit-Change-Number: 17252
Gerrit-PatchSet: 20
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Fang-Yu Rao <fa...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Tue, 13 Apr 2021 16:01:26 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10647 Improve always-true min/max filter handling in coordinator

Posted by "Riza Suminto (Code Review)" <ge...@cloudera.org>.
Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/17252 )

Change subject: IMPALA-10647 Improve always-true min/max filter handling in coordinator
......................................................................


Patch Set 22: Code-Review+1


-- 
To view, visit http://gerrit.cloudera.org:8080/17252
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I326317833979efcbe02ce6c95ad80133dd5c7964
Gerrit-Change-Number: 17252
Gerrit-PatchSet: 22
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Fang-Yu Rao <fa...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Mon, 19 Apr 2021 18:04:32 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10532: TestOverlapMinMaxFilters.test overlap min max filters seems flaky

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17252 )

Change subject: IMPALA-10532: TestOverlapMinMaxFilters.test_overlap_min_max_filters seems flaky
......................................................................


Patch Set 9:

Build Failed 

https://jenkins.impala.io/job/gerrit-code-review-checks/8513/ : Initial code review checks failed. See linked job for details on the failure.


-- 
To view, visit http://gerrit.cloudera.org:8080/17252
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I326317833979efcbe02ce6c95ad80133dd5c7964
Gerrit-Change-Number: 17252
Gerrit-PatchSet: 9
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Wed, 07 Apr 2021 15:33:55 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10647 Improve always-true min/max filter handling in coordinator

Posted by "Riza Suminto (Code Review)" <ge...@cloudera.org>.
Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/17252 )

Change subject: IMPALA-10647 Improve always-true min/max filter handling in coordinator
......................................................................


Patch Set 21:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/17252/21/testdata/workloads/functional-query/queries/QueryTest/overlap_min_max_filters.test
File testdata/workloads/functional-query/queries/QueryTest/overlap_min_max_filters.test:

http://gerrit.cloudera.org:8080/#/c/17252/21/testdata/workloads/functional-query/queries/QueryTest/overlap_min_max_filters.test@285
PS21, Line 285: set minmax_filter_threshold=0.5;
              : select straight_join count(*) from
              : lineitem_orderkey_only a join [SHUFFLE] tpch_parquet.orders b
              : where a.l_orderkey = b.o_orderkey;
It looks like this test behave differently if filter arrived on time vs arrived late.
The jenkins failure seemingly because of both filters arrived late.
It is curious that late filter arrival lead to less row from lineitem_orderkey_only scan, not more.

In my local machine, if I add query option "set runtime_filter_wait_time_ms=5000;", this test pass.
But without setting wait time, an AlwaysFalse minmax filter seems to be broadcasted. This is a section of query profile where this test failed:


      lv-desktop:27000:
        Filter 1 arrival: 1s367ms
        Filter 0 arrival: 1s371ms
...
          Runtime filters: Not all filters arrived (arrived: [], missing [1, 0]), waited for 785ms. Arrival delay: 1s000ms.
...
          Filter 1 (0):
             - Files processed: 0 (0)
             - Files rejected: 0 (0)
             - Files total: 0 (0)
             - InactiveTotalTime: 0.000ns
             - RowGroups processed: 0 (0)
             - RowGroups rejected: 0 (0)
             - RowGroups total: 1 (1)
             - Rows processed: 1.81M (1810767)
             - Rows rejected: 1.81M (1810767)
             - Rows total: 2.14M (2142543)
             - Splits processed: 0 (0)
             - Splits rejected: 0 (0)
             - Splits total: 0 (0)
             - TotalTime: 0.000ns



-- 
To view, visit http://gerrit.cloudera.org:8080/17252
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I326317833979efcbe02ce6c95ad80133dd5c7964
Gerrit-Change-Number: 17252
Gerrit-PatchSet: 21
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Fang-Yu Rao <fa...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Thu, 15 Apr 2021 21:07:29 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-10647 Improve always-true min/max filter handling in coordinator

Posted by "Fang-Yu Rao (Code Review)" <ge...@cloudera.org>.
Fang-Yu Rao has posted comments on this change. ( http://gerrit.cloudera.org:8080/17252 )

Change subject: IMPALA-10647 Improve always-true min/max filter handling in coordinator
......................................................................


Patch Set 20: Code-Review+1

(4 comments)

Thanks Qifan for addressing my previous comments! I do not have any addition suggestion.

http://gerrit.cloudera.org:8080/#/c/17252/15//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/17252/15//COMMIT_MSG@23
PS15, Line 23:                         receive
> Please see my comment in scan-node.cc.
Done


http://gerrit.cloudera.org:8080/#/c/17252/15//COMMIT_MSG@27
PS15, Line 27: lling 
> Done
Done


http://gerrit.cloudera.org:8080/#/c/17252/15/be/src/exec/scan-node.cc
File be/src/exec/scan-node.cc:

http://gerrit.cloudera.org:8080/#/c/17252/15/be/src/exec/scan-node.cc@239
PS15, Line 239: end
> Good question. 
Thanks Qifan for the detailed explanation! I do not have any more comment.


http://gerrit.cloudera.org:8080/#/c/17252/15/be/src/runtime/coordinator.cc
File be/src/runtime/coordinator.cc:

http://gerrit.cloudera.org:8080/#/c/17252/15/be/src/runtime/coordinator.cc@652
PS15, Line 652:       // Also add the min/max value for the accumulated filter as follows.
              :       //  'PartialUpdates' - The min and the max are partial
> Reword the comment as follows and try to avoid describe how the accumulated
Thanks Qifan for addressing my comment. I do not have any addition suggestion.



-- 
To view, visit http://gerrit.cloudera.org:8080/17252
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I326317833979efcbe02ce6c95ad80133dd5c7964
Gerrit-Change-Number: 17252
Gerrit-PatchSet: 20
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Fang-Yu Rao <fa...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Tue, 13 Apr 2021 17:17:05 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-10647 Improve always-true min/max filter handling in coordinator

Posted by "Qifan Chen (Code Review)" <ge...@cloudera.org>.
Qifan Chen has uploaded a new patch set (#15). ( http://gerrit.cloudera.org:8080/17252 )

Change subject: IMPALA-10647 Improve always-true min/max filter handling in coordinator
......................................................................

IMPALA-10647 Improve always-true min/max filter handling in coordinator

The change improves how a coordinator behaves when a just
arriving min/max filter is always true. A new member
'always_true_filter_received_' is introduced to record such a fact
and used when display the min and max column in
"Filter routing table" and "Final filter table" in profile. These
two columns now display the following possible values.

  1. 'PartialUpdates' - The min and the max are partially updated;
  2. 'AlwaysTrue'     - The filter is always true;
  3. 'AlwaysFalse'    - The filter is always false;
  4. Real values      - The filter is neither always true or false,
                        fully updated with the min/max real values.

A second change introduced is to record, in scan node, the
arrival time of min/max filters (as a timestamp since the system
is rebooted obtained by calling MonotonicMillis()). A timestamp
of similar nature is recorded for hdfs parquet scanners when a
row group is processed. By comparing these two timestamps, one
can easily diagnose issues relate to late arrival of filters.

Testing:
  1. Ran unit tests;
  2. Ran core tests.

Change-Id: I326317833979efcbe02ce6c95ad80133dd5c7964
---
M be/src/exec/parquet/hdfs-parquet-scanner.cc
M be/src/exec/scan-node.cc
M be/src/runtime/coordinator-filter-state.h
M be/src/runtime/coordinator.cc
M be/src/util/min-max-filter.cc
M be/src/util/min-max-filter.h
6 files changed, 105 insertions(+), 14 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/52/17252/15
-- 
To view, visit http://gerrit.cloudera.org:8080/17252
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I326317833979efcbe02ce6c95ad80133dd5c7964
Gerrit-Change-Number: 17252
Gerrit-PatchSet: 15
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Fang-Yu Rao <fa...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>

[Impala-ASF-CR] IMPALA-10647 Improve always-true min/max filter handling in coordinator

Posted by "Joe McDonnell (Code Review)" <ge...@cloudera.org>.
Joe McDonnell has posted comments on this change. ( http://gerrit.cloudera.org:8080/17252 )

Change subject: IMPALA-10647 Improve always-true min/max filter handling in coordinator
......................................................................


Patch Set 23: Code-Review+2

Bumping to +2


-- 
To view, visit http://gerrit.cloudera.org:8080/17252
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I326317833979efcbe02ce6c95ad80133dd5c7964
Gerrit-Change-Number: 17252
Gerrit-PatchSet: 23
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Fang-Yu Rao <fa...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Mon, 19 Apr 2021 19:02:53 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10647 Improve always-true min/max filter handling in coordinator

Posted by "Qifan Chen (Code Review)" <ge...@cloudera.org>.
Qifan Chen has uploaded a new patch set (#11). ( http://gerrit.cloudera.org:8080/17252 )

Change subject: IMPALA-10647 Improve always-true min/max filter handling in coordinator
......................................................................

IMPALA-10647 Improve always-true min/max filter handling in coordinator

The change improves how a coordinator behaves when a just
arriving min/max filter is the last one to arrive or is always true.
Previously, the coordinator disables the corresponding filter
representation by setting it to Always True, which makes it
impossible to differentiate a true AlwaysTrue filter (say, set in the
hash join building step) from the one being disabled. A dedicated
Boolean variable minmaxDisabled_ is introduced to record the disabled
state. The Always True state of a filter is never altered. The
enhancement improves the display of the min and max column in
"Filter routing table" and "Final filter table" in profile. These two
columns now display the following possible values.
  1. 'PartialUpdates' - The min and the max are partially updated;
  2. 'AlwaysTrue'     - The filter is always true;
  3. 'AlwaysFalse'    - The filter is always false;
  4. Real values      - The filter is neither always true or false,
                        fully updated with the min/max real values.

A second change introduced is to record, in profile for scan node, the
arrival time of min/max filters (in elapsed time since the system is
rebooted obtained by calling MonotonicMillis()). It can help the
diagnosis of late arrival of filters, when compared with the elpased
time when a row group is filtered with these filters.

Testing:
  1. Ran unit tests;
  2. Ran core tests.

Change-Id: I326317833979efcbe02ce6c95ad80133dd5c7964
---
M be/src/exec/parquet/hdfs-parquet-scanner.cc
M be/src/exec/scan-node.cc
M be/src/runtime/coordinator-filter-state.h
M be/src/runtime/coordinator.cc
M be/src/util/min-max-filter.cc
M be/src/util/min-max-filter.h
M tests/query_test/test_runtime_filters.py
7 files changed, 111 insertions(+), 18 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/52/17252/11
-- 
To view, visit http://gerrit.cloudera.org:8080/17252
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I326317833979efcbe02ce6c95ad80133dd5c7964
Gerrit-Change-Number: 17252
Gerrit-PatchSet: 11
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Fang-Yu Rao <fa...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>

[Impala-ASF-CR] IMPALA-10647 Improve always-true min/max filter handling in coordinator

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17252 )

Change subject: IMPALA-10647 Improve always-true min/max filter handling in coordinator
......................................................................


Patch Set 19:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/8560/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/17252
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I326317833979efcbe02ce6c95ad80133dd5c7964
Gerrit-Change-Number: 17252
Gerrit-PatchSet: 19
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Fang-Yu Rao <fa...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Tue, 13 Apr 2021 13:23:46 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10647 Improve always-true min/max filter handling in coordinator

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17252 )

Change subject: IMPALA-10647 Improve always-true min/max filter handling in coordinator
......................................................................


Patch Set 21:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/7072/ DRY_RUN=false


-- 
To view, visit http://gerrit.cloudera.org:8080/17252
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I326317833979efcbe02ce6c95ad80133dd5c7964
Gerrit-Change-Number: 17252
Gerrit-PatchSet: 21
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Fang-Yu Rao <fa...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Wed, 14 Apr 2021 12:03:52 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10647 Improve always-true min/max filter handling in coordinator

Posted by "Wenzhe Zhou (Code Review)" <ge...@cloudera.org>.
Wenzhe Zhou has posted comments on this change. ( http://gerrit.cloudera.org:8080/17252 )

Change subject: IMPALA-10647 Improve always-true min/max filter handling in coordinator
......................................................................


Patch Set 11:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/17252/10/be/src/exec/scan-node.cc
File be/src/exec/scan-node.cc:

http://gerrit.cloudera.org:8080/#/c/17252/10/be/src/exec/scan-node.cc@236
PS10, Line 236:  Current "
              :                    "time(ms): $2",
> ctx.filter->arrival_delay_ms(), from which the max_arrival_delay is compute
It's better to rename the label.



-- 
To view, visit http://gerrit.cloudera.org:8080/17252
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I326317833979efcbe02ce6c95ad80133dd5c7964
Gerrit-Change-Number: 17252
Gerrit-PatchSet: 11
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Fang-Yu Rao <fa...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Fri, 09 Apr 2021 21:40:08 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-10647 Improve always-true min/max filter handling in coordinator

Posted by "Riza Suminto (Code Review)" <ge...@cloudera.org>.
Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/17252 )

Change subject: IMPALA-10647 Improve always-true min/max filter handling in coordinator
......................................................................


Patch Set 18: Code-Review+1

(1 comment)

Looks good to me. Just have one more nit.
Please carry my +1 after fix.

http://gerrit.cloudera.org:8080/#/c/17252/18//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/17252/18//COMMIT_MSG@12
PS18, Line 12: always_false_flipped_
nit: always_false_flipped_to_false_



-- 
To view, visit http://gerrit.cloudera.org:8080/17252
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I326317833979efcbe02ce6c95ad80133dd5c7964
Gerrit-Change-Number: 17252
Gerrit-PatchSet: 18
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Fang-Yu Rao <fa...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Mon, 12 Apr 2021 21:00:32 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-10647 Improve always-true min/max filter handling in coordinator

Posted by "Qifan Chen (Code Review)" <ge...@cloudera.org>.
Qifan Chen has uploaded a new patch set (#19). ( http://gerrit.cloudera.org:8080/17252 )

Change subject: IMPALA-10647 Improve always-true min/max filter handling in coordinator
......................................................................

IMPALA-10647 Improve always-true min/max filter handling in coordinator

The change improves how a coordinator behaves when a just
arriving min/max filter is always true. A new member
'always_true_filter_received_' is introduced to record such a
fact. Similarily, the new member always_false_flipped_to_false_
is added to indicate that the always false flag is flipped from
'true' to 'false'. These two members only influence how the min
and max columns in "Filter routing table" and "Final filter
table" in profile are displayed as follows.

  1. 'PartialUpdates' - The min and the max are partially updated;
  2. 'AlwaysTrue'     - One received filter is AlwaysTrue;
  3. 'AlwaysFalse'    - No filter is received or all received
                        filters are empty;
  4. 'Real values'    - The final accumulated min/max from all
                        received filters.

A second change introduced is to record, in scan node, the
arrival time of min/max filters (as a timestamp since the system
is rebooted, obtained by calling MonotonicMillis()). A timestamp
of similar nature is recorded for hdfs parquet scanners when a
row group is processed. By comparing these two timestamps, one
can easily diagnose issues related to late arrival of min/max
filters.

Testing:
  1. Added three new tests in overlap_min_max_filters.test to
     verify that the min/max are displayed corrected when the
     min/max filter in hash join builder is set to always true,
     always false, or a pair of meaningful min and max values.
  2. Ran unit tests;
  3. Ran core tests.

Change-Id: I326317833979efcbe02ce6c95ad80133dd5c7964
---
M be/src/exec/parquet/hdfs-parquet-scanner.cc
M be/src/exec/partitioned-hash-join-builder.cc
M be/src/exec/scan-node.cc
M be/src/runtime/coordinator-filter-state.h
M be/src/runtime/coordinator.cc
M be/src/util/min-max-filter.cc
M be/src/util/min-max-filter.h
M testdata/workloads/functional-query/queries/QueryTest/overlap_min_max_filters.test
8 files changed, 221 insertions(+), 28 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/52/17252/19
-- 
To view, visit http://gerrit.cloudera.org:8080/17252
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I326317833979efcbe02ce6c95ad80133dd5c7964
Gerrit-Change-Number: 17252
Gerrit-PatchSet: 19
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Fang-Yu Rao <fa...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>

[Impala-ASF-CR] IMPALA-10647 Improve always-true min/max filter handling in coordinator

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17252 )

Change subject: IMPALA-10647 Improve always-true min/max filter handling in coordinator
......................................................................


Patch Set 11:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/8533/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/17252
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I326317833979efcbe02ce6c95ad80133dd5c7964
Gerrit-Change-Number: 17252
Gerrit-PatchSet: 11
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Fang-Yu Rao <fa...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Fri, 09 Apr 2021 19:32:48 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10647 Improve always-true min/max filter handling in coordinator

Posted by "Riza Suminto (Code Review)" <ge...@cloudera.org>.
Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/17252 )

Change subject: IMPALA-10647 Improve always-true min/max filter handling in coordinator
......................................................................


Patch Set 21:

> Patch Set 21:
> 
> Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/7067/ DRY_RUN=false

I think we need to rerun the gerrit-verify-dryrun since jenkins.impala.io down yesterday.


-- 
To view, visit http://gerrit.cloudera.org:8080/17252
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I326317833979efcbe02ce6c95ad80133dd5c7964
Gerrit-Change-Number: 17252
Gerrit-PatchSet: 21
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Fang-Yu Rao <fa...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Wed, 14 Apr 2021 11:39:03 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10647 Improve always-true min/max filter handling in coordinator

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17252 )

Change subject: IMPALA-10647 Improve always-true min/max filter handling in coordinator
......................................................................


Patch Set 21:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/7067/ DRY_RUN=false


-- 
To view, visit http://gerrit.cloudera.org:8080/17252
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I326317833979efcbe02ce6c95ad80133dd5c7964
Gerrit-Change-Number: 17252
Gerrit-PatchSet: 21
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Fang-Yu Rao <fa...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Tue, 13 Apr 2021 20:13:14 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] [WIP] IMPALA-10532: TestOverlapMinMaxFilters.test overlap min max filters seems flaky

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17252 )

Change subject: [WIP] IMPALA-10532: TestOverlapMinMaxFilters.test_overlap_min_max_filters seems flaky
......................................................................


Patch Set 1:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/8478/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/17252
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I326317833979efcbe02ce6c95ad80133dd5c7964
Gerrit-Change-Number: 17252
Gerrit-PatchSet: 1
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Wed, 31 Mar 2021 17:39:50 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10532: TestOverlapMinMaxFilters.test overlap min max filters seems flaky

Posted by "Qifan Chen (Code Review)" <ge...@cloudera.org>.
Qifan Chen has uploaded a new patch set (#10). ( http://gerrit.cloudera.org:8080/17252 )

Change subject: IMPALA-10532: TestOverlapMinMaxFilters.test_overlap_min_max_filters seems flaky
......................................................................

IMPALA-10532: TestOverlapMinMaxFilters.test_overlap_min_max_filters seems flaky

This change disables the overlap min/max filter test for hdfs in
erasure coding, due to the query plan change (from 3-node scan to
2-node scan) which splits the row groups among scan nodes differently.

The change also improves how a coordinator behaves when a just
arriving min/max filter is the last one to arrive or is always true.
Previously, the coordinator disables the corresponding filter
representation by setting it to Always True, which makes it
impossible to differentiate a true AlwaysTrue filter (say, set in the
hash join building step) from the one being disabled. A dedicated
Boolean variable minmaxDisabled_ is introduced to record the disabled
state. The Always True state of a filter is never altered. The
enhancement improves the display of the min and max column in
"Filter routing table" and "Final filter table" in profile. These two
columns now display the following possible values.
  1. 'PartialUpdates' - The min and the max are partially updated;
  2. 'AlwaysTrue'     - The filter is always true;
  3. 'AlwaysFalse'    - The filter is always false;
  4. Real values      - The filter is neither always true or false,
                        fully updated with the min/max real values.

A third change introduced is to record, in profile for scan node, the
arrival time of min/max filters (in elapsed time since the system is
rebooted obtained by calling MonotonicMillis()). It can help the
diagnosis of late arrival of filters, when compared with the elpased
time when a row group is filtered with these filters.

Testing:
  1. Ran unit tests;
  2. Ran core tests.

Change-Id: I326317833979efcbe02ce6c95ad80133dd5c7964
---
M be/src/exec/parquet/hdfs-parquet-scanner.cc
M be/src/exec/scan-node.cc
M be/src/runtime/coordinator-filter-state.h
M be/src/runtime/coordinator.cc
M be/src/util/min-max-filter.cc
M be/src/util/min-max-filter.h
M tests/query_test/test_runtime_filters.py
7 files changed, 112 insertions(+), 18 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/52/17252/10
-- 
To view, visit http://gerrit.cloudera.org:8080/17252
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I326317833979efcbe02ce6c95ad80133dd5c7964
Gerrit-Change-Number: 17252
Gerrit-PatchSet: 10
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>

[Impala-ASF-CR] IMPALA-10647 Improve always-true min/max filter handling in coordinator

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17252 )

Change subject: IMPALA-10647 Improve always-true min/max filter handling in coordinator
......................................................................


Patch Set 21: Code-Review+2


-- 
To view, visit http://gerrit.cloudera.org:8080/17252
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I326317833979efcbe02ce6c95ad80133dd5c7964
Gerrit-Change-Number: 17252
Gerrit-PatchSet: 21
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Fang-Yu Rao <fa...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Tue, 13 Apr 2021 20:13:13 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10647 Improve always-true min/max filter handling in coordinator

Posted by "Qifan Chen (Code Review)" <ge...@cloudera.org>.
Qifan Chen has posted comments on this change. ( http://gerrit.cloudera.org:8080/17252 )

Change subject: IMPALA-10647 Improve always-true min/max filter handling in coordinator
......................................................................


Patch Set 16:

Will add the new tests tomorrow. Thanks all for great comments.


-- 
To view, visit http://gerrit.cloudera.org:8080/17252
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I326317833979efcbe02ce6c95ad80133dd5c7964
Gerrit-Change-Number: 17252
Gerrit-PatchSet: 16
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Fang-Yu Rao <fa...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Mon, 12 Apr 2021 01:23:04 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10647 Improve always-true min/max filter handling in coordinator

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17252 )

Change subject: IMPALA-10647 Improve always-true min/max filter handling in coordinator
......................................................................


Patch Set 18:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/8556/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/17252
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I326317833979efcbe02ce6c95ad80133dd5c7964
Gerrit-Change-Number: 17252
Gerrit-PatchSet: 18
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Fang-Yu Rao <fa...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Mon, 12 Apr 2021 21:12:14 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10647 Improve always-true min/max filter handling in coordinator

Posted by "Qifan Chen (Code Review)" <ge...@cloudera.org>.
Qifan Chen has posted comments on this change. ( http://gerrit.cloudera.org:8080/17252 )

Change subject: IMPALA-10647 Improve always-true min/max filter handling in coordinator
......................................................................


Patch Set 22:

> Address the addresses a flaw with rows unexpectedly
 > filtered out, due to the reason that the always_true_ flag in
 > a min/max filter, when set, is ignored in the eval code path
 > in RuntimeFilter::Eval().

The change is one line code change in runtime-filter-ir.cc, and an extra comment in min-max-filter.h for EvalOverlap().


-- 
To view, visit http://gerrit.cloudera.org:8080/17252
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I326317833979efcbe02ce6c95ad80133dd5c7964
Gerrit-Change-Number: 17252
Gerrit-PatchSet: 22
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Fang-Yu Rao <fa...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Mon, 19 Apr 2021 17:04:38 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10647 Improve always-true min/max filter handling in coordinator

Posted by "Qifan Chen (Code Review)" <ge...@cloudera.org>.
Qifan Chen has uploaded a new patch set (#20). ( http://gerrit.cloudera.org:8080/17252 )

Change subject: IMPALA-10647 Improve always-true min/max filter handling in coordinator
......................................................................

IMPALA-10647 Improve always-true min/max filter handling in coordinator

The change improves how a coordinator behaves when a just
arriving min/max filter is always true. A new member
'always_true_filter_received_' is introduced to record such a
fact. Similarily, the new member always_false_flipped_to_false_
is added to indicate that the always false flag is flipped from
'true' to 'false'. These two members only influence how the min
and max columns in "Filter routing table" and "Final filter
table" in profile are displayed as follows.

  1. 'PartialUpdates' - The min and the max are partially updated;
  2. 'AlwaysTrue'     - One received filter is AlwaysTrue;
  3. 'AlwaysFalse'    - No filter is received or all received
                        filters are empty;
  4. 'Real values'    - The final accumulated min/max from all
                        received filters.

A second change introduced is to record, in scan node, the
arrival time of min/max filters (as a timestamp since the system
is rebooted, obtained by calling MonotonicMillis()). A timestamp
of similar nature is recorded for hdfs parquet scanners when a
row group is processed. By comparing these two timestamps, one
can easily diagnose issues related to late arrival of min/max
filters.

Testing:
  1. Added three new tests in overlap_min_max_filters.test to
     verify that the min/max are displayed correctly when the
     min/max filter in hash join builder is set to always true,
     always false, or a pair of meaningful min and max values.
  2. Ran unit tests;
  3. Ran core tests successfully.

Change-Id: I326317833979efcbe02ce6c95ad80133dd5c7964
---
M be/src/exec/parquet/hdfs-parquet-scanner.cc
M be/src/exec/partitioned-hash-join-builder.cc
M be/src/exec/scan-node.cc
M be/src/runtime/coordinator-filter-state.h
M be/src/runtime/coordinator.cc
M be/src/util/min-max-filter.cc
M be/src/util/min-max-filter.h
M testdata/workloads/functional-query/queries/QueryTest/overlap_min_max_filters.test
8 files changed, 221 insertions(+), 28 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/52/17252/20
-- 
To view, visit http://gerrit.cloudera.org:8080/17252
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I326317833979efcbe02ce6c95ad80133dd5c7964
Gerrit-Change-Number: 17252
Gerrit-PatchSet: 20
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Fang-Yu Rao <fa...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>

[Impala-ASF-CR] IMPALA-10647 Improve always-true min/max filter handling in coordinator

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17252 )

Change subject: IMPALA-10647 Improve always-true min/max filter handling in coordinator
......................................................................


Patch Set 21: Verified-1

Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/7072/


-- 
To view, visit http://gerrit.cloudera.org:8080/17252
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I326317833979efcbe02ce6c95ad80133dd5c7964
Gerrit-Change-Number: 17252
Gerrit-PatchSet: 21
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Fang-Yu Rao <fa...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Wed, 14 Apr 2021 17:48:33 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10647 Improve always-true min/max filter handling in coordinator

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17252 )

Change subject: IMPALA-10647 Improve always-true min/max filter handling in coordinator
......................................................................


Patch Set 23: Verified+1


-- 
To view, visit http://gerrit.cloudera.org:8080/17252
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I326317833979efcbe02ce6c95ad80133dd5c7964
Gerrit-Change-Number: 17252
Gerrit-PatchSet: 23
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Fang-Yu Rao <fa...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Tue, 20 Apr 2021 00:44:47 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10647 Improve always-true min/max filter handling in coordinator

Posted by "Fang-Yu Rao (Code Review)" <ge...@cloudera.org>.
Fang-Yu Rao has posted comments on this change. ( http://gerrit.cloudera.org:8080/17252 )

Change subject: IMPALA-10647 Improve always-true min/max filter handling in coordinator
......................................................................


Patch Set 15:

(4 comments)

Hi Qifan, I only have some minor comments on this patch. Thank you very much for working on this!

http://gerrit.cloudera.org:8080/#/c/17252/15//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/17252/15//COMMIT_MSG@23
PS15, Line 23: arrival time of min/max filters
Is it true that the logic added in ScanNode::WaitForRuntimeFilters() also applies for Bloom filters? Namely, the logged 'end' over there records the time when we are done waiting for all filters in 'filter_ctxs_' whether or not there might be some Bloom filters.

If the logic added also applies for Bloom filters, is there any particular reason why we do not tackle the case when a runtime filter is a Bloom filter in this patch?

I will also paste the questions above at ScanNode::WaitForRuntimeFilters() for easy reference.


http://gerrit.cloudera.org:8080/#/c/17252/15//COMMIT_MSG@27
PS15, Line 27: relate
nit: related


http://gerrit.cloudera.org:8080/#/c/17252/15/be/src/exec/scan-node.cc
File be/src/exec/scan-node.cc:

http://gerrit.cloudera.org:8080/#/c/17252/15/be/src/exec/scan-node.cc@239
PS15, Line 239: end
Is it true that the logic added here also applies for Bloom filters? Namely, the logged 'end' here records the time when we are done waiting for all filters in 'filter_ctxs_' whether or not there might be some Bloom filters.

If the logic added also applies for Bloom filters, is there any particular reason why we do not tackle the case when a runtime filter is a Bloom filter in this patch?


http://gerrit.cloudera.org:8080/#/c/17252/15/be/src/runtime/coordinator.cc
File be/src/runtime/coordinator.cc:

http://gerrit.cloudera.org:8080/#/c/17252/15/be/src/runtime/coordinator.cc@652
PS15, Line 652:       // Also add the min/max value for partitioned joins, when all updates are available
              :       // or the filter is disabled due to being always true.
I was wondering whether the comment would be clearer if we rephrase this sentence as the following although it seems a bit verbose. Please also let me know if my understanding is correct. Thanks!

For partitioned joins, add the actual min/max values when all updates are available and the filter is neither always true nor always false. Add "AlwasyTrue" if at least one received filter is always true. Add "AlwaysFalse" when the aggregated filter is always false after all updates are received. All other cases are considered "PartialUpdates".



-- 
To view, visit http://gerrit.cloudera.org:8080/17252
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I326317833979efcbe02ce6c95ad80133dd5c7964
Gerrit-Change-Number: 17252
Gerrit-PatchSet: 15
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Fang-Yu Rao <fa...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Sun, 11 Apr 2021 19:04:02 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-10647 Improve always-true min/max filter handling in coordinator

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17252 )

Change subject: IMPALA-10647 Improve always-true min/max filter handling in coordinator
......................................................................


Patch Set 16:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/8545/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/17252
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I326317833979efcbe02ce6c95ad80133dd5c7964
Gerrit-Change-Number: 17252
Gerrit-PatchSet: 16
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Fang-Yu Rao <fa...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Mon, 12 Apr 2021 01:23:43 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10532: TestOverlapMinMaxFilters.test overlap min max filters seems flaky

Posted by "Wenzhe Zhou (Code Review)" <ge...@cloudera.org>.
Wenzhe Zhou has posted comments on this change. ( http://gerrit.cloudera.org:8080/17252 )

Change subject: IMPALA-10532: TestOverlapMinMaxFilters.test_overlap_min_max_filters seems flaky
......................................................................


Patch Set 10:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/17252/10//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/17252/10//COMMIT_MSG@7
PS10, Line 7: IMPALA-10532: TestOverlapMinMaxFilters.test_overlap_min_max_filters seems flaky
> In my opinion, we should split this into 2 separate changes.
Agree to separate two changes.


http://gerrit.cloudera.org:8080/#/c/17252/10/be/src/runtime/coordinator.cc
File be/src/runtime/coordinator.cc:

http://gerrit.cloudera.org:8080/#/c/17252/10/be/src/runtime/coordinator.cc@1586
PS10, Line 1586: params.min_max_filter().always_false()
Add a DCHECK to make sure always_false() return false.


http://gerrit.cloudera.org:8080/#/c/17252/10/be/src/runtime/coordinator.cc@1637
PS10, Line 1637: 
unnecessary blank line


http://gerrit.cloudera.org:8080/#/c/17252/10/be/src/runtime/coordinator.cc@1639
PS10, Line 1639: this
Could we use filter-id, instead of this pointer?



-- 
To view, visit http://gerrit.cloudera.org:8080/17252
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I326317833979efcbe02ce6c95ad80133dd5c7964
Gerrit-Change-Number: 17252
Gerrit-PatchSet: 10
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Fang-Yu Rao <fa...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Thu, 08 Apr 2021 17:51:47 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-10647 Improve always-true min/max filter handling in coordinator

Posted by "Joe McDonnell (Code Review)" <ge...@cloudera.org>.
Joe McDonnell has posted comments on this change. ( http://gerrit.cloudera.org:8080/17252 )

Change subject: IMPALA-10647 Improve always-true min/max filter handling in coordinator
......................................................................


Patch Set 20: Code-Review+2


-- 
To view, visit http://gerrit.cloudera.org:8080/17252
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I326317833979efcbe02ce6c95ad80133dd5c7964
Gerrit-Change-Number: 17252
Gerrit-PatchSet: 20
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Fang-Yu Rao <fa...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <jo...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Tue, 13 Apr 2021 20:12:30 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10647 Improve always-true min/max filter handling in coordinator

Posted by "Riza Suminto (Code Review)" <ge...@cloudera.org>.
Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/17252 )

Change subject: IMPALA-10647 Improve always-true min/max filter handling in coordinator
......................................................................


Patch Set 17:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/17252/15//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/17252/15//COMMIT_MSG@29
PS15, Line 29: row group is processed. By comparing these two timestamps, one
             : can easily diagnose issues related to late arrival of min/max
             : filters.
> Consider adding these three tests for Final table as its content is sending
The test added in Patch set 17 looks good to me, thanks!


http://gerrit.cloudera.org:8080/#/c/17252/17/be/src/runtime/coordinator-filter-state.h
File be/src/runtime/coordinator-filter-state.h:

http://gerrit.cloudera.org:8080/#/c/17252/17/be/src/runtime/coordinator-filter-state.h@178
PS17, Line 178: True value means the always false flag in aggregated filter is flipped.
To further clarify the doc, please mention that a True value means the filter was flipped from True to False by coordinator.


http://gerrit.cloudera.org:8080/#/c/17252/15/be/src/runtime/coordinator.cc
File be/src/runtime/coordinator.cc:

http://gerrit.cloudera.org:8080/#/c/17252/15/be/src/runtime/coordinator.cc@659
PS15, Line 659: 
> If AlwaysTrueFilterReceived() is true, then the accumulated filter is logic
New logic looks good to me. I'll continue my comments on patch set 17.


http://gerrit.cloudera.org:8080/#/c/17252/17/be/src/runtime/coordinator.cc
File be/src/runtime/coordinator.cc:

http://gerrit.cloudera.org:8080/#/c/17252/17/be/src/runtime/coordinator.cc@668
PS17, Line 668: state.AlwaysFalseFlipped()
The method name sounds ambiguous. Was it flipped from true to false, or false to true. Seems like the former. The method return true IF filter was an AlwaysFalse before being disabled (flipped to an AlwaysTrue) by coordinator.

Maybe "WasAlwaysFalse" is more descriptive?


http://gerrit.cloudera.org:8080/#/c/17252/17/be/src/util/min-max-filter.h
File be/src/util/min-max-filter.h:

http://gerrit.cloudera.org:8080/#/c/17252/17/be/src/util/min-max-filter.h@358
PS17, Line 358: 
              : std::string DebugString(const MinMaxFilterPB& filter);
              : bool AlwaysTrue(const MinMaxFilterPB& filter);
              : bool AlwaysFalse(const MinMaxFilterPB& filter);
              : std::string DebugString(const ColumnValuePB& value);
Any reason not making these methods as static methods under MinMaxFilter class?



-- 
To view, visit http://gerrit.cloudera.org:8080/17252
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I326317833979efcbe02ce6c95ad80133dd5c7964
Gerrit-Change-Number: 17252
Gerrit-PatchSet: 17
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Fang-Yu Rao <fa...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Mon, 12 Apr 2021 18:59:06 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-10647 Improve always-true min/max filter handling in coordinator

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17252 )

Change subject: IMPALA-10647 Improve always-true min/max filter handling in coordinator
......................................................................


Patch Set 15:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/8537/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/17252
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I326317833979efcbe02ce6c95ad80133dd5c7964
Gerrit-Change-Number: 17252
Gerrit-PatchSet: 15
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Fang-Yu Rao <fa...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Sat, 10 Apr 2021 01:08:27 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10532: TestOverlapMinMaxFilters.test overlap min max filters seems flaky

Posted by "Riza Suminto (Code Review)" <ge...@cloudera.org>.
Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/17252 )

Change subject: IMPALA-10532: TestOverlapMinMaxFilters.test_overlap_min_max_filters seems flaky
......................................................................


Patch Set 10:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/17252/10//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/17252/10//COMMIT_MSG@7
PS10, Line 7: IMPALA-10532: TestOverlapMinMaxFilters.test_overlap_min_max_filters seems flaky
In my opinion, we should split this into 2 separate changes.
One is focus on fixing the test_overlap_min_max_filters, and the other is for the query profile improvement.


http://gerrit.cloudera.org:8080/#/c/17252/10/be/src/exec/scan-node.cc
File be/src/exec/scan-node.cc:

http://gerrit.cloudera.org:8080/#/c/17252/10/be/src/exec/scan-node.cc@236
PS10, Line 236:  Current "
              :                    "time(ms): $2",
I think this is redundant and potentially confuse users who try to debug query profile.
The information from wait_time (Maximum arrival delay) should be sufficient.
As far as I know, MonotonicMilis mostly used to calculate time interval, not to represent system time.

Besides, filter arrival time is already displayed under host profile like this:

      impala-executor-000-1:27000:
        Filter 16 arrival: 662ms
        Filter 14 arrival: 10s454ms
        Filter 6 arrival: 10s514ms
        Filter 8 arrival: 5m46s
        Filter 9 arrival: 5m46s
        Filter 10 arrival: 5m46s
        Filter 0 arrival: 7m29s
        Filter 1 arrival: 7m29s
        Filter 2 arrival: 7m29s


http://gerrit.cloudera.org:8080/#/c/17252/10/be/src/runtime/coordinator.cc
File be/src/runtime/coordinator.cc:

http://gerrit.cloudera.org:8080/#/c/17252/10/be/src/runtime/coordinator.cc@1614
PS10, Line 1614:     minmaxDisabled_ = true;
I think we should maintain the same behavior as bloom_filter_ in here, making the logic consistent for both filter type.

I understand that this change is made to distinguish between 1) an actual AlwaysTrue filter that originate from executor, vs 2) AlwaysTrue filter that comes from disabling by coordinator.

Instead, what if we have a new boolean variable to track 1). Say, a boolean variable 'always_true_minmax_received_'. Coordinator::FilterDebugString() then should refer this boolean variable instead of checking minmax_filterPB.always_true(). Similarly also for AlwaysFalse filter arrival.



-- 
To view, visit http://gerrit.cloudera.org:8080/17252
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I326317833979efcbe02ce6c95ad80133dd5c7964
Gerrit-Change-Number: 17252
Gerrit-PatchSet: 10
Gerrit-Owner: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Comment-Date: Thu, 08 Apr 2021 16:25:43 +0000
Gerrit-HasComments: Yes