You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@impala.apache.org by "Csaba Ringhofer (Code Review)" <ge...@cloudera.org> on 2023/02/16 16:44:08 UTC

[Impala-ASF-CR] IMPALA-11924: Cap runtime filter NDV with build key NDV

Csaba Ringhofer has uploaded this change for review. ( http://gerrit.cloudera.org:8080/19506


Change subject: IMPALA-11924: Cap runtime filter NDV with build key NDV
......................................................................

IMPALA-11924: Cap runtime filter NDV with build key NDV

Before this patch the NDV used for bloom filter sizing was based only
on the cardinality of the build side. This is ok for FK/PK joins but
can highly overestimate NDV if the build key column's NDV is smaller
than the number of rows.

This change takes the minimum of NDV (not changed by selectiveness)
and cardinality (reduced by selectiveness).

Testing:
- found no test for bloom filter sizes, only verified manually that
  the example in the ticket is fixed.

Change-Id: Idaa46789663cb2e6d29f518757d89c85ff8e4d1a
---
M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java
1 file changed, 19 insertions(+), 2 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/06/19506/1
-- 
To view, visit http://gerrit.cloudera.org:8080/19506
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: Idaa46789663cb2e6d29f518757d89c85ff8e4d1a
Gerrit-Change-Number: 19506
Gerrit-PatchSet: 1
Gerrit-Owner: Csaba Ringhofer <cs...@cloudera.com>

[Impala-ASF-CR] IMPALA-11924: Cap runtime filter NDV with build key NDV

Posted by "David Rorke (Code Review)" <ge...@cloudera.org>.
David Rorke has posted comments on this change. ( http://gerrit.cloudera.org:8080/19506 )

Change subject: IMPALA-11924: Cap runtime filter NDV with build key NDV
......................................................................


Patch Set 1:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/19506/1/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java
File fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java:

http://gerrit.cloudera.org:8080/#/c/19506/1/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java@713
PS1, Line 713:       filterSizeBytes_ = Math.max(filterSizeBytes_, filterSizeLimits.minVal);
It would be useful to log under trace the filters where we've reduced or increased the size based on the configured size limits (along with their original unbounded filterSizeBytes_ value).


http://gerrit.cloudera.org:8080/#/c/19506/1/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java@774
PS1, Line 774:         if (numBloomFilters >= maxNumBloomFilters) continue;
Not directly related to the NDV change but it would be useful to log under trace the filters that we're skipping here because we've exceeded the configured max count.



-- 
To view, visit http://gerrit.cloudera.org:8080/19506
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Idaa46789663cb2e6d29f518757d89c85ff8e4d1a
Gerrit-Change-Number: 19506
Gerrit-PatchSet: 1
Gerrit-Owner: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: David Rorke <dr...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Comment-Date: Thu, 16 Feb 2023 20:28:37 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-11924: Cap runtime filter NDV with build key NDV

Posted by "Riza Suminto (Code Review)" <ge...@cloudera.org>.
Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/19506 )

Change subject: IMPALA-11924: Cap runtime filter NDV with build key NDV
......................................................................


Patch Set 1:

(1 comment)

Hi Csaba, I have a question.

http://gerrit.cloudera.org:8080/#/c/19506/1/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java
File fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java:

http://gerrit.cloudera.org:8080/#/c/19506/1/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java@665
PS1, Line 665: buildKeyNdv_ = srcExpr_.getNumDistinctValues();
My understanding about bloom filter is that it is better to have slightly bigger size than too small size that can lead to higher false-positives.

In the JIRA, you have example of 8X reduction in size from 64KB to 8KB. How accurate is the NDV information to warrant such reduction?
Is it OK if NDV is being considered if the resulting filter size will go above certain threshold without it (say, only consider NDV if filter size will cross above 512KB)?



-- 
To view, visit http://gerrit.cloudera.org:8080/19506
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Idaa46789663cb2e6d29f518757d89c85ff8e4d1a
Gerrit-Change-Number: 19506
Gerrit-PatchSet: 1
Gerrit-Owner: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Comment-Date: Thu, 16 Feb 2023 17:22:52 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-11924: Cap runtime filter NDV with build key NDV

Posted by "Michael Smith (Code Review)" <ge...@cloudera.org>.
Michael Smith has posted comments on this change. ( http://gerrit.cloudera.org:8080/19506 )

Change subject: IMPALA-11924: Cap runtime filter NDV with build key NDV
......................................................................


Patch Set 7:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/19506/7//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/19506/7//COMMIT_MSG@20
PS7, Line 20: - Verified manually that the example in the ticket is fixed.
Why don't we add a test case for this?



-- 
To view, visit http://gerrit.cloudera.org:8080/19506
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Idaa46789663cb2e6d29f518757d89c85ff8e4d1a
Gerrit-Change-Number: 19506
Gerrit-PatchSet: 7
Gerrit-Owner: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: David Rorke <dr...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Smith <mi...@cloudera.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Comment-Date: Fri, 28 Jul 2023 19:22:20 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-11924: Cap runtime filter NDV with build key NDV

Posted by "Riza Suminto (Code Review)" <ge...@cloudera.org>.
Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/19506 )

Change subject: IMPALA-11924: Cap runtime filter NDV with build key NDV
......................................................................


Patch Set 7:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/19506/7//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/19506/7//COMMIT_MSG@20
PS7, Line 20: - Verified manually that the example in the ticket is fixed.
> There is actually test cases that indirectly assert the bloom filter size: 
Actually, let me add test Csaba's test case from https://issues.apache.org/jira/browse/IMPALA-11924



-- 
To view, visit http://gerrit.cloudera.org:8080/19506
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Idaa46789663cb2e6d29f518757d89c85ff8e4d1a
Gerrit-Change-Number: 19506
Gerrit-PatchSet: 7
Gerrit-Owner: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: David Rorke <dr...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Smith <mi...@cloudera.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Comment-Date: Fri, 28 Jul 2023 20:01:35 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-11924: Cap runtime filter NDV with build key NDV

Posted by "Csaba Ringhofer (Code Review)" <ge...@cloudera.org>.
Hello Riza Suminto, David Rorke, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/19506

to look at the new patch set (#3).

Change subject: IMPALA-11924: Cap runtime filter NDV with build key NDV
......................................................................

IMPALA-11924: Cap runtime filter NDV with build key NDV

Before this patch the NDV used for bloom filter sizing was based only
on the cardinality of the build side. This is ok for FK/PK joins but
can highly overestimate NDV if the build key column's NDV is smaller
than the number of rows.

This change takes the minimum of NDV (not changed by selectiveness)
and cardinality (reduced by selectiveness).

Testing:
- found no test for bloom filter sizes, only verified manually that
  the example in the ticket is fixed.

Change-Id: Idaa46789663cb2e6d29f518757d89c85ff8e4d1a
---
M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java
1 file changed, 34 insertions(+), 5 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/06/19506/3
-- 
To view, visit http://gerrit.cloudera.org:8080/19506
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Idaa46789663cb2e6d29f518757d89c85ff8e4d1a
Gerrit-Change-Number: 19506
Gerrit-PatchSet: 3
Gerrit-Owner: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: David Rorke <dr...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>

[Impala-ASF-CR] IMPALA-11924: Cap runtime filter NDV with build key NDV

Posted by "Csaba Ringhofer (Code Review)" <ge...@cloudera.org>.
Hello Riza Suminto, David Rorke, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/19506

to look at the new patch set (#2).

Change subject: IMPALA-11924: Cap runtime filter NDV with build key NDV
......................................................................

IMPALA-11924: Cap runtime filter NDV with build key NDV

Before this patch the NDV used for bloom filter sizing was based only
on the cardinality of the build side. This is ok for FK/PK joins but
can highly overestimate NDV if the build key column's NDV is smaller
than the number of rows.

This change takes the minimum of NDV (not changed by selectiveness)
and cardinality (reduced by selectiveness).

Testing:
- found no test for bloom filter sizes, only verified manually that
  the example in the ticket is fixed.

Change-Id: Idaa46789663cb2e6d29f518757d89c85ff8e4d1a
---
M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java
1 file changed, 47 insertions(+), 19 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/06/19506/2
-- 
To view, visit http://gerrit.cloudera.org:8080/19506
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Idaa46789663cb2e6d29f518757d89c85ff8e4d1a
Gerrit-Change-Number: 19506
Gerrit-PatchSet: 2
Gerrit-Owner: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: David Rorke <dr...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>

[Impala-ASF-CR] IMPALA-11924: Cap runtime filter NDV with build key NDV

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19506 )

Change subject: IMPALA-11924: Cap runtime filter NDV with build key NDV
......................................................................


Patch Set 5:

Build Failed 

https://jenkins.impala.io/job/gerrit-code-review-checks/13661/ : Initial code review checks failed. See linked job for details on the failure.


-- 
To view, visit http://gerrit.cloudera.org:8080/19506
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Idaa46789663cb2e6d29f518757d89c85ff8e4d1a
Gerrit-Change-Number: 19506
Gerrit-PatchSet: 5
Gerrit-Owner: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: David Rorke <dr...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Comment-Date: Fri, 28 Jul 2023 06:54:19 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11924: Cap runtime filter NDV with build key NDV

Posted by "Csaba Ringhofer (Code Review)" <ge...@cloudera.org>.
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/19506 )

Change subject: IMPALA-11924: Cap runtime filter NDV with build key NDV
......................................................................


Patch Set 1:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/19506/1/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java
File fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java:

http://gerrit.cloudera.org:8080/#/c/19506/1/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java@665
PS1, Line 665: buildKeyNdv_ = srcExpr_.getNumDistinctValues();
> My understanding about bloom filter is that it is better to have slightly b
AFAIK the NDV estimates are much more conservative than the cardinality estimate we use, as the cardinality is decreased based on the estimated selectivity of predicates while NDV comes from column stats (possibly modified by expression, e.g. % 8 would decrease it 8, or a * b would multiply their NDVs too). If Impala creates too small Bloom filters at the moment then I think that it is the result of the "optimistic" cardinality estimates / small target FPP / small RUNTIME_FILTER_MAX_SIZE

So I think that this is a safe change in general. There is already query option RUNTIME_FILTER_MIN_SIZE to set a minimum for the Bloom filter size.

Was thinking a bit about the problem of wrong Bloom filter sizes and created a ticket about trying to get the sizes during query execution instead of relying on the planner:
IMPALA-11928


http://gerrit.cloudera.org:8080/#/c/19506/1/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java@774
PS1, Line 774:         if (numBloomFilters >= maxNumBloomFilters) continue;
> Not directly related to the NDV change but it would be useful to log under 
I agree with the usefulness of this, will add them in the next patch.



-- 
To view, visit http://gerrit.cloudera.org:8080/19506
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Idaa46789663cb2e6d29f518757d89c85ff8e4d1a
Gerrit-Change-Number: 19506
Gerrit-PatchSet: 1
Gerrit-Owner: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: David Rorke <dr...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Comment-Date: Fri, 17 Feb 2023 09:46:14 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-11924: Cap runtime filter NDV with build key NDV

Posted by "Riza Suminto (Code Review)" <ge...@cloudera.org>.
Riza Suminto has uploaded a new patch set (#6) to the change originally created by Csaba Ringhofer. ( http://gerrit.cloudera.org:8080/19506 )

Change subject: IMPALA-11924: Cap runtime filter NDV with build key NDV
......................................................................

IMPALA-11924: Cap runtime filter NDV with build key NDV

Before this patch the NDV used for bloom filter sizing was based only
on the cardinality of the build side. This is ok for FK/PK joins but
can highly overestimate NDV if the build key column's NDV is smaller
than the number of rows.

This change takes the minimum of NDV (not changed by selectiveness)
and cardinality (reduced by selectiveness).

Testing:
- Adjust test_bloom_filters and test_row_filters, raising the NDV of
  testcase such that the assertion is mintained.
- Verified manually that the example in the ticket is fixed.

Change-Id: Idaa46789663cb2e6d29f518757d89c85ff8e4d1a
---
M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java
M testdata/workloads/functional-query/queries/QueryTest/bloom_filters.test
M testdata/workloads/functional-query/queries/QueryTest/runtime_row_filters.test
3 files changed, 45 insertions(+), 10 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/06/19506/6
-- 
To view, visit http://gerrit.cloudera.org:8080/19506
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Idaa46789663cb2e6d29f518757d89c85ff8e4d1a
Gerrit-Change-Number: 19506
Gerrit-PatchSet: 6
Gerrit-Owner: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: David Rorke <dr...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>

[Impala-ASF-CR] IMPALA-11924: Cap runtime filter NDV with build key NDV

Posted by "Riza Suminto (Code Review)" <ge...@cloudera.org>.
Riza Suminto has uploaded a new patch set (#8) to the change originally created by Csaba Ringhofer. ( http://gerrit.cloudera.org:8080/19506 )

Change subject: IMPALA-11924: Cap runtime filter NDV with build key NDV
......................................................................

IMPALA-11924: Cap runtime filter NDV with build key NDV

Before this patch, the NDV used for bloom filter sizing was based only
on the cardinality of the build side. This is ok for FK/PK joins but
can highly overestimate NDV if the build key column's NDV is smaller
than the number of rows.

This change takes the minimum of NDV (not changed by selectiveness)
and cardinality (reduced by selectiveness).

Testing:
- Adjust test_bloom_filters and test_row_filters, raising the NDV of
  the test case such that the assertion is maintained.
- Add 8KB bloom filter test case in test_bloom_filters.

Change-Id: Idaa46789663cb2e6d29f518757d89c85ff8e4d1a
---
M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java
M testdata/workloads/functional-query/queries/QueryTest/bloom_filters.test
M testdata/workloads/functional-query/queries/QueryTest/runtime_row_filters.test
3 files changed, 56 insertions(+), 10 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/06/19506/8
-- 
To view, visit http://gerrit.cloudera.org:8080/19506
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Idaa46789663cb2e6d29f518757d89c85ff8e4d1a
Gerrit-Change-Number: 19506
Gerrit-PatchSet: 8
Gerrit-Owner: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: David Rorke <dr...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Smith <mi...@cloudera.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>

[Impala-ASF-CR] IMPALA-11924: Cap runtime filter NDV with build key NDV

Posted by "Riza Suminto (Code Review)" <ge...@cloudera.org>.
Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/19506 )

Change subject: IMPALA-11924: Cap runtime filter NDV with build key NDV
......................................................................


Patch Set 8:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/19506/7//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/19506/7//COMMIT_MSG@20
PS7, Line 20: - Add 8KB bloom filter test case in test_bloom_filters.
> Actually, let me add test Csaba's test case from https://issues.apache.org/
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/19506
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Idaa46789663cb2e6d29f518757d89c85ff8e4d1a
Gerrit-Change-Number: 19506
Gerrit-PatchSet: 8
Gerrit-Owner: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: David Rorke <dr...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Smith <mi...@cloudera.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Comment-Date: Fri, 28 Jul 2023 20:08:36 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-11924: Cap runtime filter NDV with build key NDV

Posted by "Riza Suminto (Code Review)" <ge...@cloudera.org>.
Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/19506 )

Change subject: IMPALA-11924: Cap runtime filter NDV with build key NDV
......................................................................


Patch Set 3: Code-Review+2

Upgrading my votes to +2 to move this patch forward.


-- 
To view, visit http://gerrit.cloudera.org:8080/19506
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Idaa46789663cb2e6d29f518757d89c85ff8e4d1a
Gerrit-Change-Number: 19506
Gerrit-PatchSet: 3
Gerrit-Owner: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: David Rorke <dr...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Comment-Date: Thu, 27 Jul 2023 15:53:41 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11924: Cap runtime filter NDV with build key NDV

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19506 )

Change subject: IMPALA-11924: Cap runtime filter NDV with build key NDV
......................................................................


Patch Set 1:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/12388/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/19506
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Idaa46789663cb2e6d29f518757d89c85ff8e4d1a
Gerrit-Change-Number: 19506
Gerrit-PatchSet: 1
Gerrit-Owner: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Thu, 16 Feb 2023 17:04:27 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11924: Cap runtime filter NDV with build key NDV

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19506 )

Change subject: IMPALA-11924: Cap runtime filter NDV with build key NDV
......................................................................


Patch Set 3:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/12462/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/19506
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Idaa46789663cb2e6d29f518757d89c85ff8e4d1a
Gerrit-Change-Number: 19506
Gerrit-PatchSet: 3
Gerrit-Owner: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: David Rorke <dr...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Comment-Date: Mon, 27 Feb 2023 19:38:20 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11924: Cap runtime filter NDV with build key NDV

Posted by "Riza Suminto (Code Review)" <ge...@cloudera.org>.
Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/19506 )

Change subject: IMPALA-11924: Cap runtime filter NDV with build key NDV
......................................................................


Patch Set 3: Code-Review+1

This looks OK to me.


-- 
To view, visit http://gerrit.cloudera.org:8080/19506
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Idaa46789663cb2e6d29f518757d89c85ff8e4d1a
Gerrit-Change-Number: 19506
Gerrit-PatchSet: 3
Gerrit-Owner: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: David Rorke <dr...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Comment-Date: Mon, 27 Feb 2023 19:41:51 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11924: Cap runtime filter NDV with build key NDV

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19506 )

Change subject: IMPALA-11924: Cap runtime filter NDV with build key NDV
......................................................................


Patch Set 4: Code-Review+2


-- 
To view, visit http://gerrit.cloudera.org:8080/19506
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Idaa46789663cb2e6d29f518757d89c85ff8e4d1a
Gerrit-Change-Number: 19506
Gerrit-PatchSet: 4
Gerrit-Owner: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: David Rorke <dr...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Comment-Date: Thu, 27 Jul 2023 15:55:48 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11924: Cap runtime filter NDV with build key NDV

Posted by "Csaba Ringhofer (Code Review)" <ge...@cloudera.org>.
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/19506 )

Change subject: IMPALA-11924: Cap runtime filter NDV with build key NDV
......................................................................


Patch Set 3:

(2 comments)

please ignore ps2, I think I misunderstood one of the comments

http://gerrit.cloudera.org:8080/#/c/19506/1/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java
File fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java:

http://gerrit.cloudera.org:8080/#/c/19506/1/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java@713
PS1, Line 713:       int logFilterSize = FeSupport.GetMinLogSpaceForBloomFilter(ndvEstimate_, targetFpp);
> It would be useful to log under trace the filters where we've reduced or in
Added the original filter size before applying limits to the trace. I think that this provides enough info about the effects of the limits.


http://gerrit.cloudera.org:8080/#/c/19506/1/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java@774
PS1, Line 774:     // heavy-weight than the other filter types.
> I agree with the usefulness of this, will add them in the next patch.
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/19506
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Idaa46789663cb2e6d29f518757d89c85ff8e4d1a
Gerrit-Change-Number: 19506
Gerrit-PatchSet: 3
Gerrit-Owner: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: David Rorke <dr...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Comment-Date: Mon, 27 Feb 2023 19:24:20 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-11924: Cap runtime filter NDV with build key NDV

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19506 )

Change subject: IMPALA-11924: Cap runtime filter NDV with build key NDV
......................................................................


Patch Set 4:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/9535/ DRY_RUN=false


-- 
To view, visit http://gerrit.cloudera.org:8080/19506
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Idaa46789663cb2e6d29f518757d89c85ff8e4d1a
Gerrit-Change-Number: 19506
Gerrit-PatchSet: 4
Gerrit-Owner: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: David Rorke <dr...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Comment-Date: Thu, 27 Jul 2023 15:55:49 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11924: Cap runtime filter NDV with build key NDV

Posted by "Riza Suminto (Code Review)" <ge...@cloudera.org>.
Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/19506 )

Change subject: IMPALA-11924: Cap runtime filter NDV with build key NDV
......................................................................


Patch Set 4:

> Patch Set 4: Verified-1
> 
> Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/9535/

Two tests fail because runtime filter size reduced in those tests, which is a good sign. I'd like to help adjust the tests to maintain the assertion.


-- 
To view, visit http://gerrit.cloudera.org:8080/19506
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Idaa46789663cb2e6d29f518757d89c85ff8e4d1a
Gerrit-Change-Number: 19506
Gerrit-PatchSet: 4
Gerrit-Owner: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: David Rorke <dr...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Comment-Date: Fri, 28 Jul 2023 06:24:56 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11924: Cap runtime filter NDV with build key NDV

Posted by "Riza Suminto (Code Review)" <ge...@cloudera.org>.
Riza Suminto has uploaded a new patch set (#7) to the change originally created by Csaba Ringhofer. ( http://gerrit.cloudera.org:8080/19506 )

Change subject: IMPALA-11924: Cap runtime filter NDV with build key NDV
......................................................................

IMPALA-11924: Cap runtime filter NDV with build key NDV

Before this patch, the NDV used for bloom filter sizing was based only
on the cardinality of the build side. This is ok for FK/PK joins but
can highly overestimate NDV if the build key column's NDV is smaller
than the number of rows.

This change takes the minimum of NDV (not changed by selectiveness)
and cardinality (reduced by selectiveness).

Testing:
- Adjust test_bloom_filters and test_row_filters, raising the NDV of
  the test case such that the assertion is maintained.
- Verified manually that the example in the ticket is fixed.

Change-Id: Idaa46789663cb2e6d29f518757d89c85ff8e4d1a
---
M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java
M testdata/workloads/functional-query/queries/QueryTest/bloom_filters.test
M testdata/workloads/functional-query/queries/QueryTest/runtime_row_filters.test
3 files changed, 45 insertions(+), 10 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/06/19506/7
-- 
To view, visit http://gerrit.cloudera.org:8080/19506
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Idaa46789663cb2e6d29f518757d89c85ff8e4d1a
Gerrit-Change-Number: 19506
Gerrit-PatchSet: 7
Gerrit-Owner: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: David Rorke <dr...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>

[Impala-ASF-CR] IMPALA-11924: Cap runtime filter NDV with build key NDV

Posted by "Michael Smith (Code Review)" <ge...@cloudera.org>.
Michael Smith has posted comments on this change. ( http://gerrit.cloudera.org:8080/19506 )

Change subject: IMPALA-11924: Cap runtime filter NDV with build key NDV
......................................................................


Patch Set 9: Verified+1

New tests pass locally for me.


-- 
To view, visit http://gerrit.cloudera.org:8080/19506
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Idaa46789663cb2e6d29f518757d89c85ff8e4d1a
Gerrit-Change-Number: 19506
Gerrit-PatchSet: 9
Gerrit-Owner: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: David Rorke <dr...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Smith <mi...@cloudera.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Comment-Date: Fri, 28 Jul 2023 21:14:42 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11924: Cap runtime filter NDV with build key NDV

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19506 )

Change subject: IMPALA-11924: Cap runtime filter NDV with build key NDV
......................................................................


Patch Set 4: Verified-1

Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/9535/


-- 
To view, visit http://gerrit.cloudera.org:8080/19506
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Idaa46789663cb2e6d29f518757d89c85ff8e4d1a
Gerrit-Change-Number: 19506
Gerrit-PatchSet: 4
Gerrit-Owner: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: David Rorke <dr...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Comment-Date: Thu, 27 Jul 2023 20:49:04 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11924: Cap runtime filter NDV with build key NDV

Posted by "Riza Suminto (Code Review)" <ge...@cloudera.org>.
Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/19506 )

Change subject: IMPALA-11924: Cap runtime filter NDV with build key NDV
......................................................................


Patch Set 5:

l_orderkey from tpch.lineitem has cardinality around 6M, but only has 1.5M NDV.
Patch set 5 update the two tests to increase the NDV of the testcases and maintain the assertions.


-- 
To view, visit http://gerrit.cloudera.org:8080/19506
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Idaa46789663cb2e6d29f518757d89c85ff8e4d1a
Gerrit-Change-Number: 19506
Gerrit-PatchSet: 5
Gerrit-Owner: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: David Rorke <dr...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Comment-Date: Fri, 28 Jul 2023 06:29:38 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11924: Cap runtime filter NDV with build key NDV

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19506 )

Change subject: IMPALA-11924: Cap runtime filter NDV with build key NDV
......................................................................


Patch Set 6:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/13666/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/19506
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Idaa46789663cb2e6d29f518757d89c85ff8e4d1a
Gerrit-Change-Number: 19506
Gerrit-PatchSet: 6
Gerrit-Owner: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: David Rorke <dr...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Smith <mi...@cloudera.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Comment-Date: Fri, 28 Jul 2023 19:35:24 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11924: Cap runtime filter NDV with build key NDV

Posted by "Riza Suminto (Code Review)" <ge...@cloudera.org>.
Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/19506 )

Change subject: IMPALA-11924: Cap runtime filter NDV with build key NDV
......................................................................


Patch Set 7:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/19506/7//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/19506/7//COMMIT_MSG@20
PS7, Line 20: - Verified manually that the example in the ticket is fixed.
> Why don't we add a test case for this?
There is actually test cases that indirectly assert the bloom filter size: test_bloom_filters and test_row_filters.
Filter size do shrink after this patch, indicated by test failure in patch set 4 like this:
https://jenkins.impala.io/job/ubuntu-16.04-from-scratch/19875/

Patch set 6 and 7 adjust the test case instead of changing the assertion.



-- 
To view, visit http://gerrit.cloudera.org:8080/19506
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Idaa46789663cb2e6d29f518757d89c85ff8e4d1a
Gerrit-Change-Number: 19506
Gerrit-PatchSet: 7
Gerrit-Owner: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: David Rorke <dr...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Smith <mi...@cloudera.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Comment-Date: Fri, 28 Jul 2023 19:43:01 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-11924: Cap runtime filter NDV with build key NDV

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19506 )

Change subject: IMPALA-11924: Cap runtime filter NDV with build key NDV
......................................................................


Patch Set 9:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/9541/ DRY_RUN=false


-- 
To view, visit http://gerrit.cloudera.org:8080/19506
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Idaa46789663cb2e6d29f518757d89c85ff8e4d1a
Gerrit-Change-Number: 19506
Gerrit-PatchSet: 9
Gerrit-Owner: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: David Rorke <dr...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Smith <mi...@cloudera.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Comment-Date: Fri, 28 Jul 2023 21:05:33 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11924: Cap runtime filter NDV with build key NDV

Posted by "Michael Smith (Code Review)" <ge...@cloudera.org>.
Michael Smith has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/19506 )

Change subject: IMPALA-11924: Cap runtime filter NDV with build key NDV
......................................................................

IMPALA-11924: Cap runtime filter NDV with build key NDV

Before this patch, the NDV used for bloom filter sizing was based only
on the cardinality of the build side. This is ok for FK/PK joins but
can highly overestimate NDV if the build key column's NDV is smaller
than the number of rows.

This change takes the minimum of NDV (not changed by selectiveness)
and cardinality (reduced by selectiveness).

Testing:
- Adjust test_bloom_filters and test_row_filters, raising the NDV of
  the test case such that the assertion is maintained.
- Add 8KB bloom filter test case in test_bloom_filters.

Change-Id: Idaa46789663cb2e6d29f518757d89c85ff8e4d1a
Reviewed-on: http://gerrit.cloudera.org:8080/19506
Reviewed-by: Impala Public Jenkins <im...@cloudera.com>
Tested-by: Michael Smith <mi...@cloudera.com>
---
M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java
M testdata/workloads/functional-query/queries/QueryTest/bloom_filters.test
M testdata/workloads/functional-query/queries/QueryTest/runtime_row_filters.test
3 files changed, 56 insertions(+), 10 deletions(-)

Approvals:
  Impala Public Jenkins: Looks good to me, approved
  Michael Smith: Verified

-- 
To view, visit http://gerrit.cloudera.org:8080/19506
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: Idaa46789663cb2e6d29f518757d89c85ff8e4d1a
Gerrit-Change-Number: 19506
Gerrit-PatchSet: 10
Gerrit-Owner: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: David Rorke <dr...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Smith <mi...@cloudera.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>

[Impala-ASF-CR] IMPALA-11924: Cap runtime filter NDV with build key NDV

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19506 )

Change subject: IMPALA-11924: Cap runtime filter NDV with build key NDV
......................................................................


Patch Set 9: Code-Review+2


-- 
To view, visit http://gerrit.cloudera.org:8080/19506
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Idaa46789663cb2e6d29f518757d89c85ff8e4d1a
Gerrit-Change-Number: 19506
Gerrit-PatchSet: 9
Gerrit-Owner: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: David Rorke <dr...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Smith <mi...@cloudera.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Comment-Date: Fri, 28 Jul 2023 21:05:32 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11924: Cap runtime filter NDV with build key NDV

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19506 )

Change subject: IMPALA-11924: Cap runtime filter NDV with build key NDV
......................................................................


Patch Set 8:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/13667/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/19506
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Idaa46789663cb2e6d29f518757d89c85ff8e4d1a
Gerrit-Change-Number: 19506
Gerrit-PatchSet: 8
Gerrit-Owner: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: David Rorke <dr...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Smith <mi...@cloudera.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Comment-Date: Fri, 28 Jul 2023 20:28:47 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11924: Cap runtime filter NDV with build key NDV

Posted by "Michael Smith (Code Review)" <ge...@cloudera.org>.
Michael Smith has posted comments on this change. ( http://gerrit.cloudera.org:8080/19506 )

Change subject: IMPALA-11924: Cap runtime filter NDV with build key NDV
......................................................................


Patch Set 8: Code-Review+2


-- 
To view, visit http://gerrit.cloudera.org:8080/19506
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Idaa46789663cb2e6d29f518757d89c85ff8e4d1a
Gerrit-Change-Number: 19506
Gerrit-PatchSet: 8
Gerrit-Owner: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: David Rorke <dr...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Smith <mi...@cloudera.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Comment-Date: Fri, 28 Jul 2023 20:56:38 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11924: Cap runtime filter NDV with build key NDV

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19506 )

Change subject: IMPALA-11924: Cap runtime filter NDV with build key NDV
......................................................................


Patch Set 9: Verified+1


-- 
To view, visit http://gerrit.cloudera.org:8080/19506
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Idaa46789663cb2e6d29f518757d89c85ff8e4d1a
Gerrit-Change-Number: 19506
Gerrit-PatchSet: 9
Gerrit-Owner: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: David Rorke <dr...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Smith <mi...@cloudera.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Comment-Date: Sat, 29 Jul 2023 01:26:40 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11924: Cap runtime filter NDV with build key NDV

Posted by "Riza Suminto (Code Review)" <ge...@cloudera.org>.
Riza Suminto has uploaded a new patch set (#5) to the change originally created by Csaba Ringhofer. ( http://gerrit.cloudera.org:8080/19506 )

Change subject: IMPALA-11924: Cap runtime filter NDV with build key NDV
......................................................................

IMPALA-11924: Cap runtime filter NDV with build key NDV

Before this patch the NDV used for bloom filter sizing was based only
on the cardinality of the build side. This is ok for FK/PK joins but
can highly overestimate NDV if the build key column's NDV is smaller
than the number of rows.

This change takes the minimum of NDV (not changed by selectiveness)
and cardinality (reduced by selectiveness).

Testing:
- found no test for bloom filter sizes, only verified manually that
  the example in the ticket is fixed.

Change-Id: Idaa46789663cb2e6d29f518757d89c85ff8e4d1a
---
M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java
M testdata/workloads/functional-query/queries/QueryTest/bloom_filters.test
M testdata/workloads/functional-query/queries/QueryTest/runtime_row_filters.test
3 files changed, 48 insertions(+), 13 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/06/19506/5
-- 
To view, visit http://gerrit.cloudera.org:8080/19506
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Idaa46789663cb2e6d29f518757d89c85ff8e4d1a
Gerrit-Change-Number: 19506
Gerrit-PatchSet: 5
Gerrit-Owner: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: David Rorke <dr...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>

[Impala-ASF-CR] IMPALA-11924: Cap runtime filter NDV with build key NDV

Posted by "Riza Suminto (Code Review)" <ge...@cloudera.org>.
Riza Suminto has posted comments on this change. ( http://gerrit.cloudera.org:8080/19506 )

Change subject: IMPALA-11924: Cap runtime filter NDV with build key NDV
......................................................................


Patch Set 2:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/19506/2/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java
File fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java:

http://gerrit.cloudera.org:8080/#/c/19506/2/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java@690
PS2, Line 690: Sets the filter size (in bytes) based on the filter type
Is this stamement still true after changing the method signature?


http://gerrit.cloudera.org:8080/#/c/19506/2/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java@772
PS2, Line 772: // We only enforce a limit on the number of bloom filters as they are much more
             :     // heavy-weight than the other filter types.
nit: Looks like we also limit IN-LIST filter in the following loop. Maybe update this comment to mention that?



-- 
To view, visit http://gerrit.cloudera.org:8080/19506
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Idaa46789663cb2e6d29f518757d89c85ff8e4d1a
Gerrit-Change-Number: 19506
Gerrit-PatchSet: 2
Gerrit-Owner: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: David Rorke <dr...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Comment-Date: Mon, 27 Feb 2023 17:10:19 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-11924: Cap runtime filter NDV with build key NDV

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19506 )

Change subject: IMPALA-11924: Cap runtime filter NDV with build key NDV
......................................................................


Patch Set 2:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/12459/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/19506
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Idaa46789663cb2e6d29f518757d89c85ff8e4d1a
Gerrit-Change-Number: 19506
Gerrit-PatchSet: 2
Gerrit-Owner: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: David Rorke <dr...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Riza Suminto <ri...@cloudera.com>
Gerrit-Comment-Date: Mon, 27 Feb 2023 17:11:55 +0000
Gerrit-HasComments: No