You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@impala.apache.org by "Wenzhe Zhou (Code Review)" <ge...@cloudera.org> on 2020/04/08 04:50:15 UTC

[Impala-ASF-CR] IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu

Wenzhe Zhou has uploaded this change for review. ( http://gerrit.cloudera.org:8080/15683


Change subject: IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu
......................................................................

IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu

Defined the BloomFilter class as the wrapper of Kudu BlockBloomFilter,
which build runtime bloom filter in Kudu BlockBloomFilter APIs with
FastHash as default hash algorithm. Removed the duplicated functions
from BloomFillter class.
Added a query option to set runtime filter scheme for Kudu. Pushed
down runtime filters to Kudu through Kudu client APIs.
Added new test cases in PlannerTest and end-end runtime_filters test
for pushing down bloom filter to Kudu.

Testing:
Passed end-end runtime filter tests with codegen disabled.
Passed frontend Planner tests.

Change-Id: I9100076f68ea299ddb6ec8bc027cac7a47f5d754
---
M be/CMakeLists.txt
M be/src/exec/filter-context.cc
M be/src/exec/kudu-scanner.cc
M be/src/runtime/raw-value.h
M be/src/runtime/runtime-filter-bank.cc
M be/src/runtime/runtime-filter-bank.h
M be/src/runtime/runtime-filter-ir.cc
M be/src/runtime/runtime-filter.h
M be/src/service/query-options-test.cc
M be/src/service/query-options.cc
M be/src/service/query-options.h
M be/src/util/bloom-filter-ir.cc
M be/src/util/bloom-filter.cc
M be/src/util/bloom-filter.h
A be/src/util/bloom-filter.inline.h
M be/src/util/debug-util.cc
M be/src/util/debug-util.h
M be/src/util/hash-util.h
M common/thrift/ImpalaInternalService.thrift
M common/thrift/ImpalaService.thrift
M common/thrift/PlanNodes.thrift
M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java
M testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-query-options.test
M tests/query_test/test_runtime_filters.py
24 files changed, 623 insertions(+), 360 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/83/15683/2
-- 
To view, visit http://gerrit.cloudera.org:8080/15683
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I9100076f68ea299ddb6ec8bc027cac7a47f5d754
Gerrit-Change-Number: 15683
Gerrit-PatchSet: 2
Gerrit-Owner: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Reviewer: Bankim Bhavsar <ba...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>

[Impala-ASF-CR] IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15683 )

Change subject: IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu
......................................................................


Patch Set 3:

Build Failed 

https://jenkins.impala.io/job/gerrit-code-review-checks/5749/ : Initial code review checks failed. See linked job for details on the failure.


-- 
To view, visit http://gerrit.cloudera.org:8080/15683
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9100076f68ea299ddb6ec8bc027cac7a47f5d754
Gerrit-Change-Number: 15683
Gerrit-PatchSet: 3
Gerrit-Owner: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Reviewer: Bankim Bhavsar <ba...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Wed, 08 Apr 2020 06:20:50 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu

Posted by "Aman Sinha (Code Review)" <ge...@cloudera.org>.
Aman Sinha has removed a vote on this change.

Change subject: IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu
......................................................................


Removed Code-Review+1 by Aman Sinha <am...@cloudera.com>
-- 
To view, visit http://gerrit.cloudera.org:8080/15683
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: deleteVote
Gerrit-Change-Id: I9100076f68ea299ddb6ec8bc027cac7a47f5d754
Gerrit-Change-Number: 15683
Gerrit-PatchSet: 3
Gerrit-Owner: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Bankim Bhavsar <ba...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>

[Impala-ASF-CR] IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15683 )

Change subject: IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu
......................................................................


Patch Set 2:

Build Failed 

https://jenkins.impala.io/job/gerrit-code-review-checks/5748/ : Initial code review checks failed. See linked job for details on the failure.


-- 
To view, visit http://gerrit.cloudera.org:8080/15683
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9100076f68ea299ddb6ec8bc027cac7a47f5d754
Gerrit-Change-Number: 15683
Gerrit-PatchSet: 2
Gerrit-Owner: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Reviewer: Bankim Bhavsar <ba...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Comment-Date: Wed, 08 Apr 2020 05:08:06 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu

Posted by "Thomas Tauber-Marshall (Code Review)" <ge...@cloudera.org>.
Thomas Tauber-Marshall has posted comments on this change. ( http://gerrit.cloudera.org:8080/15683 )

Change subject: IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu
......................................................................


Patch Set 24:

Could you do a pass over this with clang-format-diff? It's ready to be +2ed except for a few formatting issues.


-- 
To view, visit http://gerrit.cloudera.org:8080/15683
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9100076f68ea299ddb6ec8bc027cac7a47f5d754
Gerrit-Change-Number: 15683
Gerrit-PatchSet: 24
Gerrit-Owner: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Bankim Bhavsar <ba...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Jim Apple <jb...@apache.org>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Thu, 04 Jun 2020 17:37:34 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15683 )

Change subject: IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu
......................................................................


Patch Set 11:

Build Failed 

https://jenkins.impala.io/job/gerrit-code-review-checks/5995/ : Initial code review checks failed. See linked job for details on the failure.


-- 
To view, visit http://gerrit.cloudera.org:8080/15683
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9100076f68ea299ddb6ec8bc027cac7a47f5d754
Gerrit-Change-Number: 15683
Gerrit-PatchSet: 11
Gerrit-Owner: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Bankim Bhavsar <ba...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Thu, 07 May 2020 18:47:49 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15683 )

Change subject: IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu
......................................................................


Patch Set 20:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/6133/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/15683
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9100076f68ea299ddb6ec8bc027cac7a47f5d754
Gerrit-Change-Number: 15683
Gerrit-PatchSet: 20
Gerrit-Owner: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Bankim Bhavsar <ba...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Jim Apple <jb...@apache.org>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Sat, 23 May 2020 22:54:42 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15683 )

Change subject: IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu
......................................................................


Patch Set 17:

Build Failed 

https://jenkins.impala.io/job/gerrit-code-review-checks/6116/ : Initial code review checks failed. See linked job for details on the failure.


-- 
To view, visit http://gerrit.cloudera.org:8080/15683
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9100076f68ea299ddb6ec8bc027cac7a47f5d754
Gerrit-Change-Number: 15683
Gerrit-PatchSet: 17
Gerrit-Owner: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Bankim Bhavsar <ba...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Jim Apple <jb...@apache.org>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Wed, 20 May 2020 18:43:44 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu

Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Todd Lipcon has posted comments on this change. ( http://gerrit.cloudera.org:8080/15683 )

Change subject: IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu
......................................................................


Patch Set 3:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/15683/3/common/thrift/PlanNodes.thrift
File common/thrift/PlanNodes.thrift:

http://gerrit.cloudera.org:8080/#/c/15683/3/common/thrift/PlanNodes.thrift@124
PS3, Line 124:   BLOOM_MIN_MAX = 2
> This is my understanding, I'll check further within Kudu team.
Right, sortedness in kudu is based on the (compound) primary key. So, min-max predicates can help in the following ways:
(1) eliminate range partitions -- eg a min/max on a timestamp could would help a lot if the table is timestamp-partitioned by day, whereas a bloom filter on timestamp is probably not useful since cardinality is very high

(2) convert to primary key-based range scans -- if we have inequality/equality predicates on a prefix of the primary key, we can convert those predicates into a range scan on the table using b-tree style indexes. Again if you had a min-max on timestamp and timestamp were the leading portion of your key, it's a big win. Or, if you had a min-max on timestamp, a composite primary key of (entity_id, timestamp), and an equality predicate on entity_id, it would still do the PK range scan conversion for a big win.

(3) normal row-by-row evaluation - this is still useful since, particularly for primitive columns, it's very fast to evaluate (SIMD-accelerated in latest release, etc). It's likely much faster to evaluate than a bloom filter which will probably have some frequency of L1 cache misses if not L2/LLC, plus the hash calculation and mixing itself which likely takes a couple cycles. So, you're right that it may not eliminate much data in some queries, but the cost might be small enough that it's worth doing even if it eliminates 10-20%.

To validate this I loaded 800M rows into a table using 'kudu perf loadgen' and then compared scanning with

$ :./build/latest/bin/kudu perf table_scan localhost default.loadgen_auto_0b01a9eba6d54f2ba7fa5b59e3a597c5 --columns=int_val --num-threads=10 --predicates ["AND", [">=", "int_val", 0]]

vs the same without the predicate. In this case the predicate matches all rows, but there's no special short-circuit logic, so it actually evaluates it everywhere. Wall clock times:

with the predicate:             1.7023 +- 0.0266 seconds time elapsed  ( +-  1.56% )
without the predicate:             1.6514 +- 0.0370 seconds time elapsed  ( +-  2.24% )

Also running perf record -a, I can see that about 5% of the total CPU was used on the tserver in ApplyPredicatePrimitive.

Admittedly, this overhead will go up on a percentage basis after some other in-flight work we have is finished (eg more CPU-efficient integer encoding bankim's working on) but I think we also have more room to optimize ApplyPredicatePrimitive by using AVX2 instead of just SSE4 instructions when available.

As for the overhead of sending to the coordinator and broadcasting, isn't this a _tiny_ amount of data? ie it's only two values, which for primitives is going to be a max of 16 bytes each, and for varlen, you could feasibly truncate to a fixed size bound even in the case of a long string.

So, I'm generally in favor of sending and evaluating both. In the best case it's a huge win (eliminating range partitions) and in the worst case it's probably not a major cost.

One further thought here as a future extension: we could provide an API in the Kudu scanner that says "this predicate is optional: drop it if you think it's not filtering much". For the case of these propagated predicates feeding into a join, dropping them at runtime if they are ineffective is worthwhile (I think Impala already does something like this in its scanner path, right?)



-- 
To view, visit http://gerrit.cloudera.org:8080/15683
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9100076f68ea299ddb6ec8bc027cac7a47f5d754
Gerrit-Change-Number: 15683
Gerrit-PatchSet: 3
Gerrit-Owner: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Bankim Bhavsar <ba...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Fri, 10 Apr 2020 03:49:11 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu

Posted by "Thomas Tauber-Marshall (Code Review)" <ge...@cloudera.org>.
Thomas Tauber-Marshall has posted comments on this change. ( http://gerrit.cloudera.org:8080/15683 )

Change subject: IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu
......................................................................


Patch Set 9:

(17 comments)

http://gerrit.cloudera.org:8080/#/c/15683/9/be/CMakeLists.txt
File be/CMakeLists.txt:

http://gerrit.cloudera.org:8080/#/c/15683/9/be/CMakeLists.txt@234
PS9, Line 234:   "-DKUDU_HEADERS_NO_STUBS" 
nit formatting: trailing whitespace and you can wrap this into the rest of the string instead of putting it on its own line


http://gerrit.cloudera.org:8080/#/c/15683/9/be/src/exec/filter-context.cc
File be/src/exec/filter-context.cc:

http://gerrit.cloudera.org:8080/#/c/15683/9/be/src/exec/filter-context.cc@91
PS9, Line 91:     //local_bloom_filter->Insert(val, expr_eval->root().type());
I guess this was left by accident?


http://gerrit.cloudera.org:8080/#/c/15683/9/be/src/exec/filter-context.cc@106
PS9, Line 106: // An example of the generated code for TPCH-Q2: RF002 -> n_regionkey
This needs to be updated (though of course it will only change slightly). If you're not sure how to get this, see: https://cwiki.apache.org/confluence/display/IMPALA/Codegen

(the easiest way is probably just to add a line at the end of this function that logs fn->dump() or whatever)


http://gerrit.cloudera.org:8080/#/c/15683/9/be/src/exec/kudu-scanner.cc
File be/src/exec/kudu-scanner.cc:

http://gerrit.cloudera.org:8080/#/c/15683/9/be/src/exec/kudu-scanner.cc@232
PS9, Line 232: }
             :       else if
nit: formatting ('else if' should go on the same line as prior '}')

Not sure if I've bugged you about clang-format before, but its a great way to help ensure that you're code reviews don't have these sorts of simple errors. See: https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=65868536

Note that clang-format is just a guide and if it's suggestion differs from surrounding code then you can feel free to ignore it


http://gerrit.cloudera.org:8080/#/c/15683/9/be/src/exec/kudu-scanner.cc@234
PS9, Line 234:         continue;
Brief comment, eg. 'This filter won't actually remove any rows so we don't need to push it down to Kudu'


http://gerrit.cloudera.org:8080/#/c/15683/9/be/src/exec/kudu-scanner.cc@236
PS9, Line 236: else
nit: it doesn't actually change how the code works, but it would be better to leave off this 'else' since its not actually mutually exclusive with the prior 'if's


http://gerrit.cloudera.org:8080/#/c/15683/9/be/src/exec/kudu-scanner.cc@238
PS9, Line 238: if (filter != nullptr) {
I think we can save ourselves some duplication and an indention level by changing this to 'ctx.filter->HasFilter()' and moving it up to the 'if(AlwaysTrue())'


http://gerrit.cloudera.org:8080/#/c/15683/9/be/src/exec/kudu-scanner.cc@239
PS9, Line 239: auto it = ctx.filter->filter_desc().planid_to_target_ndx.find(
             :               scan_node_->id());
             :           const TRuntimeFilterTargetDesc &target_desc =
             :               ctx.filter->filter_desc().targets[it->second];
             :           const string &col_name = target_desc.kudu_col_name;
             :           DCHECK(col_name != "");
This is duplicated below, I think we can eliminate the duplication by moving this up to before the 'if(is_bloom)'


http://gerrit.cloudera.org:8080/#/c/15683/9/be/src/util/bloom-filter.h
File be/src/util/bloom-filter.h:

http://gerrit.cloudera.org:8080/#/c/15683/9/be/src/util/bloom-filter.h@53
PS9, Line 53: class BloomFilterBufferAllocator;
I don't think this is necessary


http://gerrit.cloudera.org:8080/#/c/15683/9/be/src/util/bloom-filter.h@73
PS9, Line 73: BloomFilterBufferAllocator
nit: maybe name this ImpalaBloomFilterBufferAllocator to ensure there's no confusion


http://gerrit.cloudera.org:8080/#/c/15683/9/be/src/util/bloom-filter.h@91
PS9, Line 91:   std::shared_ptr<kudu::BlockBloomFilterBufferAllocatorIf> Clone() const override {
If this is something that's really only used in Kudu's internal testing and won't ever get hit in Impala, it might be better to just do a LOG(FATAL) here or something, since then you would be able to simplify the logic of AllocateBuffer()/Close()/etc. and not support a code path that's never getting hit.

Probably have to check with Bankim to ensure that's really the case, though


http://gerrit.cloudera.org:8080/#/c/15683/9/be/src/util/bloom-filter.h@111
PS9, Line 111: /// A BloomFilter stores sets of items and offers a query operation indicating whether or
This class comment could probably use some updating to reflect the fact that this class is now basically a thin wrapper around Kudu's bloom filter


http://gerrit.cloudera.org:8080/#/c/15683/9/be/src/util/bloom-filter.h@171
PS9, Line 171:   void Insert(void* val, const ColumnType& col_type) noexcept;
Is this version of Insert() used anywhere? Same with the equivalent Find() below


http://gerrit.cloudera.org:8080/#/c/15683/9/be/src/util/bloom-filter.h@241
PS9, Line 241: HashAlgorithm hash_algorithm_;
As far as I can tell, this isn't actually getting used anywhere, and I don't think that it would be valid to pass any value other than FAST_HASH into Init() anyways (since we've hard-coded the use of fasthash in various places such as FilterContext::Insert()), so I think its probably better to remove this, the related parameter to Init(), etc.


http://gerrit.cloudera.org:8080/#/c/15683/9/be/src/util/bloom-filter.cc
File be/src/util/bloom-filter.cc:

http://gerrit.cloudera.org:8080/#/c/15683/9/be/src/util/bloom-filter.cc@205
PS9, Line 205:   if (is_allocated_) Close();
I suspect its probably supposed to be guaranteed that FreeBuffer() is always called before the allocator is destructed, in which case it would be good to do a LOG(DFATAL) here like we do in the BloomFilter destructor above (in general, its against Impala's style to do any real work in destructors, not sure about Kudu but I assume they have the same rule)


http://gerrit.cloudera.org:8080/#/c/15683/9/be/src/util/bloom-filter.cc@226
PS9, Line 226:   Close(); // Ensure that any previously allocated memory is released.
I wonder if this is really necessary - I would hope that Kudu provides a guarantee that every call to AllocateBuffer will have a corresponding call to FreeBuffer. Might need to ask Bankim to be sure.


http://gerrit.cloudera.org:8080/#/c/15683/9/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java
File fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java:

http://gerrit.cloudera.org:8080/#/c/15683/9/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java@723
PS9, Line 723: if (slotRef == null || slotRef.getDesc().getColumn() == null
             :               || (targetExpr instanceof CastExpr)
             :               || filter.getExprCompOp() == Operator.NOT_DISTINCT) {
             :             continue;
             :           }
This is almost identical to the 'if' below. I think we can reduce code duplication by moving this outside the "if(BLOOM) {} else(MIN_MAX) {}" section



-- 
To view, visit http://gerrit.cloudera.org:8080/15683
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9100076f68ea299ddb6ec8bc027cac7a47f5d754
Gerrit-Change-Number: 15683
Gerrit-PatchSet: 9
Gerrit-Owner: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Bankim Bhavsar <ba...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Mon, 04 May 2020 21:57:27 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu

Posted by "Wenzhe Zhou (Code Review)" <ge...@cloudera.org>.
Wenzhe Zhou has uploaded a new patch set (#20). ( http://gerrit.cloudera.org:8080/15683 )

Change subject: IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu
......................................................................

IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu

Defined the BloomFilter class as the wrapper of kudu::BlockBloomFilter.
impala::BloomFilter build runtime bloom filter in kudu::BlockBloomFilter
APIs with FastHash as default hash algorithm.
Removed the duplicated functions from impala::BloomFillter class.
Pushed down bloom filter to Kudu through Kudu clinet API.

Added a new query option ENABLED_RUNTIME_FILTER_TYPES to set enabled
runtime filter types, which only affect Kudu scan node now. By default,
bloom filter is not enabled, only min-max filter will be enabled for
Kudu. With this option, user could enable bloom filter, min-max filter,
or both bloom and min-max runtime filters.

Added new test cases in PlannerTest and end-end runtime_filters test
for pushing down bloom filter to Kudu.
Added test cases to compare the number of rows returned from Kudu
scan when appling different types of runtime filter on same queries.
Updated bloom-filter-benchmark due to the bloom-filter implementation
change.

Bump Kudu version to d652cab17.

Testing:
 - Passed all core tests.

Performance benchmark:
 - Ran single_node_perf_run.py on TPC-H with scale as 30 for parquet
   and Kudu. Verified that new hash function and bloom-filter
   implementation don't cause regressions for HDFS bloom filters.
   For Kudu, there is one regression for query TPCH-Q9 and there
   are improvement for about 8 queris when appling both bloom and
   min-max filters. The bloom filter reduce the number of rows
   returned from Kudu scan, hence reduce the cost for aggregation
   and hash join. But bloom filter evaluation add extra cost for
   Kudu scan, which offset the gain on aggregation and join.
   Kudu scan need to be optimized for bloom filter in following
   tasks.
 - Ran bloom-filter microbenchmarks and verified that there is no
   regression for Insert/Find/Union functions with or without AVX2
   due to bloom-filter implementation changes. There is small
   performance degration for Init function, but this function is not
   in hot path.

Change-Id: I9100076f68ea299ddb6ec8bc027cac7a47f5d754
---
M be/CMakeLists.txt
M be/src/benchmarks/bloom-filter-benchmark.cc
M be/src/codegen/gen_ir_descriptions.py
M be/src/exec/filter-context.cc
M be/src/exec/kudu-scanner.cc
M be/src/kudu/util/block_bloom_filter.h
M be/src/runtime/raw-value-ir.cc
M be/src/runtime/raw-value.h
M be/src/runtime/raw-value.inline.h
M be/src/runtime/runtime-filter-bank.cc
M be/src/runtime/runtime-filter-ir.cc
M be/src/runtime/runtime-filter.h
M be/src/service/query-options-test.cc
M be/src/service/query-options.cc
M be/src/service/query-options.h
M be/src/util/bloom-filter-ir.cc
M be/src/util/bloom-filter-test.cc
M be/src/util/bloom-filter.cc
M be/src/util/bloom-filter.h
M be/src/util/debug-util.cc
M be/src/util/debug-util.h
M be/src/util/hash-util.h
M bin/impala-config.sh
M common/thrift/ImpalaInternalService.thrift
M common/thrift/ImpalaService.thrift
M common/thrift/PlanNodes.thrift
M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
M testdata/workloads/functional-planner/queries/PlannerTest/kudu-update.test
M testdata/workloads/functional-planner/queries/PlannerTest/kudu.test
M testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-query-options.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpch-kudu.test
A testdata/workloads/functional-query/queries/QueryTest/all_runtime_filters.test
A testdata/workloads/functional-query/queries/QueryTest/diff_runtime_filter_types.test
M testdata/workloads/functional-query/queries/QueryTest/runtime_filters.test
M tests/query_test/test_runtime_filters.py
M tests/query_test/test_spilling.py
37 files changed, 1,523 insertions(+), 623 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/83/15683/20
-- 
To view, visit http://gerrit.cloudera.org:8080/15683
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I9100076f68ea299ddb6ec8bc027cac7a47f5d754
Gerrit-Change-Number: 15683
Gerrit-PatchSet: 20
Gerrit-Owner: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Bankim Bhavsar <ba...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Jim Apple <jb...@apache.org>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>

[Impala-ASF-CR] IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu

Posted by "Thomas Tauber-Marshall (Code Review)" <ge...@cloudera.org>.
Thomas Tauber-Marshall has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/15683 )

Change subject: IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu
......................................................................

IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu

Defined the BloomFilter class as the wrapper of kudu::BlockBloomFilter.
impala::BloomFilter build runtime bloom filter in kudu::BlockBloomFilter
APIs with FastHash as default hash algorithm.
Removed the duplicated functions from impala::BloomFillter class.
Pushed down bloom filter to Kudu through Kudu clinet API.

Added a new query option ENABLED_RUNTIME_FILTER_TYPES to set enabled
runtime filter types, which only affect Kudu scan node now. By default,
bloom filter is not enabled, only min-max filter will be enabled for
Kudu. With this option, user could enable bloom filter, min-max filter,
or both bloom and min-max runtime filters.

Added new test cases in PlannerTest and end-end runtime_filters test
for pushing down bloom filter to Kudu.
Added test cases to compare the number of rows returned from Kudu
scan when appling different types of runtime filter on same queries.
Updated bloom-filter-benchmark due to the bloom-filter implementation
change.

Bump Kudu version to d652cab17.

Testing:
 - Passed all exhaustive tests.

Performance benchmark:
 - Ran single_node_perf_run.py on TPC-H with scale as 30 for parquet
   and Kudu. Verified that new hash function and bloom-filter
   implementation don't cause regressions for HDFS bloom filters.
   For Kudu, there is one regression for query TPCH-Q9 and there
   are improvement for about 8 queris when appling both bloom and
   min-max filters. The bloom filter reduce the number of rows
   returned from Kudu scan, hence reduce the cost for aggregation
   and hash join. But bloom filter evaluation add extra cost for
   Kudu scan, which offset the gain on aggregation and join.
   Kudu scan need to be optimized for bloom filter in following
   tasks.
 - Ran bloom-filter microbenchmarks and verified that there is no
   regression for Insert/Find/Union functions with or without AVX2
   due to bloom-filter implementation changes. There is small
   performance degradation for Init function, but this function is
   not in hot path.

Change-Id: I9100076f68ea299ddb6ec8bc027cac7a47f5d754
Reviewed-on: http://gerrit.cloudera.org:8080/15683
Reviewed-by: Thomas Tauber-Marshall <tm...@cloudera.com>
Tested-by: Impala Public Jenkins <im...@cloudera.com>
---
M be/CMakeLists.txt
M be/src/benchmarks/bloom-filter-benchmark.cc
M be/src/codegen/gen_ir_descriptions.py
M be/src/exec/filter-context.cc
M be/src/exec/kudu-scanner.cc
M be/src/runtime/raw-value-ir.cc
M be/src/runtime/raw-value.h
M be/src/runtime/raw-value.inline.h
M be/src/runtime/runtime-filter-bank.cc
M be/src/runtime/runtime-filter-ir.cc
M be/src/runtime/runtime-filter.h
M be/src/service/query-options-test.cc
M be/src/service/query-options.cc
M be/src/service/query-options.h
M be/src/util/bloom-filter-ir.cc
M be/src/util/bloom-filter-test.cc
M be/src/util/bloom-filter.cc
M be/src/util/bloom-filter.h
M be/src/util/debug-util.cc
M be/src/util/debug-util.h
M bin/impala-config.sh
M common/thrift/ImpalaInternalService.thrift
M common/thrift/ImpalaService.thrift
M common/thrift/PlanNodes.thrift
M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
A testdata/workloads/functional-planner/queries/PlannerTest/bloom-filter-assignment.test
M testdata/workloads/functional-planner/queries/PlannerTest/kudu-update.test
M testdata/workloads/functional-planner/queries/PlannerTest/kudu.test
M testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-query-options.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpch-kudu.test
A testdata/workloads/functional-query/queries/QueryTest/all_runtime_filters.test
A testdata/workloads/functional-query/queries/QueryTest/diff_runtime_filter_types.test
M testdata/workloads/functional-query/queries/QueryTest/runtime_filters.test
M tests/query_test/test_runtime_filters.py
M tests/query_test/test_spilling.py
36 files changed, 2,024 insertions(+), 661 deletions(-)

Approvals:
  Thomas Tauber-Marshall: Looks good to me, approved
  Impala Public Jenkins: Verified

-- 
To view, visit http://gerrit.cloudera.org:8080/15683
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I9100076f68ea299ddb6ec8bc027cac7a47f5d754
Gerrit-Change-Number: 15683
Gerrit-PatchSet: 27
Gerrit-Owner: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Bankim Bhavsar <ba...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Jim Apple <jb...@apache.org>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>

[Impala-ASF-CR] IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15683 )

Change subject: IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu
......................................................................


Patch Set 5:

Build Failed 

https://jenkins.impala.io/job/gerrit-code-review-checks/5817/ : Initial code review checks failed. See linked job for details on the failure.


-- 
To view, visit http://gerrit.cloudera.org:8080/15683
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9100076f68ea299ddb6ec8bc027cac7a47f5d754
Gerrit-Change-Number: 15683
Gerrit-PatchSet: 5
Gerrit-Owner: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Bankim Bhavsar <ba...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Fri, 17 Apr 2020 07:14:12 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu

Posted by "Wenzhe Zhou (Code Review)" <ge...@cloudera.org>.
Wenzhe Zhou has uploaded a new patch set (#7). ( http://gerrit.cloudera.org:8080/15683 )

Change subject: IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu
......................................................................

IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu

Defined the BloomFilter class as the wrapper of Kudu BlockBloomFilter,
which build runtime bloom filter in Kudu BlockBloomFilter APIs with
FastHash as default hash algorithm. Removed the duplicated functions
from BloomFillter class.
Pushed down bloom filter to Kudu through Kudu clinet API.
Added a new query option to set enabled runtime filter types. By default,
both bloom filter and min-max filter will be enabled for Kudu.
Added new test cases in PlannerTest and end-end runtime_filters test
for pushing down bloom filter to Kudu.

Testing:
Passed end-end runtime filter tests.
Passed frontend Planner tests.

Change-Id: I9100076f68ea299ddb6ec8bc027cac7a47f5d754
---
M be/CMakeLists.txt
M be/src/codegen/gen_ir_descriptions.py
M be/src/exec/filter-context.cc
M be/src/exec/kudu-scanner.cc
M be/src/runtime/raw-value-ir.cc
M be/src/runtime/raw-value.h
M be/src/runtime/raw-value.inline.h
M be/src/runtime/runtime-filter-bank.cc
M be/src/runtime/runtime-filter-bank.h
M be/src/runtime/runtime-filter-ir.cc
M be/src/runtime/runtime-filter.h
M be/src/service/query-options-test.cc
M be/src/service/query-options.cc
M be/src/service/query-options.h
M be/src/util/bloom-filter-ir.cc
M be/src/util/bloom-filter.cc
M be/src/util/bloom-filter.h
A be/src/util/bloom-filter.inline.h
M be/src/util/debug-util.cc
M be/src/util/debug-util.h
M be/src/util/hash-util.h
M common/thrift/ImpalaInternalService.thrift
M common/thrift/ImpalaService.thrift
M common/thrift/PlanNodes.thrift
M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
M testdata/workloads/functional-planner/queries/PlannerTest/kudu-update.test
M testdata/workloads/functional-planner/queries/PlannerTest/kudu.test
M testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-query-options.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpch-kudu.test
A testdata/workloads/functional-query/queries/QueryTest/all_runtime_filters.test
M testdata/workloads/functional-query/queries/QueryTest/runtime_filters.test
M tests/query_test/test_runtime_filters.py
33 files changed, 1,445 insertions(+), 522 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/83/15683/7
-- 
To view, visit http://gerrit.cloudera.org:8080/15683
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I9100076f68ea299ddb6ec8bc027cac7a47f5d754
Gerrit-Change-Number: 15683
Gerrit-PatchSet: 7
Gerrit-Owner: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Bankim Bhavsar <ba...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>

[Impala-ASF-CR] IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15683 )

Change subject: IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu
......................................................................


Patch Set 6:

Build Failed 

https://jenkins.impala.io/job/gerrit-code-review-checks/5860/ : Initial code review checks failed. See linked job for details on the failure.


-- 
To view, visit http://gerrit.cloudera.org:8080/15683
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9100076f68ea299ddb6ec8bc027cac7a47f5d754
Gerrit-Change-Number: 15683
Gerrit-PatchSet: 6
Gerrit-Owner: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Bankim Bhavsar <ba...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Thu, 23 Apr 2020 08:00:18 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu

Posted by "Wenzhe Zhou (Code Review)" <ge...@cloudera.org>.
Wenzhe Zhou has posted comments on this change. ( http://gerrit.cloudera.org:8080/15683 )

Change subject: IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu
......................................................................


Patch Set 11:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/15683/11/be/src/util/bloom-filter.cc
File be/src/util/bloom-filter.cc:

http://gerrit.cloudera.org:8080/#/c/15683/11/be/src/util/bloom-filter.cc@71
PS11, Line 71: if (directory_in_size == 0)
> Could you clarify with a comment in what case would directory_in_size be 0 
wdirectory_size equal 0 only when the fileter is always false. In the case the bloom filter will not be pushed to Kudu. Will add a comment here.


http://gerrit.cloudera.org:8080/#/c/15683/11/be/src/util/bloom-filter.cc@201
PS11, Line 201:   if (is_allocated_) {
              :     LOG(DFATAL) << "Each call to AllocateBuffer must have a corresponding call "
              :                 << "to FreeBuffer.";
              :     Close(); // Ensure that any previously allocated memory is released.
              :   }
> Same instance of allocator can be used for multiple BBFs. So this doesn't l
Line 201 to 205 are not useful. Will remove them.



-- 
To view, visit http://gerrit.cloudera.org:8080/15683
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9100076f68ea299ddb6ec8bc027cac7a47f5d754
Gerrit-Change-Number: 15683
Gerrit-PatchSet: 11
Gerrit-Owner: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Bankim Bhavsar <ba...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Sat, 09 May 2020 17:03:48 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu

Posted by "Wenzhe Zhou (Code Review)" <ge...@cloudera.org>.
Wenzhe Zhou has uploaded a new patch set (#26). ( http://gerrit.cloudera.org:8080/15683 )

Change subject: IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu
......................................................................

IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu

Defined the BloomFilter class as the wrapper of kudu::BlockBloomFilter.
impala::BloomFilter build runtime bloom filter in kudu::BlockBloomFilter
APIs with FastHash as default hash algorithm.
Removed the duplicated functions from impala::BloomFillter class.
Pushed down bloom filter to Kudu through Kudu clinet API.

Added a new query option ENABLED_RUNTIME_FILTER_TYPES to set enabled
runtime filter types, which only affect Kudu scan node now. By default,
bloom filter is not enabled, only min-max filter will be enabled for
Kudu. With this option, user could enable bloom filter, min-max filter,
or both bloom and min-max runtime filters.

Added new test cases in PlannerTest and end-end runtime_filters test
for pushing down bloom filter to Kudu.
Added test cases to compare the number of rows returned from Kudu
scan when appling different types of runtime filter on same queries.
Updated bloom-filter-benchmark due to the bloom-filter implementation
change.

Bump Kudu version to d652cab17.

Testing:
 - Passed all exhaustive tests.

Performance benchmark:
 - Ran single_node_perf_run.py on TPC-H with scale as 30 for parquet
   and Kudu. Verified that new hash function and bloom-filter
   implementation don't cause regressions for HDFS bloom filters.
   For Kudu, there is one regression for query TPCH-Q9 and there
   are improvement for about 8 queris when appling both bloom and
   min-max filters. The bloom filter reduce the number of rows
   returned from Kudu scan, hence reduce the cost for aggregation
   and hash join. But bloom filter evaluation add extra cost for
   Kudu scan, which offset the gain on aggregation and join.
   Kudu scan need to be optimized for bloom filter in following
   tasks.
 - Ran bloom-filter microbenchmarks and verified that there is no
   regression for Insert/Find/Union functions with or without AVX2
   due to bloom-filter implementation changes. There is small
   performance degradation for Init function, but this function is
   not in hot path.

Change-Id: I9100076f68ea299ddb6ec8bc027cac7a47f5d754
---
M be/CMakeLists.txt
M be/src/benchmarks/bloom-filter-benchmark.cc
M be/src/codegen/gen_ir_descriptions.py
M be/src/exec/filter-context.cc
M be/src/exec/kudu-scanner.cc
M be/src/runtime/raw-value-ir.cc
M be/src/runtime/raw-value.h
M be/src/runtime/raw-value.inline.h
M be/src/runtime/runtime-filter-bank.cc
M be/src/runtime/runtime-filter-ir.cc
M be/src/runtime/runtime-filter.h
M be/src/service/query-options-test.cc
M be/src/service/query-options.cc
M be/src/service/query-options.h
M be/src/util/bloom-filter-ir.cc
M be/src/util/bloom-filter-test.cc
M be/src/util/bloom-filter.cc
M be/src/util/bloom-filter.h
M be/src/util/debug-util.cc
M be/src/util/debug-util.h
M bin/impala-config.sh
M common/thrift/ImpalaInternalService.thrift
M common/thrift/ImpalaService.thrift
M common/thrift/PlanNodes.thrift
M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
A testdata/workloads/functional-planner/queries/PlannerTest/bloom-filter-assignment.test
M testdata/workloads/functional-planner/queries/PlannerTest/kudu-update.test
M testdata/workloads/functional-planner/queries/PlannerTest/kudu.test
M testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-query-options.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpch-kudu.test
A testdata/workloads/functional-query/queries/QueryTest/all_runtime_filters.test
A testdata/workloads/functional-query/queries/QueryTest/diff_runtime_filter_types.test
M testdata/workloads/functional-query/queries/QueryTest/runtime_filters.test
M tests/query_test/test_runtime_filters.py
M tests/query_test/test_spilling.py
36 files changed, 2,024 insertions(+), 661 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/83/15683/26
-- 
To view, visit http://gerrit.cloudera.org:8080/15683
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I9100076f68ea299ddb6ec8bc027cac7a47f5d754
Gerrit-Change-Number: 15683
Gerrit-PatchSet: 26
Gerrit-Owner: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Bankim Bhavsar <ba...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Jim Apple <jb...@apache.org>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>

[Impala-ASF-CR] IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu

Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Todd Lipcon has posted comments on this change. ( http://gerrit.cloudera.org:8080/15683 )

Change subject: IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu
......................................................................


Patch Set 3:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/15683/3/common/thrift/PlanNodes.thrift
File common/thrift/PlanNodes.thrift:

http://gerrit.cloudera.org:8080/#/c/15683/3/common/thrift/PlanNodes.thrift@124
PS3, Line 124:   BLOOM_MIN_MAX = 2
> Right, sortedness in kudu is based on the (compound) primary key. So, min-m
BTW forgot that I took some measurements of the ApplyPredicatePrimitive performance bfeore:

     int8   NOT NULL   (c >= 0 AND c < 2) 3152.2M evals/sec 0.88 cycles/eval
     int16  NOT NULL   (c >= 0 AND c < 2) 3309.6M evals/sec 0.85 cycles/eval
     int32  NOT NULL   (c >= 0 AND c < 2) 3384.0M evals/sec 0.85 cycles/eval
     int64  NOT NULL   (c >= 0 AND c < 2) 1847.6M evals/sec 1.57 cycles/eval
     float  NOT NULL   (c >= 0 AND c < 2) 3268.3M evals/sec 0.88 cycles/eval
     double NOT NULL   (c >= 0 AND c < 2) 2245.2M evals/sec 1.27 cycles/eval



-- 
To view, visit http://gerrit.cloudera.org:8080/15683
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9100076f68ea299ddb6ec8bc027cac7a47f5d754
Gerrit-Change-Number: 15683
Gerrit-PatchSet: 3
Gerrit-Owner: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Bankim Bhavsar <ba...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Fri, 10 Apr 2020 03:56:13 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu

Posted by "Jim Apple (Code Review)" <ge...@cloudera.org>.
Jim Apple has posted comments on this change. ( http://gerrit.cloudera.org:8080/15683 )

Change subject: IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu
......................................................................


Patch Set 14:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/15683/12/be/src/benchmarks/bloom-filter-benchmark.cc
File be/src/benchmarks/bloom-filter-benchmark.cc:

http://gerrit.cloudera.org:8080/#/c/15683/12/be/src/benchmarks/bloom-filter-benchmark.cc@64
PS12, Line 64: // Machine Info: Intel(R) Core(TM) i7-6700 CPU @ 3.40GHz
> Reran the the bloom filter benchmark on my desktop before and after applyin
Sounds good; thanks!



-- 
To view, visit http://gerrit.cloudera.org:8080/15683
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9100076f68ea299ddb6ec8bc027cac7a47f5d754
Gerrit-Change-Number: 15683
Gerrit-PatchSet: 14
Gerrit-Owner: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Bankim Bhavsar <ba...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Jim Apple <jb...@apache.org>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Thu, 14 May 2020 14:02:40 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu

Posted by "Wenzhe Zhou (Code Review)" <ge...@cloudera.org>.
Wenzhe Zhou has posted comments on this change. ( http://gerrit.cloudera.org:8080/15683 )

Change subject: IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu
......................................................................


Patch Set 5:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/15683/4/common/thrift/PlanNodes.thrift
File common/thrift/PlanNodes.thrift:

http://gerrit.cloudera.org:8080/#/c/15683/4/common/thrift/PlanNodes.thrift@124
PS4, Line 124: MIN_MAX = 2
> It's possible that the comma separated list thing is a bad idea and won't w
Query option setting is treated i as a string in impala-shell. We can set the option value with comma in impala-shell. But we will get parsing error in function impala::ParseQueryOptions() since comma is used as separator of a list of query option KV string, like "runtime_filter_min_size=1024, runtime_filter_max_size=65536". If we set "enabled_runtime_filter_type=bloom, min_max", the function parse the string as two KV string  "enabled_runtime_filter_type=bloom," and "min_max". It get parsing error when parsing second KV string "min_max" since there is no "=" in the KV string.  As discussed offline, we will not support comma separated filter types. Instead, rename BLOOM_MIN_MAX to ALL.



-- 
To view, visit http://gerrit.cloudera.org:8080/15683
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9100076f68ea299ddb6ec8bc027cac7a47f5d754
Gerrit-Change-Number: 15683
Gerrit-PatchSet: 5
Gerrit-Owner: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Bankim Bhavsar <ba...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Thu, 23 Apr 2020 06:37:39 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu

Posted by "Thomas Tauber-Marshall (Code Review)" <ge...@cloudera.org>.
Thomas Tauber-Marshall has posted comments on this change. ( http://gerrit.cloudera.org:8080/15683 )

Change subject: IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu
......................................................................


Patch Set 5:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/15683/4/common/thrift/PlanNodes.thrift
File common/thrift/PlanNodes.thrift:

http://gerrit.cloudera.org:8080/#/c/15683/4/common/thrift/PlanNodes.thrift@124
PS4, Line 124: MIN_MAX = 2
> The code for showing the current settings in impala-shell are auto generate
It's possible that the comma separated list thing is a bad idea and won't work, but I'm not sure why the shell would be affected by it. I think it should be possible to just treat is as a string from the perspective of impala-shell, its only necessary to break it up into a list internally in Impala.



-- 
To view, visit http://gerrit.cloudera.org:8080/15683
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9100076f68ea299ddb6ec8bc027cac7a47f5d754
Gerrit-Change-Number: 15683
Gerrit-PatchSet: 5
Gerrit-Owner: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Bankim Bhavsar <ba...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Mon, 20 Apr 2020 18:56:11 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu

Posted by "Bankim Bhavsar (Code Review)" <ge...@cloudera.org>.
Bankim Bhavsar has posted comments on this change. ( http://gerrit.cloudera.org:8080/15683 )

Change subject: IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu
......................................................................


Patch Set 3:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/15683/3/common/thrift/PlanNodes.thrift
File common/thrift/PlanNodes.thrift:

http://gerrit.cloudera.org:8080/#/c/15683/3/common/thrift/PlanNodes.thrift@124
PS3, Line 124:   BLOOM_MIN_MAX = 2
> Instead of 'are not sorted(partially sorted within each tablet)',  I meant 
This is my understanding, I'll check further within Kudu team.

When the predicate pushed to Kudu includes partition key columns then pruning using min-max filter will be done first eliminating bunch of files containing column values namely cfiles.

When the predicate doesn't include partition key columns then all column values will need to be scanned and as per current implementation for every column value first Bloom filter will be checked before checking for range/min-max values[1]. So in case of non-partition key columns whether min-max filter will help depends upon whether min-max filter values are simply min and max of the values inserted in the Bloom filter or not derived from the values inserted in the Bloom filter. For the former case, I agree supplying min-max filter with Bloom filter won't help.

[1] https://github.com/apache/kudu/blob/master/src/kudu/common/column_predicate.h#L332



-- 
To view, visit http://gerrit.cloudera.org:8080/15683
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9100076f68ea299ddb6ec8bc027cac7a47f5d754
Gerrit-Change-Number: 15683
Gerrit-PatchSet: 3
Gerrit-Owner: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Bankim Bhavsar <ba...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Fri, 10 Apr 2020 03:23:19 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu

Posted by "Wenzhe Zhou (Code Review)" <ge...@cloudera.org>.
Wenzhe Zhou has uploaded a new patch set (#5). ( http://gerrit.cloudera.org:8080/15683 )

Change subject: IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu
......................................................................

IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu

Defined the BloomFilter class as the wrapper of Kudu BlockBloomFilter,
which build runtime bloom filter in Kudu BlockBloomFilter APIs with
FastHash as default hash algorithm. Removed the duplicated functions
from BloomFillter class.
Added a query option to set enabled runtime filter types. Pushed
down runtime filters to Kudu through Kudu client APIs.
Added new test cases in PlannerTest and end-end runtime_filters test
for pushing down bloom filter to Kudu.

Testing:
Passed end-end runtime filter tests.
Passed frontend Planner tests.

Change-Id: I9100076f68ea299ddb6ec8bc027cac7a47f5d754
---
M be/CMakeLists.txt
M be/src/codegen/gen_ir_descriptions.py
M be/src/exec/filter-context.cc
M be/src/exec/kudu-scanner.cc
M be/src/runtime/raw-value-ir.cc
M be/src/runtime/raw-value.h
M be/src/runtime/raw-value.inline.h
M be/src/runtime/runtime-filter-bank.cc
M be/src/runtime/runtime-filter-bank.h
M be/src/runtime/runtime-filter-ir.cc
M be/src/runtime/runtime-filter.h
M be/src/service/query-options-test.cc
M be/src/service/query-options.cc
M be/src/service/query-options.h
M be/src/util/bloom-filter-ir.cc
M be/src/util/bloom-filter.cc
M be/src/util/bloom-filter.h
A be/src/util/bloom-filter.inline.h
M be/src/util/debug-util.cc
M be/src/util/debug-util.h
M be/src/util/hash-util.h
M common/thrift/ImpalaInternalService.thrift
M common/thrift/ImpalaService.thrift
M common/thrift/PlanNodes.thrift
M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
M testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-query-options.test
M tests/query_test/test_runtime_filters.py
28 files changed, 834 insertions(+), 362 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/83/15683/5
-- 
To view, visit http://gerrit.cloudera.org:8080/15683
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I9100076f68ea299ddb6ec8bc027cac7a47f5d754
Gerrit-Change-Number: 15683
Gerrit-PatchSet: 5
Gerrit-Owner: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Bankim Bhavsar <ba...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>

[Impala-ASF-CR] IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu

Posted by "Wenzhe Zhou (Code Review)" <ge...@cloudera.org>.
Wenzhe Zhou has posted comments on this change. ( http://gerrit.cloudera.org:8080/15683 )

Change subject: IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu
......................................................................


Patch Set 9:

(17 comments)

http://gerrit.cloudera.org:8080/#/c/15683/9/be/CMakeLists.txt
File be/CMakeLists.txt:

http://gerrit.cloudera.org:8080/#/c/15683/9/be/CMakeLists.txt@234
PS9, Line 234:   "-DKUDU_HEADERS_NO_STUBS" 
> nit formatting: trailing whitespace and you can wrap this into the rest of 
will fix as suggested.


http://gerrit.cloudera.org:8080/#/c/15683/9/be/src/exec/filter-context.cc
File be/src/exec/filter-context.cc:

http://gerrit.cloudera.org:8080/#/c/15683/9/be/src/exec/filter-context.cc@91
PS9, Line 91:     //local_bloom_filter->Insert(val, expr_eval->root().type());
> I guess this was left by accident?
Right, will remove these two lines.


http://gerrit.cloudera.org:8080/#/c/15683/9/be/src/exec/filter-context.cc@106
PS9, Line 106: // An example of the generated code for TPCH-Q2: RF002 -> n_regionkey
> This needs to be updated (though of course it will only change slightly). I
will add LlvmCodeGen::Print(*fn) at the end of function to dump the codegen functions, then update the sample in the functions' comments.


http://gerrit.cloudera.org:8080/#/c/15683/9/be/src/exec/kudu-scanner.cc
File be/src/exec/kudu-scanner.cc:

http://gerrit.cloudera.org:8080/#/c/15683/9/be/src/exec/kudu-scanner.cc@232
PS9, Line 232: }
             :       else if
> nit: formatting ('else if' should go on the same line as prior '}')
Will fix it.


http://gerrit.cloudera.org:8080/#/c/15683/9/be/src/exec/kudu-scanner.cc@234
PS9, Line 234:         continue;
> Brief comment, eg. 'This filter won't actually remove any rows so we don't 
will add comments


http://gerrit.cloudera.org:8080/#/c/15683/9/be/src/exec/kudu-scanner.cc@236
PS9, Line 236: else
> nit: it doesn't actually change how the code works, but it would be better 
will remove "else"


http://gerrit.cloudera.org:8080/#/c/15683/9/be/src/exec/kudu-scanner.cc@238
PS9, Line 238: if (filter != nullptr) {
> I think we can save ourselves some duplication and an indention level by ch
Agree, will change the code.


http://gerrit.cloudera.org:8080/#/c/15683/9/be/src/exec/kudu-scanner.cc@239
PS9, Line 239: auto it = ctx.filter->filter_desc().planid_to_target_ndx.find(
             :               scan_node_->id());
             :           const TRuntimeFilterTargetDesc &target_desc =
             :               ctx.filter->filter_desc().targets[it->second];
             :           const string &col_name = target_desc.kudu_col_name;
             :           DCHECK(col_name != "");
> This is duplicated below, I think we can eliminate the duplication by movin
That's right. Will remove the duplication.


http://gerrit.cloudera.org:8080/#/c/15683/9/be/src/util/bloom-filter.h
File be/src/util/bloom-filter.h:

http://gerrit.cloudera.org:8080/#/c/15683/9/be/src/util/bloom-filter.h@53
PS9, Line 53: class BloomFilterBufferAllocator;
> I don't think this is necessary
will remove it.


http://gerrit.cloudera.org:8080/#/c/15683/9/be/src/util/bloom-filter.h@73
PS9, Line 73: BloomFilterBufferAllocator
> nit: maybe name this ImpalaBloomFilterBufferAllocator to ensure there's no 
Will rename it as suggested


http://gerrit.cloudera.org:8080/#/c/15683/9/be/src/util/bloom-filter.h@91
PS9, Line 91:   std::shared_ptr<kudu::BlockBloomFilterBufferAllocatorIf> Clone() const override {
> If this is something that's really only used in Kudu's internal testing and
Will add LOG(FATAL) and simplify the logic of allocate/close functions.


http://gerrit.cloudera.org:8080/#/c/15683/9/be/src/util/bloom-filter.h@111
PS9, Line 111: /// A BloomFilter stores sets of items and offers a query operation indicating whether or
> This class comment could probably use some updating to reflect the fact tha
Will update the comments.


http://gerrit.cloudera.org:8080/#/c/15683/9/be/src/util/bloom-filter.h@171
PS9, Line 171:   void Insert(void* val, const ColumnType& col_type) noexcept;
> Is this version of Insert() used anywhere? Same with the equivalent Find() 
Will remove them


http://gerrit.cloudera.org:8080/#/c/15683/9/be/src/util/bloom-filter.h@241
PS9, Line 241: HashAlgorithm hash_algorithm_;
> As far as I can tell, this isn't actually getting used anywhere, and I don'
Will remove it.


http://gerrit.cloudera.org:8080/#/c/15683/9/be/src/util/bloom-filter.cc
File be/src/util/bloom-filter.cc:

http://gerrit.cloudera.org:8080/#/c/15683/9/be/src/util/bloom-filter.cc@205
PS9, Line 205:   if (is_allocated_) Close();
> I suspect its probably supposed to be guaranteed that FreeBuffer() is alway
Will add LOG(DFATAL).


http://gerrit.cloudera.org:8080/#/c/15683/9/be/src/util/bloom-filter.cc@226
PS9, Line 226:   Close(); // Ensure that any previously allocated memory is released.
> I wonder if this is really necessary - I would hope that Kudu provides a gu
Will LOG(DFALTAL) if is allocated.


http://gerrit.cloudera.org:8080/#/c/15683/9/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java
File fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java:

http://gerrit.cloudera.org:8080/#/c/15683/9/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java@723
PS9, Line 723: if (slotRef == null || slotRef.getDesc().getColumn() == null
             :               || (targetExpr instanceof CastExpr)
             :               || filter.getExprCompOp() == Operator.NOT_DISTINCT) {
             :             continue;
             :           }
> This is almost identical to the 'if' below. I think we can reduce code dupl
They are not identical. Min-max filter support integer casting, Bloom filter don't  allow any casting.



-- 
To view, visit http://gerrit.cloudera.org:8080/15683
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9100076f68ea299ddb6ec8bc027cac7a47f5d754
Gerrit-Change-Number: 15683
Gerrit-PatchSet: 9
Gerrit-Owner: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Bankim Bhavsar <ba...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Thu, 07 May 2020 18:13:37 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15683 )

Change subject: IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu
......................................................................


Patch Set 2:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/15683/2/be/src/exec/kudu-scanner.cc
File be/src/exec/kudu-scanner.cc:

http://gerrit.cloudera.org:8080/#/c/15683/2/be/src/exec/kudu-scanner.cc@269
PS2, Line 269:             const ColumnType& col_type = ColumnType::FromThrift(target_desc.kudu_col_type);
line too long (91 > 90)


http://gerrit.cloudera.org:8080/#/c/15683/2/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java
File fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java:

http://gerrit.cloudera.org:8080/#/c/15683/2/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java@713
PS2, Line 713:           // Kudu only supports targeting a single column, not general exprs, so the target
line too long (91 > 90)


http://gerrit.cloudera.org:8080/#/c/15683/2/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java@729
PS2, Line 729:           // Kudu only supports targeting a single column, not general exprs, so the target
line too long (91 > 90)


http://gerrit.cloudera.org:8080/#/c/15683/2/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java@732
PS2, Line 732:           // Kudu also cannot currently return nulls if a filter is applied, so it does not
line too long (91 > 90)



-- 
To view, visit http://gerrit.cloudera.org:8080/15683
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9100076f68ea299ddb6ec8bc027cac7a47f5d754
Gerrit-Change-Number: 15683
Gerrit-PatchSet: 2
Gerrit-Owner: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Reviewer: Bankim Bhavsar <ba...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Comment-Date: Wed, 08 Apr 2020 04:50:56 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu

Posted by "Bankim Bhavsar (Code Review)" <ge...@cloudera.org>.
Bankim Bhavsar has posted comments on this change. ( http://gerrit.cloudera.org:8080/15683 )

Change subject: IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu
......................................................................


Patch Set 11:

(2 comments)

Apologize for not taking a look earlier. Lot of the changes looked specific to Impala.
I mainly looked at Impala BBF code that was updated to invoke one from kudu-util.

http://gerrit.cloudera.org:8080/#/c/15683/11/be/src/util/bloom-filter.cc
File be/src/util/bloom-filter.cc:

http://gerrit.cloudera.org:8080/#/c/15683/11/be/src/util/bloom-filter.cc@71
PS11, Line 71: if (directory_in_size == 0)
Could you clarify with a comment in what case would directory_in_size be 0 since this Init variant should really be used for deserialization case.


http://gerrit.cloudera.org:8080/#/c/15683/11/be/src/util/bloom-filter.cc@201
PS11, Line 201:   if (is_allocated_) {
              :     LOG(DFATAL) << "Each call to AllocateBuffer must have a corresponding call "
              :                 << "to FreeBuffer.";
              :     Close(); // Ensure that any previously allocated memory is released.
              :   }
Same instance of allocator can be used for multiple BBFs. So this doesn't look right to me.

With that in mind, I don't see need for is_allocated_member variable. ImpalaBloomFilterBufferAllocator should be a simple wrapper on top of Impala's default buffer pool and should be implemented as a singleton.



-- 
To view, visit http://gerrit.cloudera.org:8080/15683
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9100076f68ea299ddb6ec8bc027cac7a47f5d754
Gerrit-Change-Number: 15683
Gerrit-PatchSet: 11
Gerrit-Owner: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Bankim Bhavsar <ba...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Fri, 08 May 2020 00:11:08 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15683 )

Change subject: IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu
......................................................................


Patch Set 26:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/6224/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/15683
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9100076f68ea299ddb6ec8bc027cac7a47f5d754
Gerrit-Change-Number: 15683
Gerrit-PatchSet: 26
Gerrit-Owner: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Bankim Bhavsar <ba...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Jim Apple <jb...@apache.org>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Fri, 05 Jun 2020 18:34:07 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu

Posted by "Wenzhe Zhou (Code Review)" <ge...@cloudera.org>.
Wenzhe Zhou has uploaded a new patch set (#14). ( http://gerrit.cloudera.org:8080/15683 )

Change subject: IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu
......................................................................

IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu

Defined the BloomFilter class as the wrapper of kudu::BlockBloomFilter.
impala::BloomFilter build runtime bloom filter in kudu::BlockBloomFilter
APIs with FastHash as default hash algorithm.
Removed the duplicated functions from impala::BloomFillter class.
Pushed down bloom filter to Kudu through Kudu clinet API.
Added a new query option to set enabled runtime filter types, which
only affect Kudu scan node now. By default, both bloom filter and
min-max filter will be enabled for Kudu.
Added new test cases in PlannerTest and end-end runtime_filters test
for pushing down bloom filter to Kudu.
Updated bloom-filter-benchmark for the bloom-filter implementation
change.

Testing:
 - Passed test_kudu.py
 - Passed end-end test_runtime_filters.py.
 - Passed frontend Planner tests.
 - Ran single_node_perf_run.py on TPC-H with scale as 30 for parquet
   and Kudu. Verified that new hash function and bloom-filter
   implementation don't cause regressions for HDFS bloom filters.
 - Ran bloom-filter-benchmark and verified there is no regression
   due to bloom-filter implementation changes.

Change-Id: I9100076f68ea299ddb6ec8bc027cac7a47f5d754
---
M be/CMakeLists.txt
M be/src/benchmarks/bloom-filter-benchmark.cc
M be/src/codegen/gen_ir_descriptions.py
M be/src/exec/filter-context.cc
M be/src/exec/kudu-scanner.cc
M be/src/runtime/raw-value-ir.cc
M be/src/runtime/raw-value.h
M be/src/runtime/raw-value.inline.h
M be/src/runtime/runtime-filter-bank.cc
M be/src/runtime/runtime-filter-bank.h
M be/src/runtime/runtime-filter-ir.cc
M be/src/runtime/runtime-filter.h
M be/src/service/query-options-test.cc
M be/src/service/query-options.cc
M be/src/service/query-options.h
M be/src/util/bloom-filter-ir.cc
M be/src/util/bloom-filter-test.cc
M be/src/util/bloom-filter.cc
M be/src/util/bloom-filter.h
M be/src/util/debug-util.cc
M be/src/util/debug-util.h
M be/src/util/hash-util.h
M common/thrift/ImpalaInternalService.thrift
M common/thrift/ImpalaService.thrift
M common/thrift/PlanNodes.thrift
M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
M testdata/workloads/functional-planner/queries/PlannerTest/kudu-update.test
M testdata/workloads/functional-planner/queries/PlannerTest/kudu.test
M testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-query-options.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpch-kudu.test
A testdata/workloads/functional-query/queries/QueryTest/all_runtime_filters.test
M testdata/workloads/functional-query/queries/QueryTest/runtime_filters.test
M tests/query_test/test_runtime_filters.py
34 files changed, 1,420 insertions(+), 633 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/83/15683/14
-- 
To view, visit http://gerrit.cloudera.org:8080/15683
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I9100076f68ea299ddb6ec8bc027cac7a47f5d754
Gerrit-Change-Number: 15683
Gerrit-PatchSet: 14
Gerrit-Owner: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Bankim Bhavsar <ba...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Jim Apple <jb...@apache.org>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>

[Impala-ASF-CR] IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu

Posted by "Wenzhe Zhou (Code Review)" <ge...@cloudera.org>.
Wenzhe Zhou has posted comments on this change. ( http://gerrit.cloudera.org:8080/15683 )

Change subject: IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu
......................................................................


Patch Set 5:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/15683/4/be/src/util/bloom-filter.h
File be/src/util/bloom-filter.h:

http://gerrit.cloudera.org:8080/#/c/15683/4/be/src/util/bloom-filter.h@80
PS4, Line 80: ublic:
> unnecessary
It's defined to support the "clone" virtual function in base class.


http://gerrit.cloudera.org:8080/#/c/15683/4/be/src/util/bloom-filter.h@91
PS4, Line 91:   std::shared_ptr<kudu::BlockBloomFilterBufferAllocatorIf> Clone() const override {
> I don't understand what this is supposed to do - why wouldn't the cloned al
Yes, we cannot get the buffer pool from runtime filter bank without calling context.


http://gerrit.cloudera.org:8080/#/c/15683/4/common/thrift/ImpalaService.thrift
File common/thrift/ImpalaService.thrift:

http://gerrit.cloudera.org:8080/#/c/15683/4/common/thrift/ImpalaService.thrift@521
PS4, Line 521:  
> Seems a little confusing, since users don't specify the numbers they use "B
Fixed


http://gerrit.cloudera.org:8080/#/c/15683/4/common/thrift/ImpalaService.thrift@524
PS4, Line 524: R_TYP
> We'll eventually want to apply min-max filters in other places like HDFS sc
Fixed


http://gerrit.cloudera.org:8080/#/c/15683/4/common/thrift/PlanNodes.thrift
File common/thrift/PlanNodes.thrift:

http://gerrit.cloudera.org:8080/#/c/15683/4/common/thrift/PlanNodes.thrift@124
PS4, Line 124: MIN_MAX = 2
> We probably eventually want to add in-list filters as well, and we won't wa
The code for showing the current settings in impala-shell are auto generated. Still try to figure out how to show the setting for enabled filter type as comma separated list of filter type.



-- 
To view, visit http://gerrit.cloudera.org:8080/15683
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9100076f68ea299ddb6ec8bc027cac7a47f5d754
Gerrit-Change-Number: 15683
Gerrit-PatchSet: 5
Gerrit-Owner: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Bankim Bhavsar <ba...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Fri, 17 Apr 2020 18:57:49 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu

Posted by "Wenzhe Zhou (Code Review)" <ge...@cloudera.org>.
Wenzhe Zhou has uploaded a new patch set (#3). ( http://gerrit.cloudera.org:8080/15683 )

Change subject: IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu
......................................................................

IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu

Defined the BloomFilter class as the wrapper of Kudu BlockBloomFilter,
which build runtime bloom filter in Kudu BlockBloomFilter APIs with
FastHash as default hash algorithm. Removed the duplicated functions
from BloomFillter class.
Added a query option to set runtime filter scheme for Kudu. Pushed
down runtime filters to Kudu through Kudu client APIs.
Added new test cases in PlannerTest and end-end runtime_filters test
for pushing down bloom filter to Kudu.

Testing:
Passed end-end runtime filter tests with codegen disabled.
Passed frontend Planner tests.

Change-Id: I9100076f68ea299ddb6ec8bc027cac7a47f5d754
---
M be/CMakeLists.txt
M be/src/exec/filter-context.cc
M be/src/exec/kudu-scanner.cc
M be/src/runtime/raw-value.h
M be/src/runtime/runtime-filter-bank.cc
M be/src/runtime/runtime-filter-bank.h
M be/src/runtime/runtime-filter-ir.cc
M be/src/runtime/runtime-filter.h
M be/src/service/query-options-test.cc
M be/src/service/query-options.cc
M be/src/service/query-options.h
M be/src/util/bloom-filter-ir.cc
M be/src/util/bloom-filter.cc
M be/src/util/bloom-filter.h
A be/src/util/bloom-filter.inline.h
M be/src/util/debug-util.cc
M be/src/util/debug-util.h
M be/src/util/hash-util.h
M common/thrift/ImpalaInternalService.thrift
M common/thrift/ImpalaService.thrift
M common/thrift/PlanNodes.thrift
M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java
M testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-query-options.test
M tests/query_test/test_runtime_filters.py
24 files changed, 624 insertions(+), 360 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/83/15683/3
-- 
To view, visit http://gerrit.cloudera.org:8080/15683
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I9100076f68ea299ddb6ec8bc027cac7a47f5d754
Gerrit-Change-Number: 15683
Gerrit-PatchSet: 3
Gerrit-Owner: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Reviewer: Bankim Bhavsar <ba...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>

[Impala-ASF-CR] IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15683 )

Change subject: IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu
......................................................................


Patch Set 24:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/6202/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/15683
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9100076f68ea299ddb6ec8bc027cac7a47f5d754
Gerrit-Change-Number: 15683
Gerrit-PatchSet: 24
Gerrit-Owner: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Bankim Bhavsar <ba...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Jim Apple <jb...@apache.org>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Wed, 03 Jun 2020 14:24:49 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15683 )

Change subject: IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu
......................................................................


Patch Set 7:

Build Failed 

https://jenkins.impala.io/job/gerrit-code-review-checks/5877/ : Initial code review checks failed. See linked job for details on the failure.


-- 
To view, visit http://gerrit.cloudera.org:8080/15683
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9100076f68ea299ddb6ec8bc027cac7a47f5d754
Gerrit-Change-Number: 15683
Gerrit-PatchSet: 7
Gerrit-Owner: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Bankim Bhavsar <ba...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Sat, 25 Apr 2020 05:07:21 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu

Posted by "Wenzhe Zhou (Code Review)" <ge...@cloudera.org>.
Wenzhe Zhou has posted comments on this change. ( http://gerrit.cloudera.org:8080/15683 )

Change subject: IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu
......................................................................


Patch Set 3:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/15683/3/common/thrift/PlanNodes.thrift
File common/thrift/PlanNodes.thrift:

http://gerrit.cloudera.org:8080/#/c/15683/3/common/thrift/PlanNodes.thrift@124
PS3, Line 124:   BLOOM_MIN_MAX = 2
> Never mind the question above 'In that case would the bloom filter be redun
Thanks for your comments. I will try to add some heuristic in planner when assign filter to Kudu scanner node. By default, we assign both types of filter to Kudu.


http://gerrit.cloudera.org:8080/#/c/15683/3/testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-query-options.test
File testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-query-options.test:

http://gerrit.cloudera.org:8080/#/c/15683/3/testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-query-options.test@667
PS3, Line 667: # RUNTIME_FILTER_SCHEME_KUDU is set as BLOOM, Bloom filter is assigned
> One additional useful test is with a SEMI-JOIN, since it is a common patter
Will add the suggested case.


http://gerrit.cloudera.org:8080/#/c/15683/3/testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-query-options.test@722
PS3, Line 722: RUNTIME_FILTER_SCHEME_KUDU=BLOOM_MIN_MAX
> Suggest setting the explain_level=2 such that the runtime filter shows what
Will change the explain_level=2 as suggested.



-- 
To view, visit http://gerrit.cloudera.org:8080/15683
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9100076f68ea299ddb6ec8bc027cac7a47f5d754
Gerrit-Change-Number: 15683
Gerrit-PatchSet: 3
Gerrit-Owner: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Bankim Bhavsar <ba...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Fri, 10 Apr 2020 18:59:39 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15683 )

Change subject: IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu
......................................................................


Patch Set 9:

Build Failed 

https://jenkins.impala.io/job/gerrit-code-review-checks/5939/ : Initial code review checks failed. See linked job for details on the failure.


-- 
To view, visit http://gerrit.cloudera.org:8080/15683
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9100076f68ea299ddb6ec8bc027cac7a47f5d754
Gerrit-Change-Number: 15683
Gerrit-PatchSet: 9
Gerrit-Owner: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Bankim Bhavsar <ba...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Sun, 03 May 2020 07:08:21 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu

Posted by "Aman Sinha (Code Review)" <ge...@cloudera.org>.
Aman Sinha has posted comments on this change. ( http://gerrit.cloudera.org:8080/15683 )

Change subject: IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu
......................................................................


Patch Set 3:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/15683/3/common/thrift/PlanNodes.thrift
File common/thrift/PlanNodes.thrift:

http://gerrit.cloudera.org:8080/#/c/15683/3/common/thrift/PlanNodes.thrift@124
PS3, Line 124:   BLOOM_MIN_MAX = 2
> Thanks Bankim, Todd for the.illustrative explanation.   Agreed, for the (co
Never mind the question above 'In that case would the bloom filter be redundant ?'.   I must have been thinking of the FALSE case for the first predicate.



-- 
To view, visit http://gerrit.cloudera.org:8080/15683
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9100076f68ea299ddb6ec8bc027cac7a47f5d754
Gerrit-Change-Number: 15683
Gerrit-PatchSet: 3
Gerrit-Owner: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Bankim Bhavsar <ba...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Fri, 10 Apr 2020 14:49:34 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu

Posted by "Wenzhe Zhou (Code Review)" <ge...@cloudera.org>.
Wenzhe Zhou has uploaded a new patch set (#23). ( http://gerrit.cloudera.org:8080/15683 )

Change subject: IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu
......................................................................

IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu

Defined the BloomFilter class as the wrapper of kudu::BlockBloomFilter.
impala::BloomFilter build runtime bloom filter in kudu::BlockBloomFilter
APIs with FastHash as default hash algorithm.
Removed the duplicated functions from impala::BloomFillter class.
Pushed down bloom filter to Kudu through Kudu clinet API.

Added a new query option ENABLED_RUNTIME_FILTER_TYPES to set enabled
runtime filter types, which only affect Kudu scan node now. By default,
bloom filter is not enabled, only min-max filter will be enabled for
Kudu. With this option, user could enable bloom filter, min-max filter,
or both bloom and min-max runtime filters.

Added new test cases in PlannerTest and end-end runtime_filters test
for pushing down bloom filter to Kudu.
Added test cases to compare the number of rows returned from Kudu
scan when appling different types of runtime filter on same queries.
Updated bloom-filter-benchmark due to the bloom-filter implementation
change.

Bump Kudu version to d652cab17.

Testing:
 - Passed all exhaustive tests.

Performance benchmark:
 - Ran single_node_perf_run.py on TPC-H with scale as 30 for parquet
   and Kudu. Verified that new hash function and bloom-filter
   implementation don't cause regressions for HDFS bloom filters.
   For Kudu, there is one regression for query TPCH-Q9 and there
   are improvement for about 8 queris when appling both bloom and
   min-max filters. The bloom filter reduce the number of rows
   returned from Kudu scan, hence reduce the cost for aggregation
   and hash join. But bloom filter evaluation add extra cost for
   Kudu scan, which offset the gain on aggregation and join.
   Kudu scan need to be optimized for bloom filter in following
   tasks.
 - Ran bloom-filter microbenchmarks and verified that there is no
   regression for Insert/Find/Union functions with or without AVX2
   due to bloom-filter implementation changes. There is small
   performance degration for Init function, but this function is not
   in hot path.

Change-Id: I9100076f68ea299ddb6ec8bc027cac7a47f5d754
---
M be/CMakeLists.txt
M be/src/benchmarks/bloom-filter-benchmark.cc
M be/src/codegen/gen_ir_descriptions.py
M be/src/exec/filter-context.cc
M be/src/exec/kudu-scanner.cc
M be/src/runtime/raw-value-ir.cc
M be/src/runtime/raw-value.h
M be/src/runtime/raw-value.inline.h
M be/src/runtime/runtime-filter-bank.cc
M be/src/runtime/runtime-filter-ir.cc
M be/src/runtime/runtime-filter.h
M be/src/service/query-options-test.cc
M be/src/service/query-options.cc
M be/src/service/query-options.h
M be/src/util/bloom-filter-ir.cc
M be/src/util/bloom-filter-test.cc
M be/src/util/bloom-filter.cc
M be/src/util/bloom-filter.h
M be/src/util/debug-util.cc
M be/src/util/debug-util.h
M bin/impala-config.sh
M common/thrift/ImpalaInternalService.thrift
M common/thrift/ImpalaService.thrift
M common/thrift/PlanNodes.thrift
M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
A testdata/workloads/functional-planner/queries/PlannerTest/bloom-filter-assignment.test
M testdata/workloads/functional-planner/queries/PlannerTest/kudu-update.test
M testdata/workloads/functional-planner/queries/PlannerTest/kudu.test
M testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-query-options.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpch-kudu.test
A testdata/workloads/functional-query/queries/QueryTest/all_runtime_filters.test
A testdata/workloads/functional-query/queries/QueryTest/diff_runtime_filter_types.test
M testdata/workloads/functional-query/queries/QueryTest/runtime_filters.test
M tests/query_test/test_runtime_filters.py
M tests/query_test/test_spilling.py
36 files changed, 2,017 insertions(+), 651 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/83/15683/23
-- 
To view, visit http://gerrit.cloudera.org:8080/15683
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I9100076f68ea299ddb6ec8bc027cac7a47f5d754
Gerrit-Change-Number: 15683
Gerrit-PatchSet: 23
Gerrit-Owner: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Bankim Bhavsar <ba...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Jim Apple <jb...@apache.org>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>

[Impala-ASF-CR] IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu

Posted by "Thomas Tauber-Marshall (Code Review)" <ge...@cloudera.org>.
Thomas Tauber-Marshall has posted comments on this change. ( http://gerrit.cloudera.org:8080/15683 )

Change subject: IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu
......................................................................


Patch Set 22:

(23 comments)

The patch is looking pretty good, thanks for all the work on this!

Mostly just some nits and additional testing left.

http://gerrit.cloudera.org:8080/#/c/15683/22/be/src/codegen/gen_ir_descriptions.py
File be/src/codegen/gen_ir_descriptions.py:

http://gerrit.cloudera.org:8080/#/c/15683/22/be/src/codegen/gen_ir_descriptions.py@204
PS22, Line 204: ["RAW_VALUE_GET_HASH_VALUE",
I don't think this is being used anywhere anymore, so it can be removed.


http://gerrit.cloudera.org:8080/#/c/15683/22/be/src/exec/kudu-scanner.cc
File be/src/exec/kudu-scanner.cc:

http://gerrit.cloudera.org:8080/#/c/15683/22/be/src/exec/kudu-scanner.cc@247
PS22, Line 247: NULL
nit: we prefer nullptr, here and elsewhere


http://gerrit.cloudera.org:8080/#/c/15683/22/be/src/kudu/util/block_bloom_filter.h
File be/src/kudu/util/block_bloom_filter.h:

http://gerrit.cloudera.org:8080/#/c/15683/22/be/src/kudu/util/block_bloom_filter.h@146
PS22, Line 146:     // It's possible that log_heap_size is less than kLogBucketWordBits in unit test.
We definitely want to avoid making any changes to the be/src/kudu directory if possible, since it makes upgrading the newer versions trickier. Is it possible to rewrite the tests to not trigger this?


http://gerrit.cloudera.org:8080/#/c/15683/22/be/src/runtime/raw-value.h
File be/src/runtime/raw-value.h:

http://gerrit.cloudera.org:8080/#/c/15683/22/be/src/runtime/raw-value.h@86
PS22, Line 86: GetHashValueFastHash32
I think you mean 'GetHashValueFastHash', i.e. this function doesn't return a 32-bit value.


http://gerrit.cloudera.org:8080/#/c/15683/22/be/src/runtime/raw-value.h@93
PS22, Line 93: GetHashValueFastHash32NonNull
Same here - this doesn't return a 32-bit value, so I think you meant to say GetHashValueFastHashNonNull


http://gerrit.cloudera.org:8080/#/c/15683/22/be/src/util/bloom-filter-test.cc
File be/src/util/bloom-filter-test.cc:

http://gerrit.cloudera.org:8080/#/c/15683/22/be/src/util/bloom-filter-test.cc@34
PS22, Line 34: // kudu::BlockBloomFilter::kCpu is a static variable and is initialized once.
             : // To temporarily disable AVX2 for Bloom Filter in runtime testing, set flag
             : // disable_blockbloomfilter_avx2 as true. See kudu::BlockBloomFilter::has_avx2().
             : // kudu::BlockBloomFilter::has_avx2() is called by kudu::BlockBloomFilter
             : // constructor and kudu::BlockBloomFilter::OrEqualArrayInternal(). Impala
             : // testing code should set this flag before creating impala::BloomFilter object
             : // which affect impala::BloomFilter::Insert() and impala::BloomFilter::Find(),
             : // or before calling impala::BloomFilter::or().
             : // This flag has no effect if the target CPU doesn't support AVX2.
This comment is a lot longer than necessary. Probably fine to just say something like "This flag is used in Kudu to temporarily disable avx2 support for testing purposes"


http://gerrit.cloudera.org:8080/#/c/15683/22/be/src/util/bloom-filter.h
File be/src/util/bloom-filter.h:

http://gerrit.cloudera.org:8080/#/c/15683/22/be/src/util/bloom-filter.h@40
PS22, Line 40: class BlockBloomFilter;
             : class BlockBloomFilterBufferAllocatorIf;
             : class Slice;
nit: I think currently as written none of these are needed, since Slice isn't used here and you're importing block_bloom_filter anyways.


http://gerrit.cloudera.org:8080/#/c/15683/22/be/src/util/bloom-filter.h@70
PS22, Line 70: class ImpalaBloomFilterBufferAllocator : public kudu::BlockBloomFilterBufferAllocatorIf {
I think this is now only referenced in bloom-filter.cc, so we can just move the definition there.


http://gerrit.cloudera.org:8080/#/c/15683/22/be/src/util/bloom-filter.h@210
PS22, Line 210:  *
nit: formatting


http://gerrit.cloudera.org:8080/#/c/15683/22/be/src/util/bloom-filter.h@213
PS22, Line 213: /// Buffer allocator is owned by the bloom filter.
This comment seems weird. Maybe say something like "used by kudu to allocate memory for 'block_bloom_filter_'


http://gerrit.cloudera.org:8080/#/c/15683/22/be/src/util/bloom-filter.h@215
PS22, Line 215: e
nit: 'E'

and please leave an extra line between this and the definition above


http://gerrit.cloudera.org:8080/#/c/15683/22/be/src/util/bloom-filter.cc
File be/src/util/bloom-filter.cc:

http://gerrit.cloudera.org:8080/#/c/15683/22/be/src/util/bloom-filter.cc@35
PS22, Line 35: #include "util/kudu-status-util.h"
nit: please alphabetize


http://gerrit.cloudera.org:8080/#/c/15683/22/be/src/util/bloom-filter.cc@49
PS22, Line 49: delete
In general, we try to avoid using 'delete' as its error prone.

I'm not sure if there's a reason why it needs to be a pointer and not just a be a regular member variable of BloomFilter, but if there is then of course you can use unique_ptr.


http://gerrit.cloudera.org:8080/#/c/15683/22/be/src/util/bloom-filter.cc@52
PS22, Line 52: // Kudu only support FastHash for BlockBloomFilter.
             : // Since Fasthash is strictly better than Murmur Hash2, we do not
             : // support Murmur Hash2 algorithm for Bloom filter.
How about we move this information to the class header on BloomFilter in bloom-filter.h, i.e. when you mention that its a thin wrapper around BlockBloomFilter also mention that we use FAST_HASH.


http://gerrit.cloudera.org:8080/#/c/15683/22/be/src/util/bloom-filter.cc@57
PS22, Line 57: hash_seed_
I don't think this is ever actually used, so it can probably just be removed.


http://gerrit.cloudera.org:8080/#/c/15683/22/be/src/util/bloom-filter.cc@176
PS22, Line 176:   
nit: formating


http://gerrit.cloudera.org:8080/#/c/15683/22/be/src/util/hash-util.h
File be/src/util/hash-util.h:

http://gerrit.cloudera.org:8080/#/c/15683/22/be/src/util/hash-util.h@30
PS22, Line 30: enum HashAlgorithm {
I don't think this is used anywhere, so lets remove it.


http://gerrit.cloudera.org:8080/#/c/15683/22/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java
File fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java:

http://gerrit.cloudera.org:8080/#/c/15683/22/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java@722
PS22, Line 722: SlotRef slotRef = targetExpr.unwrapSlotRef(true);
This is unnecessary - what its doing is removing any possible casts around the SlotRef, but then you're checking for casts anyways in the 'if' directly below.

You can instead do a 'targetExpr instanceof SlotRef', and if true then cast targetExpr to a SlotRef and do the getColumn check.


http://gerrit.cloudera.org:8080/#/c/15683/22/fe/src/test/java/org/apache/impala/planner/PlannerTest.java
File fe/src/test/java/org/apache/impala/planner/PlannerTest.java:

http://gerrit.cloudera.org:8080/#/c/15683/22/fe/src/test/java/org/apache/impala/planner/PlannerTest.java@596
PS22, Line 596:   public void testKudu() {
I think it would be good to add some planner tests for the interaction between HDFS and Kudu scans in the same query, eg. have a bloom filter that gets assigned to both HDFS and Kudu scan nodes, have one that gets assigned to HDFS but not Kudu (eg. because its not targeting a slot ref), etc.


http://gerrit.cloudera.org:8080/#/c/15683/22/fe/src/test/java/org/apache/impala/planner/PlannerTest.java@635
PS22, Line 635:   public void testKuduTpch() {
Might be nice to set explainLevel = EXTENDED on at least one of these tests that includes bloom filters just to have it print out the [bloom] or [min-max] next to each filter in the plan.


http://gerrit.cloudera.org:8080/#/c/15683/22/testdata/workloads/functional-query/queries/QueryTest/all_runtime_filters.test
File testdata/workloads/functional-query/queries/QueryTest/all_runtime_filters.test:

http://gerrit.cloudera.org:8080/#/c/15683/22/testdata/workloads/functional-query/queries/QueryTest/all_runtime_filters.test@10
PS22, Line 10: 29200
Is it possible to add PROFILE sections for these first test cases like the ones further down have, with the aggregation over ProbeRows?

Might also be nice to think about anything else we can do to make sure these tests are actually doing what we think, eg. maybe also add a row_regex for the "n of n Runtime Filters Published" line to show that we really are generating both min-max and bloom filters.


http://gerrit.cloudera.org:8080/#/c/15683/22/testdata/workloads/functional-query/queries/QueryTest/all_runtime_filters.test@396
PS22, Line 396: # Test case 4: Two StringMinMaxFilters generated in the same fragment (IMPALA-7272)
I think this test case can probably be removed (the memory allocation in MinMaxFilter that its testing is covered by the equivalent test in min_max_filter.test already)


http://gerrit.cloudera.org:8080/#/c/15683/22/testdata/workloads/functional-query/queries/QueryTest/all_runtime_filters.test@423
PS22, Line 423: # Test case 5: SEMI_JOIN
Add a comment saying what's interesting about this test case?



-- 
To view, visit http://gerrit.cloudera.org:8080/15683
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9100076f68ea299ddb6ec8bc027cac7a47f5d754
Gerrit-Change-Number: 15683
Gerrit-PatchSet: 22
Gerrit-Owner: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Bankim Bhavsar <ba...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Jim Apple <jb...@apache.org>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Tue, 02 Jun 2020 22:21:58 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu

Posted by "Wenzhe Zhou (Code Review)" <ge...@cloudera.org>.
Wenzhe Zhou has uploaded a new patch set (#8). ( http://gerrit.cloudera.org:8080/15683 )

Change subject: IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu
......................................................................

IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu

Defined the BloomFilter class as the wrapper of kudu::BlockBloomFilter.
impala::BloomFilter build runtime bloom filter in kudu::BlockBloomFilter
APIs with FastHash as default hash algorithm.
Removed the duplicated functions from impala::BloomFillter class.
Pushed down bloom filter to Kudu through Kudu clinet API.
Added a new query option to set enabled runtime filter types, which
only affect Kudu scan node now. By default, both bloom filter and
min-max filter will be enabled for Kudu.
Added new test cases in PlannerTest and end-end runtime_filters test
for pushing down bloom filter to Kudu.
Updated bloom-filter-benchmark for the bloom-filter implementation
change.
Upgraded Kudu version to 389d4f1e1 to pull in Kudu Bloom Filter support
which is needed for the Kudu/Impala Bloom Filter integration.

Testing:
 - Passed test_kudu.py
 - Passed end-end test_runtime_filters.py.
 - Passed frontend Planner tests.
 - Ran single_node_perf_run.py on TPC-H with scale as 30 for parquet
   and Kudu. Verified that new hash function and bloom-filter
   implementation don't cause regressions for HDFS bloom filters.
 - Ran bloom-filter-benchmark and verified there is no regression
   due to bloom-filter implementation changes.

Change-Id: I9100076f68ea299ddb6ec8bc027cac7a47f5d754
---
M be/CMakeLists.txt
M be/src/benchmarks/bloom-filter-benchmark.cc
M be/src/codegen/gen_ir_descriptions.py
M be/src/exec/filter-context.cc
M be/src/exec/kudu-scanner.cc
M be/src/runtime/raw-value-ir.cc
M be/src/runtime/raw-value.h
M be/src/runtime/raw-value.inline.h
M be/src/runtime/runtime-filter-bank.cc
M be/src/runtime/runtime-filter-bank.h
M be/src/runtime/runtime-filter-ir.cc
M be/src/runtime/runtime-filter.h
M be/src/service/query-options-test.cc
M be/src/service/query-options.cc
M be/src/service/query-options.h
M be/src/util/bloom-filter-ir.cc
M be/src/util/bloom-filter-test.cc
M be/src/util/bloom-filter.cc
M be/src/util/bloom-filter.h
A be/src/util/bloom-filter.inline.h
M be/src/util/debug-util.cc
M be/src/util/debug-util.h
M be/src/util/hash-util.h
M bin/impala-config.sh
M common/thrift/ImpalaInternalService.thrift
M common/thrift/ImpalaService.thrift
M common/thrift/PlanNodes.thrift
M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
M testdata/workloads/functional-planner/queries/PlannerTest/kudu-update.test
M testdata/workloads/functional-planner/queries/PlannerTest/kudu.test
M testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-query-options.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpch-kudu.test
A testdata/workloads/functional-query/queries/QueryTest/all_runtime_filters.test
M testdata/workloads/functional-query/queries/QueryTest/runtime_filters.test
M tests/query_test/test_runtime_filters.py
36 files changed, 1,469 insertions(+), 560 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/83/15683/8
-- 
To view, visit http://gerrit.cloudera.org:8080/15683
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I9100076f68ea299ddb6ec8bc027cac7a47f5d754
Gerrit-Change-Number: 15683
Gerrit-PatchSet: 8
Gerrit-Owner: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Bankim Bhavsar <ba...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>

[Impala-ASF-CR] IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu

Posted by "Wenzhe Zhou (Code Review)" <ge...@cloudera.org>.
Wenzhe Zhou has posted comments on this change. ( http://gerrit.cloudera.org:8080/15683 )

Change subject: IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu
......................................................................


Patch Set 22:

(23 comments)

http://gerrit.cloudera.org:8080/#/c/15683/22/be/src/codegen/gen_ir_descriptions.py
File be/src/codegen/gen_ir_descriptions.py:

http://gerrit.cloudera.org:8080/#/c/15683/22/be/src/codegen/gen_ir_descriptions.py@204
PS22, Line 204: ["RAW_VALUE_GET_HASH_VALUE",
> I don't think this is being used anywhere anymore, so it can be removed.
Right, will remove this entry.


http://gerrit.cloudera.org:8080/#/c/15683/22/be/src/exec/kudu-scanner.cc
File be/src/exec/kudu-scanner.cc:

http://gerrit.cloudera.org:8080/#/c/15683/22/be/src/exec/kudu-scanner.cc@247
PS22, Line 247: NULL
> nit: we prefer nullptr, here and elsewhere
Will fix it.


http://gerrit.cloudera.org:8080/#/c/15683/22/be/src/kudu/util/block_bloom_filter.h
File be/src/kudu/util/block_bloom_filter.h:

http://gerrit.cloudera.org:8080/#/c/15683/22/be/src/kudu/util/block_bloom_filter.h@146
PS22, Line 146:     // It's possible that log_heap_size is less than kLogBucketWordBits in unit test.
> We definitely want to avoid making any changes to the be/src/kudu directory
Will change bloom-filter-test.cc to avoid hitting this error.


http://gerrit.cloudera.org:8080/#/c/15683/22/be/src/runtime/raw-value.h
File be/src/runtime/raw-value.h:

http://gerrit.cloudera.org:8080/#/c/15683/22/be/src/runtime/raw-value.h@86
PS22, Line 86: GetHashValueFastHash32
> I think you mean 'GetHashValueFastHash', i.e. this function doesn't return 
Right, fix it.


http://gerrit.cloudera.org:8080/#/c/15683/22/be/src/runtime/raw-value.h@93
PS22, Line 93: GetHashValueFastHash32NonNull
> Same here - this doesn't return a 32-bit value, so I think you meant to say
will fix it.


http://gerrit.cloudera.org:8080/#/c/15683/22/be/src/util/bloom-filter-test.cc
File be/src/util/bloom-filter-test.cc:

http://gerrit.cloudera.org:8080/#/c/15683/22/be/src/util/bloom-filter-test.cc@34
PS22, Line 34: // kudu::BlockBloomFilter::kCpu is a static variable and is initialized once.
             : // To temporarily disable AVX2 for Bloom Filter in runtime testing, set flag
             : // disable_blockbloomfilter_avx2 as true. See kudu::BlockBloomFilter::has_avx2().
             : // kudu::BlockBloomFilter::has_avx2() is called by kudu::BlockBloomFilter
             : // constructor and kudu::BlockBloomFilter::OrEqualArrayInternal(). Impala
             : // testing code should set this flag before creating impala::BloomFilter object
             : // which affect impala::BloomFilter::Insert() and impala::BloomFilter::Find(),
             : // or before calling impala::BloomFilter::or().
             : // This flag has no effect if the target CPU doesn't support AVX2.
> This comment is a lot longer than necessary. Probably fine to just say some
will make it short.


http://gerrit.cloudera.org:8080/#/c/15683/22/be/src/util/bloom-filter.h
File be/src/util/bloom-filter.h:

http://gerrit.cloudera.org:8080/#/c/15683/22/be/src/util/bloom-filter.h@40
PS22, Line 40: class BlockBloomFilter;
             : class BlockBloomFilterBufferAllocatorIf;
             : class Slice;
> nit: I think currently as written none of these are needed, since Slice isn
will remove these three lines.


http://gerrit.cloudera.org:8080/#/c/15683/22/be/src/util/bloom-filter.h@70
PS22, Line 70: class ImpalaBloomFilterBufferAllocator : public kudu::BlockBloomFilterBufferAllocatorIf {
> I think this is now only referenced in bloom-filter.cc, so we can just move
A buffer allocator is defined as regular member variable in BloomFilter class. If moving this class definition to bloom-filter.cc, we will get a compilation error which complains  incomplete type for the allocator member variable.


http://gerrit.cloudera.org:8080/#/c/15683/22/be/src/util/bloom-filter.h@210
PS22, Line 210:  *
> nit: formatting
will fix it


http://gerrit.cloudera.org:8080/#/c/15683/22/be/src/util/bloom-filter.h@213
PS22, Line 213: /// Buffer allocator is owned by the bloom filter.
> This comment seems weird. Maybe say something like "used by kudu to allocat
will fix it.


http://gerrit.cloudera.org:8080/#/c/15683/22/be/src/util/bloom-filter.h@215
PS22, Line 215: e
> nit: 'E'
Will fix it.


http://gerrit.cloudera.org:8080/#/c/15683/22/be/src/util/bloom-filter.cc
File be/src/util/bloom-filter.cc:

http://gerrit.cloudera.org:8080/#/c/15683/22/be/src/util/bloom-filter.cc@35
PS22, Line 35: #include "util/kudu-status-util.h"
> nit: please alphabetize
Will fix it


http://gerrit.cloudera.org:8080/#/c/15683/22/be/src/util/bloom-filter.cc@49
PS22, Line 49: delete
> In general, we try to avoid using 'delete' as its error prone.
will change it as regular member variable.


http://gerrit.cloudera.org:8080/#/c/15683/22/be/src/util/bloom-filter.cc@52
PS22, Line 52: // Kudu only support FastHash for BlockBloomFilter.
             : // Since Fasthash is strictly better than Murmur Hash2, we do not
             : // support Murmur Hash2 algorithm for Bloom filter.
> How about we move this information to the class header on BloomFilter in bl
will move to header file.


http://gerrit.cloudera.org:8080/#/c/15683/22/be/src/util/bloom-filter.cc@57
PS22, Line 57: hash_seed_
> I don't think this is ever actually used, so it can probably just be remove
will remove it.


http://gerrit.cloudera.org:8080/#/c/15683/22/be/src/util/bloom-filter.cc@176
PS22, Line 176:   
> nit: formating
will fix it.


http://gerrit.cloudera.org:8080/#/c/15683/22/be/src/util/hash-util.h
File be/src/util/hash-util.h:

http://gerrit.cloudera.org:8080/#/c/15683/22/be/src/util/hash-util.h@30
PS22, Line 30: enum HashAlgorithm {
> I don't think this is used anywhere, so lets remove it.
will remote it.


http://gerrit.cloudera.org:8080/#/c/15683/22/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java
File fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java:

http://gerrit.cloudera.org:8080/#/c/15683/22/fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java@722
PS22, Line 722: SlotRef slotRef = targetExpr.unwrapSlotRef(true);
> This is unnecessary - what its doing is removing any possible casts around 
will fix it as suggested.


http://gerrit.cloudera.org:8080/#/c/15683/22/fe/src/test/java/org/apache/impala/planner/PlannerTest.java
File fe/src/test/java/org/apache/impala/planner/PlannerTest.java:

http://gerrit.cloudera.org:8080/#/c/15683/22/fe/src/test/java/org/apache/impala/planner/PlannerTest.java@596
PS22, Line 596:   public void testKudu() {
> I think it would be good to add some planner tests for the interaction betw
Will add a new test file bloom-filter-assignment.test for the new test cases.


http://gerrit.cloudera.org:8080/#/c/15683/22/fe/src/test/java/org/apache/impala/planner/PlannerTest.java@635
PS22, Line 635:   public void testKuduTpch() {
> Might be nice to set explainLevel = EXTENDED on at least one of these tests
will add query option with explain_level as extended for a query with both bloom filter and min-max filter.


http://gerrit.cloudera.org:8080/#/c/15683/22/testdata/workloads/functional-query/queries/QueryTest/all_runtime_filters.test
File testdata/workloads/functional-query/queries/QueryTest/all_runtime_filters.test:

http://gerrit.cloudera.org:8080/#/c/15683/22/testdata/workloads/functional-query/queries/QueryTest/all_runtime_filters.test@10
PS22, Line 10: 29200
> Is it possible to add PROFILE sections for these first test cases like the 
Will add profile sections with aggregation over ProbeRows and row_regex for the "n of n Runtime Filters Published".


http://gerrit.cloudera.org:8080/#/c/15683/22/testdata/workloads/functional-query/queries/QueryTest/all_runtime_filters.test@396
PS22, Line 396: # Test case 4: Two StringMinMaxFilters generated in the same fragment (IMPALA-7272)
> I think this test case can probably be removed (the memory allocation in Mi
will remove it.


http://gerrit.cloudera.org:8080/#/c/15683/22/testdata/workloads/functional-query/queries/QueryTest/all_runtime_filters.test@423
PS22, Line 423: # Test case 5: SEMI_JOIN
> Add a comment saying what's interesting about this test case?
It's suggested by Aman to add this test case. Will add comment.



-- 
To view, visit http://gerrit.cloudera.org:8080/15683
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9100076f68ea299ddb6ec8bc027cac7a47f5d754
Gerrit-Change-Number: 15683
Gerrit-PatchSet: 22
Gerrit-Owner: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Bankim Bhavsar <ba...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Jim Apple <jb...@apache.org>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Wed, 03 Jun 2020 13:11:51 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu

Posted by "Wenzhe Zhou (Code Review)" <ge...@cloudera.org>.
Wenzhe Zhou has uploaded a new patch set (#6). ( http://gerrit.cloudera.org:8080/15683 )

Change subject: IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu
......................................................................

IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu

Defined the BloomFilter class as the wrapper of Kudu BlockBloomFilter,
which build runtime bloom filter in Kudu BlockBloomFilter APIs with
FastHash as default hash algorithm. Removed the duplicated functions
from BloomFillter class.
Added a query option to set enabled runtime filter types. Pushed
down runtime filters to Kudu through Kudu client APIs.
Added new test cases in PlannerTest and end-end runtime_filters test
for pushing down bloom filter to Kudu.

Testing:
Passed end-end runtime filter tests.
Passed frontend Planner tests.

Change-Id: I9100076f68ea299ddb6ec8bc027cac7a47f5d754
---
M be/CMakeLists.txt
M be/src/codegen/gen_ir_descriptions.py
M be/src/exec/filter-context.cc
M be/src/exec/kudu-scanner.cc
M be/src/runtime/raw-value-ir.cc
M be/src/runtime/raw-value.h
M be/src/runtime/raw-value.inline.h
M be/src/runtime/runtime-filter-bank.cc
M be/src/runtime/runtime-filter-bank.h
M be/src/runtime/runtime-filter-ir.cc
M be/src/runtime/runtime-filter.h
M be/src/service/query-options-test.cc
M be/src/service/query-options.cc
M be/src/service/query-options.h
M be/src/util/bloom-filter-ir.cc
M be/src/util/bloom-filter.cc
M be/src/util/bloom-filter.h
A be/src/util/bloom-filter.inline.h
M be/src/util/debug-util.cc
M be/src/util/debug-util.h
M be/src/util/hash-util.h
M common/thrift/ImpalaInternalService.thrift
M common/thrift/ImpalaService.thrift
M common/thrift/PlanNodes.thrift
M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
M testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-query-options.test
A testdata/workloads/functional-query/queries/QueryTest/all_runtime_filters.test
M tests/query_test/test_runtime_filters.py
29 files changed, 1,262 insertions(+), 365 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/83/15683/6
-- 
To view, visit http://gerrit.cloudera.org:8080/15683
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I9100076f68ea299ddb6ec8bc027cac7a47f5d754
Gerrit-Change-Number: 15683
Gerrit-PatchSet: 6
Gerrit-Owner: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Bankim Bhavsar <ba...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>

[Impala-ASF-CR] IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu

Posted by "Wenzhe Zhou (Code Review)" <ge...@cloudera.org>.
Wenzhe Zhou has uploaded a new patch set (#24). ( http://gerrit.cloudera.org:8080/15683 )

Change subject: IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu
......................................................................

IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu

Defined the BloomFilter class as the wrapper of kudu::BlockBloomFilter.
impala::BloomFilter build runtime bloom filter in kudu::BlockBloomFilter
APIs with FastHash as default hash algorithm.
Removed the duplicated functions from impala::BloomFillter class.
Pushed down bloom filter to Kudu through Kudu clinet API.

Added a new query option ENABLED_RUNTIME_FILTER_TYPES to set enabled
runtime filter types, which only affect Kudu scan node now. By default,
bloom filter is not enabled, only min-max filter will be enabled for
Kudu. With this option, user could enable bloom filter, min-max filter,
or both bloom and min-max runtime filters.

Added new test cases in PlannerTest and end-end runtime_filters test
for pushing down bloom filter to Kudu.
Added test cases to compare the number of rows returned from Kudu
scan when appling different types of runtime filter on same queries.
Updated bloom-filter-benchmark due to the bloom-filter implementation
change.

Bump Kudu version to d652cab17.

Testing:
 - Passed all exhaustive tests.

Performance benchmark:
 - Ran single_node_perf_run.py on TPC-H with scale as 30 for parquet
   and Kudu. Verified that new hash function and bloom-filter
   implementation don't cause regressions for HDFS bloom filters.
   For Kudu, there is one regression for query TPCH-Q9 and there
   are improvement for about 8 queris when appling both bloom and
   min-max filters. The bloom filter reduce the number of rows
   returned from Kudu scan, hence reduce the cost for aggregation
   and hash join. But bloom filter evaluation add extra cost for
   Kudu scan, which offset the gain on aggregation and join.
   Kudu scan need to be optimized for bloom filter in following
   tasks.
 - Ran bloom-filter microbenchmarks and verified that there is no
   regression for Insert/Find/Union functions with or without AVX2
   due to bloom-filter implementation changes. There is small
   performance degration for Init function, but this function is not
   in hot path.

Change-Id: I9100076f68ea299ddb6ec8bc027cac7a47f5d754
---
M be/CMakeLists.txt
M be/src/benchmarks/bloom-filter-benchmark.cc
M be/src/codegen/gen_ir_descriptions.py
M be/src/exec/filter-context.cc
M be/src/exec/kudu-scanner.cc
M be/src/runtime/raw-value-ir.cc
M be/src/runtime/raw-value.h
M be/src/runtime/raw-value.inline.h
M be/src/runtime/runtime-filter-bank.cc
M be/src/runtime/runtime-filter-ir.cc
M be/src/runtime/runtime-filter.h
M be/src/service/query-options-test.cc
M be/src/service/query-options.cc
M be/src/service/query-options.h
M be/src/util/bloom-filter-ir.cc
M be/src/util/bloom-filter-test.cc
M be/src/util/bloom-filter.cc
M be/src/util/bloom-filter.h
M be/src/util/debug-util.cc
M be/src/util/debug-util.h
M bin/impala-config.sh
M common/thrift/ImpalaInternalService.thrift
M common/thrift/ImpalaService.thrift
M common/thrift/PlanNodes.thrift
M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
A testdata/workloads/functional-planner/queries/PlannerTest/bloom-filter-assignment.test
M testdata/workloads/functional-planner/queries/PlannerTest/kudu-update.test
M testdata/workloads/functional-planner/queries/PlannerTest/kudu.test
M testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-query-options.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpch-kudu.test
A testdata/workloads/functional-query/queries/QueryTest/all_runtime_filters.test
A testdata/workloads/functional-query/queries/QueryTest/diff_runtime_filter_types.test
M testdata/workloads/functional-query/queries/QueryTest/runtime_filters.test
M tests/query_test/test_runtime_filters.py
M tests/query_test/test_spilling.py
36 files changed, 2,017 insertions(+), 651 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/83/15683/24
-- 
To view, visit http://gerrit.cloudera.org:8080/15683
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I9100076f68ea299ddb6ec8bc027cac7a47f5d754
Gerrit-Change-Number: 15683
Gerrit-PatchSet: 24
Gerrit-Owner: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Bankim Bhavsar <ba...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Jim Apple <jb...@apache.org>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>

[Impala-ASF-CR] IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu

Posted by "Aman Sinha (Code Review)" <ge...@cloudera.org>.
Aman Sinha has posted comments on this change. ( http://gerrit.cloudera.org:8080/15683 )

Change subject: IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu
......................................................................


Patch Set 3:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/15683/3/common/thrift/PlanNodes.thrift
File common/thrift/PlanNodes.thrift:

http://gerrit.cloudera.org:8080/#/c/15683/3/common/thrift/PlanNodes.thrift@124
PS3, Line 124:   BLOOM_MIN_MAX = 2
In the case of both Bloom and Min-Max filters, is the order of evaluation determined by Kudu ?  For columns that are not sorted (partially sorted within each tablet), the Min-Max filters could potentially not eliminate much and be wasted effort, so Bloom filter should be evaluated first.  On the other hand, for sorted columns,  it would make sense to apply Min-Max filters first since Bloom filter has false positives. 

It seems that for a specific column, creating two types of filters would incur overhead during execution - since both have to be sent to coordinator and broadcast to executors.  Any thoughts on that ?



-- 
To view, visit http://gerrit.cloudera.org:8080/15683
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9100076f68ea299ddb6ec8bc027cac7a47f5d754
Gerrit-Change-Number: 15683
Gerrit-PatchSet: 3
Gerrit-Owner: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Bankim Bhavsar <ba...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Fri, 10 Apr 2020 02:08:39 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu

Posted by "Aman Sinha (Code Review)" <ge...@cloudera.org>.
Aman Sinha has posted comments on this change. ( http://gerrit.cloudera.org:8080/15683 )

Change subject: IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu
......................................................................


Patch Set 3: Code-Review+1

(2 comments)

Couple of suggestions about the tests, otherwise the planner side changes LGTM. I did look at the backend changes but would be better for someone else to review it too.

http://gerrit.cloudera.org:8080/#/c/15683/3/testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-query-options.test
File testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-query-options.test:

http://gerrit.cloudera.org:8080/#/c/15683/3/testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-query-options.test@667
PS3, Line 667: # RUNTIME_FILTER_SCHEME_KUDU is set as BLOOM, Bloom filter is assigned
One additional useful test is with a SEMI-JOIN, since it is a common pattern :
SELECT COUNT(*) FROM alltypes a WHERE a.id IN (SELECT b.id FROM alltypes b WHERE b.int_col < 10);


http://gerrit.cloudera.org:8080/#/c/15683/3/testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-query-options.test@722
PS3, Line 722: RUNTIME_FILTER_SCHEME_KUDU=BLOOM_MIN_MAX
Suggest setting the explain_level=2 such that the runtime filter shows what type of filter it is 'bloom' or 'min-max'.  e.g RF000[bloom]



-- 
To view, visit http://gerrit.cloudera.org:8080/15683
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9100076f68ea299ddb6ec8bc027cac7a47f5d754
Gerrit-Change-Number: 15683
Gerrit-PatchSet: 3
Gerrit-Owner: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Bankim Bhavsar <ba...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Fri, 10 Apr 2020 18:22:49 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu

Posted by "Wenzhe Zhou (Code Review)" <ge...@cloudera.org>.
Wenzhe Zhou has uploaded a new patch set (#12). ( http://gerrit.cloudera.org:8080/15683 )

Change subject: IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu
......................................................................

IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu

Defined the BloomFilter class as the wrapper of kudu::BlockBloomFilter.
impala::BloomFilter build runtime bloom filter in kudu::BlockBloomFilter
APIs with FastHash as default hash algorithm.
Removed the duplicated functions from impala::BloomFillter class.
Pushed down bloom filter to Kudu through Kudu clinet API.
Added a new query option to set enabled runtime filter types, which
only affect Kudu scan node now. By default, both bloom filter and
min-max filter will be enabled for Kudu.
Added new test cases in PlannerTest and end-end runtime_filters test
for pushing down bloom filter to Kudu.
Updated bloom-filter-benchmark for the bloom-filter implementation
change.

Testing:
 - Passed test_kudu.py
 - Passed end-end test_runtime_filters.py.
 - Passed frontend Planner tests.
 - Ran single_node_perf_run.py on TPC-H with scale as 30 for parquet
   and Kudu. Verified that new hash function and bloom-filter
   implementation don't cause regressions for HDFS bloom filters.
 - Ran bloom-filter-benchmark and verified there is no regression
   due to bloom-filter implementation changes.

Change-Id: I9100076f68ea299ddb6ec8bc027cac7a47f5d754
---
M be/CMakeLists.txt
M be/src/benchmarks/bloom-filter-benchmark.cc
M be/src/codegen/gen_ir_descriptions.py
M be/src/exec/filter-context.cc
M be/src/exec/kudu-scanner.cc
M be/src/runtime/raw-value-ir.cc
M be/src/runtime/raw-value.h
M be/src/runtime/raw-value.inline.h
M be/src/runtime/runtime-filter-bank.cc
M be/src/runtime/runtime-filter-bank.h
M be/src/runtime/runtime-filter-ir.cc
M be/src/runtime/runtime-filter.h
M be/src/service/query-options-test.cc
M be/src/service/query-options.cc
M be/src/service/query-options.h
M be/src/util/bloom-filter-ir.cc
M be/src/util/bloom-filter-test.cc
M be/src/util/bloom-filter.cc
M be/src/util/bloom-filter.h
M be/src/util/debug-util.cc
M be/src/util/debug-util.h
M be/src/util/hash-util.h
M common/thrift/ImpalaInternalService.thrift
M common/thrift/ImpalaService.thrift
M common/thrift/PlanNodes.thrift
M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
M testdata/workloads/functional-planner/queries/PlannerTest/kudu-update.test
M testdata/workloads/functional-planner/queries/PlannerTest/kudu.test
M testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-query-options.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpch-kudu.test
A testdata/workloads/functional-query/queries/QueryTest/all_runtime_filters.test
M testdata/workloads/functional-query/queries/QueryTest/runtime_filters.test
M tests/query_test/test_runtime_filters.py
34 files changed, 1,339 insertions(+), 619 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/83/15683/12
-- 
To view, visit http://gerrit.cloudera.org:8080/15683
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I9100076f68ea299ddb6ec8bc027cac7a47f5d754
Gerrit-Change-Number: 15683
Gerrit-PatchSet: 12
Gerrit-Owner: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Bankim Bhavsar <ba...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>

[Impala-ASF-CR] IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu

Posted by "Wenzhe Zhou (Code Review)" <ge...@cloudera.org>.
Wenzhe Zhou has uploaded a new patch set (#11). ( http://gerrit.cloudera.org:8080/15683 )

Change subject: IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu
......................................................................

IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu

Defined the BloomFilter class as the wrapper of kudu::BlockBloomFilter.
impala::BloomFilter build runtime bloom filter in kudu::BlockBloomFilter
APIs with FastHash as default hash algorithm.
Removed the duplicated functions from impala::BloomFillter class.
Pushed down bloom filter to Kudu through Kudu clinet API.
Added a new query option to set enabled runtime filter types, which
only affect Kudu scan node now. By default, both bloom filter and
min-max filter will be enabled for Kudu.
Added new test cases in PlannerTest and end-end runtime_filters test
for pushing down bloom filter to Kudu.
Updated bloom-filter-benchmark for the bloom-filter implementation
change.

Testing:
 - Passed test_kudu.py
 - Passed end-end test_runtime_filters.py.
 - Passed frontend Planner tests.
 - Ran single_node_perf_run.py on TPC-H with scale as 30 for parquet
   and Kudu. Verified that new hash function and bloom-filter
   implementation don't cause regressions for HDFS bloom filters.
 - Ran bloom-filter-benchmark and verified there is no regression
   due to bloom-filter implementation changes.

Change-Id: I9100076f68ea299ddb6ec8bc027cac7a47f5d754
---
M be/CMakeLists.txt
M be/src/benchmarks/bloom-filter-benchmark.cc
M be/src/codegen/gen_ir_descriptions.py
M be/src/exec/filter-context.cc
M be/src/exec/kudu-scanner.cc
M be/src/runtime/raw-value-ir.cc
M be/src/runtime/raw-value.h
M be/src/runtime/raw-value.inline.h
M be/src/runtime/runtime-filter-bank.cc
M be/src/runtime/runtime-filter-bank.h
M be/src/runtime/runtime-filter-ir.cc
M be/src/runtime/runtime-filter.h
M be/src/service/query-options-test.cc
M be/src/service/query-options.cc
M be/src/service/query-options.h
M be/src/util/bloom-filter-ir.cc
M be/src/util/bloom-filter-test.cc
M be/src/util/bloom-filter.cc
M be/src/util/bloom-filter.h
M be/src/util/debug-util.cc
M be/src/util/debug-util.h
M be/src/util/hash-util.h
M common/thrift/ImpalaInternalService.thrift
M common/thrift/ImpalaService.thrift
M common/thrift/PlanNodes.thrift
M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
M testdata/workloads/functional-planner/queries/PlannerTest/kudu-update.test
M testdata/workloads/functional-planner/queries/PlannerTest/kudu.test
M testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-query-options.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpch-kudu.test
A testdata/workloads/functional-query/queries/QueryTest/all_runtime_filters.test
M testdata/workloads/functional-query/queries/QueryTest/runtime_filters.test
M tests/query_test/test_runtime_filters.py
34 files changed, 1,344 insertions(+), 620 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/83/15683/11
-- 
To view, visit http://gerrit.cloudera.org:8080/15683
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I9100076f68ea299ddb6ec8bc027cac7a47f5d754
Gerrit-Change-Number: 15683
Gerrit-PatchSet: 11
Gerrit-Owner: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Bankim Bhavsar <ba...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>

[Impala-ASF-CR] IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu

Posted by "Aman Sinha (Code Review)" <ge...@cloudera.org>.
Aman Sinha has posted comments on this change. ( http://gerrit.cloudera.org:8080/15683 )

Change subject: IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu
......................................................................


Patch Set 3:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/15683/3/common/thrift/PlanNodes.thrift
File common/thrift/PlanNodes.thrift:

http://gerrit.cloudera.org:8080/#/c/15683/3/common/thrift/PlanNodes.thrift@124
PS3, Line 124:   BLOOM_MIN_MAX = 2
> BTW forgot that I took some measurements of the ApplyPredicatePrimitive per
Thanks Bankim, Todd for the.illustrative explanation.   Agreed, for the (compound) primary keys, the min-max predicate is well suited for range pruning - that was my thought too in my initial comment.  For #3, I didn't think it would provide much benefit but your numbers look compelling enough for the 'with predicate' case that it seems worth it.  It sounds like Todd, you would even prefer evaluating that first. In that case, would the bloom filter be redundant ?  It sounds like current behavior (from Bankim's comment) is the BF is evaluated first. 

In any case, yeah on further thought, the overhead of sending both is not too much and we could leave the decision making to Kudu.



-- 
To view, visit http://gerrit.cloudera.org:8080/15683
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9100076f68ea299ddb6ec8bc027cac7a47f5d754
Gerrit-Change-Number: 15683
Gerrit-PatchSet: 3
Gerrit-Owner: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Bankim Bhavsar <ba...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Fri, 10 Apr 2020 05:47:36 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu

Posted by "Bankim Bhavsar (Code Review)" <ge...@cloudera.org>.
Bankim Bhavsar has posted comments on this change. ( http://gerrit.cloudera.org:8080/15683 )

Change subject: IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu
......................................................................


Patch Set 12:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/15683/12/be/src/exec/kudu-scanner.cc
File be/src/exec/kudu-scanner.cc:

http://gerrit.cloudera.org:8080/#/c/15683/12/be/src/exec/kudu-scanner.cc@252
PS12, Line 252:         ImpalaBloomFilterBufferAllocator* allocator =
              :             new ImpalaBloomFilterBufferAllocator(
              :                 state_->filter_bank()->buffer_pool_client());
This doesn't look right.
The allocator that is used to create BBF must be passed down to kudu.
In current impl, the one stored in impala BF must be passed down. https://gerrit.cloudera.org/c/15683/12/be/src/util/bloom-filter.h#214



-- 
To view, visit http://gerrit.cloudera.org:8080/15683
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9100076f68ea299ddb6ec8bc027cac7a47f5d754
Gerrit-Change-Number: 15683
Gerrit-PatchSet: 12
Gerrit-Owner: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Bankim Bhavsar <ba...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Jim Apple <jb...@apache.org>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Mon, 11 May 2020 23:15:39 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu

Posted by "Bankim Bhavsar (Code Review)" <ge...@cloudera.org>.
Bankim Bhavsar has posted comments on this change. ( http://gerrit.cloudera.org:8080/15683 )

Change subject: IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu
......................................................................


Patch Set 21:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/15683/21/be/src/util/bloom-filter.h
File be/src/util/bloom-filter.h:

http://gerrit.cloudera.org:8080/#/c/15683/21/be/src/util/bloom-filter.h@73
PS21, Line 73: use
uses


http://gerrit.cloudera.org:8080/#/c/15683/21/be/src/util/bloom-filter.cc
File be/src/util/bloom-filter.cc:

http://gerrit.cloudera.org:8080/#/c/15683/21/be/src/util/bloom-filter.cc@202
PS21, Line 202:   BufferPool *buffer_pool = ExecEnv::GetInstance()->buffer_pool();
              :   DCHECK(buffer_pool_client_ != nullptr);
With this implementation, a check that is_allocated is not true is needed to make sure previous allocation is not leaked?



-- 
To view, visit http://gerrit.cloudera.org:8080/15683
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9100076f68ea299ddb6ec8bc027cac7a47f5d754
Gerrit-Change-Number: 15683
Gerrit-PatchSet: 21
Gerrit-Owner: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Bankim Bhavsar <ba...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Jim Apple <jb...@apache.org>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Mon, 01 Jun 2020 19:19:12 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15683 )

Change subject: IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu
......................................................................


Patch Set 5:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/15683/5/tests/query_test/test_runtime_filters.py
File tests/query_test/test_runtime_filters.py:

http://gerrit.cloudera.org:8080/#/c/15683/5/tests/query_test/test_runtime_filters.py@31
PS5, Line 31: from tests.common.test_dimensions import (
flake8: F401 'tests.common.test_dimensions.create_exec_option_dimension' imported but unused



-- 
To view, visit http://gerrit.cloudera.org:8080/15683
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9100076f68ea299ddb6ec8bc027cac7a47f5d754
Gerrit-Change-Number: 15683
Gerrit-PatchSet: 5
Gerrit-Owner: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Bankim Bhavsar <ba...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Fri, 17 Apr 2020 07:04:15 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu

Posted by "Jim Apple (Code Review)" <ge...@cloudera.org>.
Jim Apple has posted comments on this change. ( http://gerrit.cloudera.org:8080/15683 )

Change subject: IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu
......................................................................


Patch Set 12:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/15683/12/be/src/benchmarks/bloom-filter-benchmark.cc
File be/src/benchmarks/bloom-filter-benchmark.cc:

http://gerrit.cloudera.org:8080/#/c/15683/12/be/src/benchmarks/bloom-filter-benchmark.cc@64
PS12, Line 64: // Machine Info: Intel(R) Core(TM) i7-6700 CPU @ 3.40GHz
I'd suggest re-running these on your machine before and after applying this patch, to see if there are any regressions. There are an indirection or two that hopefully will be inlined away, but I'm not sure.



-- 
To view, visit http://gerrit.cloudera.org:8080/15683
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9100076f68ea299ddb6ec8bc027cac7a47f5d754
Gerrit-Change-Number: 15683
Gerrit-PatchSet: 12
Gerrit-Owner: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Bankim Bhavsar <ba...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Jim Apple <jb...@apache.org>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Mon, 11 May 2020 03:19:34 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15683 )

Change subject: IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu
......................................................................


Patch Set 23:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/6201/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/15683
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9100076f68ea299ddb6ec8bc027cac7a47f5d754
Gerrit-Change-Number: 15683
Gerrit-PatchSet: 23
Gerrit-Owner: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Bankim Bhavsar <ba...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Jim Apple <jb...@apache.org>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Wed, 03 Jun 2020 14:06:34 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu

Posted by "Aman Sinha (Code Review)" <ge...@cloudera.org>.
Aman Sinha has posted comments on this change. ( http://gerrit.cloudera.org:8080/15683 )

Change subject: IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu
......................................................................


Patch Set 3:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/15683/3/common/thrift/PlanNodes.thrift
File common/thrift/PlanNodes.thrift:

http://gerrit.cloudera.org:8080/#/c/15683/3/common/thrift/PlanNodes.thrift@124
PS3, Line 124:   BLOOM_MIN_MAX = 2
> In the case of both Bloom and Min-Max filters, is the order of evaluation d
Instead of 'are not sorted(partially sorted within each tablet)',  I meant that even within a tablet there's no sortedness of that column.



-- 
To view, visit http://gerrit.cloudera.org:8080/15683
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9100076f68ea299ddb6ec8bc027cac7a47f5d754
Gerrit-Change-Number: 15683
Gerrit-PatchSet: 3
Gerrit-Owner: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Bankim Bhavsar <ba...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Fri, 10 Apr 2020 02:19:32 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu

Posted by "Wenzhe Zhou (Code Review)" <ge...@cloudera.org>.
Wenzhe Zhou has uploaded a new patch set (#18). ( http://gerrit.cloudera.org:8080/15683 )

Change subject: IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu
......................................................................

IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu

Defined the BloomFilter class as the wrapper of kudu::BlockBloomFilter.
impala::BloomFilter build runtime bloom filter in kudu::BlockBloomFilter
APIs with FastHash as default hash algorithm.
Removed the duplicated functions from impala::BloomFillter class.
Pushed down bloom filter to Kudu through Kudu clinet API.

Added a new query option ENABLED_RUNTIME_FILTER_TYPES to set enabled
runtime filter types, which only affect Kudu scan node now. By default,
bloom filter is not enabled, only min-max filter will be enabled for
Kudu. With this option, user could enable bloom filter, min-max filter,
or both bloom and min-max runtime filters.

Added new test cases in PlannerTest and end-end runtime_filters test
for pushing down bloom filter to Kudu.
Added test cases to compare the number of rows returned from Kudu
scan when appling different types of runtime filter on same queries.
Updated bloom-filter-benchmark due to the bloom-filter implementation
change.

Testing:
 - Passed all core tests.

Performance benchmark:
 - Ran single_node_perf_run.py on TPC-H with scale as 30 for parquet
   and Kudu. Verified that new hash function and bloom-filter
   implementation don't cause regressions for HDFS bloom filters.
   For Kudu, there is one regression for query TPCH-Q9 and there
   are improvement for about 8 queris when appling both bloom and
   min-max filters. The bloom filter reduce the number of rows
   returned from Kudu scan, hence reduce the cost for aggregation
   and hash join. But bloom filter evaluation add extra cost for
   Kudu scan, which offset the gain on aggregation and join.
   Kudu scan need to be optimized for bloom filter in following
   tasks.
 - Ran bloom-filter microbenchmarks and verified that there is no
   regression for Insert/Find/Union functions with or without AVX2
   due to bloom-filter implementation changes. There is small
   performance degration for Init function, but this function is not
   in hot path.

Change-Id: I9100076f68ea299ddb6ec8bc027cac7a47f5d754
---
M be/CMakeLists.txt
M be/src/benchmarks/bloom-filter-benchmark.cc
M be/src/codegen/gen_ir_descriptions.py
M be/src/exec/filter-context.cc
M be/src/exec/kudu-scanner.cc
M be/src/kudu/util/block_bloom_filter.h
M be/src/runtime/raw-value-ir.cc
M be/src/runtime/raw-value.h
M be/src/runtime/raw-value.inline.h
M be/src/runtime/runtime-filter-bank.cc
M be/src/runtime/runtime-filter-bank.h
M be/src/runtime/runtime-filter-ir.cc
M be/src/runtime/runtime-filter.h
M be/src/service/query-options-test.cc
M be/src/service/query-options.cc
M be/src/service/query-options.h
M be/src/util/bloom-filter-ir.cc
M be/src/util/bloom-filter-test.cc
M be/src/util/bloom-filter.cc
M be/src/util/bloom-filter.h
M be/src/util/debug-util.cc
M be/src/util/debug-util.h
M be/src/util/hash-util.h
M bin/impala-config.sh
M common/thrift/ImpalaInternalService.thrift
M common/thrift/ImpalaService.thrift
M common/thrift/PlanNodes.thrift
M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
M testdata/workloads/functional-planner/queries/PlannerTest/kudu-update.test
M testdata/workloads/functional-planner/queries/PlannerTest/kudu.test
M testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-query-options.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpch-kudu.test
A testdata/workloads/functional-query/queries/QueryTest/all_runtime_filters.test
A testdata/workloads/functional-query/queries/QueryTest/diff_runtime_filter_types.test
M testdata/workloads/functional-query/queries/QueryTest/runtime_filters.test
M tests/query_test/test_runtime_filters.py
M tests/query_test/test_spilling.py
38 files changed, 1,533 insertions(+), 622 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/83/15683/18
-- 
To view, visit http://gerrit.cloudera.org:8080/15683
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I9100076f68ea299ddb6ec8bc027cac7a47f5d754
Gerrit-Change-Number: 15683
Gerrit-PatchSet: 18
Gerrit-Owner: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Bankim Bhavsar <ba...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Jim Apple <jb...@apache.org>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>

[Impala-ASF-CR] IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu

Posted by "Wenzhe Zhou (Code Review)" <ge...@cloudera.org>.
Wenzhe Zhou has uploaded a new patch set (#9). ( http://gerrit.cloudera.org:8080/15683 )

Change subject: IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu
......................................................................

IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu

Defined the BloomFilter class as the wrapper of kudu::BlockBloomFilter.
impala::BloomFilter build runtime bloom filter in kudu::BlockBloomFilter
APIs with FastHash as default hash algorithm.
Removed the duplicated functions from impala::BloomFillter class.
Pushed down bloom filter to Kudu through Kudu clinet API.
Added a new query option to set enabled runtime filter types, which
only affect Kudu scan node now. By default, both bloom filter and
min-max filter will be enabled for Kudu.
Added new test cases in PlannerTest and end-end runtime_filters test
for pushing down bloom filter to Kudu.
Updated bloom-filter-benchmark for the bloom-filter implementation
change.

Testing:
 - Passed test_kudu.py
 - Passed end-end test_runtime_filters.py.
 - Passed frontend Planner tests.
 - Ran single_node_perf_run.py on TPC-H with scale as 30 for parquet
   and Kudu. Verified that new hash function and bloom-filter
   implementation don't cause regressions for HDFS bloom filters.
 - Ran bloom-filter-benchmark and verified there is no regression
   due to bloom-filter implementation changes.

Change-Id: I9100076f68ea299ddb6ec8bc027cac7a47f5d754
---
M be/CMakeLists.txt
M be/src/benchmarks/bloom-filter-benchmark.cc
M be/src/codegen/gen_ir_descriptions.py
M be/src/exec/filter-context.cc
M be/src/exec/kudu-scanner.cc
M be/src/runtime/raw-value-ir.cc
M be/src/runtime/raw-value.h
M be/src/runtime/raw-value.inline.h
M be/src/runtime/runtime-filter-bank.cc
M be/src/runtime/runtime-filter-bank.h
M be/src/runtime/runtime-filter-ir.cc
M be/src/runtime/runtime-filter.h
M be/src/service/query-options-test.cc
M be/src/service/query-options.cc
M be/src/service/query-options.h
M be/src/util/bloom-filter-ir.cc
M be/src/util/bloom-filter-test.cc
M be/src/util/bloom-filter.cc
M be/src/util/bloom-filter.h
A be/src/util/bloom-filter.inline.h
M be/src/util/debug-util.cc
M be/src/util/debug-util.h
M be/src/util/hash-util.h
M common/thrift/ImpalaInternalService.thrift
M common/thrift/ImpalaService.thrift
M common/thrift/PlanNodes.thrift
M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
M testdata/workloads/functional-planner/queries/PlannerTest/kudu-update.test
M testdata/workloads/functional-planner/queries/PlannerTest/kudu.test
M testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-query-options.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpch-kudu.test
A testdata/workloads/functional-query/queries/QueryTest/all_runtime_filters.test
M testdata/workloads/functional-query/queries/QueryTest/runtime_filters.test
M tests/query_test/test_runtime_filters.py
35 files changed, 1,472 insertions(+), 557 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/83/15683/9
-- 
To view, visit http://gerrit.cloudera.org:8080/15683
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I9100076f68ea299ddb6ec8bc027cac7a47f5d754
Gerrit-Change-Number: 15683
Gerrit-PatchSet: 9
Gerrit-Owner: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Bankim Bhavsar <ba...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>

[Impala-ASF-CR] IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu

Posted by "Wenzhe Zhou (Code Review)" <ge...@cloudera.org>.
Wenzhe Zhou has uploaded a new patch set (#17). ( http://gerrit.cloudera.org:8080/15683 )

Change subject: IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu
......................................................................

IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu

Defined the BloomFilter class as the wrapper of kudu::BlockBloomFilter.
impala::BloomFilter build runtime bloom filter in kudu::BlockBloomFilter
APIs with FastHash as default hash algorithm.
Removed the duplicated functions from impala::BloomFillter class.
Pushed down bloom filter to Kudu through Kudu clinet API.
Added a new query option to set enabled runtime filter types, which
only affect Kudu scan node now. By default, both bloom filter and
min-max filter will be enabled for Kudu.
Added new test cases in PlannerTest and end-end runtime_filters test
for pushing down bloom filter to Kudu.
Updated bloom-filter-benchmark for the bloom-filter implementation
change.

Testing:
 - Passed test_kudu.py
 - Passed end-end test_runtime_filters.py.
 - Passed frontend Planner tests.
 - Ran single_node_perf_run.py on TPC-H with scale as 30 for parquet
   and Kudu. Verified that new hash function and bloom-filter
   implementation don't cause regressions for HDFS bloom filters.
 - Ran bloom-filter-benchmark and verified there is no regression
   due to bloom-filter implementation changes.

Change-Id: I9100076f68ea299ddb6ec8bc027cac7a47f5d754
---
M be/CMakeLists.txt
M be/src/benchmarks/bloom-filter-benchmark.cc
M be/src/codegen/gen_ir_descriptions.py
M be/src/exec/filter-context.cc
M be/src/exec/kudu-scanner.cc
M be/src/kudu/util/block_bloom_filter.h
M be/src/runtime/raw-value-ir.cc
M be/src/runtime/raw-value.h
M be/src/runtime/raw-value.inline.h
M be/src/runtime/runtime-filter-bank.cc
M be/src/runtime/runtime-filter-bank.h
M be/src/runtime/runtime-filter-ir.cc
M be/src/runtime/runtime-filter.h
M be/src/service/query-options-test.cc
M be/src/service/query-options.cc
M be/src/service/query-options.h
M be/src/util/bloom-filter-ir.cc
M be/src/util/bloom-filter-test.cc
M be/src/util/bloom-filter.cc
M be/src/util/bloom-filter.h
M be/src/util/debug-util.cc
M be/src/util/debug-util.h
M be/src/util/hash-util.h
M bin/impala-config.sh
M common/thrift/ImpalaInternalService.thrift
M common/thrift/ImpalaService.thrift
M common/thrift/PlanNodes.thrift
M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
M testdata/workloads/functional-planner/queries/PlannerTest/kudu-update.test
M testdata/workloads/functional-planner/queries/PlannerTest/kudu.test
M testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-query-options.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpch-kudu.test
A testdata/workloads/functional-query/queries/QueryTest/all_runtime_filters.test
M testdata/workloads/functional-query/queries/QueryTest/runtime_filters.test
M tests/query_test/test_runtime_filters.py
M tests/query_test/test_spilling.py
37 files changed, 1,380 insertions(+), 622 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/83/15683/17
-- 
To view, visit http://gerrit.cloudera.org:8080/15683
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I9100076f68ea299ddb6ec8bc027cac7a47f5d754
Gerrit-Change-Number: 15683
Gerrit-PatchSet: 17
Gerrit-Owner: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Bankim Bhavsar <ba...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Jim Apple <jb...@apache.org>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>

[Impala-ASF-CR] IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15683 )

Change subject: IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu
......................................................................


Patch Set 14:

Build Failed 

https://jenkins.impala.io/job/gerrit-code-review-checks/6058/ : Initial code review checks failed. See linked job for details on the failure.


-- 
To view, visit http://gerrit.cloudera.org:8080/15683
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9100076f68ea299ddb6ec8bc027cac7a47f5d754
Gerrit-Change-Number: 15683
Gerrit-PatchSet: 14
Gerrit-Owner: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Bankim Bhavsar <ba...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Jim Apple <jb...@apache.org>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Thu, 14 May 2020 07:10:43 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu

Posted by "Wenzhe Zhou (Code Review)" <ge...@cloudera.org>.
Wenzhe Zhou has posted comments on this change. ( http://gerrit.cloudera.org:8080/15683 )

Change subject: IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu
......................................................................


Patch Set 21:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/15683/21/be/src/util/bloom-filter.h
File be/src/util/bloom-filter.h:

http://gerrit.cloudera.org:8080/#/c/15683/21/be/src/util/bloom-filter.h@73
PS21, Line 73: use
> uses
fixed


http://gerrit.cloudera.org:8080/#/c/15683/21/be/src/util/bloom-filter.cc
File be/src/util/bloom-filter.cc:

http://gerrit.cloudera.org:8080/#/c/15683/21/be/src/util/bloom-filter.cc@202
PS21, Line 202:   BufferPool *buffer_pool = ExecEnv::GetInstance()->buffer_pool();
              :   DCHECK(buffer_pool_client_ != nullptr);
> With this implementation, a check that is_allocated is not true is needed t
will call close() first.



-- 
To view, visit http://gerrit.cloudera.org:8080/15683
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9100076f68ea299ddb6ec8bc027cac7a47f5d754
Gerrit-Change-Number: 15683
Gerrit-PatchSet: 21
Gerrit-Owner: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Bankim Bhavsar <ba...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Jim Apple <jb...@apache.org>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Mon, 01 Jun 2020 21:24:59 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu

Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Todd Lipcon has posted comments on this change. ( http://gerrit.cloudera.org:8080/15683 )

Change subject: IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu
......................................................................


Patch Set 3:

Sorry for repeated comments, but one more thought occurred me to me: the most effective case of min-max filter is when min == max, and thus the filter becomes an equality. An equality predicate can then serve to prune Kudu hash partitions, which isn't doable with bloom or range predicates.


-- 
To view, visit http://gerrit.cloudera.org:8080/15683
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9100076f68ea299ddb6ec8bc027cac7a47f5d754
Gerrit-Change-Number: 15683
Gerrit-PatchSet: 3
Gerrit-Owner: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Bankim Bhavsar <ba...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Fri, 10 Apr 2020 04:12:06 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15683 )

Change subject: IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu
......................................................................


Patch Set 21:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/6166/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/15683
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9100076f68ea299ddb6ec8bc027cac7a47f5d754
Gerrit-Change-Number: 15683
Gerrit-PatchSet: 21
Gerrit-Owner: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Bankim Bhavsar <ba...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Jim Apple <jb...@apache.org>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Fri, 29 May 2020 18:25:52 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15683 )

Change subject: IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu
......................................................................


Patch Set 8:

Build Failed 

https://jenkins.impala.io/job/gerrit-code-review-checks/5920/ : Initial code review checks failed. See linked job for details on the failure.


-- 
To view, visit http://gerrit.cloudera.org:8080/15683
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9100076f68ea299ddb6ec8bc027cac7a47f5d754
Gerrit-Change-Number: 15683
Gerrit-PatchSet: 8
Gerrit-Owner: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Bankim Bhavsar <ba...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Fri, 01 May 2020 00:11:25 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu

Posted by "Wenzhe Zhou (Code Review)" <ge...@cloudera.org>.
Wenzhe Zhou has uploaded a new patch set (#22). ( http://gerrit.cloudera.org:8080/15683 )

Change subject: IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu
......................................................................

IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu

Defined the BloomFilter class as the wrapper of kudu::BlockBloomFilter.
impala::BloomFilter build runtime bloom filter in kudu::BlockBloomFilter
APIs with FastHash as default hash algorithm.
Removed the duplicated functions from impala::BloomFillter class.
Pushed down bloom filter to Kudu through Kudu clinet API.

Added a new query option ENABLED_RUNTIME_FILTER_TYPES to set enabled
runtime filter types, which only affect Kudu scan node now. By default,
bloom filter is not enabled, only min-max filter will be enabled for
Kudu. With this option, user could enable bloom filter, min-max filter,
or both bloom and min-max runtime filters.

Added new test cases in PlannerTest and end-end runtime_filters test
for pushing down bloom filter to Kudu.
Added test cases to compare the number of rows returned from Kudu
scan when appling different types of runtime filter on same queries.
Updated bloom-filter-benchmark due to the bloom-filter implementation
change.

Bump Kudu version to d652cab17.

Testing:
 - Passed all exhaustive tests.

Performance benchmark:
 - Ran single_node_perf_run.py on TPC-H with scale as 30 for parquet
   and Kudu. Verified that new hash function and bloom-filter
   implementation don't cause regressions for HDFS bloom filters.
   For Kudu, there is one regression for query TPCH-Q9 and there
   are improvement for about 8 queris when appling both bloom and
   min-max filters. The bloom filter reduce the number of rows
   returned from Kudu scan, hence reduce the cost for aggregation
   and hash join. But bloom filter evaluation add extra cost for
   Kudu scan, which offset the gain on aggregation and join.
   Kudu scan need to be optimized for bloom filter in following
   tasks.
 - Ran bloom-filter microbenchmarks and verified that there is no
   regression for Insert/Find/Union functions with or without AVX2
   due to bloom-filter implementation changes. There is small
   performance degration for Init function, but this function is not
   in hot path.

Change-Id: I9100076f68ea299ddb6ec8bc027cac7a47f5d754
---
M be/CMakeLists.txt
M be/src/benchmarks/bloom-filter-benchmark.cc
M be/src/codegen/gen_ir_descriptions.py
M be/src/exec/filter-context.cc
M be/src/exec/kudu-scanner.cc
M be/src/kudu/util/block_bloom_filter.h
M be/src/runtime/raw-value-ir.cc
M be/src/runtime/raw-value.h
M be/src/runtime/raw-value.inline.h
M be/src/runtime/runtime-filter-bank.cc
M be/src/runtime/runtime-filter-ir.cc
M be/src/runtime/runtime-filter.h
M be/src/service/query-options-test.cc
M be/src/service/query-options.cc
M be/src/service/query-options.h
M be/src/util/bloom-filter-ir.cc
M be/src/util/bloom-filter-test.cc
M be/src/util/bloom-filter.cc
M be/src/util/bloom-filter.h
M be/src/util/debug-util.cc
M be/src/util/debug-util.h
M be/src/util/hash-util.h
M bin/impala-config.sh
M common/thrift/ImpalaInternalService.thrift
M common/thrift/ImpalaService.thrift
M common/thrift/PlanNodes.thrift
M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
M testdata/workloads/functional-planner/queries/PlannerTest/kudu-update.test
M testdata/workloads/functional-planner/queries/PlannerTest/kudu.test
M testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-query-options.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpch-kudu.test
A testdata/workloads/functional-query/queries/QueryTest/all_runtime_filters.test
A testdata/workloads/functional-query/queries/QueryTest/diff_runtime_filter_types.test
M testdata/workloads/functional-query/queries/QueryTest/runtime_filters.test
M tests/query_test/test_runtime_filters.py
M tests/query_test/test_spilling.py
37 files changed, 1,524 insertions(+), 622 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/83/15683/22
-- 
To view, visit http://gerrit.cloudera.org:8080/15683
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I9100076f68ea299ddb6ec8bc027cac7a47f5d754
Gerrit-Change-Number: 15683
Gerrit-PatchSet: 22
Gerrit-Owner: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Bankim Bhavsar <ba...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Jim Apple <jb...@apache.org>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>

[Impala-ASF-CR] IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15683 )

Change subject: IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu
......................................................................


Patch Set 12:

Build Failed 

https://jenkins.impala.io/job/gerrit-code-review-checks/6016/ : Initial code review checks failed. See linked job for details on the failure.


-- 
To view, visit http://gerrit.cloudera.org:8080/15683
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9100076f68ea299ddb6ec8bc027cac7a47f5d754
Gerrit-Change-Number: 15683
Gerrit-PatchSet: 12
Gerrit-Owner: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Bankim Bhavsar <ba...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Mon, 11 May 2020 01:48:25 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu

Posted by "Wenzhe Zhou (Code Review)" <ge...@cloudera.org>.
Wenzhe Zhou has uploaded a new patch set (#21). ( http://gerrit.cloudera.org:8080/15683 )

Change subject: IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu
......................................................................

IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu

Defined the BloomFilter class as the wrapper of kudu::BlockBloomFilter.
impala::BloomFilter build runtime bloom filter in kudu::BlockBloomFilter
APIs with FastHash as default hash algorithm.
Removed the duplicated functions from impala::BloomFillter class.
Pushed down bloom filter to Kudu through Kudu clinet API.

Added a new query option ENABLED_RUNTIME_FILTER_TYPES to set enabled
runtime filter types, which only affect Kudu scan node now. By default,
bloom filter is not enabled, only min-max filter will be enabled for
Kudu. With this option, user could enable bloom filter, min-max filter,
or both bloom and min-max runtime filters.

Added new test cases in PlannerTest and end-end runtime_filters test
for pushing down bloom filter to Kudu.
Added test cases to compare the number of rows returned from Kudu
scan when appling different types of runtime filter on same queries.
Updated bloom-filter-benchmark due to the bloom-filter implementation
change.

Bump Kudu version to d652cab17.

Testing:
 - Passed all core tests.

Performance benchmark:
 - Ran single_node_perf_run.py on TPC-H with scale as 30 for parquet
   and Kudu. Verified that new hash function and bloom-filter
   implementation don't cause regressions for HDFS bloom filters.
   For Kudu, there is one regression for query TPCH-Q9 and there
   are improvement for about 8 queris when appling both bloom and
   min-max filters. The bloom filter reduce the number of rows
   returned from Kudu scan, hence reduce the cost for aggregation
   and hash join. But bloom filter evaluation add extra cost for
   Kudu scan, which offset the gain on aggregation and join.
   Kudu scan need to be optimized for bloom filter in following
   tasks.
 - Ran bloom-filter microbenchmarks and verified that there is no
   regression for Insert/Find/Union functions with or without AVX2
   due to bloom-filter implementation changes. There is small
   performance degration for Init function, but this function is not
   in hot path.

Change-Id: I9100076f68ea299ddb6ec8bc027cac7a47f5d754
---
M be/CMakeLists.txt
M be/src/benchmarks/bloom-filter-benchmark.cc
M be/src/codegen/gen_ir_descriptions.py
M be/src/exec/filter-context.cc
M be/src/exec/kudu-scanner.cc
M be/src/kudu/util/block_bloom_filter.h
M be/src/runtime/raw-value-ir.cc
M be/src/runtime/raw-value.h
M be/src/runtime/raw-value.inline.h
M be/src/runtime/runtime-filter-bank.cc
M be/src/runtime/runtime-filter-ir.cc
M be/src/runtime/runtime-filter.h
M be/src/service/query-options-test.cc
M be/src/service/query-options.cc
M be/src/service/query-options.h
M be/src/util/bloom-filter-ir.cc
M be/src/util/bloom-filter-test.cc
M be/src/util/bloom-filter.cc
M be/src/util/bloom-filter.h
M be/src/util/debug-util.cc
M be/src/util/debug-util.h
M be/src/util/hash-util.h
M bin/impala-config.sh
M common/thrift/ImpalaInternalService.thrift
M common/thrift/ImpalaService.thrift
M common/thrift/PlanNodes.thrift
M fe/src/main/java/org/apache/impala/planner/RuntimeFilterGenerator.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
M testdata/workloads/functional-planner/queries/PlannerTest/kudu-update.test
M testdata/workloads/functional-planner/queries/PlannerTest/kudu.test
M testdata/workloads/functional-planner/queries/PlannerTest/runtime-filter-query-options.test
M testdata/workloads/functional-planner/queries/PlannerTest/tpch-kudu.test
A testdata/workloads/functional-query/queries/QueryTest/all_runtime_filters.test
A testdata/workloads/functional-query/queries/QueryTest/diff_runtime_filter_types.test
M testdata/workloads/functional-query/queries/QueryTest/runtime_filters.test
M tests/query_test/test_runtime_filters.py
M tests/query_test/test_spilling.py
37 files changed, 1,522 insertions(+), 622 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/83/15683/21
-- 
To view, visit http://gerrit.cloudera.org:8080/15683
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I9100076f68ea299ddb6ec8bc027cac7a47f5d754
Gerrit-Change-Number: 15683
Gerrit-PatchSet: 21
Gerrit-Owner: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Bankim Bhavsar <ba...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Jim Apple <jb...@apache.org>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>

[Impala-ASF-CR] IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15683 )

Change subject: IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu
......................................................................


Patch Set 18:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/6121/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/15683
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9100076f68ea299ddb6ec8bc027cac7a47f5d754
Gerrit-Change-Number: 15683
Gerrit-PatchSet: 18
Gerrit-Owner: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Bankim Bhavsar <ba...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Jim Apple <jb...@apache.org>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Thu, 21 May 2020 01:21:07 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15683 )

Change subject: IMPALA-3741 [part 2]: Push runtime bloom filter to Kudu
......................................................................


Patch Set 22:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/6186/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/15683
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9100076f68ea299ddb6ec8bc027cac7a47f5d754
Gerrit-Change-Number: 15683
Gerrit-PatchSet: 22
Gerrit-Owner: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Reviewer: Aman Sinha <am...@cloudera.com>
Gerrit-Reviewer: Bankim Bhavsar <ba...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Jim Apple <jb...@apache.org>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Mon, 01 Jun 2020 22:21:46 +0000
Gerrit-HasComments: No