You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@impala.apache.org by "Thomas Tauber-Marshall (Code Review)" <ge...@cloudera.org> on 2019/12/16 19:36:30 UTC

[Impala-ASF-CR] IMPALA-4400: aggregate runtime filters locally

Thomas Tauber-Marshall has posted comments on this change. ( http://gerrit.cloudera.org:8080/14538 )

Change subject: IMPALA-4400: aggregate runtime filters locally
......................................................................


Patch Set 15:

(15 comments)

http://gerrit.cloudera.org:8080/#/c/14538/15//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/14538/15//COMMIT_MSG@18
PS15, Line 18: can done
typo


http://gerrit.cloudera.org:8080/#/c/14538/15/be/src/exec/kudu-scan-node-base.cc
File be/src/exec/kudu-scan-node-base.cc:

http://gerrit.cloudera.org:8080/#/c/14538/15/be/src/exec/kudu-scan-node-base.cc@110
PS15, Line 110:   if (filter_ctxs_.size() > 0) WaitForRuntimeFilters();
Maybe outside the scope of this patch, but it looks like the hdfs scanners don't call this until GetNext(), which is nice since it means that all of the setup can happen before we wait. Might want to do the same thing for Kudu.


http://gerrit.cloudera.org:8080/#/c/14538/15/be/src/runtime/coordinator-backend-state.h
File be/src/runtime/coordinator-backend-state.h:

http://gerrit.cloudera.org:8080/#/c/14538/15/be/src/runtime/coordinator-backend-state.h@136
PS15, Line 136:   bool HasFragmentIdx(int fragment_idx);
const


http://gerrit.cloudera.org:8080/#/c/14538/15/be/src/runtime/coordinator-backend-state.h@140
PS15, Line 140:   bool HasFragmentIdx(const std::unordered_set<int>& fragment_idxs);
const


http://gerrit.cloudera.org:8080/#/c/14538/15/be/src/runtime/query-state.h
File be/src/runtime/query-state.h:

http://gerrit.cloudera.org:8080/#/c/14538/15/be/src/runtime/query-state.h@161
PS15, Line 161:   RuntimeFilterBank* filter_bank() { return filter_bank_.get(); }
const


http://gerrit.cloudera.org:8080/#/c/14538/15/be/src/runtime/query-state.cc
File be/src/runtime/query-state.cc:

http://gerrit.cloudera.org:8080/#/c/14538/15/be/src/runtime/query-state.cc@a679
PS15, Line 679: 
Looks like maybe this was the only use of 'fragment_map_' anywhere and its unnecessary now?


http://gerrit.cloudera.org:8080/#/c/14538/15/be/src/runtime/runtime-filter-bank.h
File be/src/runtime/runtime-filter-bank.h:

http://gerrit.cloudera.org:8080/#/c/14538/15/be/src/runtime/runtime-filter-bank.h@68
PS15, Line 68: AllocateScratchBloomFilter()
Not your change, but this is wrong, since the sentence refers to both bloom and min-max filters. Maybe change it to say "AllocateScratch*Filter()"


http://gerrit.cloudera.org:8080/#/c/14538/15/be/src/runtime/runtime-filter-bank.h@79
PS15, Line 79: PublishGlobalFilte
             : /// ()
typo


http://gerrit.cloudera.org:8080/#/c/14538/15/be/src/runtime/runtime-filter-bank.h@184
PS15, Line 184: locked separately
typo


http://gerrit.cloudera.org:8080/#/c/14538/15/be/src/runtime/runtime-state.h
File be/src/runtime/runtime-state.h:

http://gerrit.cloudera.org:8080/#/c/14538/15/be/src/runtime/runtime-state.h@63
PS15, Line 63: /// A collection of items that are part of the global state of a query and shared across
Not your change, but this comment seems wrong/confusing to me. RuntimeState isn't shared across a query, its per-fragment instance, right?


http://gerrit.cloudera.org:8080/#/c/14538/15/be/src/scheduling/query-schedule.cc
File be/src/scheduling/query-schedule.cc:

http://gerrit.cloudera.org:8080/#/c/14538/15/be/src/scheduling/query-schedule.cc@246
PS15, Line 246:   for (const FInstanceExecParams& instance_params: instance_exec_params) {
Looks like you're relying on the fact that all instances for a particular host will be contiguous within instance_exec_params. Might want to document that assumption where instance_exec_params is declared, or even make it a map from host to finstances.


http://gerrit.cloudera.org:8080/#/c/14538/15/be/src/util/bloom-filter.cc
File be/src/util/bloom-filter.cc:

http://gerrit.cloudera.org:8080/#/c/14538/15/be/src/util/bloom-filter.cc@251
PS15, Line 251:   if (other.AlwaysFalse()) return;
Doesn't look like you're handling the always_true case, maybe add a DCHECK?


http://gerrit.cloudera.org:8080/#/c/14538/15/be/src/util/min-max-filter.h
File be/src/util/min-max-filter.h:

http://gerrit.cloudera.org:8080/#/c/14538/15/be/src/util/min-max-filter.h@74
PS15, Line 74:   virtual void SetAlwaysTrue() = 0;
protected?


http://gerrit.cloudera.org:8080/#/c/14538/15/be/src/util/min-max-filter.cc
File be/src/util/min-max-filter.cc:

http://gerrit.cloudera.org:8080/#/c/14538/15/be/src/util/min-max-filter.cc@324
PS15, Line 324: // Sets 'always_true_' to true and clears the values of 'min_', 'max_', 'min_buffer_',
Seems pretty clear from the code, so probably unnecessary, or maybe leave it in the header.


http://gerrit.cloudera.org:8080/#/c/14538/15/common/protobuf/data_stream_service.proto
File common/protobuf/data_stream_service.proto:

http://gerrit.cloudera.org:8080/#/c/14538/15/common/protobuf/data_stream_service.proto@136
PS15, Line 136:   // optional int32 dst_fragment_idx = 3;
Any reason not to just completely remove this and renumber? Its completely internal so we don't really care about backwards compatibility.



-- 
To view, visit http://gerrit.cloudera.org:8080/14538
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iabeeab5eec869ff2197250ad41c1eb5551704acc
Gerrit-Change-Number: 14538
Gerrit-PatchSet: 15
Gerrit-Owner: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Fang-Yu Rao <fa...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Ho <mi...@gmail.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Mon, 16 Dec 2019 19:36:30 +0000
Gerrit-HasComments: Yes