You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@impala.apache.org by "Michael Ho (Code Review)" <ge...@cloudera.org> on 2019/11/01 23:52:47 UTC

[Impala-ASF-CR] IMPALA-7984: Port runtime filter from Thrift RPC to KRPC

Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/13882 )

Change subject: IMPALA-7984: Port runtime filter from Thrift RPC to KRPC
......................................................................


Patch Set 27: Code-Review+2

(19 comments)

LGTM. Please address the comments below.

http://gerrit.cloudera.org:8080/#/c/13882/27/be/src/benchmarks/bloom-filter-benchmark.cc
File be/src/benchmarks/bloom-filter-benchmark.cc:

http://gerrit.cloudera.org:8080/#/c/13882/27/be/src/benchmarks/bloom-filter-benchmark.cc@290
PS27, Line 290:  std::shared_ptr
No need for shared_ptr


http://gerrit.cloudera.org:8080/#/c/13882/25/be/src/runtime/coordinator.cc
File be/src/runtime/coordinator.cc:

http://gerrit.cloudera.org:8080/#/c/13882/25/be/src/runtime/coordinator.cc@1022
PS25, Line 1022: {
> Thanks for pointing this out.
Makes sense.


http://gerrit.cloudera.org:8080/#/c/13882/25/be/src/runtime/coordinator.cc@1072
PS25, Line 1072:       bloom_filter_directory.swap(state->bloom_filter_directory());
               :       DCHECK(rpc_params.bloom_filter().always_false()
> Thanks for the suggestion!
I guess it can simply be

*rpc_params.mutable_bloom_filter() = state->bloom_filter();


http://gerrit.cloudera.org:8080/#/c/13882/25/be/src/runtime/coordinator.cc@1074
PS25, Line 1074:     || rpc_params.bloo
> Thanks for the suggestion.
Missed the part about the scope of state. Think it's fine to keep it as-is. Thanks for the explanation.


http://gerrit.cloudera.org:8080/#/c/13882/25/be/src/runtime/coordinator.cc@1098
PS25, Line 1098: 
> Since 'state' is defined inside of the critical section as mentioned above,
Makes sense.


http://gerrit.cloudera.org:8080/#/c/13882/27/be/src/runtime/coordinator.cc
File be/src/runtime/coordinator.cc:

http://gerrit.cloudera.org:8080/#/c/13882/27/be/src/runtime/coordinator.cc@1096
PS27, Line 1096: reinterpret_cast<const char*>(&(bloom_filter_directory[0])),
               :             static_cast<unsigned long>(bloom_filter_directory.size())
Please see comments at BloomFilter::AddDirectorySidecar(). We can pass in the string directly instead.


http://gerrit.cloudera.org:8080/#/c/13882/27/be/src/runtime/runtime-filter-bank.cc
File be/src/runtime/runtime-filter-bank.cc:

http://gerrit.cloudera.org:8080/#/c/13882/27/be/src/runtime/runtime-filter-bank.cc@187
PS27, Line 187: DCHECK(
nit: DCHECK_EQ


http://gerrit.cloudera.org:8080/#/c/13882/25/be/src/service/impala-server.h
File be/src/service/impala-server.h:

http://gerrit.cloudera.org:8080/#/c/13882/25/be/src/service/impala-server.h@35
PS25, Line 35: #include "gen-cpp/ImpalaInternalService.h"
> Thanks for this suggestion.
Looks like it's more involved than expected. Please feel free to defer it to the follow-up patch which removes ImpalaInternalService altogether.


http://gerrit.cloudera.org:8080/#/c/13882/25/be/src/service/impala-server.h@70
PS25, Line 70: class TSessionState;
             : class TQueryOptions;
> Thanks for pointing this out!
Sounds good.


http://gerrit.cloudera.org:8080/#/c/13882/27/be/src/util/bloom-filter-test.cc
File be/src/util/bloom-filter-test.cc:

http://gerrit.cloudera.org:8080/#/c/13882/27/be/src/util/bloom-filter-test.cc@a68
PS27, Line 68: 
             : 
             : 
Why not keep most of this header comment which explains what this test does and talks about the output arguments (e.g. success, protobuf, directory)


http://gerrit.cloudera.org:8080/#/c/13882/27/be/src/util/bloom-filter-test.cc@103
PS27, Line 103: *success = directory_y.compare(directory_y2) == 0 ? true : false;
*success = directory_y.compare(directory_y2) == 0;


http://gerrit.cloudera.org:8080/#/c/13882/27/be/src/util/bloom-filter-test.cc@365
PS27, Line 365: std::shared_ptr<
No need for shared_ptr


http://gerrit.cloudera.org:8080/#/c/13882/27/be/src/util/bloom-filter-test.cc@367
PS27, Line 367: 
nit: unnecessary blank line


http://gerrit.cloudera.org:8080/#/c/13882/27/be/src/util/bloom-filter-test.cc@395
PS27, Line 395:  TBloomFilter
BloomFilterPB


http://gerrit.cloudera.org:8080/#/c/13882/27/be/src/util/bloom-filter.h
File be/src/util/bloom-filter.h:

http://gerrit.cloudera.org:8080/#/c/13882/27/be/src/util/bloom-filter.h@95
PS27, Line 95: const uint8_t* directory_in,
             :       size_t directory_in_size)
Can this interface take a string only instead for the directory ? Can we use directory.size() to get the size in Init() or are there cases in which passing a string won't work ?


http://gerrit.cloudera.org:8080/#/c/13882/27/be/src/util/bloom-filter.cc
File be/src/util/bloom-filter.cc:

http://gerrit.cloudera.org:8080/#/c/13882/27/be/src/util/bloom-filter.cc@79
PS27, Line 79: protobuf
rpc_params


http://gerrit.cloudera.org:8080/#/c/13882/27/be/src/util/bloom-filter.cc@80
PS27, Line 80: const char* directory,
             :     unsigned long directory_size
See comments below. This could be const string& directory. I believe faststring::append() also has an interface for taking string as input.


http://gerrit.cloudera.org:8080/#/c/13882/27/be/src/util/bloom-filter.cc@85
PS27, Line 85:   unique_ptr<kudu::faststring> sidecar_buf = make_unique<kudu::faststring>();
             :   sidecar_buf->append(directory, directory_size);
             :   unique_ptr<kudu::rpc::RpcSidecar> rpc_sidecar =
             :       kudu::rpc::RpcSidecar::FromFaststring(std::move(sidecar_buf))
Would like to point out an alternative would be to use a slice instead of a faststring. This avoids the need to copy into the faststring buffer but directory needs to be around until the RPC is done. On the other hand, keeping the code as-is leaves a cleaner interface so this may actually be better. If you choose to use the option of using a slice, please make sure to highlight this side-effect (i.e. directory needs to be alive until the RPC is done).

    kudu::Slice dir_slice(directory);
    kudu::rpc::RpcSidecar::FromSlice(dir_slice);


http://gerrit.cloudera.org:8080/#/c/13882/27/common/protobuf/data_stream_service.proto
File common/protobuf/data_stream_service.proto:

http://gerrit.cloudera.org:8080/#/c/13882/27/common/protobuf/data_stream_service.proto@91
PS27, Line 91:   // The sidecar index associated with the directory of a Bloom filter.
             :   // A directory is a list of buckets representing the Bloom Filter contents,
             :   // laid out contiguously in one string for efficiency of (de)serialisation.
             :   // See BloomFilter::Bucket and BloomFilter::directory_.
             :   optional int32 directory_sidecar_idx = 4;
Since BloomFilterPB is also used as the representation of the aggregated bloom filter in the coordinator, does it make more sense to store the directory_sidecar_idx in UpdateFilterParamsPB instead ?



-- 
To view, visit http://gerrit.cloudera.org:8080/13882
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I6b394796d250286510e157ae326882bfc01d387a
Gerrit-Change-Number: 13882
Gerrit-PatchSet: 27
Gerrit-Owner: Fang-Yu Rao <fa...@cloudera.com>
Gerrit-Reviewer: Fang-Yu Rao <fa...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Comment-Date: Fri, 01 Nov 2019 23:52:47 +0000
Gerrit-HasComments: Yes