You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@kudu.apache.org by "ZhangYao (Code Review)" <ge...@cloudera.org> on 2018/08/01 10:59:14 UTC

[kudu-CR] Implement BloomFilter Predicate in server side.

ZhangYao has uploaded this change for review. ( http://gerrit.cloudera.org:8080/11100


Change subject: Implement BloomFilter Predicate in server side.
......................................................................

Implement BloomFilter Predicate in server side.

Change-Id: I62c2de42667d0255d94e19db773240f7f9ee636c
---
M src/kudu/common/column_predicate-test.cc
M src/kudu/common/column_predicate.cc
M src/kudu/common/column_predicate.h
M src/kudu/common/common.proto
M src/kudu/common/key_util.cc
M src/kudu/common/scan_spec.cc
M src/kudu/common/wire_protocol-test.cc
M src/kudu/common/wire_protocol.cc
M src/kudu/tablet/cfile_set-test.cc
9 files changed, 988 insertions(+), 14 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/00/11100/1
-- 
To view, visit http://gerrit.cloudera.org:8080/11100
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I62c2de42667d0255d94e19db773240f7f9ee636c
Gerrit-Change-Number: 11100
Gerrit-PatchSet: 1
Gerrit-Owner: ZhangYao <tr...@gmail.com>

[kudu-CR] Implement BloomFilter Predicate in server side.

Posted by "Dan Burkert (Code Review)" <ge...@cloudera.org>.
Dan Burkert has posted comments on this change. ( http://gerrit.cloudera.org:8080/11100 )

Change subject: Implement BloomFilter Predicate in server side.
......................................................................


Patch Set 11: Code-Review+2

(1 comment)

Awesome!  Thanks for sticking with this, I'm really excited about the optimizations that this will enable.

http://gerrit.cloudera.org:8080/#/c/11100/9/src/kudu/common/common.proto
File src/kudu/common/common.proto:

http://gerrit.cloudera.org:8080/#/c/11100/9/src/kudu/common/common.proto@347
PS9, Line 347: 
> Because fs.proto already contains an enum which has UNKNOWN in the same nam
Sounds good, renaming enum variants isn't an issue (unless the tag value changes).



-- 
To view, visit http://gerrit.cloudera.org:8080/11100
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I62c2de42667d0255d94e19db773240f7f9ee636c
Gerrit-Change-Number: 11100
Gerrit-PatchSet: 11
Gerrit-Owner: ZhangYao <tr...@gmail.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: ZhangYao <tr...@gmail.com>
Gerrit-Comment-Date: Thu, 11 Oct 2018 22:50:31 +0000
Gerrit-HasComments: Yes

[kudu-CR] Implement BloomFilter Predicate in server side.

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has posted comments on this change. ( http://gerrit.cloudera.org:8080/11100 )

Change subject: Implement BloomFilter Predicate in server side.
......................................................................


Patch Set 6:

(68 comments)

Looks good, most of my comments are stylistic.

http://gerrit.cloudera.org:8080/#/c/11100/6/src/kudu/common/column_predicate-test.cc
File src/kudu/common/column_predicate-test.cc:

http://gerrit.cloudera.org:8080/#/c/11100/6/src/kudu/common/column_predicate-test.cc@45
PS6, Line 45: static const int kRandomSeed = 0xdeadbeef;
Can you use SeedRandom() instead? That way we'll use a different seed for every run of the test and get additional test coverage while retaining the ability to rerun a test with a particular seed using --test_random_seed.


http://gerrit.cloudera.org:8080/#/c/11100/6/src/kudu/common/column_predicate-test.cc@77
PS6, Line 77:                                        BloomFilterBuilder *bfb1,
            :                                        BloomFilterBuilder *bfb2) {
Nit: BloomFilterBuilder* bfb1


http://gerrit.cloudera.org:8080/#/c/11100/6/src/kudu/common/column_predicate-test.cc@79
PS6, Line 79:     srandom(random_seed);
Don't need if you use SeedRandom().


http://gerrit.cloudera.org:8080/#/c/11100/6/src/kudu/common/column_predicate-test.cc@83
PS6, Line 83:         uint64_t key = random();
Use util/Random instead.


http://gerrit.cloudera.org:8080/#/c/11100/6/src/kudu/common/column_predicate-test.cc@88
PS6, Line 88: const uint8_t *
const uint8_t*

(the theme here is that when using a pointer type, the asterisk should be adjacent to the type rather than separated by a space).


http://gerrit.cloudera.org:8080/#/c/11100/6/src/kudu/common/column_predicate-test.cc@92
PS6, Line 92:         values->push_back(key);
emplace_back in new code. Elsewhere too.


http://gerrit.cloudera.org:8080/#/c/11100/6/src/kudu/common/column_predicate-test.cc@791
PS6, Line 791:                                         std::vector<ColumnPredicate::BloomFilterInner>* bf,
Don't need std:: prefix. Below too.


http://gerrit.cloudera.org:8080/#/c/11100/6/src/kudu/common/column_predicate-test.cc@793
PS6, Line 793:     std::vector<ColumnPredicate::BloomFilterInner> hold = *bf;
Nit: maybe name "orig_bloom_filters" to better convey its purpose?


http://gerrit.cloudera.org:8080/#/c/11100/6/src/kudu/common/column_predicate-test.cc@828
PS6, Line 828:     std::vector<ColumnPredicate::BloomFilterInner> bf_bak = *bf;
bf_copy maybe?


http://gerrit.cloudera.org:8080/#/c/11100/6/src/kudu/common/column_predicate-test.cc@1382
PS6, Line 1382:   int n_keys = 5; // 0 1both hit bf1 and bf2, 2 3 4 only hit bf2.
0 1 both hit


http://gerrit.cloudera.org:8080/#/c/11100/6/src/kudu/common/column_predicate-test.cc@1414
PS6, Line 1414:   // 0 1both hit bf1 and bf2, 2 3 4 only hit bf2.
0 1 both hit


http://gerrit.cloudera.org:8080/#/c/11100/6/src/kudu/common/column_predicate-test.cc@1429
PS6, Line 1429:   // Test for BINARY type
Each of these "Test for ... type" is independent of the other tests, right? Wouldn't it be cleaner to declare them each as a separate TEST_F?


http://gerrit.cloudera.org:8080/#/c/11100/6/src/kudu/common/column_predicate.h
File src/kudu/common/column_predicate.h:

http://gerrit.cloudera.org:8080/#/c/11100/6/src/kudu/common/column_predicate.h@275
PS6, Line 275: 
             :     const Slice& bloom_data() const {
             :       return bloom_data_;
             :     }
             : 
             :     size_t nhash() const {
             :       return nhash_;
             :     }
             : 
             :     HashAlgorithmInBloomFilter hash_algorithm() const {
             :       return hash_algorithm_;
             :     }
             : 
             :     void set_nhash(size_t nhash) {
             :       nhash_ = nhash;
             :     }
             : 
             :     void set_bloom_data(Slice bloom_data) {
             :       bloom_data_ = bloom_data;
             :     }
             : 
             :     void set_hash_algorithm(HashAlgorithmInBloomFilter hash_algorithm) {
             :       hash_algorithm_ = hash_algorithm;
             :     }
I don't see the point of these getters and setters since the members are all public to begin with. Why not just access the members directly?


http://gerrit.cloudera.org:8080/#/c/11100/6/src/kudu/common/column_predicate.h@301
PS6, Line 301:       return (bloom_data_ == other.bloom_data_ && nhash_ == other.nhash_
             :               && hash_algorithm_ == other.hash_algorithm_);
Nit: reformat like:

  return (bloom_data_ == other.bloom_data_ &&
          nhash_ == other.nhash_ &&
          hash_algorithm_ == other.hash_algorithm_);

That way it's less noisy to add new members.


http://gerrit.cloudera.org:8080/#/c/11100/6/src/kudu/common/column_predicate.h@304
PS6, Line 304:     // The slice of bloom filter.
Nit: separate each member from the member before it (or from operator==) with an empty line.


http://gerrit.cloudera.org:8080/#/c/11100/6/src/kudu/common/column_predicate.h@308
PS6, Line 308: , currently support
             :     // murmur2 and cityhash.
I don't think this comment is necessary; the list of supported algorithms is embodied by the set of enum values.


http://gerrit.cloudera.org:8080/#/c/11100/6/src/kudu/common/column_predicate.h@354
PS6, Line 354: const ColumnPredicate &
Nit: const ColumnPredicate&


http://gerrit.cloudera.org:8080/#/c/11100/6/src/kudu/common/column_predicate.h@425
PS6, Line 425:   // The vector of bloom filter in this predicate.
Nit: "The list of bloom filters in this predicate."


http://gerrit.cloudera.org:8080/#/c/11100/6/src/kudu/common/column_predicate.cc
File src/kudu/common/column_predicate.cc:

http://gerrit.cloudera.org:8080/#/c/11100/6/src/kudu/common/column_predicate.cc@68
PS6, Line 68:         : predicate_type_(predicate_type),
            :           column_(move(column)),
            :           lower_(lower),
            :           upper_(upper) {
Nit: use the same indentation as in the other two constructors.


http://gerrit.cloudera.org:8080/#/c/11100/6/src/kudu/common/column_predicate.cc@113
PS6, Line 113:   CHECK(!bfs->empty());
Should check that bfs != nullptr.


http://gerrit.cloudera.org:8080/#/c/11100/6/src/kudu/common/column_predicate.cc@189
PS6, Line 189:   bloom_filters_.clear();
Hmm, why don't we clear values_ in this function? Why is it necessary to clear bloom_filters_ but not values_?


http://gerrit.cloudera.org:8080/#/c/11100/6/src/kudu/common/column_predicate.cc@226
PS6, Line 226:           predicate_type_ = PredicateType::None;
             :           lower_ = nullptr;
             :           upper_ = nullptr;
Could use SetToNone here.


http://gerrit.cloudera.org:8080/#/c/11100/6/src/kudu/common/column_predicate.cc@254
PS6, Line 254:     case PredicateType::InBloomFilter: {
If we allowed an empty list of bloom filters, could that simplify into something else?


http://gerrit.cloudera.org:8080/#/c/11100/6/src/kudu/common/column_predicate.cc@264
PS6, Line 264:           if (CheckValueInBloomFilter(lower_)) {
             :             predicate_type_ = PredicateType::Equality;
             :             upper_ = nullptr;
             :             bloom_filters_.clear();
             :           } else {
             :             SetToNone();
             :           }
Can you explain the rationale behind this simplification? It's not obvious to me.


http://gerrit.cloudera.org:8080/#/c/11100/6/src/kudu/common/column_predicate.cc@275
PS6, Line 275:           upper_ = nullptr;
Isn't upper_ guaranteed to be nullptr in this branch?


http://gerrit.cloudera.org:8080/#/c/11100/6/src/kudu/common/column_predicate.cc@277
PS6, Line 277:           if (CheckValueInBloomFilter(lower_)) {
             :             predicate_type_ = PredicateType::Equality;
             :             upper_ = nullptr;
             :             bloom_filters_.clear();
             :           } else {
             :             SetToNone();
             :           }
Also not understanding this simplification; could you explain in a comment?


http://gerrit.cloudera.org:8080/#/c/11100/6/src/kudu/common/column_predicate.cc@287
PS6, Line 287:           SetToNone();
Why is this case different from the L274-L275 case? That is, why do we use SetToNone() here but not there?


http://gerrit.cloudera.org:8080/#/c/11100/6/src/kudu/common/column_predicate.cc@443
PS6, Line 443:       if (!other.CheckValueInBloomFilter(lower_)) {
             :         SetToNone();
             :       }
Can you explain this?


http://gerrit.cloudera.org:8080/#/c/11100/6/src/kudu/common/column_predicate.cc@616
PS6, Line 616:       // Merge the optional lower and upper bound.
             :       if (other.lower_ != nullptr &&
             :           (lower_ == nullptr || column_.type_info()->Compare(lower_, other.lower_) < 0)) {
             :         lower_ = other.lower_;
             :       }
             :       if (other.upper_ != nullptr &&
             :           (upper_ == nullptr || column_.type_info()->Compare(upper_, other.upper_) > 0)) {
             :         upper_ = other.upper_;
             :       }
This pattern seems to repeat itself quite a bit in this class. Could you decompose it into one or two helper functions in an anonymous namespace?


http://gerrit.cloudera.org:8080/#/c/11100/6/src/kudu/common/column_predicate.cc@630
PS6, Line 630:         // value falls in list so change to Equality predicate
Nit: "bloom filter", not "list"


http://gerrit.cloudera.org:8080/#/c/11100/6/src/kudu/common/column_predicate.cc@636
PS6, Line 636:  in list
in bloom filter


http://gerrit.cloudera.org:8080/#/c/11100/6/src/kudu/common/column_predicate.cc@647
PS6, Line 647:       std::vector<const void *> new_values;
Nit: vector<const void*> new_values;

(no need for std:: prefix either).


http://gerrit.cloudera.org:8080/#/c/11100/6/src/kudu/common/column_predicate.cc@650
PS6, Line 650:           new_values.push_back(value);
Nit: emplace_back in new code.


http://gerrit.cloudera.org:8080/#/c/11100/6/src/kudu/common/column_predicate.cc@870
PS6, Line 870:       return strings::Substitute("`$0` IS BloomFilter", column_.name());
IS IN BloomFilter, perhaps?


http://gerrit.cloudera.org:8080/#/c/11100/6/src/kudu/common/common.proto
File src/kudu/common/common.proto:

http://gerrit.cloudera.org:8080/#/c/11100/6/src/kudu/common/common.proto@357
PS6, Line 357:   // The bloomfilter for quick filter the row.
What does "quick filter the row" mean? Could you elaborate?


http://gerrit.cloudera.org:8080/#/c/11100/6/src/kudu/common/common.proto@362
PS6, Line 362:     // And We currently use CityHash backend.
Doesn't this depend on the value of hash_algorithm?


http://gerrit.cloudera.org:8080/#/c/11100/6/src/kudu/common/common.proto@365
PS6, Line 365:     optional HashAlgorithmInBloomFilter hash_algorithm = 3;
Does this need a default value specifier for CITY_HASH?


http://gerrit.cloudera.org:8080/#/c/11100/6/src/kudu/common/common.proto@377
PS6, Line 377: We will eventually add a special
             :     // predicate type for null-ness.
Nit: not your fault, but could you remove this sentence? We've since added IsNotNull and IsNull predicates.


http://gerrit.cloudera.org:8080/#/c/11100/6/src/kudu/common/common.proto@404
PS6, Line 404:     repeated BloomFilter bloom_filters = 1;
Nit: add a comment explaining what this is.


http://gerrit.cloudera.org:8080/#/c/11100/6/src/kudu/common/common.proto@406
PS6, Line 406:     // Lower and Upper is optional for InBloomFilter.
How do lower and upper work with bloom filters? Can you doc in a comment?


http://gerrit.cloudera.org:8080/#/c/11100/6/src/kudu/common/common.proto@408
PS6, Line 408:     optional bytes lower = 3 [(kudu.REDACT) = true];
Use field id 2 here and 3 for 'upper'; no reason to skip 2 since there wasn't a field 2 to begin with.


http://gerrit.cloudera.org:8080/#/c/11100/6/src/kudu/common/key_util.cc
File src/kudu/common/key_util.cc:

http://gerrit.cloudera.org:8080/#/c/11100/6/src/kudu/common/key_util.cc@234
PS6, Line 234:       case PredicateType::InBloomFilter:  // Upper in InBloomFilter process as upper in Range.
Nit: "upper in InBloomFilter processed as upper in Range"

Below too (with 'lower' instead of 'upper')


http://gerrit.cloudera.org:8080/#/c/11100/6/src/kudu/common/wire_protocol-test.cc
File src/kudu/common/wire_protocol-test.cc:

http://gerrit.cloudera.org:8080/#/c/11100/6/src/kudu/common/wire_protocol-test.cc@44
PS6, Line 44: #include "kudu/util/bloom_filter.h"
Not sorted correctly.


http://gerrit.cloudera.org:8080/#/c/11100/6/src/kudu/common/wire_protocol-test.cc@508
PS6, Line 508:   Arena arena(1024);
WireProtocolTest already has an arena, could reuse it?


http://gerrit.cloudera.org:8080/#/c/11100/6/src/kudu/common/wire_protocol-test.cc@536
PS6, Line 536:     ASSERT_NO_FATAL_FAILURE(ColumnPredicateToPB(ibf, &pb));
NO_FATALS in new code. Below too.


http://gerrit.cloudera.org:8080/#/c/11100/6/src/kudu/common/wire_protocol-test.cc@555
PS6, Line 555: TEST_F(WireProtocolTest, TestColumnPredicateBloomFilterWithBound) {
The setup code is identical in both tests; can you decompose it into a new test fixture? Something like:

  class BFWireProtocolTest : public WireProtocolTest {
    <members or functions for these two tests>
  }

  TEST_F(BFWireProtocolTest, Blah) {
    ...
  }


http://gerrit.cloudera.org:8080/#/c/11100/6/src/kudu/common/wire_protocol.cc
File src/kudu/common/wire_protocol.cc:

http://gerrit.cloudera.org:8080/#/c/11100/6/src/kudu/common/wire_protocol.cc@416
PS6, Line 416:   const void* src;
             :   size_t size;
             :   src = bf_src.bloom_data().data();
             :   size = bf_src.bloom_data().size();
Combine these four lines into two (i.e. combine declaration and assignment).


http://gerrit.cloudera.org:8080/#/c/11100/6/src/kudu/common/wire_protocol.cc@455
PS6, Line 455: // Extract BloomFilterInner for bloom data for ColumnBloomFilterPredicate.
from


http://gerrit.cloudera.org:8080/#/c/11100/6/src/kudu/common/wire_protocol.cc@461
PS6, Line 461:   uint8_t* data_copy = static_cast<uint8_t*>(arena->AllocateBytes(bf_src.bloom_data().size()));
bf_src.bloom_data().size() is repeated three times; can you store in a local variable with a shorter name?


http://gerrit.cloudera.org:8080/#/c/11100/6/src/kudu/common/wire_protocol.cc@463
PS6, Line 463:   const Slice* slice = arena->NewObject<Slice>(data_copy, bf_src.bloom_data().size());
Why do we need to allocate the slice itself from the arena? Seems like we turn around and immediately make a copy of it on L464, so couldn't we just declare a local slice for 'data_copy'?


http://gerrit.cloudera.org:8080/#/c/11100/6/src/kudu/common/wire_protocol.cc@511
PS6, Line 511:  
Nit: unnecessary space here.


http://gerrit.cloudera.org:8080/#/c/11100/6/src/kudu/common/wire_protocol.cc@514
PS6, Line 514: ColumnPredicatePB::BloomFilter *bloom_filter
Nit: ColumnPredicatePB::BloomFilter* bloom_filter


http://gerrit.cloudera.org:8080/#/c/11100/6/src/kudu/common/wire_protocol.cc@595
PS6, Line 595:         *predicate = ColumnPredicate::IsNull(col);
             :         break;
Nit: while you're here, could you fix this indentation?


http://gerrit.cloudera.org:8080/#/c/11100/6/src/kudu/common/wire_protocol.cc@601
PS6, Line 601: < 1
Nit: == 0 is more clear


http://gerrit.cloudera.org:8080/#/c/11100/6/src/kudu/common/wire_protocol.cc@603
PS6, Line 603: on bloom filter contained
Nit: no bloom filters


http://gerrit.cloudera.org:8080/#/c/11100/6/src/kudu/common/wire_protocol.cc@607
PS6, Line 607:  bloom filter
Nit: "in bloom filter"


http://gerrit.cloudera.org:8080/#/c/11100/6/src/kudu/common/wire_protocol.cc@608
PS6, Line 608: no nhash or bloom filter or hash algorithm
Nit: "missing bloom filter details"


http://gerrit.cloudera.org:8080/#/c/11100/6/src/kudu/common/wire_protocol.cc@612
PS6, Line 612:         bloom_filters.push_back(bloom_filter);
emplace_back


http://gerrit.cloudera.org:8080/#/c/11100/6/src/kudu/tablet/cfile_set-test.cc
File src/kudu/tablet/cfile_set-test.cc:

http://gerrit.cloudera.org:8080/#/c/11100/6/src/kudu/tablet/cfile_set-test.cc@21
PS6, Line 21: 
Nit: no empty line here.


http://gerrit.cloudera.org:8080/#/c/11100/6/src/kudu/tablet/cfile_set-test.cc@109
PS6, Line 109: 2nth
Nit: "(2n)th" so that it doesn't seem like you're trying to say "second". Below too.


http://gerrit.cloudera.org:8080/#/c/11100/6/src/kudu/tablet/cfile_set-test.cc@110
PS6, Line 110: from
Nit: form. Below too.


http://gerrit.cloudera.org:8080/#/c/11100/6/src/kudu/tablet/cfile_set-test.cc@113
PS6, Line 113: BloomFilterBuilder* bf1_contain,
Nit: put this one on its own line for better symmetry with the others. Below too.


http://gerrit.cloudera.org:8080/#/c/11100/6/src/kudu/tablet/cfile_set-test.cc@213
PS6, Line 213: const shared_ptr<CFileSet> &
Nit: const shared_ptr<CFileSet>&


http://gerrit.cloudera.org:8080/#/c/11100/6/src/kudu/tablet/cfile_set-test.cc@532
PS6, Line 532:   // BloomFilter of column 0 contain and column 1 contain
Nit: terminate with a period.


http://gerrit.cloudera.org:8080/#/c/11100/6/src/kudu/util/bloom_filter.h
File src/kudu/util/bloom_filter.h:

http://gerrit.cloudera.org:8080/#/c/11100/6/src/kudu/util/bloom_filter.h@57
PS6, Line 57:   explicit BloomKeyProbe(const Slice &key) : key_(key) {
Can you rewrite this constructor to chain to the new one?


http://gerrit.cloudera.org:8080/#/c/11100/6/src/kudu/util/bloom_filter.h@68
PS6, Line 68:   explicit BloomKeyProbe(const Slice &key, HashAlgorithmInBloomFilter hash_algorithm) :
The explicit keyword is only for constructors with one argument.

https://google.github.io/styleguide/cppguide.html#Implicit_Conversions


http://gerrit.cloudera.org:8080/#/c/11100/6/src/kudu/util/bloom_filter.h@69
PS6, Line 69:                          key_(key) {
Reformat like this:

  explicit BloomKeyProbe(arg1 a, arg2 b)
      : key_(key) {
    ...
  }


http://gerrit.cloudera.org:8080/#/c/11100/6/src/kudu/util/bloom_filter.h@76
PS6, Line 76:                 0);
Can you reformat:

  /*seed=*/ 0);



-- 
To view, visit http://gerrit.cloudera.org:8080/11100
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I62c2de42667d0255d94e19db773240f7f9ee636c
Gerrit-Change-Number: 11100
Gerrit-PatchSet: 6
Gerrit-Owner: ZhangYao <tr...@gmail.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: ZhangYao <tr...@gmail.com>
Gerrit-Comment-Date: Wed, 19 Sep 2018 17:39:09 +0000
Gerrit-HasComments: Yes

[kudu-CR] Implement BloomFilter Predicate in server side.

Posted by "ZhangYao (Code Review)" <ge...@cloudera.org>.
Hello Tidy Bot, Dan Burkert, Kudu Jenkins, Andrew Wong, Adar Dembo, Todd Lipcon, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/11100

to look at the new patch set (#8).

Change subject: Implement BloomFilter Predicate in server side.
......................................................................

Implement BloomFilter Predicate in server side.

Change-Id: I62c2de42667d0255d94e19db773240f7f9ee636c
---
M src/kudu/common/column_predicate-test.cc
M src/kudu/common/column_predicate.cc
M src/kudu/common/column_predicate.h
M src/kudu/common/common.proto
M src/kudu/common/key_util.cc
M src/kudu/common/scan_spec.cc
M src/kudu/common/wire_protocol-test.cc
M src/kudu/common/wire_protocol.cc
M src/kudu/tablet/cfile_set-test.cc
M src/kudu/util/bloom_filter.h
10 files changed, 1,116 insertions(+), 24 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/00/11100/8
-- 
To view, visit http://gerrit.cloudera.org:8080/11100
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I62c2de42667d0255d94e19db773240f7f9ee636c
Gerrit-Change-Number: 11100
Gerrit-PatchSet: 8
Gerrit-Owner: ZhangYao <tr...@gmail.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: ZhangYao <tr...@gmail.com>

[kudu-CR] Implement BloomFilter Predicate in server side.

Posted by "ZhangYao (Code Review)" <ge...@cloudera.org>.
Hello Tidy Bot, Dan Burkert, Kudu Jenkins, Andrew Wong, Todd Lipcon, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/11100

to look at the new patch set (#6).

Change subject: Implement BloomFilter Predicate in server side.
......................................................................

Implement BloomFilter Predicate in server side.

Change-Id: I62c2de42667d0255d94e19db773240f7f9ee636c
---
M src/kudu/common/column_predicate-test.cc
M src/kudu/common/column_predicate.cc
M src/kudu/common/column_predicate.h
M src/kudu/common/common.proto
M src/kudu/common/key_util.cc
M src/kudu/common/scan_spec.cc
M src/kudu/common/wire_protocol-test.cc
M src/kudu/common/wire_protocol.cc
M src/kudu/tablet/cfile_set-test.cc
M src/kudu/util/bloom_filter.h
10 files changed, 1,115 insertions(+), 12 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/00/11100/6
-- 
To view, visit http://gerrit.cloudera.org:8080/11100
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I62c2de42667d0255d94e19db773240f7f9ee636c
Gerrit-Change-Number: 11100
Gerrit-PatchSet: 6
Gerrit-Owner: ZhangYao <tr...@gmail.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: ZhangYao <tr...@gmail.com>

[kudu-CR] Implement BloomFilter Predicate in server side.

Posted by "ZhangYao (Code Review)" <ge...@cloudera.org>.
Hello Tidy Bot, Dan Burkert, Kudu Jenkins, Andrew Wong, Adar Dembo, Todd Lipcon, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/11100

to look at the new patch set (#7).

Change subject: Implement BloomFilter Predicate in server side.
......................................................................

Implement BloomFilter Predicate in server side.

Change-Id: I62c2de42667d0255d94e19db773240f7f9ee636c
---
M src/kudu/common/column_predicate-test.cc
M src/kudu/common/column_predicate.cc
M src/kudu/common/column_predicate.h
M src/kudu/common/common.proto
M src/kudu/common/key_util.cc
M src/kudu/common/scan_spec.cc
M src/kudu/common/wire_protocol-test.cc
M src/kudu/common/wire_protocol.cc
M src/kudu/tablet/cfile_set-test.cc
M src/kudu/util/bloom_filter.h
10 files changed, 1,116 insertions(+), 24 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/00/11100/7
-- 
To view, visit http://gerrit.cloudera.org:8080/11100
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I62c2de42667d0255d94e19db773240f7f9ee636c
Gerrit-Change-Number: 11100
Gerrit-PatchSet: 7
Gerrit-Owner: ZhangYao <tr...@gmail.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: ZhangYao <tr...@gmail.com>

[kudu-CR] Implement BloomFilter Predicate in server side.

Posted by "Dan Burkert (Code Review)" <ge...@cloudera.org>.
Dan Burkert has posted comments on this change. ( http://gerrit.cloudera.org:8080/11100 )

Change subject: Implement BloomFilter Predicate in server side.
......................................................................


Patch Set 5:

(9 comments)

http://gerrit.cloudera.org:8080/#/c/11100/5/src/kudu/common/column_predicate-test.cc
File src/kudu/common/column_predicate-test.cc:

http://gerrit.cloudera.org:8080/#/c/11100/5/src/kudu/common/column_predicate-test.cc@20
PS5, Line 20: #include <stdlib.h>
cstdlib


http://gerrit.cloudera.org:8080/#/c/11100/5/src/kudu/common/column_predicate.h
File src/kudu/common/column_predicate.h:

http://gerrit.cloudera.org:8080/#/c/11100/5/src/kudu/common/column_predicate.h@20
PS5, Line 20: #include <stddef.h>
prefer <cstddef> and <cstdint>, and group with the other C++ headers.


http://gerrit.cloudera.org:8080/#/c/11100/5/src/kudu/common/column_predicate.h@190
PS5, Line 190:  *d
here and elsewhere, put the asterisk with the type (not the variable name).


http://gerrit.cloudera.org:8080/#/c/11100/5/src/kudu/common/column_predicate.h@262
PS5, Line 262:   bool EvaluateCellForBloomFilter(DataType type, const void* cell) const;
Can this be private?


http://gerrit.cloudera.org:8080/#/c/11100/5/src/kudu/common/column_predicate.h@298
PS5, Line 298:   struct BloomFilterInner {
Please add docs for this class, and especially explain what 'nhash' is.


http://gerrit.cloudera.org:8080/#/c/11100/5/src/kudu/common/column_predicate.cc
File src/kudu/common/column_predicate.cc:

http://gerrit.cloudera.org:8080/#/c/11100/5/src/kudu/common/column_predicate.cc@905
PS5, Line 905:       return (lower_ == other.lower_ ||
You could reduce the duplication with Range by wrapping this in a lambda.


http://gerrit.cloudera.org:8080/#/c/11100/5/src/kudu/common/column_predicate.cc@930
PS5, Line 930:   return EvaluateCell(column_.type_info()->physical_type(), value);
Given this definition, why is this method necessary instead of directly calling EvaluateCell?


http://gerrit.cloudera.org:8080/#/c/11100/5/src/kudu/common/column_predicate.cc@943
PS5, Line 943:     case PredicateType::InBloomFilter: rank = 6; break;
This should probably be ahead of IsNotNull.


http://gerrit.cloudera.org:8080/#/c/11100/5/src/kudu/common/common.proto
File src/kudu/common/common.proto:

http://gerrit.cloudera.org:8080/#/c/11100/5/src/kudu/common/common.proto@352
PS5, Line 352:   message BloomFilter {
Could you extract out the HashAlgorithm message above, and add a field here for it as well?  I think that's the conclusion we came to in https://gerrit.cloudera.org/c/11333/



-- 
To view, visit http://gerrit.cloudera.org:8080/11100
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I62c2de42667d0255d94e19db773240f7f9ee636c
Gerrit-Change-Number: 11100
Gerrit-PatchSet: 5
Gerrit-Owner: ZhangYao <tr...@gmail.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: ZhangYao <tr...@gmail.com>
Gerrit-Comment-Date: Tue, 11 Sep 2018 23:27:39 +0000
Gerrit-HasComments: Yes

[kudu-CR] Implement BloomFilter Predicate in server side.

Posted by "ZhangYao (Code Review)" <ge...@cloudera.org>.
Hello Tidy Bot, Dan Burkert, Kudu Jenkins, Andrew Wong, Adar Dembo, Todd Lipcon, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/11100

to look at the new patch set (#11).

Change subject: Implement BloomFilter Predicate in server side.
......................................................................

Implement BloomFilter Predicate in server side.

Change-Id: I62c2de42667d0255d94e19db773240f7f9ee636c
---
M src/kudu/common/column_predicate-test.cc
M src/kudu/common/column_predicate.cc
M src/kudu/common/column_predicate.h
M src/kudu/common/common.proto
M src/kudu/common/key_util.cc
M src/kudu/common/scan_spec.cc
M src/kudu/common/wire_protocol-test.cc
M src/kudu/common/wire_protocol.cc
M src/kudu/tablet/cfile_set-test.cc
M src/kudu/util/bloom_filter.h
10 files changed, 1,069 insertions(+), 40 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/00/11100/11
-- 
To view, visit http://gerrit.cloudera.org:8080/11100
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I62c2de42667d0255d94e19db773240f7f9ee636c
Gerrit-Change-Number: 11100
Gerrit-PatchSet: 11
Gerrit-Owner: ZhangYao <tr...@gmail.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: ZhangYao <tr...@gmail.com>

[kudu-CR] Implement BloomFilter Predicate in server side.

Posted by "Dan Burkert (Code Review)" <ge...@cloudera.org>.
Dan Burkert has posted comments on this change. ( http://gerrit.cloudera.org:8080/11100 )

Change subject: Implement BloomFilter Predicate in server side.
......................................................................


Patch Set 9:

(15 comments)

http://gerrit.cloudera.org:8080/#/c/11100/9/src/kudu/common/column_predicate-test.cc
File src/kudu/common/column_predicate-test.cc:

http://gerrit.cloudera.org:8080/#/c/11100/9/src/kudu/common/column_predicate-test.cc@881
PS9, Line 881: InList
Should this be 'Equality'?


http://gerrit.cloudera.org:8080/#/c/11100/9/src/kudu/common/column_predicate.h
File src/kudu/common/column_predicate.h:

http://gerrit.cloudera.org:8080/#/c/11100/9/src/kudu/common/column_predicate.h@230
PS9, Line 230:   bool EvaluateCellForBloomFilter(DataType type, const void* cell) const;
Is this used anywhere?  If it's test only please indicate that in the doc.


http://gerrit.cloudera.org:8080/#/c/11100/9/src/kudu/common/column_predicate.h@376
PS9, Line 376:       typedef typename DataTypeTraits<PhysicalType>::cpp_type cpp_type;
             :       size_t size = sizeof(cpp_type);
             :       const void* data = cell;
             :       if (PhysicalType == BINARY) {
             :         const Slice *slice = reinterpret_cast<const Slice *>(cell);
             :         size = slice->size();
             :         data = slice->data();
             :       }
             :       Slice cell_slice(reinterpret_cast<const uint8_t *>(data), size);
             :       BloomKeyProbe probe(cell_slice, bf.hash_algorithm());
This entire portion could be hoisted outside the for loop, which saves recomputing the probe in the case of multiple filters.


http://gerrit.cloudera.org:8080/#/c/11100/9/src/kudu/common/column_predicate.h@386
PS9, Line 386:       if (!BloomFilter(bf.bloom_data(), bf.nhash()).MayContainKey(probe)) {
Once this for loop is simplified to just be this call, it should further simplify to just use std::all_of


http://gerrit.cloudera.org:8080/#/c/11100/9/src/kudu/common/column_predicate.cc
File src/kudu/common/column_predicate.cc:

http://gerrit.cloudera.org:8080/#/c/11100/9/src/kudu/common/column_predicate.cc@115
PS9, Line 115:   return ColumnPredicate(PredicateType::InBloomFilter, move(column), bfs, lower, upper);
I think this should be calling Simplify(), or if there's a reason not to please leave a comment.


http://gerrit.cloudera.org:8080/#/c/11100/9/src/kudu/common/column_predicate.cc@385
PS9, Line 385:       bloom_filters_.insert(bloom_filters_.end(), other.bloom_filters().begin(),
Pretty sure this can be simplified to a straight copy, eg:

bloom_filters_ = other.bloom_filters_;


http://gerrit.cloudera.org:8080/#/c/11100/9/src/kudu/common/column_predicate.cc@387
PS9, Line 387:       if (other.lower_ != nullptr &&
This range portion of this code doesn't need to be duplicated if you move InBloomFilter above Range, and let it fall-through.  Make sure to annotate with FALLTHROUGH_INTENDED, though.


http://gerrit.cloudera.org:8080/#/c/11100/9/src/kudu/common/column_predicate.cc@588
PS9, Line 588:       if (new_values.empty()) {
Simplify() should already handle this, so I think you can reduce this to just the else clause.


http://gerrit.cloudera.org:8080/#/c/11100/9/src/kudu/common/column_predicate.cc@644
PS9, Line 644:           new_values.emplace_back(value);
can this use copy_if like above?  I think it's more elegant that way.


http://gerrit.cloudera.org:8080/#/c/11100/9/src/kudu/common/column_predicate.cc@648
PS9, Line 648:         SetToNone();
likewise, I think this is already handled by the Simplify() call.


http://gerrit.cloudera.org:8080/#/c/11100/9/src/kudu/common/column_predicate.cc@661
PS9, Line 661:       if (other.lower_ != nullptr &&
likewise I think this can be simplified with the fallthrough trick.


http://gerrit.cloudera.org:8080/#/c/11100/9/src/kudu/common/column_predicate.cc@895
PS9, Line 895:     case PredicateType::InBloomFilter: {
This could be simplified by moving it above Range and allowing it to fall through to check the range portion (after checking the annotations).  Annotate with FALLTHROUGH_INTENDED if you do that.


http://gerrit.cloudera.org:8080/#/c/11100/9/src/kudu/common/column_predicate.cc@899
PS9, Line 899:       auto bound_equal = [&] (const void* eleml, const void* elemr) {
This is nice, consider doing the same simplification for Range.


http://gerrit.cloudera.org:8080/#/c/11100/9/src/kudu/common/common.proto
File src/kudu/common/common.proto:

http://gerrit.cloudera.org:8080/#/c/11100/9/src/kudu/common/common.proto@347
PS9, Line 347: enum HashAlgorithmInBloomFilter {
I suggested this back in PS5 but I think there was some confusion.  I think you can call this just 'HashAlgorithm', and use it for both bloom filters as well as in HashBucketSchemaPB (where the existing one will be removed).  To do that compatibly the existing HashAlgorithm values have to remain, so it should be:

HashAlgorithm {
  UNKNOWN = 0;
  MURMUR_HASH_2 = 1;
  CITY_HASH = 2;
}


http://gerrit.cloudera.org:8080/#/c/11100/9/src/kudu/common/wire_protocol.cc
File src/kudu/common/wire_protocol.cc:

http://gerrit.cloudera.org:8080/#/c/11100/9/src/kudu/common/wire_protocol.cc@604
PS9, Line 604:         if (!bf.has_nhash() || !bf.has_bloom_data() || !bf.has_hash_algorithm()) {
This should also be checking for BloomFilter::UNKNOWN when that variant is added.



-- 
To view, visit http://gerrit.cloudera.org:8080/11100
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I62c2de42667d0255d94e19db773240f7f9ee636c
Gerrit-Change-Number: 11100
Gerrit-PatchSet: 9
Gerrit-Owner: ZhangYao <tr...@gmail.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: ZhangYao <tr...@gmail.com>
Gerrit-Comment-Date: Fri, 05 Oct 2018 05:08:37 +0000
Gerrit-HasComments: Yes

[kudu-CR] Implement BloomFilter Predicate in server side.

Posted by "Dan Burkert (Code Review)" <ge...@cloudera.org>.
Dan Burkert has posted comments on this change. ( http://gerrit.cloudera.org:8080/11100 )

Change subject: Implement BloomFilter Predicate in server side.
......................................................................


Patch Set 4:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/11100/4/src/kudu/common/column_predicate.h
File src/kudu/common/column_predicate.h:

http://gerrit.cloudera.org:8080/#/c/11100/4/src/kudu/common/column_predicate.h@67
PS4, Line 67:   InBloomFilter,
nit: I think 'BloomFilter' is shorter and just as clear.  The 'In' part of InList comes from the SQL IN syntax, which isn't at play here.


http://gerrit.cloudera.org:8080/#/c/11100/4/src/kudu/common/column_predicate.h@286
PS4, Line 286:   // Returns the nhash of bloom filter if this is a bloom filter predicate.
Add a space above.  Is the comment correct, what's 'nhash'?


http://gerrit.cloudera.org:8080/#/c/11100/4/src/kudu/common/column_predicate.cc
File src/kudu/common/column_predicate.cc:

http://gerrit.cloudera.org:8080/#/c/11100/4/src/kudu/common/column_predicate.cc@42
PS4, Line 42: ColumnPredicate::ColumnPredicate(PredicateType predicate_type,
(meta-comment) If I understand the changes to this file correctly, Range predicates can now have an attached set of bloom filters?  If so I think this may be the wrong way to go about it, instead I think it would be cleaner to say that BloomFilter predicates can have an optional upper and lower bound.  In other words, if the bfs vector is non-empty then it must be a BloomFilter predicate type.


http://gerrit.cloudera.org:8080/#/c/11100/4/src/kudu/common/column_predicate.cc@460
PS4, Line 460:         for (auto value : values_) {
Simplify with std::copy_if or something equivalent.


http://gerrit.cloudera.org:8080/#/c/11100/4/src/kudu/common/column_predicate.cc@563
PS4, Line 563:       for (auto value : values_) {
likewise



-- 
To view, visit http://gerrit.cloudera.org:8080/11100
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I62c2de42667d0255d94e19db773240f7f9ee636c
Gerrit-Change-Number: 11100
Gerrit-PatchSet: 4
Gerrit-Owner: ZhangYao <tr...@gmail.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: ZhangYao <tr...@gmail.com>
Gerrit-Comment-Date: Wed, 15 Aug 2018 00:32:30 +0000
Gerrit-HasComments: Yes

[kudu-CR] Implement BloomFilter Predicate in server side.

Posted by "ZhangYao (Code Review)" <ge...@cloudera.org>.
ZhangYao has posted comments on this change. ( http://gerrit.cloudera.org:8080/11100 )

Change subject: Implement BloomFilter Predicate in server side.
......................................................................


Patch Set 8:

(66 comments)

http://gerrit.cloudera.org:8080/#/c/11100/6/src/kudu/common/column_predicate-test.cc
File src/kudu/common/column_predicate-test.cc:

http://gerrit.cloudera.org:8080/#/c/11100/6/src/kudu/common/column_predicate-test.cc@45
PS6, Line 45:  public:
> Can you use SeedRandom() instead? That way we'll use a different seed for e
Done


http://gerrit.cloudera.org:8080/#/c/11100/6/src/kudu/common/column_predicate-test.cc@77
PS6, Line 77:     Random rand(SeedRandom());
            :     uint64_t current = 0;
> Nit: BloomFilterBuilder* bfb1
Done


http://gerrit.cloudera.org:8080/#/c/11100/6/src/kudu/common/column_predicate-test.cc@79
PS6, Line 79:     for (int i = 0; i < 2; ++i) {
> Don't need if you use SeedRandom().
Done


http://gerrit.cloudera.org:8080/#/c/11100/6/src/kudu/common/column_predicate-test.cc@83
PS6, Line 83:           continue;
> Use util/Random instead.
Done


http://gerrit.cloudera.org:8080/#/c/11100/6/src/kudu/common/column_predicate-test.cc@88
PS6, Line 88: 
> const uint8_t*
Ok, got it, thanks a lot. Done.


http://gerrit.cloudera.org:8080/#/c/11100/6/src/kudu/common/column_predicate-test.cc@92
PS6, Line 92:       }
> emplace_back in new code. Elsewhere too.
Done


http://gerrit.cloudera.org:8080/#/c/11100/6/src/kudu/common/column_predicate-test.cc@791
PS6, Line 791:     vector<ColumnPredicate::BloomFilterInner> orig_bloom_filters = *bf;
> Don't need std:: prefix. Below too.
Done


http://gerrit.cloudera.org:8080/#/c/11100/6/src/kudu/common/column_predicate-test.cc@793
PS6, Line 793:     // NONE
> Nit: maybe name "orig_bloom_filters" to better convey its purpose?
Done


http://gerrit.cloudera.org:8080/#/c/11100/6/src/kudu/common/column_predicate-test.cc@828
PS6, Line 828:               ColumnPredicate::IsNotNull(column),
> bf_copy maybe?
Done


http://gerrit.cloudera.org:8080/#/c/11100/6/src/kudu/common/column_predicate-test.cc@1382
PS6, Line 1382:   BloomFilterBuilder bfb1(
> 0 1 both hit
Done


http://gerrit.cloudera.org:8080/#/c/11100/6/src/kudu/common/column_predicate-test.cc@1414
PS6, Line 1414:   vector<Slice> keys_slice;
> 0 1 both hit
Done


http://gerrit.cloudera.org:8080/#/c/11100/6/src/kudu/common/column_predicate-test.cc@1429
PS6, Line 1429:           BloomFilterSizing::ByCountAndFPRate(n_keys, 0.01));
> Each of these "Test for ... type" is independent of the other tests, right?
Actually it is not independent. This Test is for InBloomFilter merge and the test is formed by three types tests:uint64, string and binary just like the other predicate test, so I think they should be together.


http://gerrit.cloudera.org:8080/#/c/11100/6/src/kudu/common/column_predicate.h
File src/kudu/common/column_predicate.h:

http://gerrit.cloudera.org:8080/#/c/11100/6/src/kudu/common/column_predicate.h@275
PS6, Line 275: 
             :     BloomFilterInner() : nhash_(0), hash_algorithm_(CITY_HASH) {}
             : 
             :     const Slice& bloom_data() const {
             :       return bloom_data_;
             :     }
             : 
             :     size_t nhash() const {
             :       return nhash_;
             :     }
             : 
             :     HashAlgorithmInBloomFilter hash_algorithm() const {
             :       return hash_algorithm_;
             :     }
             : 
             :     void set_nhash(size_t nhash) {
             :       nhash_ = nhash;
             :     }
             : 
             :     void set_bloom_data(Slice bloom_data) {
             :       bloom_data_ = bloom_data;
             :     }
             : 
             :     v
> I don't see the point of these getters and setters since the members are al
Emmm I simply try to use setters and getters to let the calls look uniform. Maybe I should change the struct to class. Done


http://gerrit.cloudera.org:8080/#/c/11100/6/src/kudu/common/column_predicate.h@301
PS6, Line 301: 
             :     bool operator==(const BloomFilterInner& other) const {
> Nit: reformat like:
Done


http://gerrit.cloudera.org:8080/#/c/11100/6/src/kudu/common/column_predicate.h@304
PS6, Line 304:               nhash_ == other.nhash() &&
> Nit: separate each member from the member before it (or from operator==) wi
Done


http://gerrit.cloudera.org:8080/#/c/11100/6/src/kudu/common/column_predicate.h@308
PS6, Line 308: 
             : 
> I don't think this comment is necessary; the list of supported algorithms i
Done


http://gerrit.cloudera.org:8080/#/c/11100/6/src/kudu/common/column_predicate.h@354
PS6, Line 354:  into this IS NOT NULL 
> Nit: const ColumnPredicate&
Done


http://gerrit.cloudera.org:8080/#/c/11100/6/src/kudu/common/column_predicate.h@425
PS6, Line 425: 
> Nit: "The list of bloom filters in this predicate."
Done


http://gerrit.cloudera.org:8080/#/c/11100/6/src/kudu/common/column_predicate.cc
File src/kudu/common/column_predicate.cc:

http://gerrit.cloudera.org:8080/#/c/11100/6/src/kudu/common/column_predicate.cc@68
PS6, Line 68:     : predicate_type_(predicate_type),
            :       column_(move(column)),
            :       lower_(lower),
            :       upper_(upper) {
> Nit: use the same indentation as in the other two constructors.
Done


http://gerrit.cloudera.org:8080/#/c/11100/6/src/kudu/common/column_predicate.cc@113
PS6, Line 113:   CHECK(bfs != nullptr);
> Should check that bfs != nullptr.
Done


http://gerrit.cloudera.org:8080/#/c/11100/6/src/kudu/common/column_predicate.cc@189
PS6, Line 189:   upper_ = nullptr;
> Hmm, why don't we clear values_ in this function? Why is it necessary to cl
Emm, the origin code doesn't clear the values_ and it works well because use the predicate type to judge. I conservativly clear the bloom_filters_ actually if we don't clear is also ok. Done


http://gerrit.cloudera.org:8080/#/c/11100/6/src/kudu/common/column_predicate.cc@226
PS6, Line 226:           SetToNone();
             :         }
             :       }
> Could use SetToNone here.
Done


http://gerrit.cloudera.org:8080/#/c/11100/6/src/kudu/common/column_predicate.cc@254
PS6, Line 254:         return;
> If we allowed an empty list of bloom filters, could that simplify into some
Emmm, I think InBloomFilter predicate must contain as least one bloom filter. And allowed an empty list may not simplify the code.


http://gerrit.cloudera.org:8080/#/c/11100/6/src/kudu/common/column_predicate.cc@264
PS6, Line 264:             upper_ = nullptr;
             :             bloom_filters_.clear();
             :           } else {
             :             SetToNone();
             :           }
             :         }
             :       } els
> Can you explain the rationale behind this simplification? It's not obvious 
This simplification refer to the Range simplification above. Currently InBloomFilter predicate can contain range, so when we simply InBloomFilter predicate we should consider both. In this case lower_ and upper_ are consecutive which means this predicate may can simplify to Equality if lower_ hits the bloomfilters. If lower_ doesn't hit the bloomfilters this predicate should be None.


http://gerrit.cloudera.org:8080/#/c/11100/6/src/kudu/common/column_predicate.cc@275
PS6, Line 275:             predicate_type_ = PredicateType::Equality;
> Isn't upper_ guaranteed to be nullptr in this branch?
Yes, I just simply copy the Range simplify part. Should I modify the Range part too? Done.


http://gerrit.cloudera.org:8080/#/c/11100/6/src/kudu/common/column_predicate.cc@277
PS6, Line 277:           } else {
             :             SetToNone();
             :           }
             :         }
             :       } else if (upper_ != nullptr) {
             :         if (type_info->IsMinValue(upper_)) {
             :           S
> Also not understanding this simplification; could you explain in a comment?
This simplification refer to the Range simplification above. In this case lower_ is the max value which means this predicate may can simplify to Equality if lower_ hits the bloomfilters. If lower_ doesn't hit the bloomfilters this predicate should be None.


http://gerrit.cloudera.org:8080/#/c/11100/6/src/kudu/common/column_predicate.cc@287
PS6, Line 287:     };
> Why is this case different from the L274-L275 case? That is, why do we use 
If the lower_ is the min value means the value can be any value, so L274-L275 simply set lower_ to be nullptr and keep the bloomfilters intact. But if the upper_ is the min vlaue means no value can be selected so the predicate can simplify to None.


http://gerrit.cloudera.org:8080/#/c/11100/6/src/kudu/common/column_predicate.cc@443
PS6, Line 443:     };
             :   }
             :   LOG(F
> Can you explain this?
Equality merge InBloomFilter, if the value(lower_) of Equality can not hit the InBloomFilter(satisfy both bloomfilters and range bound if contains) then the merge result should be None.


http://gerrit.cloudera.org:8080/#/c/11100/6/src/kudu/common/column_predicate.cc@630
PS6, Line 630:         bloom_filters_.clear();
> Nit: "bloom filter", not "list"
Done


http://gerrit.cloudera.org:8080/#/c/11100/6/src/kudu/common/column_predicate.cc@636
PS6, Line 636: 
> in bloom filter
Done


http://gerrit.cloudera.org:8080/#/c/11100/6/src/kudu/common/column_predicate.cc@647
PS6, Line 647:         }
> Nit: vector<const void*> new_values;
Done


http://gerrit.cloudera.org:8080/#/c/11100/6/src/kudu/common/column_predicate.cc@650
PS6, Line 650:         SetToNone();
> Nit: emplace_back in new code.
Done


http://gerrit.cloudera.org:8080/#/c/11100/6/src/kudu/common/column_predicate.cc@870
PS6, Line 870: }
> IS IN BloomFilter, perhaps?
Done


http://gerrit.cloudera.org:8080/#/c/11100/6/src/kudu/common/common.proto
File src/kudu/common/common.proto:

http://gerrit.cloudera.org:8080/#/c/11100/6/src/kudu/common/common.proto@357
PS6, Line 357:   // Represent a bloom filter.
> What does "quick filter the row" mean? Could you elaborate?
My fault, the comment is not precise. Done.


http://gerrit.cloudera.org:8080/#/c/11100/6/src/kudu/common/common.proto@362
PS6, Line 362:     optional bytes bloom_data = 2 [(kudu.REDACT) = true];
> Doesn't this depend on the value of hash_algorithm?
Yes, done.


http://gerrit.cloudera.org:8080/#/c/11100/6/src/kudu/common/common.proto@365
PS6, Line 365: 
> Does this need a default value specifier for CITY_HASH?
Done


http://gerrit.cloudera.org:8080/#/c/11100/6/src/kudu/common/common.proto@377
PS6, Line 377: 
             :     optional bytes lower = 1 [(kudu.
> Nit: not your fault, but could you remove this sentence? We've since added 
Done


http://gerrit.cloudera.org:8080/#/c/11100/6/src/kudu/common/common.proto@404
PS6, Line 404:     // Lower and Upper is optional for InBloomFilter.
> Nit: add a comment explaining what this is.
Done


http://gerrit.cloudera.org:8080/#/c/11100/6/src/kudu/common/common.proto@406
PS6, Line 406:     // merge result can be InBloomFilter whith range bound inside. And the lower
> How do lower and upper work with bloom filters? Can you doc in a comment?
When use both InBloomFilter and Range predicate for the same column the merge result can be InBloomFilter whith range bound inside. And the lower and upper works just like they in Range predicate. Done.


http://gerrit.cloudera.org:8080/#/c/11100/6/src/kudu/common/common.proto@408
PS6, Line 408:     // The inclusive lower bound.
> Use field id 2 here and 3 for 'upper'; no reason to skip 2 since there wasn
My fault. Done.


http://gerrit.cloudera.org:8080/#/c/11100/6/src/kudu/common/key_util.cc
File src/kudu/common/key_util.cc:

http://gerrit.cloudera.org:8080/#/c/11100/6/src/kudu/common/key_util.cc@234
PS6, Line 234:       case PredicateType::InBloomFilter:  // Upper in InBloomFilter processed as upper in Range.
> Nit: "upper in InBloomFilter processed as upper in Range"
Done


http://gerrit.cloudera.org:8080/#/c/11100/6/src/kudu/common/wire_protocol-test.cc
File src/kudu/common/wire_protocol-test.cc:

http://gerrit.cloudera.org:8080/#/c/11100/6/src/kudu/common/wire_protocol-test.cc@44
PS6, Line 44: #include "kudu/util/test_util.h"
> Not sorted correctly.
Done


http://gerrit.cloudera.org:8080/#/c/11100/6/src/kudu/common/wire_protocol-test.cc@508
PS6, Line 508:   Arena arena(1024);
> WireProtocolTest already has an arena, could reuse it?
The above testcase use new local variable arena such as InList, so I want to keep consistent.


http://gerrit.cloudera.org:8080/#/c/11100/6/src/kudu/common/wire_protocol-test.cc@536
PS6, Line 536:     NO_FATALS(ColumnPredicateToPB(ibf, &pb));
> NO_FATALS in new code. Below too.
Ok, Done.


http://gerrit.cloudera.org:8080/#/c/11100/6/src/kudu/common/wire_protocol-test.cc@555
PS6, Line 555: TEST_F(WireProtocolTest, TestColumnPredicateBloomFilterWithBound) {
> The setup code is identical in both tests; can you decompose it into a new 
The BF test is juxtaposed with InList test above, It is necessary to do this? Or I should decompose the predicate test(include the InList) into a new test fixture?


http://gerrit.cloudera.org:8080/#/c/11100/6/src/kudu/common/wire_protocol.cc
File src/kudu/common/wire_protocol.cc:

http://gerrit.cloudera.org:8080/#/c/11100/6/src/kudu/common/wire_protocol.cc@416
PS6, Line 416:   const void* src = bf_src.bloom_data().data();
             :   size_t size = bf_src.bloom_data().size();
             :   bf_dst->mutable_bloom_data()->assign(reinterpret_cast<const char*>(src), size);
             :   bf_dst->set_hash_algorithm(bf_src.
> Combine these four lines into two (i.e. combine declaration and assignment)
Done


http://gerrit.cloudera.org:8080/#/c/11100/6/src/kudu/common/wire_protocol.cc@455
PS6, Line 455:                                       ColumnPredicate::BloomFilterInner* dst_src,
> from
Done


http://gerrit.cloudera.org:8080/#/c/11100/6/src/kudu/common/wire_protocol.cc@461
PS6, Line 461:   memcpy(data_copy, bf_src.bloom_data().data(), bloom_data_size);
> bf_src.bloom_data().size() is repeated three times; can you store in a loca
Done


http://gerrit.cloudera.org:8080/#/c/11100/6/src/kudu/common/wire_protocol.cc@463
PS6, Line 463:   dst_src->set_hash_algorithm(bf_src.hash_algorithm());
> Why do we need to allocate the slice itself from the arena? Seems like we t
Done


http://gerrit.cloudera.org:8080/#/c/11100/6/src/kudu/common/wire_protocol.cc@511
PS6, Line 511:  
> Nit: unnecessary space here.
Done


http://gerrit.cloudera.org:8080/#/c/11100/6/src/kudu/common/wire_protocol.cc@514
PS6, Line 514: 
> Nit: ColumnPredicatePB::BloomFilter* bloom_filter
Done


http://gerrit.cloudera.org:8080/#/c/11100/6/src/kudu/common/wire_protocol.cc@595
PS6, Line 595:     };
             :     case Colum
> Nit: while you're here, could you fix this indentation?
Done


http://gerrit.cloudera.org:8080/#/c/11100/6/src/kudu/common/wire_protocol.cc@601
PS6, Line 601: m f
> Nit: == 0 is more clear
Done


http://gerrit.cloudera.org:8080/#/c/11100/6/src/kudu/common/wire_protocol.cc@603
PS6, Line 603: ter.bloom_filters()) {
> Nit: no bloom filters
Done


http://gerrit.cloudera.org:8080/#/c/11100/6/src/kudu/common/wire_protocol.cc@608
PS6, Line 608: bloom_filter;
> Nit: "missing bloom filter details"
Done


http://gerrit.cloudera.org:8080/#/c/11100/6/src/kudu/common/wire_protocol.cc@612
PS6, Line 612:       // Extract the optional lower and upper bound.
> emplace_back
Done


http://gerrit.cloudera.org:8080/#/c/11100/6/src/kudu/tablet/cfile_set-test.cc
File src/kudu/tablet/cfile_set-test.cc:

http://gerrit.cloudera.org:8080/#/c/11100/6/src/kudu/tablet/cfile_set-test.cc@21
PS6, Line 21: #include <cstddef>
> Nit: no empty line here.
Done


http://gerrit.cloudera.org:8080/#/c/11100/6/src/kudu/tablet/cfile_set-test.cc@109
PS6, Line 109: (2n 
> Nit: "(2n)th" so that it doesn't seem like you're trying to say "second". B
Done


http://gerrit.cloudera.org:8080/#/c/11100/6/src/kudu/tablet/cfile_set-test.cc@110
PS6, Line 110:  blo
> Nit: form. Below too.
Done


http://gerrit.cloudera.org:8080/#/c/11100/6/src/kudu/tablet/cfile_set-test.cc@113
PS6, Line 113: Builder* bf1_contain,
> Nit: put this one on its own line for better symmetry with the others. Belo
Done


http://gerrit.cloudera.org:8080/#/c/11100/6/src/kudu/tablet/cfile_set-test.cc@213
PS6, Line 213: const shared_ptr<CFileSet>& 
> Nit: const shared_ptr<CFileSet>&
Done


http://gerrit.cloudera.org:8080/#/c/11100/6/src/kudu/tablet/cfile_set-test.cc@532
PS6, Line 532:   // BloomFilter of column 0 contain and column 1 contain.
> Nit: terminate with a period.
Done


http://gerrit.cloudera.org:8080/#/c/11100/6/src/kudu/util/bloom_filter.h
File src/kudu/util/bloom_filter.h:

http://gerrit.cloudera.org:8080/#/c/11100/6/src/kudu/util/bloom_filter.h@57
PS6, Line 57:   explicit BloomKeyProbe(const Slice &key, HashAlgorithmInBloomFilter hash_algorithm = CITY_HASH)
> Can you rewrite this constructor to chain to the new one?
I chose to use constructor with default value. I can also simply delete the origin constructor and modify the calls of origin constructor.


http://gerrit.cloudera.org:8080/#/c/11100/6/src/kudu/util/bloom_filter.h@68
PS6, Line 68:       default:
> The explicit keyword is only for constructors with one argument.
Done.


http://gerrit.cloudera.org:8080/#/c/11100/6/src/kudu/util/bloom_filter.h@69
PS6, Line 69:         h = util_hash::CityHash64(
> Reformat like this:
Done


http://gerrit.cloudera.org:8080/#/c/11100/6/src/kudu/util/bloom_filter.h@76
PS6, Line 76:     h_2_ = static_cast<uint32_t>(h >> 32);
> Can you reformat:
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/11100
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I62c2de42667d0255d94e19db773240f7f9ee636c
Gerrit-Change-Number: 11100
Gerrit-PatchSet: 8
Gerrit-Owner: ZhangYao <tr...@gmail.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: ZhangYao <tr...@gmail.com>
Gerrit-Comment-Date: Mon, 24 Sep 2018 09:22:18 +0000
Gerrit-HasComments: Yes

[kudu-CR] Implement BloomFilter Predicate in server side.

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has posted comments on this change. ( http://gerrit.cloudera.org:8080/11100 )

Change subject: Implement BloomFilter Predicate in server side.
......................................................................


Patch Set 9: Code-Review+1

Looks good, though I'd like Dan to take another look as he's our resident predicate expert.


-- 
To view, visit http://gerrit.cloudera.org:8080/11100
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I62c2de42667d0255d94e19db773240f7f9ee636c
Gerrit-Change-Number: 11100
Gerrit-PatchSet: 9
Gerrit-Owner: ZhangYao <tr...@gmail.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: ZhangYao <tr...@gmail.com>
Gerrit-Comment-Date: Wed, 26 Sep 2018 15:51:42 +0000
Gerrit-HasComments: No

[kudu-CR] Implement BloomFilter Predicate in server side.

Posted by "ZhangYao (Code Review)" <ge...@cloudera.org>.
ZhangYao has posted comments on this change. ( http://gerrit.cloudera.org:8080/11100 )

Change subject: Implement BloomFilter Predicate in server side.
......................................................................


Patch Set 9:

(5 comments)

Thanks a lot :)

http://gerrit.cloudera.org:8080/#/c/11100/8/src/kudu/common/column_predicate-test.cc
File src/kudu/common/column_predicate-test.cc:

http://gerrit.cloudera.org:8080/#/c/11100/8/src/kudu/common/column_predicate-test.cc@77
PS8, Line 77:                                 BloomFilterBuilder* bfb2) {
> You shouldn't call this more than once per test. So maybe add it to the fix
Get your point, I add rand_ as a member of Test and initialize it in constructor so each test case will have different rand seed and in the same test case will use the same seed.


http://gerrit.cloudera.org:8080/#/c/11100/8/src/kudu/common/column_predicate-test.cc@1399
PS8, Line 1399:   ColumnPredicate::BloomFilterInner bf2(slice2, bfb2.n_hashes(), MURMUR_HASH_2);
> Still got some usage of push_back in this test instead of emplace_back.
Done


http://gerrit.cloudera.org:8080/#/c/11100/6/src/kudu/common/column_predicate.cc
File src/kudu/common/column_predicate.cc:

http://gerrit.cloudera.org:8080/#/c/11100/6/src/kudu/common/column_predicate.cc@275
PS6, Line 275:           } else {
> Yes, please do.
Done


http://gerrit.cloudera.org:8080/#/c/11100/6/src/kudu/common/wire_protocol-test.cc
File src/kudu/common/wire_protocol-test.cc:

http://gerrit.cloudera.org:8080/#/c/11100/6/src/kudu/common/wire_protocol-test.cc@555
PS6, Line 555:     ASSERT_EQ(predicate->predicate_type(), PredicateType::InBloomFilter);
> Yeah, decomposing the one for InList would be great too.
I decompose the bloomfilter test and use unique_ptr to wrap the BloomFilterBuilder because it doesn't have a default constructor so I have to use it's point format as member variable.


http://gerrit.cloudera.org:8080/#/c/11100/8/src/kudu/util/bloom_filter.h
File src/kudu/util/bloom_filter.h:

http://gerrit.cloudera.org:8080/#/c/11100/8/src/kudu/util/bloom_filter.h@67
PS8, Line 67:       case CITY_HASH:
> Nit: don't need the comment here; it's obvious from the default value.
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/11100
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I62c2de42667d0255d94e19db773240f7f9ee636c
Gerrit-Change-Number: 11100
Gerrit-PatchSet: 9
Gerrit-Owner: ZhangYao <tr...@gmail.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: ZhangYao <tr...@gmail.com>
Gerrit-Comment-Date: Wed, 26 Sep 2018 07:18:58 +0000
Gerrit-HasComments: Yes

[kudu-CR] Implement BloomFilter Predicate in server side.

Posted by "ZhangYao (Code Review)" <ge...@cloudera.org>.
Hello Tidy Bot, Dan Burkert, Kudu Jenkins, Andrew Wong, Adar Dembo, Todd Lipcon, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/11100

to look at the new patch set (#10).

Change subject: Implement BloomFilter Predicate in server side.
......................................................................

Implement BloomFilter Predicate in server side.

Change-Id: I62c2de42667d0255d94e19db773240f7f9ee636c
---
M src/kudu/common/column_predicate-test.cc
M src/kudu/common/column_predicate.cc
M src/kudu/common/column_predicate.h
M src/kudu/common/common.proto
M src/kudu/common/key_util.cc
M src/kudu/common/scan_spec.cc
M src/kudu/common/wire_protocol-test.cc
M src/kudu/common/wire_protocol.cc
M src/kudu/tablet/cfile_set-test.cc
M src/kudu/util/bloom_filter.h
10 files changed, 1,069 insertions(+), 40 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/00/11100/10
-- 
To view, visit http://gerrit.cloudera.org:8080/11100
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I62c2de42667d0255d94e19db773240f7f9ee636c
Gerrit-Change-Number: 11100
Gerrit-PatchSet: 10
Gerrit-Owner: ZhangYao <tr...@gmail.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: ZhangYao <tr...@gmail.com>

[kudu-CR] Implement BloomFilter Predicate in server side.

Posted by "ZhangYao (Code Review)" <ge...@cloudera.org>.
ZhangYao has posted comments on this change. ( http://gerrit.cloudera.org:8080/11100 )

Change subject: Implement BloomFilter Predicate in server side.
......................................................................


Patch Set 4:

(6 comments)

http://gerrit.cloudera.org:8080/#/c/11100/2/src/kudu/common/column_predicate.h
File src/kudu/common/column_predicate.h:

http://gerrit.cloudera.org:8080/#/c/11100/2/src/kudu/common/column_predicate.h@97
PS2, Line 97:   // Either (but not both) of the bounds may be a nullptr to indicate an
            :   // unbounded range on that end.
> I'm not sure range queries should be able to support BloomFilters. Is InBlo
Actually users are not suppose to use both range and bloomfilter for the same column , but this is possible to happen so I should handle this case and merge all the predicates for the same column.


http://gerrit.cloudera.org:8080/#/c/11100/2/src/kudu/common/common.proto
File src/kudu/common/common.proto:

http://gerrit.cloudera.org:8080/#/c/11100/2/src/kudu/common/common.proto@351
PS2, Line 351: // The bloomfilter fo
> One of the things that I think might be confusing in the future is there is
I name it BloomFilter to be consistent with other predicates message such as Range.


http://gerrit.cloudera.org:8080/#/c/11100/2/src/kudu/common/common.proto@352
PS2, Line 352: ssage BloomFilter {
             :     optional int32 nhash = 1;
             :     // We currently use CityHash backend
> It's a bit unclear what these fields are. Mind adding comments explaining?
Done


http://gerrit.cloudera.org:8080/#/c/11100/2/src/kudu/common/wire_protocol.cc
File src/kudu/common/wire_protocol.cc:

http://gerrit.cloudera.org:8080/#/c/11100/2/src/kudu/common/wire_protocol.cc@412
PS2, Line 412: // Copies a predicate bloom filter data from 'bf_src' into 'bf_dst'.
             : void CopyPredicateBloomFilterToPB(const ColumnPredicate::BloomFilterInner& bf_src,
             :                                   ColumnPredicatePB::BloomFilter* bf_dst) {
             :   bf_dst->set_nhash(bf_src.nhash());
             :   const void* src;
             :   size_t size;
             :   src = bf_src.bloom_data().data();
             :  
> This would be clearer if it took a BloomFilterInner instead of a Slice. Sam
Done


http://gerrit.cloudera.org:8080/#/c/11100/2/src/kudu/tablet/cfile_set-test.cc
File src/kudu/tablet/cfile_set-test.cc:

http://gerrit.cloudera.org:8080/#/c/11100/2/src/kudu/tablet/cfile_set-test.cc@104
PS2, Line 104: 
             :     ASSERT_OK(rsw.Finish());
             :   }
             : 
> Can you comment what these variables do and how they should be used? Same w
Done


http://gerrit.cloudera.org:8080/#/c/11100/2/src/kudu/tablet/cfile_set-test.cc@451
PS2, Line 451:   DoTestRangeScan(fileset, 2001, 2009);
> Would you mind adding a high-level summary of what this test is testing and
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/11100
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I62c2de42667d0255d94e19db773240f7f9ee636c
Gerrit-Change-Number: 11100
Gerrit-PatchSet: 4
Gerrit-Owner: ZhangYao <tr...@gmail.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: ZhangYao <tr...@gmail.com>
Gerrit-Comment-Date: Tue, 14 Aug 2018 10:45:31 +0000
Gerrit-HasComments: Yes

[kudu-CR] Implement BloomFilter Predicate in server side.

Posted by "ZhangYao (Code Review)" <ge...@cloudera.org>.
Hello Tidy Bot, Dan Burkert, Kudu Jenkins, Andrew Wong, Todd Lipcon, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/11100

to look at the new patch set (#5).

Change subject: Implement BloomFilter Predicate in server side.
......................................................................

Implement BloomFilter Predicate in server side.

Change-Id: I62c2de42667d0255d94e19db773240f7f9ee636c
---
M src/kudu/common/column_predicate-test.cc
M src/kudu/common/column_predicate.cc
M src/kudu/common/column_predicate.h
M src/kudu/common/common.proto
M src/kudu/common/key_util.cc
M src/kudu/common/scan_spec.cc
M src/kudu/common/wire_protocol-test.cc
M src/kudu/common/wire_protocol.cc
M src/kudu/tablet/cfile_set-test.cc
9 files changed, 1,062 insertions(+), 9 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/00/11100/5
-- 
To view, visit http://gerrit.cloudera.org:8080/11100
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I62c2de42667d0255d94e19db773240f7f9ee636c
Gerrit-Change-Number: 11100
Gerrit-PatchSet: 5
Gerrit-Owner: ZhangYao <tr...@gmail.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: ZhangYao <tr...@gmail.com>

[kudu-CR] Implement BloomFilter Predicate in server side.

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has posted comments on this change. ( http://gerrit.cloudera.org:8080/11100 )

Change subject: Implement BloomFilter Predicate in server side.
......................................................................


Patch Set 8:

(7 comments)

http://gerrit.cloudera.org:8080/#/c/11100/8/src/kudu/common/column_predicate-test.cc
File src/kudu/common/column_predicate-test.cc:

http://gerrit.cloudera.org:8080/#/c/11100/8/src/kudu/common/column_predicate-test.cc@77
PS8, Line 77:     Random rand(SeedRandom());
You shouldn't call this more than once per test. So maybe add it to the fixture?


http://gerrit.cloudera.org:8080/#/c/11100/8/src/kudu/common/column_predicate-test.cc@1399
PS8, Line 1399:   bfs.push_back(bf1);
Still got some usage of push_back in this test instead of emplace_back.


http://gerrit.cloudera.org:8080/#/c/11100/6/src/kudu/common/column_predicate.h
File src/kudu/common/column_predicate.h:

http://gerrit.cloudera.org:8080/#/c/11100/6/src/kudu/common/column_predicate.h@275
PS6, Line 275: 
             :     BloomFilterInner() : nhash_(0), hash_algorithm_(CITY_HASH) {}
             : 
             :     const Slice& bloom_data() const {
             :       return bloom_data_;
             :     }
             : 
             :     size_t nhash() const {
             :       return nhash_;
             :     }
             : 
             :     HashAlgorithmInBloomFilter hash_algorithm() const {
             :       return hash_algorithm_;
             :     }
             : 
             :     void set_nhash(size_t nhash) {
             :       nhash_ = nhash;
             :     }
             : 
             :     void set_bloom_data(Slice bloom_data) {
             :       bloom_data_ = bloom_data;
             :     }
             : 
             :     v
> Emmm I simply try to use setters and getters to let the calls look uniform.
Gotcha. Yeah, I agree, if you want to keep the setters/getters as an abstraction, switch to a class so that default visibility is private. Or just drop the setters/getters. Whatever you prefer.


http://gerrit.cloudera.org:8080/#/c/11100/6/src/kudu/common/column_predicate.cc
File src/kudu/common/column_predicate.cc:

http://gerrit.cloudera.org:8080/#/c/11100/6/src/kudu/common/column_predicate.cc@275
PS6, Line 275:             predicate_type_ = PredicateType::Equality;
> Yes, I just simply copy the Range simplify part. Should I modify the Range 
Yes, please do.


http://gerrit.cloudera.org:8080/#/c/11100/6/src/kudu/common/wire_protocol-test.cc
File src/kudu/common/wire_protocol-test.cc:

http://gerrit.cloudera.org:8080/#/c/11100/6/src/kudu/common/wire_protocol-test.cc@555
PS6, Line 555: TEST_F(WireProtocolTest, TestColumnPredicateBloomFilterWithBound) {
> The BF test is juxtaposed with InList test above, It is necessary to do thi
Yeah, decomposing the one for InList would be great too.

Test code tends to be verbose and it's easy for our eyes to glaze over when reviewing it, so anything you can do to make it shorter means we're more likely to give it the same detail in code review as we would non-test code.


http://gerrit.cloudera.org:8080/#/c/11100/8/src/kudu/util/bloom_filter.h
File src/kudu/util/bloom_filter.h:

http://gerrit.cloudera.org:8080/#/c/11100/8/src/kudu/util/bloom_filter.h@67
PS8, Line 67:       case CITY_HASH: // Default use city hash.
Nit: don't need the comment here; it's obvious from the default value.


http://gerrit.cloudera.org:8080/#/c/11100/6/src/kudu/util/bloom_filter.h
File src/kudu/util/bloom_filter.h:

http://gerrit.cloudera.org:8080/#/c/11100/6/src/kudu/util/bloom_filter.h@57
PS6, Line 57:   explicit BloomKeyProbe(const Slice &key, HashAlgorithmInBloomFilter hash_algorithm = CITY_HASH)
> I chose to use constructor with default value. I can also simply delete the
Default values are fine too; I just wanted to make sure the repeated code is consolidated.



-- 
To view, visit http://gerrit.cloudera.org:8080/11100
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I62c2de42667d0255d94e19db773240f7f9ee636c
Gerrit-Change-Number: 11100
Gerrit-PatchSet: 8
Gerrit-Owner: ZhangYao <tr...@gmail.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: ZhangYao <tr...@gmail.com>
Gerrit-Comment-Date: Mon, 24 Sep 2018 18:52:19 +0000
Gerrit-HasComments: Yes

[kudu-CR] Implement BloomFilter Predicate in server side.

Posted by "ZhangYao (Code Review)" <ge...@cloudera.org>.
ZhangYao has posted comments on this change. ( http://gerrit.cloudera.org:8080/11100 )

Change subject: Implement BloomFilter Predicate in server side.
......................................................................


Patch Set 11:

(15 comments)

http://gerrit.cloudera.org:8080/#/c/11100/9/src/kudu/common/column_predicate-test.cc
File src/kudu/common/column_predicate-test.cc:

http://gerrit.cloudera.org:8080/#/c/11100/9/src/kudu/common/column_predicate-test.cc@881
PS9, Line 881: Equali
> Should this be 'Equality'?
Done


http://gerrit.cloudera.org:8080/#/c/11100/9/src/kudu/common/column_predicate.h
File src/kudu/common/column_predicate.h:

http://gerrit.cloudera.org:8080/#/c/11100/9/src/kudu/common/column_predicate.h@230
PS9, Line 230: 
> Is this used anywhere?  If it's test only please indicate that in the doc.
It currently not be used, I add this function for potential use, but it seems not needed now. I remove it. Done.


http://gerrit.cloudera.org:8080/#/c/11100/9/src/kudu/common/column_predicate.h@376
PS9, Line 376:       size = slice->size();
             :       data = slice->data();
             :     }
             :     Slice cell_slice(reinterpret_cast<const uint8_t*>(data), size);
             :     for (const auto& bf : bloom_filters_) {
             :       BloomKeyProbe probe(cell_slice, bf.hash_algorithm());
             :       if (!BloomFilter(bf.bloom_data(), bf.nhash()).MayContainKey(probe)) {
             :         return false;
             :       }
             :     }
> This entire portion could be hoisted outside the for loop, which saves reco
Yeah, I correct it. Done.


http://gerrit.cloudera.org:8080/#/c/11100/9/src/kudu/common/column_predicate.h@386
PS9, Line 386:     // Check optional lower and upper bound.
> Once this for loop is simplified to just be this call, it should further si
Because BloomKeyProbe is bf related, so this for loop can not be simplified to be one call and in that condition change it to std::all_of may not needed ?


http://gerrit.cloudera.org:8080/#/c/11100/9/src/kudu/common/column_predicate.cc
File src/kudu/common/column_predicate.cc:

http://gerrit.cloudera.org:8080/#/c/11100/9/src/kudu/common/column_predicate.cc@115
PS9, Line 115:   CHECK(!bfs->empty());
> I think this should be calling Simplify(), or if there's a reason not to pl
There is only one condition that InBloomFilter can simplify is that it contains simplifiable bound. And IMO that kind of InBloomFilter should be formed by InBloomFilter and Range merge. When we merge the simplify() is already called. But as I offer this kind of interface to form that InBloomFilter(with bound) directly I should add simplify() call here. Done.


http://gerrit.cloudera.org:8080/#/c/11100/9/src/kudu/common/column_predicate.cc@385
PS9, Line 385:                                    }), values_.end());
> Pretty sure this can be simplified to a straight copy, eg:
Yeah. Done.


http://gerrit.cloudera.org:8080/#/c/11100/9/src/kudu/common/column_predicate.cc@387
PS9, Line 387:       Simplify();
> This range portion of this code doesn't need to be duplicated if you move I
Got it. Done.


http://gerrit.cloudera.org:8080/#/c/11100/9/src/kudu/common/column_predicate.cc@588
PS9, Line 588:   CHECK(predicate_type_ == PredicateType::InBloomFilter);
> Simplify() should already handle this, so I think you can reduce this to ju
Done


http://gerrit.cloudera.org:8080/#/c/11100/9/src/kudu/common/column_predicate.cc@644
PS9, Line 644:   }
> can this use copy_if like above?  I think it's more elegant that way.
Done


http://gerrit.cloudera.org:8080/#/c/11100/9/src/kudu/common/column_predicate.cc@648
PS9, Line 648: namespace {
> likewise, I think this is already handled by the Simplify() call.
Done


http://gerrit.cloudera.org:8080/#/c/11100/9/src/kudu/common/column_predicate.cc@661
PS9, Line 661:       if (!sel->IsRowSelected(i)) continue;
> likewise I think this can be simplified with the fallthrough trick.
Done


http://gerrit.cloudera.org:8080/#/c/11100/9/src/kudu/common/column_predicate.cc@895
PS9, Line 895: 
> This could be simplified by moving it above Range and allowing it to fall t
Done


http://gerrit.cloudera.org:8080/#/c/11100/9/src/kudu/common/column_predicate.cc@899
PS9, Line 899: 
> This is nice, consider doing the same simplification for Range.
Done


http://gerrit.cloudera.org:8080/#/c/11100/9/src/kudu/common/common.proto
File src/kudu/common/common.proto:

http://gerrit.cloudera.org:8080/#/c/11100/9/src/kudu/common/common.proto@347
PS9, Line 347: 
> I suggested this back in PS5 but I think there was some confusion.  I think
Because fs.proto already contains an enum which has UNKNOWN in the same namespace so here I use UNKNOWN_HASH instead. And I find that struct HashBucketSchema doesn't contain a hash algorithm member, it seems that we just ignore the hash algorithm from HashBucketSchemaPB so I think the renaming of UNKNOWN will not cause error.


http://gerrit.cloudera.org:8080/#/c/11100/9/src/kudu/common/wire_protocol.cc
File src/kudu/common/wire_protocol.cc:

http://gerrit.cloudera.org:8080/#/c/11100/9/src/kudu/common/wire_protocol.cc@604
PS9, Line 604:         if (!bf.has_nhash()
> This should also be checking for BloomFilter::UNKNOWN when that variant is 
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/11100
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I62c2de42667d0255d94e19db773240f7f9ee636c
Gerrit-Change-Number: 11100
Gerrit-PatchSet: 11
Gerrit-Owner: ZhangYao <tr...@gmail.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: ZhangYao <tr...@gmail.com>
Gerrit-Comment-Date: Thu, 11 Oct 2018 05:55:02 +0000
Gerrit-HasComments: Yes

[kudu-CR] Implement BloomFilter Predicate in server side.

Posted by "ZhangYao (Code Review)" <ge...@cloudera.org>.
Hello Tidy Bot, Dan Burkert, Kudu Jenkins, Todd Lipcon, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/11100

to look at the new patch set (#2).

Change subject: Implement BloomFilter Predicate in server side.
......................................................................

Implement BloomFilter Predicate in server side.

Change-Id: I62c2de42667d0255d94e19db773240f7f9ee636c
---
M src/kudu/common/column_predicate-test.cc
M src/kudu/common/column_predicate.cc
M src/kudu/common/column_predicate.h
M src/kudu/common/common.proto
M src/kudu/common/key_util.cc
M src/kudu/common/scan_spec.cc
M src/kudu/common/wire_protocol-test.cc
M src/kudu/common/wire_protocol.cc
M src/kudu/tablet/cfile_set-test.cc
9 files changed, 988 insertions(+), 20 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/00/11100/2
-- 
To view, visit http://gerrit.cloudera.org:8080/11100
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I62c2de42667d0255d94e19db773240f7f9ee636c
Gerrit-Change-Number: 11100
Gerrit-PatchSet: 2
Gerrit-Owner: ZhangYao <tr...@gmail.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] Implement BloomFilter Predicate in server side.

Posted by "Dan Burkert (Code Review)" <ge...@cloudera.org>.
Dan Burkert has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/11100 )

Change subject: Implement BloomFilter Predicate in server side.
......................................................................

Implement BloomFilter Predicate in server side.

Change-Id: I62c2de42667d0255d94e19db773240f7f9ee636c
Reviewed-on: http://gerrit.cloudera.org:8080/11100
Tested-by: Kudu Jenkins
Reviewed-by: Dan Burkert <da...@apache.org>
---
M src/kudu/common/column_predicate-test.cc
M src/kudu/common/column_predicate.cc
M src/kudu/common/column_predicate.h
M src/kudu/common/common.proto
M src/kudu/common/key_util.cc
M src/kudu/common/scan_spec.cc
M src/kudu/common/wire_protocol-test.cc
M src/kudu/common/wire_protocol.cc
M src/kudu/tablet/cfile_set-test.cc
M src/kudu/util/bloom_filter.h
10 files changed, 1,069 insertions(+), 40 deletions(-)

Approvals:
  Kudu Jenkins: Verified
  Dan Burkert: Looks good to me, approved

-- 
To view, visit http://gerrit.cloudera.org:8080/11100
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I62c2de42667d0255d94e19db773240f7f9ee636c
Gerrit-Change-Number: 11100
Gerrit-PatchSet: 12
Gerrit-Owner: ZhangYao <tr...@gmail.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: ZhangYao <tr...@gmail.com>

[kudu-CR] Implement BloomFilter Predicate in server side.

Posted by "ZhangYao (Code Review)" <ge...@cloudera.org>.
Hello Tidy Bot, Dan Burkert, Kudu Jenkins, Andrew Wong, Adar Dembo, Todd Lipcon, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/11100

to look at the new patch set (#9).

Change subject: Implement BloomFilter Predicate in server side.
......................................................................

Implement BloomFilter Predicate in server side.

Change-Id: I62c2de42667d0255d94e19db773240f7f9ee636c
---
M src/kudu/common/column_predicate-test.cc
M src/kudu/common/column_predicate.cc
M src/kudu/common/column_predicate.h
M src/kudu/common/common.proto
M src/kudu/common/key_util.cc
M src/kudu/common/scan_spec.cc
M src/kudu/common/wire_protocol-test.cc
M src/kudu/common/wire_protocol.cc
M src/kudu/tablet/cfile_set-test.cc
M src/kudu/util/bloom_filter.h
10 files changed, 1,112 insertions(+), 27 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/00/11100/9
-- 
To view, visit http://gerrit.cloudera.org:8080/11100
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I62c2de42667d0255d94e19db773240f7f9ee636c
Gerrit-Change-Number: 11100
Gerrit-PatchSet: 9
Gerrit-Owner: ZhangYao <tr...@gmail.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: ZhangYao <tr...@gmail.com>

[kudu-CR] Implement BloomFilter Predicate in server side.

Posted by "ZhangYao (Code Review)" <ge...@cloudera.org>.
ZhangYao has posted comments on this change. ( http://gerrit.cloudera.org:8080/11100 )

Change subject: Implement BloomFilter Predicate in server side.
......................................................................


Patch Set 6:

(9 comments)

http://gerrit.cloudera.org:8080/#/c/11100/5/src/kudu/common/column_predicate-test.cc
File src/kudu/common/column_predicate-test.cc:

http://gerrit.cloudera.org:8080/#/c/11100/5/src/kudu/common/column_predicate-test.cc@20
PS5, Line 20: #include <cstdlib>
> cstdlib
Done


http://gerrit.cloudera.org:8080/#/c/11100/5/src/kudu/common/column_predicate.h
File src/kudu/common/column_predicate.h:

http://gerrit.cloudera.org:8080/#/c/11100/5/src/kudu/common/column_predicate.h@20
PS5, Line 20: #include <cstddef>
> prefer <cstddef> and <cstdint>, and group with the other C++ headers.
Done


http://gerrit.cloudera.org:8080/#/c/11100/5/src/kudu/common/column_predicate.h@190
PS5, Line 190: 
> here and elsewhere, put the asterisk with the type (not the variable name).
Done


http://gerrit.cloudera.org:8080/#/c/11100/5/src/kudu/common/column_predicate.h@262
PS5, Line 262:   const std::vector<BloomFilterInner>& bloom_filters() const {
> Can this be private?
Done


http://gerrit.cloudera.org:8080/#/c/11100/5/src/kudu/common/column_predicate.h@298
PS5, Line 298:     }
> Please add docs for this class, and especially explain what 'nhash' is.
Done


http://gerrit.cloudera.org:8080/#/c/11100/5/src/kudu/common/column_predicate.cc
File src/kudu/common/column_predicate.cc:

http://gerrit.cloudera.org:8080/#/c/11100/5/src/kudu/common/column_predicate.cc@905
PS5, Line 905:       auto bound_equal = [&] (const void* eleml, const void* elemr) {
> You could reduce the duplication with Range by wrapping this in a lambda.
A lambda seems clearly. Done.


http://gerrit.cloudera.org:8080/#/c/11100/5/src/kudu/common/column_predicate.cc@930
PS5, Line 930: }
> Given this definition, why is this method necessary instead of directly cal
This method is a wrap of EvaluateCell for bloom filter check. You can directly call EvaluateCell but I think it's easier to understand by using a function named clearly.


http://gerrit.cloudera.org:8080/#/c/11100/5/src/kudu/common/column_predicate.cc@943
PS5, Line 943:     default: LOG(FATAL) << "unknown predicate type";
> This should probably be ahead of IsNotNull.
Done


http://gerrit.cloudera.org:8080/#/c/11100/5/src/kudu/common/common.proto
File src/kudu/common/common.proto:

http://gerrit.cloudera.org:8080/#/c/11100/5/src/kudu/common/common.proto@352
PS5, Line 352: // A predicate that can be applied on a Kudu column.
> Could you extract out the HashAlgorithm message above, and add a field here
Got it. Done



-- 
To view, visit http://gerrit.cloudera.org:8080/11100
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I62c2de42667d0255d94e19db773240f7f9ee636c
Gerrit-Change-Number: 11100
Gerrit-PatchSet: 6
Gerrit-Owner: ZhangYao <tr...@gmail.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: ZhangYao <tr...@gmail.com>
Gerrit-Comment-Date: Wed, 19 Sep 2018 09:14:04 +0000
Gerrit-HasComments: Yes

[kudu-CR] Implement BloomFilter Predicate in server side.

Posted by "Andrew Wong (Code Review)" <ge...@cloudera.org>.
Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/11100 )

Change subject: Implement BloomFilter Predicate in server side.
......................................................................


Patch Set 2:

(6 comments)

Interesting feature! I just did a pass over the "big picture." The biggest thing I'm unsure of is why we're pushing bloom filters into range predicates. It seems like they are completely separate predicates, so why combine them with ranges?

Also it'd be helpful in reviewing if you could add some comments around some the new classes regarding how they are will be used (e.g. how merging works, etc.), as well as a high level overview of the design in the commit message.

http://gerrit.cloudera.org:8080/#/c/11100/2/src/kudu/common/column_predicate.h
File src/kudu/common/column_predicate.h:

http://gerrit.cloudera.org:8080/#/c/11100/2/src/kudu/common/column_predicate.h@97
PS2, Line 97:   static ColumnPredicate Range(ColumnSchema column, const void* lower, const void* upper,
            :                                std::vector<BloomFilterInner>* bfs = nullptr
I'm not sure range queries should be able to support BloomFilters. Is InBloomFilter ANDed with a Range not sufficient? If you add InBloomFilter as rank 4 in SelectivityRank() in column_predicate.cc, this should prioritize evaluating bloom filters first, and if it yields no rows, further range predicates should be skipped.

Regardless, it'd be helpful to add a comment on how 'bfs' works.


http://gerrit.cloudera.org:8080/#/c/11100/2/src/kudu/common/common.proto
File src/kudu/common/common.proto:

http://gerrit.cloudera.org:8080/#/c/11100/2/src/kudu/common/common.proto@351
PS2, Line 351: message BloomFilter {
One of the things that I think might be confusing in the future is there is this BloomFilter (should probably be renamed to BloomFilterPB), the BloomFilter in utili/bloom_filter.h, and BloomFilterInner. Somewhere in this patch (maybe the commit message), it's worth explaining what each one is and how it is expected to be used. Ex.

BloomFilterPB: protobuf that gets serialized and sent via predicates in scans
BloomFilterInner: in-memory representation of BloomFilterPB
BloomFilter: underlying bloom filter implementation used by the server to check the presence of rows


http://gerrit.cloudera.org:8080/#/c/11100/2/src/kudu/common/common.proto@352
PS2, Line 352: optional int32 nhash = 1;
             :     // We currently use CityHash backend
             :     optional bytes bloom_data = 2 [(kudu.REDACT) = true];
It's a bit unclear what these fields are. Mind adding comments explaining?

Also, since this is a part of the public API, we'd like to make sure this is extendible in the future (e.g. allowing for different kinds of hashes, or different kinds of blooms). As such, maybe define an enum to specify the implementation details. See common.proto for an example.


http://gerrit.cloudera.org:8080/#/c/11100/2/src/kudu/common/wire_protocol.cc
File src/kudu/common/wire_protocol.cc:

http://gerrit.cloudera.org:8080/#/c/11100/2/src/kudu/common/wire_protocol.cc@412
PS2, Line 412: // Copies a predicate bloom filter data from 'bf_src' into 'bf_dst'.
             : void CopyPredicateBloomDataToPB(const Slice& bf_src, string* bf_dst) {
             :   const void* src;
             :   size_t size;
             :   src = bf_src.data();
             :   size = bf_src.size();
             :   bf_dst->assign(reinterpret_cast<const char*>(src), size);
             : }
This would be clearer if it took a BloomFilterInner instead of a Slice. Same with CopyPredicateBloomDataFromPB, but if it took BloomFilterPB as an out-parameter.


http://gerrit.cloudera.org:8080/#/c/11100/2/src/kudu/tablet/cfile_set-test.cc
File src/kudu/tablet/cfile_set-test.cc:

http://gerrit.cloudera.org:8080/#/c/11100/2/src/kudu/tablet/cfile_set-test.cc@104
PS2, Line 104: int nrows, BloomFilterBuilder* bf1_contain,
             :                        BloomFilterBuilder* bf1_exclude,
             :                        BloomFilterBuilder* bf2_contain,
             :                        BloomFilterBuilder* bf2_exclude)
Can you comment what these variables do and how they should be used? Same with GetBloomFilterResult().


http://gerrit.cloudera.org:8080/#/c/11100/2/src/kudu/tablet/cfile_set-test.cc@451
PS2, Line 451: TEST_F(TestCFileSet, TestBloomFilterPredicates) {
Would you mind adding a high-level summary of what this test is testing and how it accomplishes that? What are exclude vs contain, etc.?



-- 
To view, visit http://gerrit.cloudera.org:8080/11100
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I62c2de42667d0255d94e19db773240f7f9ee636c
Gerrit-Change-Number: 11100
Gerrit-PatchSet: 2
Gerrit-Owner: ZhangYao <tr...@gmail.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Tue, 07 Aug 2018 23:44:55 +0000
Gerrit-HasComments: Yes

[kudu-CR] Implement BloomFilter Predicate in server side.

Posted by "ZhangYao (Code Review)" <ge...@cloudera.org>.
Hello Tidy Bot, Dan Burkert, Kudu Jenkins, Andrew Wong, Todd Lipcon, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/11100

to look at the new patch set (#4).

Change subject: Implement BloomFilter Predicate in server side.
......................................................................

Implement BloomFilter Predicate in server side.

Change-Id: I62c2de42667d0255d94e19db773240f7f9ee636c
---
M src/kudu/common/column_predicate-test.cc
M src/kudu/common/column_predicate.cc
M src/kudu/common/column_predicate.h
M src/kudu/common/common.proto
M src/kudu/common/key_util.cc
M src/kudu/common/scan_spec.cc
M src/kudu/common/wire_protocol-test.cc
M src/kudu/common/wire_protocol.cc
M src/kudu/tablet/cfile_set-test.cc
9 files changed, 1,020 insertions(+), 22 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/00/11100/4
-- 
To view, visit http://gerrit.cloudera.org:8080/11100
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I62c2de42667d0255d94e19db773240f7f9ee636c
Gerrit-Change-Number: 11100
Gerrit-PatchSet: 4
Gerrit-Owner: ZhangYao <tr...@gmail.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] Implement BloomFilter Predicate in server side.

Posted by "ZhangYao (Code Review)" <ge...@cloudera.org>.
ZhangYao has posted comments on this change. ( http://gerrit.cloudera.org:8080/11100 )

Change subject: Implement BloomFilter Predicate in server side.
......................................................................


Patch Set 5:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/11100/4/src/kudu/common/column_predicate.h
File src/kudu/common/column_predicate.h:

http://gerrit.cloudera.org:8080/#/c/11100/4/src/kudu/common/column_predicate.h@286
PS4, Line 286:   }
> Add a space above.  Is the comment correct, what's 'nhash'?
Done


http://gerrit.cloudera.org:8080/#/c/11100/4/src/kudu/common/column_predicate.cc
File src/kudu/common/column_predicate.cc:

http://gerrit.cloudera.org:8080/#/c/11100/4/src/kudu/common/column_predicate.cc@42
PS4, Line 42: 
> (meta-comment) If I understand the changes to this file correctly, Range pr
True. I think your advice is very reasonable and I change the relevant code.


http://gerrit.cloudera.org:8080/#/c/11100/4/src/kudu/common/column_predicate.cc@460
PS4, Line 460:     default: {
> Simplify with std::copy_if or something equivalent.
Done


http://gerrit.cloudera.org:8080/#/c/11100/4/src/kudu/common/column_predicate.cc@563
PS4, Line 563: 
> likewise
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/11100
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I62c2de42667d0255d94e19db773240f7f9ee636c
Gerrit-Change-Number: 11100
Gerrit-PatchSet: 5
Gerrit-Owner: ZhangYao <tr...@gmail.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: ZhangYao <tr...@gmail.com>
Gerrit-Comment-Date: Thu, 16 Aug 2018 09:35:11 +0000
Gerrit-HasComments: Yes