You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@kudu.apache.org by "Bankim Bhavsar (Code Review)" <ge...@cloudera.org> on 2019/11/19 20:01:44 UTC

[kudu-CR] Import Impala's blocked based BloomFilter

Bankim Bhavsar has uploaded this change for review. ( http://gerrit.cloudera.org:8080/14745


Change subject: Import Impala's blocked based BloomFilter
......................................................................

Import Impala's blocked based BloomFilter

This change imports Impala's blocked based BloomFilter
that uses tiny BloomFilters that fit in cache lines.

Plan is to use this BloomFilter for pushing down
predicate from Impala.

Created a separate sub-directory kudu/impala/util
and retained impala namespace to distinguish
between existing kudu BloomFilter and this one.

Along with BloomFilter also imported some utility
methods as needed.

Made minor modifications mostly to adhere to the coding
standards and make the bots pass.

Added a simple smoke test to run the cpu info utility methods.

TODO: Build and run on the Mac.

Change-Id: I89c54a051c5093cf5fb81481a47a0a6677d7d906
---
M CMakeLists.txt
A src/kudu/impala/util/CMakeLists.txt
A src/kudu/impala/util/bloom-filter-test.cc
A src/kudu/impala/util/bloom-filter.cc
A src/kudu/impala/util/bloom-filter.h
A src/kudu/impala/util/compiler-util.h
A src/kudu/impala/util/cpu-info-test.cc
A src/kudu/impala/util/cpu-info.cc
A src/kudu/impala/util/cpu-info.h
A src/kudu/impala/util/pretty-printer.cc
A src/kudu/impala/util/pretty-printer.h
11 files changed, 1,479 insertions(+), 0 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/45/14745/1
-- 
To view, visit http://gerrit.cloudera.org:8080/14745
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I89c54a051c5093cf5fb81481a47a0a6677d7d906
Gerrit-Change-Number: 14745
Gerrit-PatchSet: 1
Gerrit-Owner: Bankim Bhavsar <ba...@cloudera.com>

[kudu-CR] Import Impala's blocked based BloomFilter

Posted by "Bankim Bhavsar (Code Review)" <ge...@cloudera.org>.
Bankim Bhavsar has posted comments on this change. ( http://gerrit.cloudera.org:8080/14745 )

Change subject: Import Impala's blocked based BloomFilter
......................................................................


Patch Set 2:

(7 comments)

http://gerrit.cloudera.org:8080/#/c/14745/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/14745/1//COMMIT_MSG@12
PS1, Line 12: Added this BF as part of kudu-util and renamed to BlockBloomFilter to avoid name collision
            : with existing BloomFil
> Will you also evaluate whether Impala's BF could be used in Kudu's diskrows
Sure. I can look into this as a follow-up.


http://gerrit.cloudera.org:8080/#/c/14745/1//COMMIT_MSG@14
PS1, Line 14: 
            : Removed dependencies on generated code like Thrift/Protobuf, Impala specific
            : utilities like BufferPool, CPUInfo.
            : 
> Works for me.
Done


http://gerrit.cloudera.org:8080/#/c/14745/1/src/kudu/impala/util/bloom-filter.h
File src/kudu/impala/util/bloom-filter.h:

http://gerrit.cloudera.org:8080/#/c/14745/1/src/kudu/impala/util/bloom-filter.h@22
PS1, Line 22: 
            : 
> nit for here and other files: consider using C++ headers, such as cstddef a
Done


http://gerrit.cloudera.org:8080/#/c/14745/1/src/kudu/impala/util/compiler-util.h
File src/kudu/impala/util/compiler-util.h:

PS1: 
> We have a bunch (if not all) of this already in gutil/macros.h, gutil/port.
Done


http://gerrit.cloudera.org:8080/#/c/14745/1/src/kudu/impala/util/cpu-info.cc
File src/kudu/impala/util/cpu-info.cc:

PS1: 
> Likewise this may overlap somewhat with gutil/cpu.{cc,h} and gutil/sysinfo.
Done


http://gerrit.cloudera.org:8080/#/c/14745/1/src/kudu/impala/util/cpu-info.cc@156
PS1, Line 156: 
> All the code above for /proc/cpuinfo in this method doesn't work on macOS, 
Done


http://gerrit.cloudera.org:8080/#/c/14745/1/src/kudu/impala/util/pretty-printer.h
File src/kudu/impala/util/pretty-printer.h:

PS1: 
> Subsumed by gutil/strings/human_readable.{cc,h}?
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/14745
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I89c54a051c5093cf5fb81481a47a0a6677d7d906
Gerrit-Change-Number: 14745
Gerrit-PatchSet: 2
Gerrit-Owner: Bankim Bhavsar <ba...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Bankim Bhavsar <ba...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Fri, 22 Nov 2019 18:04:54 +0000
Gerrit-HasComments: Yes

[kudu-CR] [util] Import Impala's block based BloomFilter

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has posted comments on this change. ( http://gerrit.cloudera.org:8080/14745 )

Change subject: [util] Import Impala's block based BloomFilter
......................................................................


Patch Set 10: Code-Review+2


-- 
To view, visit http://gerrit.cloudera.org:8080/14745
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I89c54a051c5093cf5fb81481a47a0a6677d7d906
Gerrit-Change-Number: 14745
Gerrit-PatchSet: 10
Gerrit-Owner: Bankim Bhavsar <ba...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Bankim Bhavsar <ba...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: helifu <hz...@corp.netease.com>
Gerrit-Comment-Date: Tue, 10 Dec 2019 19:10:58 +0000
Gerrit-HasComments: No

[kudu-CR] [util] Import Impala's block based BloomFilter

Posted by "Bankim Bhavsar (Code Review)" <ge...@cloudera.org>.
Hello Thomas Tauber-Marshall, Tidy Bot, Alexey Serbin, Andrew Wong, helifu, Adar Dembo, Todd Lipcon, Tim Armstrong, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/14745

to look at the new patch set (#10).

Change subject: [util] Import Impala's block based BloomFilter
......................................................................

[util] Import Impala's block based BloomFilter

This change imports Impala's block based BloomFilter(BF) that uses tiny
BloomFilters that are space, cache and hash efficient.
Plan is to use this BloomFilter for pushing down predicates from Impala.

Added this BloomFilter in kudu-util and renamed to BlockBloomFilter to
avoid name collision with existing BloomFilter in Kudu.

Removed dependencies on auto-generated code like Thrift/Protobuf,
Impala specific utilities like BufferPool, CPUInfo.

Added a simple BlockBloomFilterBufferAllocatorIf interface to allow
integration with Impala that uses its own BufferPool.
Added a DefaultBlockBloomFilterBufferAllocator implementation derived
from earlier version in Impala.

AVX2 operations are used which leads to following scenarios:
1) Compiler lacks AVX2 support.
Code is conditionally compiled using USE_AVX2 macro.
2) Compiler supports AVX2 but CPU at runtime does not.
Function pointers used to dispatch to generic implementation.
3) Compiler supports AVX2 and CPU at runtime does too.
Function pointers used to dispatch to AVX2 specific implementation.

Made bunch of cosmetic changes to adhere to coding style.

Change-Id: I89c54a051c5093cf5fb81481a47a0a6677d7d906
---
M src/kudu/util/CMakeLists.txt
A src/kudu/util/block_bloom_filter-test.cc
A src/kudu/util/block_bloom_filter.cc
A src/kudu/util/block_bloom_filter.h
A src/kudu/util/block_bloom_filter_avx2.cc
M src/kudu/util/bloom_filter.h
6 files changed, 777 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/45/14745/10
-- 
To view, visit http://gerrit.cloudera.org:8080/14745
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I89c54a051c5093cf5fb81481a47a0a6677d7d906
Gerrit-Change-Number: 14745
Gerrit-PatchSet: 10
Gerrit-Owner: Bankim Bhavsar <ba...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Bankim Bhavsar <ba...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: helifu <hz...@corp.netease.com>

[kudu-CR] Import Impala's blocked based BloomFilter

Posted by "Bankim Bhavsar (Code Review)" <ge...@cloudera.org>.
Bankim Bhavsar has posted comments on this change. ( http://gerrit.cloudera.org:8080/14745 )

Change subject: Import Impala's blocked based BloomFilter
......................................................................


Patch Set 5:

(25 comments)

http://gerrit.cloudera.org:8080/#/c/14745/5//COMMIT_MSG
Commit Message:

PS5: 
> A few nits on writing commit messages as per guidelines mentioned at https:
Done


http://gerrit.cloudera.org:8080/#/c/14745/5//COMMIT_MSG@7
PS5, Line 7: blocked based
> block-based
Done


http://gerrit.cloudera.org:8080/#/c/14745/5//COMMIT_MSG@9
PS5, Line 9: blocked based
> block-based
Done


http://gerrit.cloudera.org:8080/#/c/14745/5//COMMIT_MSG@10
PS5, Line 10: predicate
> predicates
Done


http://gerrit.cloudera.org:8080/#/c/14745/5//COMMIT_MSG@12
PS5, Line 12: part
> a part
Done


http://gerrit.cloudera.org:8080/#/c/14745/5//COMMIT_MSG@13
PS5, Line 13: kudu
> Kudu
Done


http://gerrit.cloudera.org:8080/#/c/14745/5//COMMIT_MSG@15
PS5, Line 15: generated
> auto-generated
Done


http://gerrit.cloudera.org:8080/#/c/14745/5//COMMIT_MSG@22
PS5, Line 22: This BF uses AVX2 operations which may not be available on target CPU when run.
            : So the AVX2 specific methods have been moved to a separate file and compiled with -mavx2
            : flag and a runtime check is made to invoke the method only if the CPU supports AVX2.
            : 
            : Compiler may not support AVX2, hence the file block_bloom_filter_avx2.cc is conditionally
            : compiled.
Done.

> In general do you know which segment(s) of our supported distro matrix fall into state #1?

Support for AVX2 instructions was added in GCC 4.7.0 and we require GCC 4.8+ for C+11 support. So support for AVX2 will always be available with GCC. https://gcc.gnu.org/gcc-4.7/changes.html

Couldn't figure about Clang.


http://gerrit.cloudera.org:8080/#/c/14745/5/src/kudu/util/CMakeLists.txt
File src/kudu/util/CMakeLists.txt:

http://gerrit.cloudera.org:8080/#/c/14745/5/src/kudu/util/CMakeLists.txt@248
PS5, Line 248: execute_process (
> Nit: execute_process(
Done


http://gerrit.cloudera.org:8080/#/c/14745/5/src/kudu/util/CMakeLists.txt@256
PS5, Line 256: if (${AVX2_SUPPORT})
> I think this might work too:
Done


http://gerrit.cloudera.org:8080/#/c/14745/5/src/kudu/util/CMakeLists.txt@258
PS5, Line 258:   set_source_files_properties(block_bloom_filter_avx2.cc PROPERTIES COMPILE_FLAGS "-mavx2")
> Seems like COMPILE_OPTIONS is preferred now: https://cmake.org/cmake/help/l
Using the preferred method of COMPILE_OPTIONS here.


http://gerrit.cloudera.org:8080/#/c/14745/5/src/kudu/util/block_bloom_filter-test.cc
File src/kudu/util/block_bloom_filter-test.cc:

http://gerrit.cloudera.org:8080/#/c/14745/5/src/kudu/util/block_bloom_filter-test.cc@88
PS5, Line 88:   vector<shared_ptr<BlockBloomFilter>> bloom_filters_;
> Based on usage, doesn't seem like shared ownership is warranted here. How a
shared_ptr is needed so that Close() can be invoked on the BBFs created by the test cases using CreateBloomFilter().

If the BBF API used regular constructor/destructor to allocate and de-allocate memory then there won't be need to keep track of BFFs. However the API uses Init/Close to allow integration with Impala.


http://gerrit.cloudera.org:8080/#/c/14745/5/src/kudu/util/block_bloom_filter-test.cc@124
PS5, Line 124:   SeedRandom();
> You can SeedRandom() once in the test fixture constructor (or SetUp) instea
Done


http://gerrit.cloudera.org:8080/#/c/14745/5/src/kudu/util/block_bloom_filter.h
File src/kudu/util/block_bloom_filter.h:

http://gerrit.cloudera.org:8080/#/c/14745/5/src/kudu/util/block_bloom_filter.h@37
PS5, Line 37: // A BloomFilter stores sets of items and offers a query operation indicating whether or
> May want to add something here about how this is different from bloom_filte
Done


http://gerrit.cloudera.org:8080/#/c/14745/5/src/kudu/util/block_bloom_filter.h@63
PS5, Line 63:   virtual Status AllocateBuffer(size_t bytes, void **ptr) = 0;
            :   virtual void FreeBuffer(void *ptr) = 0;
> Nit: void** and void* (space after the asterisk).
Done


http://gerrit.cloudera.org:8080/#/c/14745/5/src/kudu/util/block_bloom_filter.h@134
PS5, Line 134:   // The BloomFilter is divided up into Buckets
> Nit: comments should terminate with a period, here and elsewhere.
Done


http://gerrit.cloudera.org:8080/#/c/14745/5/src/kudu/util/block_bloom_filter.h@188
PS5, Line 188: 
             :   // Detect at run-time whether CPU supports AVX2
             :   bool has_avx2() const {
             :     return !FLAGS_disable_blockbloomfilter_avx2 && cpu_.has_avx2();
             :   }
> These checks are on the hot path; they add a few branches to each BF operat
Done


http://gerrit.cloudera.org:8080/#/c/14745/5/src/kudu/util/block_bloom_filter.cc
File src/kudu/util/block_bloom_filter.cc:

http://gerrit.cloudera.org:8080/#/c/14745/5/src/kudu/util/block_bloom_filter.cc@33
PS5, Line 33: DEFINE_bool(disable_blockbloomfilter_avx2, false,
> should this be marked hidden or experimental?
Done


http://gerrit.cloudera.org:8080/#/c/14745/5/src/kudu/util/block_bloom_filter.cc@48
PS5, Line 48:   cpu_(base::CPU()) {}
> Depending on how often BBFs are constructed/destroyed, this may get expensi
Made the cpu_ member static so it's initialized only once.


http://gerrit.cloudera.org:8080/#/c/14745/5/src/kudu/util/block_bloom_filter.cc@52
PS5, Line 52:     "Close() should have been called before the object is destroyed.";
> Yeah if the convention in Kudu is to release resources in the destructor, m
Ack. Using Init/Close v/s the default constructor/destructor approach is not intuitive. However this comes from Impala and will allow for easier integration.


http://gerrit.cloudera.org:8080/#/c/14745/5/src/kudu/util/block_bloom_filter.cc@64
PS5, Line 64:   DCHECK(log_num_buckets_ <= 32) << "Bloom filter too large. log_space_bytes: "
> Yeah, this invariant I think came from the Impala planner not generating fi
Done


http://gerrit.cloudera.org:8080/#/c/14745/5/src/kudu/util/block_bloom_filter.cc@138
PS5, Line 138: ATTRIBUTE_NO_SANITIZE_INTEGER
> why do we not want to sanitize this and the above? it seems like overflow w
Done


http://gerrit.cloudera.org:8080/#/c/14745/5/src/kudu/util/block_bloom_filter.cc@178
PS5, Line 178:   if (has_avx2()) {
> do all of our supported compiler toolchains support function multiversionin
For GCC, we require 4.8+ so function multi-versioning is supported there.

Addressed run-time branching issue using function pointers that are evaluated in constructor.

Few points in favor of this approach:
- We need to support the case where compiler doesn't support AVX2 so not sure how function multi-versioning will work if the compiler doesn't support the target architecture.
- Using function multi-versioning, we won't be able to test both non-AVX2 and AVX2 variants on the same machine as currently using a simple toggle flag (at least I couldn't figure)


http://gerrit.cloudera.org:8080/#/c/14745/5/src/kudu/util/block_bloom_filter.cc@187
PS5, Line 187:   int ret_code = posix_memalign(ptr, 64, bytes);
> I think we have a constant somewhere (maybe port.h) for CACHELINE_SIZE or s
Done


http://gerrit.cloudera.org:8080/#/c/14745/5/src/kudu/util/block_bloom_filter.cc@192
PS5, Line 192:   DCHECK(ptr != nullptr);
> or free(DCHECK_NOTNULL(ptr))
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/14745
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I89c54a051c5093cf5fb81481a47a0a6677d7d906
Gerrit-Change-Number: 14745
Gerrit-PatchSet: 5
Gerrit-Owner: Bankim Bhavsar <ba...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Bankim Bhavsar <ba...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Thu, 05 Dec 2019 18:27:19 +0000
Gerrit-HasComments: Yes

[kudu-CR] Import Impala's blocked based BloomFilter

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/14745 )

Change subject: Import Impala's blocked based BloomFilter
......................................................................


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/14745/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/14745/1//COMMIT_MSG@14
PS1, Line 14: 
            : Created a separate sub-directory kudu/impala/util
            : and retained impala namespace to distinguish
            : between existing kudu BloomFilter and this one.
> This is will be tricky
If I were you I'd fully kudu-ify the code - i.e. prune out unnecessary stuff and use Kudu utilities where appropriate - and not worry about making it consumable by Impala. We do need to be sure the binary representation is the same, but I think we can accomplish that with inspection and testing.



-- 
To view, visit http://gerrit.cloudera.org:8080/14745
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I89c54a051c5093cf5fb81481a47a0a6677d7d906
Gerrit-Change-Number: 14745
Gerrit-PatchSet: 1
Gerrit-Owner: Bankim Bhavsar <ba...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Bankim Bhavsar <ba...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Tue, 19 Nov 2019 21:52:59 +0000
Gerrit-HasComments: Yes

[kudu-CR] Import Impala's blocked based BloomFilter

Posted by "Bankim Bhavsar (Code Review)" <ge...@cloudera.org>.
Bankim Bhavsar has posted comments on this change. ( http://gerrit.cloudera.org:8080/14745 )

Change subject: Import Impala's blocked based BloomFilter
......................................................................


Patch Set 3:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/14745/3/src/kudu/util/CMakeLists.txt
File src/kudu/util/CMakeLists.txt:

http://gerrit.cloudera.org:8080/#/c/14745/3/src/kudu/util/CMakeLists.txt@260
PS3, Line 260:   add_definitions(-DUSE_AVX2=1)
There appears to be a mechanism to set for particular file. Figuring it out...



-- 
To view, visit http://gerrit.cloudera.org:8080/14745
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I89c54a051c5093cf5fb81481a47a0a6677d7d906
Gerrit-Change-Number: 14745
Gerrit-PatchSet: 3
Gerrit-Owner: Bankim Bhavsar <ba...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Bankim Bhavsar <ba...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Mon, 25 Nov 2019 22:08:51 +0000
Gerrit-HasComments: Yes

[kudu-CR] Import Impala's blocked based BloomFilter

Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Todd Lipcon has posted comments on this change. ( http://gerrit.cloudera.org:8080/14745 )

Change subject: Import Impala's blocked based BloomFilter
......................................................................


Patch Set 5:

(7 comments)

http://gerrit.cloudera.org:8080/#/c/14745/5/src/kudu/util/block_bloom_filter.cc
File src/kudu/util/block_bloom_filter.cc:

http://gerrit.cloudera.org:8080/#/c/14745/5/src/kudu/util/block_bloom_filter.cc@33
PS5, Line 33: DEFINE_bool(disable_blockbloomfilter_avx2, false,
should this be marked hidden or experimental?


http://gerrit.cloudera.org:8080/#/c/14745/5/src/kudu/util/block_bloom_filter.cc@52
PS5, Line 52:     "Close() should have been called before the object is destroyed.";
a bit of a strange API, but I guess we are borrowing this from Impala


http://gerrit.cloudera.org:8080/#/c/14745/5/src/kudu/util/block_bloom_filter.cc@64
PS5, Line 64:   DCHECK(log_num_buckets_ <= 32) << "Bloom filter too large. log_space_bytes: "
should this be either a CHECK or a return of InvalidArgument? seems like a DCHECK could result in continuing on with incorrect results or out-of-bounds access etc


http://gerrit.cloudera.org:8080/#/c/14745/5/src/kudu/util/block_bloom_filter.cc@138
PS5, Line 138: ATTRIBUTE_NO_SANITIZE_INTEGER
why do we not want to sanitize this and the above? it seems like overflow would be a bug?


http://gerrit.cloudera.org:8080/#/c/14745/5/src/kudu/util/block_bloom_filter.cc@178
PS5, Line 178:   if (has_avx2()) {
do all of our supported compiler toolchains support function multiversioning? https://lwn.net/Articles/691932/
Perhaps this is a slightly more performant way of doing this vs branching, though could be added in a TODO.


http://gerrit.cloudera.org:8080/#/c/14745/5/src/kudu/util/block_bloom_filter.cc@187
PS5, Line 187:   int ret_code = posix_memalign(ptr, 64, bytes);
> Why is this alignment required? I'll take my answer in the form of a code c
I think we have a constant somewhere (maybe port.h) for CACHELINE_SIZE or something which we could use in place of 64


http://gerrit.cloudera.org:8080/#/c/14745/5/src/kudu/util/block_bloom_filter.cc@192
PS5, Line 192:   DCHECK(ptr != nullptr);
> DCHECK_NE(ptr, nullptr) is preferred (clearer log message on a crash).
or free(DCHECK_NOTNULL(ptr))



-- 
To view, visit http://gerrit.cloudera.org:8080/14745
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I89c54a051c5093cf5fb81481a47a0a6677d7d906
Gerrit-Change-Number: 14745
Gerrit-PatchSet: 5
Gerrit-Owner: Bankim Bhavsar <ba...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Bankim Bhavsar <ba...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Wed, 27 Nov 2019 23:00:01 +0000
Gerrit-HasComments: Yes

[kudu-CR] Import Impala's blocked based BloomFilter

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/14745 )

Change subject: Import Impala's blocked based BloomFilter
......................................................................


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/14745/2/src/kudu/util/CMakeLists.txt
File src/kudu/util/CMakeLists.txt:

http://gerrit.cloudera.org:8080/#/c/14745/2/src/kudu/util/CMakeLists.txt@246
PS2, Line 246: set_source_files_properties(block_bloom_filter_avx2.cc PROPERTIES COMPILE_FLAGS "-mavx2")
> It's the __attribute__((__target__("avx2"))) on the functions that use AVX2
I guess we also use our own version of binutils that has the right support as well - we ran into issues there on old OS versions that predate that ISA support.



-- 
To view, visit http://gerrit.cloudera.org:8080/14745
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I89c54a051c5093cf5fb81481a47a0a6677d7d906
Gerrit-Change-Number: 14745
Gerrit-PatchSet: 2
Gerrit-Owner: Bankim Bhavsar <ba...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Bankim Bhavsar <ba...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Fri, 22 Nov 2019 21:34:53 +0000
Gerrit-HasComments: Yes

[kudu-CR] [util] Import Impala's block based BloomFilter

Posted by "Bankim Bhavsar (Code Review)" <ge...@cloudera.org>.
Bankim Bhavsar has posted comments on this change. ( http://gerrit.cloudera.org:8080/14745 )

Change subject: [util] Import Impala's block based BloomFilter
......................................................................


Patch Set 7:

(3 comments)

> Patch Set 7: Verified-1
> 
> Build Failed 
> 
> http://jenkins.kudu.apache.org/job/kudu-gerrit/19779/ : FAILURE

Unrelated test failure MaintenanceManagerTest.TestPriorityOpLaunch in TSAN.

http://gerrit.cloudera.org:8080/#/c/14745/5/src/kudu/util/block_bloom_filter-test.cc
File src/kudu/util/block_bloom_filter-test.cc:

http://gerrit.cloudera.org:8080/#/c/14745/5/src/kudu/util/block_bloom_filter-test.cc@88
PS5, Line 88:   }
> Right, I see what you mean; the lack of a default constructor means you can
Callers of CreateBloomFilter() need to act on the returned BFF. So if I were to use vector of unique_ptrs then the ownership would transfer to the vector and caller can't use the returned BFF directly. In that case CreateBloomFilter() could return index/handle to the created BF and add helper methods to Insert and Find using the index which is not clean/intuitive.


http://gerrit.cloudera.org:8080/#/c/14745/7/src/kudu/util/block_bloom_filter-test.cc
File src/kudu/util/block_bloom_filter-test.cc:

http://gerrit.cloudera.org:8080/#/c/14745/7/src/kudu/util/block_bloom_filter-test.cc@59
PS7, Line 59:     CHECK_OK(bf->Init(log_space_bytes));
> Could you also test the failure condition where log_space_bytes is too larg
Done


http://gerrit.cloudera.org:8080/#/c/14745/7/src/kudu/util/block_bloom_filter.cc
File src/kudu/util/block_bloom_filter.cc:

http://gerrit.cloudera.org:8080/#/c/14745/7/src/kudu/util/block_bloom_filter.cc@177
PS7, Line 177: this->
> Do you actually need this-> here and below?
Yes, we need to specify the object explicitly, so "this" is unavoidable to invoke another member function via a function pointer. Took a while to figure this out.

http://umich.edu/~eecs381/handouts/Pointers_to_memberfuncs.pdf



-- 
To view, visit http://gerrit.cloudera.org:8080/14745
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I89c54a051c5093cf5fb81481a47a0a6677d7d906
Gerrit-Change-Number: 14745
Gerrit-PatchSet: 7
Gerrit-Owner: Bankim Bhavsar <ba...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Bankim Bhavsar <ba...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Thu, 05 Dec 2019 22:14:02 +0000
Gerrit-HasComments: Yes

[kudu-CR] [util] Import Impala's block based BloomFilter

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has posted comments on this change. ( http://gerrit.cloudera.org:8080/14745 )

Change subject: [util] Import Impala's block based BloomFilter
......................................................................


Patch Set 7:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/14745/5//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/14745/5//COMMIT_MSG@22
PS5, Line 22: from earlier version in Impala.
            : 
            : AVX2 operations are used which leads to following scenarios:
            : 1) Compiler lacks AVX2 support.
            : Code is conditionally compiled using USE_AVX2 macro.
            : 2) Compil
> Done.
Production builds all use gcc, so I guess we'll never be in state #1. Except for the odd ducks using parts of Kudu in ARM and what not; they're also the ones who conditionalized the AVX2 support in the thirdparty build.


http://gerrit.cloudera.org:8080/#/c/14745/5/src/kudu/util/block_bloom_filter-test.cc
File src/kudu/util/block_bloom_filter-test.cc:

http://gerrit.cloudera.org:8080/#/c/14745/5/src/kudu/util/block_bloom_filter-test.cc@88
PS5, Line 88:   }
> shared_ptr is needed so that Close() can be invoked on the BBFs created by 
Right, I see what you mean; the lack of a default constructor means you can't constructor a vector of BBFs. However, shared_ptr is still overkill; unique_ptr should be sufficient.


http://gerrit.cloudera.org:8080/#/c/14745/7/src/kudu/util/block_bloom_filter-test.cc
File src/kudu/util/block_bloom_filter-test.cc:

http://gerrit.cloudera.org:8080/#/c/14745/7/src/kudu/util/block_bloom_filter-test.cc@59
PS7, Line 59:     CHECK_OK(bf->Init(log_space_bytes));
Could you also test the failure condition where log_space_bytes is too large?


http://gerrit.cloudera.org:8080/#/c/14745/7/src/kudu/util/block_bloom_filter.cc
File src/kudu/util/block_bloom_filter.cc:

http://gerrit.cloudera.org:8080/#/c/14745/7/src/kudu/util/block_bloom_filter.cc@177
PS7, Line 177: this->
Do you actually need this-> here and below?



-- 
To view, visit http://gerrit.cloudera.org:8080/14745
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I89c54a051c5093cf5fb81481a47a0a6677d7d906
Gerrit-Change-Number: 14745
Gerrit-PatchSet: 7
Gerrit-Owner: Bankim Bhavsar <ba...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Bankim Bhavsar <ba...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Thu, 05 Dec 2019 19:22:05 +0000
Gerrit-HasComments: Yes

[kudu-CR] [util] Import Impala's block based BloomFilter

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has removed Kudu Jenkins from this change.  ( http://gerrit.cloudera.org:8080/14745 )

Change subject: [util] Import Impala's block based BloomFilter
......................................................................


Removed reviewer Kudu Jenkins with the following votes:

* Verified-1 by Kudu Jenkins (120)
-- 
To view, visit http://gerrit.cloudera.org:8080/14745
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: deleteReviewer
Gerrit-Change-Id: I89c54a051c5093cf5fb81481a47a0a6677d7d906
Gerrit-Change-Number: 14745
Gerrit-PatchSet: 9
Gerrit-Owner: Bankim Bhavsar <ba...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Bankim Bhavsar <ba...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] Import Impala's blocked based BloomFilter

Posted by "Bankim Bhavsar (Code Review)" <ge...@cloudera.org>.
Hello Thomas Tauber-Marshall, Tidy Bot, Alexey Serbin, Kudu Jenkins, Andrew Wong, Adar Dembo, Todd Lipcon, Tim Armstrong, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/14745

to look at the new patch set (#3).

Change subject: Import Impala's blocked based BloomFilter
......................................................................

Import Impala's blocked based BloomFilter

This change imports Impala's blocked based BloomFilter that uses tiny BloomFilters that fit
in cache lines. Plan is to use this BloomFilter for pushing down predicate from Impala.

Added this BF as part of kudu-util and renamed to BlockBloomFilter to avoid name collision
with existing BloomFilter in kudu.

Removed dependencies on generated code like Thrift/Protobuf, Impala specific
utilities like BufferPool, CPUInfo.

Added a simple BufferAllocator interface to allow integration with Impala
that uses its own BufferPool. Added a DefaultBufferAllocator implementation derived
from earlier version of the BF in Impala.

This BF uses AVX2 operations which may not be available on target CPU when run.
So the AVX2 specific methods have been moved to a separate file and compiled with -mavx2
flag and a runtime check is made to invoke the method only if the CPU supports AVX2.

Compiler may not support AVX2, hence the file block_bloom_filter_avx2.cc is conditionally
compiled.

Made modifications mostly to adhere to the coding style and make the bots pass.

TODO: Benchmarking and compare against existing Kudu BloomFilter.

Change-Id: I89c54a051c5093cf5fb81481a47a0a6677d7d906
---
M src/kudu/util/CMakeLists.txt
A src/kudu/util/block_bloom_filter-test.cc
A src/kudu/util/block_bloom_filter.cc
A src/kudu/util/block_bloom_filter.h
A src/kudu/util/block_bloom_filter_avx2.cc
5 files changed, 770 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/45/14745/3
-- 
To view, visit http://gerrit.cloudera.org:8080/14745
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I89c54a051c5093cf5fb81481a47a0a6677d7d906
Gerrit-Change-Number: 14745
Gerrit-PatchSet: 3
Gerrit-Owner: Bankim Bhavsar <ba...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Bankim Bhavsar <ba...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] Import Impala's blocked based BloomFilter

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/14745 )

Change subject: Import Impala's blocked based BloomFilter
......................................................................


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/14745/2/src/kudu/util/CMakeLists.txt
File src/kudu/util/CMakeLists.txt:

http://gerrit.cloudera.org:8080/#/c/14745/2/src/kudu/util/CMakeLists.txt@246
PS2, Line 246: set_source_files_properties(block_bloom_filter_avx2.cc PROPERTIES COMPILE_FLAGS "-mavx2")
> True, there will be compilation failure on a compiler that doesn't support 
It's the __attribute__((__target__("avx2"))) on the functions that use AVX2 that does the trick. We only compile with clang and gcc 4.9 so might just not be running into the same issue.



-- 
To view, visit http://gerrit.cloudera.org:8080/14745
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I89c54a051c5093cf5fb81481a47a0a6677d7d906
Gerrit-Change-Number: 14745
Gerrit-PatchSet: 2
Gerrit-Owner: Bankim Bhavsar <ba...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Bankim Bhavsar <ba...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Fri, 22 Nov 2019 21:34:05 +0000
Gerrit-HasComments: Yes

[kudu-CR] Import Impala's blocked based BloomFilter

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has posted comments on this change. ( http://gerrit.cloudera.org:8080/14745 )

Change subject: Import Impala's blocked based BloomFilter
......................................................................


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/14745/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/14745/1//COMMIT_MSG@14
PS1, Line 14: 
            : Created a separate sub-directory kudu/impala/util
            : and retained impala namespace to distinguish
            : between existing kudu BloomFilter and this one.
> Thanks Adar, Alexey and Tim for the feedback.
Works for me.



-- 
To view, visit http://gerrit.cloudera.org:8080/14745
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I89c54a051c5093cf5fb81481a47a0a6677d7d906
Gerrit-Change-Number: 14745
Gerrit-PatchSet: 1
Gerrit-Owner: Bankim Bhavsar <ba...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Bankim Bhavsar <ba...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Tue, 19 Nov 2019 23:51:51 +0000
Gerrit-HasComments: Yes

[kudu-CR] Import Impala's blocked based BloomFilter

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has posted comments on this change. ( http://gerrit.cloudera.org:8080/14745 )

Change subject: Import Impala's blocked based BloomFilter
......................................................................


Patch Set 5:

(13 comments)

http://gerrit.cloudera.org:8080/#/c/14745/5//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/14745/5//COMMIT_MSG@22
PS5, Line 22: This BF uses AVX2 operations which may not be available on target CPU when run.
            : So the AVX2 specific methods have been moved to a separate file and compiled with -mavx2
            : flag and a runtime check is made to invoke the method only if the CPU supports AVX2.
            : 
            : Compiler may not support AVX2, hence the file block_bloom_filter_avx2.cc is conditionally
            : compiled.
I'd restructure this to more clearly explain the different states:
1. Compiler lacks AVX2 support.
2. Compiler has AVX2 support but CPU at runtime does not.
3. Compiler has AVX2 support and CPU at runtime does too.

In general do you know which segment(s) of our supported distro matrix fall into state #1?


http://gerrit.cloudera.org:8080/#/c/14745/5/src/kudu/util/CMakeLists.txt
File src/kudu/util/CMakeLists.txt:

http://gerrit.cloudera.org:8080/#/c/14745/5/src/kudu/util/CMakeLists.txt@248
PS5, Line 248: execute_process (
Nit: execute_process(


http://gerrit.cloudera.org:8080/#/c/14745/5/src/kudu/util/CMakeLists.txt@256
PS5, Line 256: if (${AVX2_SUPPORT})
I think this might work too:

  if (AVX2_SUPPORT)


http://gerrit.cloudera.org:8080/#/c/14745/5/src/kudu/util/CMakeLists.txt@258
PS5, Line 258:   set_source_files_properties(block_bloom_filter_avx2.cc PROPERTIES COMPILE_FLAGS "-mavx2")
Seems like COMPILE_OPTIONS is preferred now: https://cmake.org/cmake/help/latest/prop_sf/COMPILE_OPTIONS.html

I see we use COMPILE_FLAGS pretty universally though, so if you'd like to do that here and submit a follow-on change to fix up all instances, that's fine too.


http://gerrit.cloudera.org:8080/#/c/14745/5/src/kudu/util/block_bloom_filter-test.cc
File src/kudu/util/block_bloom_filter-test.cc:

http://gerrit.cloudera.org:8080/#/c/14745/5/src/kudu/util/block_bloom_filter-test.cc@88
PS5, Line 88:   vector<shared_ptr<BlockBloomFilter>> bloom_filters_;
Based on usage, doesn't seem like shared ownership is warranted here. How about a regular vector of BBFs?


http://gerrit.cloudera.org:8080/#/c/14745/5/src/kudu/util/block_bloom_filter-test.cc@124
PS5, Line 124:   SeedRandom();
You can SeedRandom() once in the test fixture constructor (or SetUp) instead of in each test.


http://gerrit.cloudera.org:8080/#/c/14745/5/src/kudu/util/block_bloom_filter.h
File src/kudu/util/block_bloom_filter.h:

http://gerrit.cloudera.org:8080/#/c/14745/5/src/kudu/util/block_bloom_filter.h@37
PS5, Line 37: // A BloomFilter stores sets of items and offers a query operation indicating whether or
May want to add something here about how this is different from bloom_filter.{cc,h}. Same for the existing implementation: a breadcrumb of sorts so anyone who begins reading either one can immediately understand the difference, at least at a high level.


http://gerrit.cloudera.org:8080/#/c/14745/5/src/kudu/util/block_bloom_filter.h@63
PS5, Line 63:   virtual Status AllocateBuffer(size_t bytes, void **ptr) = 0;
            :   virtual void FreeBuffer(void *ptr) = 0;
Nit: void** and void* (space after the asterisk).

Check the rest of the new files too. Same goes for references, if there are any.


http://gerrit.cloudera.org:8080/#/c/14745/5/src/kudu/util/block_bloom_filter.h@134
PS5, Line 134:   // The BloomFilter is divided up into Buckets
Nit: comments should terminate with a period, here and elsewhere.


http://gerrit.cloudera.org:8080/#/c/14745/5/src/kudu/util/block_bloom_filter.h@188
PS5, Line 188: 
             :   // Detect at run-time whether CPU supports AVX2
             :   bool has_avx2() const {
             :     return !FLAGS_disable_blockbloomfilter_avx2 && cpu_.has_avx2();
             :   }
These checks are on the hot path; they add a few branches to each BF operation. Could we do what bitshuffle_arch_wrapper.cc does and initialize some function pointers instead?

Doing so at BF construction time (or Init time) would be sufficient; no need to use __attribute__((constructor)) here.


http://gerrit.cloudera.org:8080/#/c/14745/5/src/kudu/util/block_bloom_filter.cc
File src/kudu/util/block_bloom_filter.cc:

http://gerrit.cloudera.org:8080/#/c/14745/5/src/kudu/util/block_bloom_filter.cc@48
PS5, Line 48:   cpu_(base::CPU()) {}
Depending on how often BBFs are constructed/destroyed, this may get expensive. Do you have a sense for their expected lifetimes?


http://gerrit.cloudera.org:8080/#/c/14745/5/src/kudu/util/block_bloom_filter.cc@187
PS5, Line 187:   int ret_code = posix_memalign(ptr, 64, bytes);
Why is this alignment required? I'll take my answer in the form of a code comment.


http://gerrit.cloudera.org:8080/#/c/14745/5/src/kudu/util/block_bloom_filter.cc@192
PS5, Line 192:   DCHECK(ptr != nullptr);
DCHECK_NE(ptr, nullptr) is preferred (clearer log message on a crash).

Elsewhere too.



-- 
To view, visit http://gerrit.cloudera.org:8080/14745
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I89c54a051c5093cf5fb81481a47a0a6677d7d906
Gerrit-Change-Number: 14745
Gerrit-PatchSet: 5
Gerrit-Owner: Bankim Bhavsar <ba...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Bankim Bhavsar <ba...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Wed, 27 Nov 2019 22:27:04 +0000
Gerrit-HasComments: Yes

[kudu-CR] [util] Import Impala's block based BloomFilter

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/14745 )

Change subject: [util] Import Impala's block based BloomFilter
......................................................................


Patch Set 9: Code-Review+1


-- 
To view, visit http://gerrit.cloudera.org:8080/14745
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I89c54a051c5093cf5fb81481a47a0a6677d7d906
Gerrit-Change-Number: 14745
Gerrit-PatchSet: 9
Gerrit-Owner: Bankim Bhavsar <ba...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Bankim Bhavsar <ba...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Fri, 06 Dec 2019 03:09:25 +0000
Gerrit-HasComments: No

[kudu-CR] [util] Import Impala's block based BloomFilter

Posted by "Bankim Bhavsar (Code Review)" <ge...@cloudera.org>.
Bankim Bhavsar has posted comments on this change. ( http://gerrit.cloudera.org:8080/14745 )

Change subject: [util] Import Impala's block based BloomFilter
......................................................................


Patch Set 9:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/14745/9/src/kudu/util/block_bloom_filter.cc
File src/kudu/util/block_bloom_filter.cc:

http://gerrit.cloudera.org:8080/#/c/14745/9/src/kudu/util/block_bloom_filter.cc@83
PS9, Line 83: directory_mask_ = (1ULL << std::min(63, log_num_buckets_)) - 1;
> seems 'log_num_buckets_' is always less than 63 according to Line77.
Done


http://gerrit.cloudera.org:8080/#/c/14745/9/src/kudu/util/block_bloom_filter.cc@108
PS9, Line 108: new_bucket[i] =
             :         (kRehash[i] * hash) >> ((1 << kLogBucketWordBits) - kLogBucketWordBits);
> The code would look nicer if we didn't wrap(newline) it, what do you think?
Done


http://gerrit.cloudera.org:8080/#/c/14745/9/src/kudu/util/block_bloom_filter.cc@113
PS9, Line 113:  __m128i new_bucket_sse =
             :         _mm_load_si128(reinterpret_cast<__m128i*>(new_bucket + 4 * i));
> the same.
Done


http://gerrit.cloudera.org:8080/#/c/14745/9/src/kudu/util/block_bloom_filter.cc@124
PS9, Line 124: BucketWord hval =
             :         (kRehash[i] * hash) >> ((1 << kLogBucketWordBits) - kLogBucketWordBits);
> the same.
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/14745
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I89c54a051c5093cf5fb81481a47a0a6677d7d906
Gerrit-Change-Number: 14745
Gerrit-PatchSet: 9
Gerrit-Owner: Bankim Bhavsar <ba...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Bankim Bhavsar <ba...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: helifu <hz...@corp.netease.com>
Gerrit-Comment-Date: Tue, 10 Dec 2019 19:06:47 +0000
Gerrit-HasComments: Yes

[kudu-CR] Import Impala's blocked based BloomFilter

Posted by "Bankim Bhavsar (Code Review)" <ge...@cloudera.org>.
Bankim Bhavsar has posted comments on this change. ( http://gerrit.cloudera.org:8080/14745 )

Change subject: Import Impala's blocked based BloomFilter
......................................................................


Patch Set 4:

(1 comment)

> Patch Set 3:
> 
> (1 comment)

http://gerrit.cloudera.org:8080/#/c/14745/3/src/kudu/util/CMakeLists.txt
File src/kudu/util/CMakeLists.txt:

http://gerrit.cloudera.org:8080/#/c/14745/3/src/kudu/util/CMakeLists.txt@260
PS3, Line 260:   # to know at compile time whether AVX2 support is available, hence the custom definition
> There appears to be a mechanism to set for particular file. Figuring it out
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/14745
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I89c54a051c5093cf5fb81481a47a0a6677d7d906
Gerrit-Change-Number: 14745
Gerrit-PatchSet: 4
Gerrit-Owner: Bankim Bhavsar <ba...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Bankim Bhavsar <ba...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Tue, 26 Nov 2019 18:44:22 +0000
Gerrit-HasComments: Yes

[kudu-CR] Import Impala's blocked based BloomFilter

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has posted comments on this change. ( http://gerrit.cloudera.org:8080/14745 )

Change subject: Import Impala's blocked based BloomFilter
......................................................................


Patch Set 2:

(6 comments)

http://gerrit.cloudera.org:8080/#/c/14745/2/src/kudu/util/CMakeLists.txt
File src/kudu/util/CMakeLists.txt:

http://gerrit.cloudera.org:8080/#/c/14745/2/src/kudu/util/CMakeLists.txt@246
PS2, Line 246: set_source_files_properties(block_bloom_filter_avx2.cc PROPERTIES COMPILE_FLAGS "-mavx2")
Hardcoding -mavx2 like this will cause problems in architectures that don't support AVX2. Should we take a page from bitshuffle and test for AVX2 support first?


http://gerrit.cloudera.org:8080/#/c/14745/2/src/kudu/util/block_bloom_filter-test.cc
File src/kudu/util/block_bloom_filter-test.cc:

http://gerrit.cloudera.org:8080/#/c/14745/2/src/kudu/util/block_bloom_filter-test.cc@52
PS2, Line 52:   void BfInsert(const std::shared_ptr<BlockBloomFilter>& bf, uint32_t h) {
Consider adding using statements for this and the other STL usage in this file.


http://gerrit.cloudera.org:8080/#/c/14745/2/src/kudu/util/block_bloom_filter-test.cc@96
PS2, Line 96:   srand(0);
Instead of these calls, consider SeedRandom() from KuduTest, which makes it easier to repro a test failure.


http://gerrit.cloudera.org:8080/#/c/14745/2/src/kudu/util/block_bloom_filter.h
File src/kudu/util/block_bloom_filter.h:

http://gerrit.cloudera.org:8080/#/c/14745/2/src/kudu/util/block_bloom_filter.h@18
PS2, Line 18: #ifndef IMPALA_UTIL_BLOOM_H
            : #define IMPALA_UTIL_BLOOM_H
Replace with #pragma once


http://gerrit.cloudera.org:8080/#/c/14745/2/src/kudu/util/block_bloom_filter.h@39
PS2, Line 39: /// A BloomFilter stores sets of items and offers a query operation indicating whether or
Nit: Kudu style uses double slashes for comments rather than triple slashes.


http://gerrit.cloudera.org:8080/#/c/14745/2/src/kudu/util/block_bloom_filter.h@63
PS2, Line 63: class BloomFilterBufferAllocatorIf {
            :  public:
            :   virtual Status AllocateBuffer(size_t bytes, void **ptr) = 0;
            :   virtual void FreeBuffer(void *ptr) = 0;
            : };
            : 
            : class DefaultBloomFilterBufferAllocator : public BloomFilterBufferAllocatorIf {
            :  public:
            :   // Required for Singleton
            :   DefaultBloomFilterBufferAllocator() = default;
            : 
            :   Status AllocateBuffer(size_t bytes, void **ptr) override;
            :   void FreeBuffer(void *ptr) override;
            : 
            :   static DefaultBloomFilterBufferAllocator* GetSingleton();
            :  private:
            :   DISALLOW_COPY_AND_ASSIGN(DefaultBloomFilterBufferAllocator);
            : };
Should these class names also include the word Block?



-- 
To view, visit http://gerrit.cloudera.org:8080/14745
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I89c54a051c5093cf5fb81481a47a0a6677d7d906
Gerrit-Change-Number: 14745
Gerrit-PatchSet: 2
Gerrit-Owner: Bankim Bhavsar <ba...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Bankim Bhavsar <ba...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Fri, 22 Nov 2019 19:37:39 +0000
Gerrit-HasComments: Yes

[kudu-CR] [util] Import Impala's block based BloomFilter

Posted by "Bankim Bhavsar (Code Review)" <ge...@cloudera.org>.
Hello Thomas Tauber-Marshall, Tidy Bot, Alexey Serbin, Kudu Jenkins, Andrew Wong, Adar Dembo, Todd Lipcon, Tim Armstrong, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/14745

to look at the new patch set (#8).

Change subject: [util] Import Impala's block based BloomFilter
......................................................................

[util] Import Impala's block based BloomFilter

This change imports Impala's block based BloomFilter(BF) that uses tiny
BloomFilters that are space, cache and hash efficient.
Plan is to use this BloomFilter for pushing down predicates from Impala.

Added this BloomFilter in kudu-util and renamed to BlockBloomFilter to
avoid name collision with existing BloomFilter in Kudu.

Removed dependencies on auto-generated code like Thrift/Protobuf,
Impala specific utilities like BufferPool, CPUInfo.

Added a simple BlockBloomFilterBufferAllocatorIf interface to allow
integration with Impala that uses its own BufferPool.
Added a DefaultBlockBloomFilterBufferAllocator implementation derived
from earlier version in Impala.

AVX2 operations are used which leads to following scenarios:
1) Compiler lacks AVX2 support.
Code is conditionally compiled using USE_AVX2 macro.
2) Compiler supports AVX2 but CPU at runtime does not.
Function pointers used to dispatch to generic implementation.
3) Compiler supports AVX2 and CPU at runtime does too.
Function pointers used to dispatch to AVX2 specific implementation.

Made bunch of cosmetic changes to adhere to coding style.

Change-Id: I89c54a051c5093cf5fb81481a47a0a6677d7d906
---
M src/kudu/util/CMakeLists.txt
A src/kudu/util/block_bloom_filter-test.cc
A src/kudu/util/block_bloom_filter.cc
A src/kudu/util/block_bloom_filter.h
A src/kudu/util/block_bloom_filter_avx2.cc
M src/kudu/util/bloom_filter.h
6 files changed, 776 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/45/14745/8
-- 
To view, visit http://gerrit.cloudera.org:8080/14745
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I89c54a051c5093cf5fb81481a47a0a6677d7d906
Gerrit-Change-Number: 14745
Gerrit-PatchSet: 8
Gerrit-Owner: Bankim Bhavsar <ba...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Bankim Bhavsar <ba...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] [util] Import Impala's block based BloomFilter

Posted by "Andrew Wong (Code Review)" <ge...@cloudera.org>.
Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/14745 )

Change subject: [util] Import Impala's block based BloomFilter
......................................................................


Patch Set 12: Code-Review+2

Carrying forward Adar's +2


-- 
To view, visit http://gerrit.cloudera.org:8080/14745
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I89c54a051c5093cf5fb81481a47a0a6677d7d906
Gerrit-Change-Number: 14745
Gerrit-PatchSet: 12
Gerrit-Owner: Bankim Bhavsar <ba...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Bankim Bhavsar <ba...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: helifu <hz...@corp.netease.com>
Gerrit-Comment-Date: Mon, 16 Dec 2019 19:34:23 +0000
Gerrit-HasComments: No

[kudu-CR] Import Impala's blocked based BloomFilter

Posted by "Bankim Bhavsar (Code Review)" <ge...@cloudera.org>.
Bankim Bhavsar has posted comments on this change. ( http://gerrit.cloudera.org:8080/14745 )

Change subject: Import Impala's blocked based BloomFilter
......................................................................


Patch Set 4:

> Patch Set 4: Verified-1
> 
> Build Failed 
> 
> http://jenkins.kudu.apache.org/job/kudu-gerrit/19688/ : FAILURE

> /home/jenkins-slave/workspace/kudu-master/1/src/kudu/util/block_bloom_filter-test.cc:179
> Expected: (found) <= (find_limit * expected_fpp * 8), actual: 51 vs 48.0793
> Too many false positives with -log2(fpp) = 11

Looks like change to use SeedRandom() that uses current time instead of earlier fixed seed srand(0) is exposing some issue in the false-positive rate observed.

I took a peek why the expected range of fpp is 1x >= expected fpp <= 8x but couldn't find a good reason looking at Impala git history.

Investigating further...


-- 
To view, visit http://gerrit.cloudera.org:8080/14745
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I89c54a051c5093cf5fb81481a47a0a6677d7d906
Gerrit-Change-Number: 14745
Gerrit-PatchSet: 4
Gerrit-Owner: Bankim Bhavsar <ba...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Bankim Bhavsar <ba...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Tue, 26 Nov 2019 19:19:27 +0000
Gerrit-HasComments: No

[kudu-CR] Import Impala's blocked based BloomFilter

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/14745 )

Change subject: Import Impala's blocked based BloomFilter
......................................................................


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/14745/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/14745/1//COMMIT_MSG@14
PS1, Line 14: 
            : Created a separate sub-directory kudu/impala/util
            : and retained impala namespace to distinguish
            : between existing kudu BloomFilter and this one.
> Since Impala consumes kudu_util, would it be possible to incorporate the Im
I'd also suggest just making it a first-class citizen in Kudu's util directory. In some ways it would be nice to generalise it enough that it could be consumed by impala, but I suspect it's more work overall to maintain one general-purpose implementation vs two special-purpose implementations, since the core logic of the bloom filter should be low-churn, but the integrations with Impala query execution are more subject to change, e.g. this patch https://gerrit.cloudera.org/#/c/14538/.

The buffer pool and thrift/protobuf integration on the Impala side, in particular, would require work to abstract and just not sure it's worth it.

If we did want to share things, I kinda wonder if it would be better to just have static functions that implement the core algorithms (insert, find, or, maxndv, minlogspace, falsepositiveprob) on raw arrays.



-- 
To view, visit http://gerrit.cloudera.org:8080/14745
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I89c54a051c5093cf5fb81481a47a0a6677d7d906
Gerrit-Change-Number: 14745
Gerrit-PatchSet: 1
Gerrit-Owner: Bankim Bhavsar <ba...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Bankim Bhavsar <ba...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Tue, 19 Nov 2019 21:32:01 +0000
Gerrit-HasComments: Yes

[kudu-CR] Import Impala's blocked based BloomFilter

Posted by "Bankim Bhavsar (Code Review)" <ge...@cloudera.org>.
Bankim Bhavsar has posted comments on this change. ( http://gerrit.cloudera.org:8080/14745 )

Change subject: Import Impala's blocked based BloomFilter
......................................................................


Patch Set 4:

> Patch Set 4:
> 
> > Patch Set 4: Verified-1
> > 
> > Build Failed 
> > 
> > http://jenkins.kudu.apache.org/job/kudu-gerrit/19688/ : FAILURE
> 
> > /home/jenkins-slave/workspace/kudu-master/1/src/kudu/util/block_bloom_filter-test.cc:179
> > Expected: (found) <= (find_limit * expected_fpp * 8), actual: 51 vs 48.0793
> > Too many false positives with -log2(fpp) = 11
> 
> Looks like change to use SeedRandom() that uses current time instead of earlier fixed seed srand(0) is exposing some issue in the false-positive rate observed.
> 
> I took a peek why the expected range of fpp is 1x >= expected fpp <= 8x but couldn't find a good reason looking at Impala git history.
> 
> Investigating further...

With dist-test looping for 1000 times or so with random seed, FindInvalid test would fail ~10% for lower false positive rate (1.0/ (1 << 11)) to (1.0 / (1 << 15)).

I also tried different random number generation mechanisms but results were same.

Bumped up the range for assertion from 8x and 16x and haven't seen any failures with multiple dist test runs.


-- 
To view, visit http://gerrit.cloudera.org:8080/14745
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I89c54a051c5093cf5fb81481a47a0a6677d7d906
Gerrit-Change-Number: 14745
Gerrit-PatchSet: 4
Gerrit-Owner: Bankim Bhavsar <ba...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Bankim Bhavsar <ba...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Wed, 27 Nov 2019 16:27:55 +0000
Gerrit-HasComments: No

[kudu-CR] Import Impala's blocked based BloomFilter

Posted by "Bankim Bhavsar (Code Review)" <ge...@cloudera.org>.
Bankim Bhavsar has posted comments on this change. ( http://gerrit.cloudera.org:8080/14745 )

Change subject: Import Impala's blocked based BloomFilter
......................................................................


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/14745/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/14745/1//COMMIT_MSG@14
PS1, Line 14: 
            : Created a separate sub-directory kudu/impala/util
            : and retained impala namespace to distinguish
            : between existing kudu BloomFilter and this one.
> If I were you I'd fully kudu-ify the code - i.e. prune out unnecessary stuf
Thanks Adar, Alexey and Tim for the feedback.

This is what I plan to do:
- Don't create a separate impala/util sub-directory and instead import the BloomFilter into kudu/util. Rename it to BlockBloomFilter under kudu namespace to distinguish from the existing BloomFilter in kudu/util.
- For the buffer pool stuff, incorporate the suggestion from Todd of using an interface that allocates/de-allocates memory and add a default implementation that uses malloc/posix_memalign. This way it'll be easier to consume the BF in Impala from kudu-util.
- Use existing utilities from gutil for cpu info and pretty printer as pointed out by Alexey and Adar. So no need to those from Impala.
- Like in this change, BF will not depend on any generated code and not implement any Thrift/Protobuf related methods.



-- 
To view, visit http://gerrit.cloudera.org:8080/14745
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I89c54a051c5093cf5fb81481a47a0a6677d7d906
Gerrit-Change-Number: 14745
Gerrit-PatchSet: 1
Gerrit-Owner: Bankim Bhavsar <ba...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Bankim Bhavsar <ba...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Tue, 19 Nov 2019 22:44:28 +0000
Gerrit-HasComments: Yes

[kudu-CR] Import Impala's blocked based BloomFilter

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has posted comments on this change. ( http://gerrit.cloudera.org:8080/14745 )

Change subject: Import Impala's blocked based BloomFilter
......................................................................


Patch Set 1:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/14745/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/14745/1//COMMIT_MSG@12
PS1, Line 12: Plan is to use this BloomFilter for pushing down
            : predicate from Impala.
Will you also evaluate whether Impala's BF could be used in Kudu's diskrowset BFs? We can't change the on-disk BF format (well, we can, but would need to do so in a backwards compatible way), but perhaps we can adapt the on-disk state into Impala's BF in memory?


http://gerrit.cloudera.org:8080/#/c/14745/1//COMMIT_MSG@14
PS1, Line 14: 
            : Created a separate sub-directory kudu/impala/util
            : and retained impala namespace to distinguish
            : between existing kudu BloomFilter and this one.
Since Impala consumes kudu_util, would it be possible to incorporate the Impala bloom filter into kudu_util (leveraging existing Kudu dependencies as needed), drop it from Impala altogether, and have Impala get it via its kudu_util dependency?

I'm a little hesitant to add those Impala dependencies into Kudu, especially if there exists a way to solve this problem without doing so.


http://gerrit.cloudera.org:8080/#/c/14745/1/src/kudu/impala/util/compiler-util.h
File src/kudu/impala/util/compiler-util.h:

PS1: 
We have a bunch (if not all) of this already in gutil/macros.h, gutil/port.h, and likely other headers in gitul.


http://gerrit.cloudera.org:8080/#/c/14745/1/src/kudu/impala/util/cpu-info.cc
File src/kudu/impala/util/cpu-info.cc:

PS1: 
Likewise this may overlap somewhat with gutil/cpu.{cc,h} and gutil/sysinfo.{cc,h}


http://gerrit.cloudera.org:8080/#/c/14745/1/src/kudu/impala/util/pretty-printer.h
File src/kudu/impala/util/pretty-printer.h:

PS1: 
Subsumed by gutil/strings/human_readable.{cc,h}?



-- 
To view, visit http://gerrit.cloudera.org:8080/14745
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I89c54a051c5093cf5fb81481a47a0a6677d7d906
Gerrit-Change-Number: 14745
Gerrit-PatchSet: 1
Gerrit-Owner: Bankim Bhavsar <ba...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Bankim Bhavsar <ba...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Tue, 19 Nov 2019 20:58:33 +0000
Gerrit-HasComments: Yes

[kudu-CR] Import Impala's blocked based BloomFilter

Posted by "Bankim Bhavsar (Code Review)" <ge...@cloudera.org>.
Bankim Bhavsar has posted comments on this change. ( http://gerrit.cloudera.org:8080/14745 )

Change subject: Import Impala's blocked based BloomFilter
......................................................................


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/14745/2/src/kudu/util/CMakeLists.txt
File src/kudu/util/CMakeLists.txt:

http://gerrit.cloudera.org:8080/#/c/14745/2/src/kudu/util/CMakeLists.txt@246
PS2, Line 246: set_source_files_properties(block_bloom_filter_avx2.cc PROPERTIES COMPILE_FLAGS "-mavx2")
> Hardcoding -mavx2 like this will cause problems in architectures that don't
True, there will be compilation failure on a compiler that doesn't support AVX2. There is already a run-time CPU check.
So basically we need to support 2x2 scenarios of compiler and run-time check.

It's surprising that Impala doesn't specify -mavx2 flag and doesn't have any compiler check....

Let me figure how to achieve this.



-- 
To view, visit http://gerrit.cloudera.org:8080/14745
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I89c54a051c5093cf5fb81481a47a0a6677d7d906
Gerrit-Change-Number: 14745
Gerrit-PatchSet: 2
Gerrit-Owner: Bankim Bhavsar <ba...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Bankim Bhavsar <ba...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Fri, 22 Nov 2019 20:04:59 +0000
Gerrit-HasComments: Yes

[kudu-CR] [util] Import Impala's block based BloomFilter

Posted by "Bankim Bhavsar (Code Review)" <ge...@cloudera.org>.
Hello Thomas Tauber-Marshall, Tidy Bot, Alexey Serbin, Kudu Jenkins, Andrew Wong, Adar Dembo, Todd Lipcon, Tim Armstrong, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/14745

to look at the new patch set (#7).

Change subject: [util] Import Impala's block based BloomFilter
......................................................................

[util] Import Impala's block based BloomFilter

This change imports Impala's block based BloomFilter(BF) that uses tiny
BloomFilters that are space, cache and hash efficient.
Plan is to use this BloomFilter for pushing down predicates from Impala.

Added this BloomFilter in kudu-util and renamed to BlockBloomFilter to
avoid name collision with existing BloomFilter in Kudu.

Removed dependencies on auto-generated code like Thrift/Protobuf,
Impala specific utilities like BufferPool, CPUInfo.

Added a simple BlockBloomFilterBufferAllocatorIf interface to allow
integration with Impala that uses its own BufferPool.
Added a DefaultBlockBloomFilterBufferAllocator implementation derived
from earlier version in Impala.

AVX2 operations are used which leads to following scenarios:
1) Compiler lacks AVX2 support.
Code is conditionally compiled using USE_AVX2 macro.
2) Compiler supports AVX2 but CPU at runtime does not.
Function pointers used to dispatch to generic implementation.
3) Compiler supports AVX2 and CPU at runtime does too.
Function pointers used to dispatch to AVX2 specific implementation.

Made bunch of cosmetic changes to adhere to coding style.

Change-Id: I89c54a051c5093cf5fb81481a47a0a6677d7d906
---
M src/kudu/util/CMakeLists.txt
A src/kudu/util/block_bloom_filter-test.cc
A src/kudu/util/block_bloom_filter.cc
A src/kudu/util/block_bloom_filter.h
A src/kudu/util/block_bloom_filter_avx2.cc
M src/kudu/util/bloom_filter.h
6 files changed, 769 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/45/14745/7
-- 
To view, visit http://gerrit.cloudera.org:8080/14745
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I89c54a051c5093cf5fb81481a47a0a6677d7d906
Gerrit-Change-Number: 14745
Gerrit-PatchSet: 7
Gerrit-Owner: Bankim Bhavsar <ba...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Bankim Bhavsar <ba...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] [util] Import Impala's block based BloomFilter

Posted by "Bankim Bhavsar (Code Review)" <ge...@cloudera.org>.
Hello Thomas Tauber-Marshall, Tidy Bot, Alexey Serbin, Kudu Jenkins, Andrew Wong, Adar Dembo, Todd Lipcon, Tim Armstrong, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/14745

to look at the new patch set (#6).

Change subject: [util] Import Impala's block based BloomFilter
......................................................................

[util] Import Impala's block based BloomFilter

This change imports Impala's block based BloomFilter(BF) that uses tiny
BloomFilters that are space, cache and hash efficient.
Plan is to use this BloomFilter for pushing down predicates from Impala.

Added this BloomFilter in kudu-util and renamed to BlockBloomFilter to
avoid name collision with existing BloomFilter in Kudu.

Removed dependencies on auto-generated code like Thrift/Protobuf,
Impala specific utilities like BufferPool, CPUInfo.

Added a simple BlockBloomFilterBufferAllocatorIf interface to allow
integration with Impala that uses its own BufferPool.
Added a DefaultBlockBloomFilterBufferAllocator implementation derived
from earlier version in Impala.

AVX2 operations are used which leads to following scenarios:
1) Compiler lacks AVX2 support.
Code is conditionally compiled using USE_AVX2 macro.
2) Compiler supports AVX2 but CPU at runtime does not.
Function pointers used to dispatch to generic implementation.
3) Compiler supports AVX2 and CPU at runtime does too.
Function pointers used to dispatch to AVX2 specific implementation.

Made bunch of cosmetic changes to adhere to coding style.

Change-Id: I89c54a051c5093cf5fb81481a47a0a6677d7d906
---
M src/kudu/util/CMakeLists.txt
A src/kudu/util/block_bloom_filter-test.cc
A src/kudu/util/block_bloom_filter.cc
A src/kudu/util/block_bloom_filter.h
A src/kudu/util/block_bloom_filter_avx2.cc
M src/kudu/util/bloom_filter.h
6 files changed, 769 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/45/14745/6
-- 
To view, visit http://gerrit.cloudera.org:8080/14745
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I89c54a051c5093cf5fb81481a47a0a6677d7d906
Gerrit-Change-Number: 14745
Gerrit-PatchSet: 6
Gerrit-Owner: Bankim Bhavsar <ba...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Bankim Bhavsar <ba...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] [util] Import Impala's block based BloomFilter

Posted by "Bankim Bhavsar (Code Review)" <ge...@cloudera.org>.
Hello Thomas Tauber-Marshall, Tidy Bot, Alexey Serbin, Kudu Jenkins, Andrew Wong, Adar Dembo, Todd Lipcon, Tim Armstrong, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/14745

to look at the new patch set (#9).

Change subject: [util] Import Impala's block based BloomFilter
......................................................................

[util] Import Impala's block based BloomFilter

This change imports Impala's block based BloomFilter(BF) that uses tiny
BloomFilters that are space, cache and hash efficient.
Plan is to use this BloomFilter for pushing down predicates from Impala.

Added this BloomFilter in kudu-util and renamed to BlockBloomFilter to
avoid name collision with existing BloomFilter in Kudu.

Removed dependencies on auto-generated code like Thrift/Protobuf,
Impala specific utilities like BufferPool, CPUInfo.

Added a simple BlockBloomFilterBufferAllocatorIf interface to allow
integration with Impala that uses its own BufferPool.
Added a DefaultBlockBloomFilterBufferAllocator implementation derived
from earlier version in Impala.

AVX2 operations are used which leads to following scenarios:
1) Compiler lacks AVX2 support.
Code is conditionally compiled using USE_AVX2 macro.
2) Compiler supports AVX2 but CPU at runtime does not.
Function pointers used to dispatch to generic implementation.
3) Compiler supports AVX2 and CPU at runtime does too.
Function pointers used to dispatch to AVX2 specific implementation.

Made bunch of cosmetic changes to adhere to coding style.

Change-Id: I89c54a051c5093cf5fb81481a47a0a6677d7d906
---
M src/kudu/util/CMakeLists.txt
A src/kudu/util/block_bloom_filter-test.cc
A src/kudu/util/block_bloom_filter.cc
A src/kudu/util/block_bloom_filter.h
A src/kudu/util/block_bloom_filter_avx2.cc
M src/kudu/util/bloom_filter.h
6 files changed, 780 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/45/14745/9
-- 
To view, visit http://gerrit.cloudera.org:8080/14745
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I89c54a051c5093cf5fb81481a47a0a6677d7d906
Gerrit-Change-Number: 14745
Gerrit-PatchSet: 9
Gerrit-Owner: Bankim Bhavsar <ba...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Bankim Bhavsar <ba...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] Import Impala's blocked based BloomFilter

Posted by "Bankim Bhavsar (Code Review)" <ge...@cloudera.org>.
Hello Thomas Tauber-Marshall, Alexey Serbin, Kudu Jenkins, Andrew Wong, Adar Dembo, Todd Lipcon, Tim Armstrong, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/14745

to look at the new patch set (#2).

Change subject: Import Impala's blocked based BloomFilter
......................................................................

Import Impala's blocked based BloomFilter

This change imports Impala's blocked based BloomFilter that uses tiny BloomFilters that fit
in cache lines. Plan is to use this BloomFilter for pushing down predicate from Impala.

Added this BF as part of kudu-util and renamed to BlockBloomFilter to avoid name collision
with existing BloomFilter in kudu.

Removed dependencies on generated code like Thrift/Protobuf, Impala specific
utilities like BufferPool, CPUInfo.

Added a simple BufferAllocator interface to allow integration with Impala
that uses its own BufferPool. Added a DefaultBufferAllocator implementation derived
from earlier version of the BF in Impala.

This BF uses AVX2 operations which may not be available on target CPU when run.
So the AVX2 specific methods have been moved to a separate file and compiled with -mavx2
flag and a runtime check is made to invoke the method only if the CPU supports AVX2.

Made modifications mostly to adhere to the coding style and make the bots pass.

TODO: Benchmarking and compare against existing Kudu BloomFilter.

Change-Id: I89c54a051c5093cf5fb81481a47a0a6677d7d906
---
M src/kudu/util/CMakeLists.txt
A src/kudu/util/block_bloom_filter-test.cc
A src/kudu/util/block_bloom_filter.cc
A src/kudu/util/block_bloom_filter.h
A src/kudu/util/block_bloom_filter_avx2.cc
5 files changed, 739 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/45/14745/2
-- 
To view, visit http://gerrit.cloudera.org:8080/14745
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I89c54a051c5093cf5fb81481a47a0a6677d7d906
Gerrit-Change-Number: 14745
Gerrit-PatchSet: 2
Gerrit-Owner: Bankim Bhavsar <ba...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Bankim Bhavsar <ba...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] Import Impala's blocked based BloomFilter

Posted by "Bankim Bhavsar (Code Review)" <ge...@cloudera.org>.
Bankim Bhavsar has posted comments on this change. ( http://gerrit.cloudera.org:8080/14745 )

Change subject: Import Impala's blocked based BloomFilter
......................................................................


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/14745/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/14745/1//COMMIT_MSG@14
PS1, Line 14: 
            : Created a separate sub-directory kudu/impala/util
            : and retained impala namespace to distinguish
            : between existing kudu BloomFilter and this one.
> Since Impala consumes kudu_util, would it be possible to incorporate the Im
This is will be tricky
1) Impala's BF uses the BufferPool memory. I basically a used a version of BF before that change.
2) Impala's BF also relies on other auto-generated code dependencies. In this import I removed those methods. So this import was not as-is.
https://github.com/apache/impala/blob/master/be/src/util/bloom-filter.h#L29

Core of the BloomFilter is small enough that we can import the necessary piece.

If ppl think strongly about it, I can look into separating out the buffer pool logic, generated code dependency and other dependencies etc.

What do the Impala folks think?



-- 
To view, visit http://gerrit.cloudera.org:8080/14745
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I89c54a051c5093cf5fb81481a47a0a6677d7d906
Gerrit-Change-Number: 14745
Gerrit-PatchSet: 1
Gerrit-Owner: Bankim Bhavsar <ba...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Bankim Bhavsar <ba...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Tue, 19 Nov 2019 21:42:52 +0000
Gerrit-HasComments: Yes

[kudu-CR] [util] Import Impala's block based BloomFilter

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has posted comments on this change. ( http://gerrit.cloudera.org:8080/14745 )

Change subject: [util] Import Impala's block based BloomFilter
......................................................................


Patch Set 9: Verified+1

Overriding Jenkins, unrelated test flake.

Anyone else want to take a look before we merge this?


-- 
To view, visit http://gerrit.cloudera.org:8080/14745
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I89c54a051c5093cf5fb81481a47a0a6677d7d906
Gerrit-Change-Number: 14745
Gerrit-PatchSet: 9
Gerrit-Owner: Bankim Bhavsar <ba...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Bankim Bhavsar <ba...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Fri, 06 Dec 2019 00:19:56 +0000
Gerrit-HasComments: No

[kudu-CR] Import Impala's blocked based BloomFilter

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/14745 )

Change subject: Import Impala's blocked based BloomFilter
......................................................................


Patch Set 5:

(7 comments)

Just nits

http://gerrit.cloudera.org:8080/#/c/14745/5//COMMIT_MSG
Commit Message:

PS5: 
A few nits on writing commit messages as per guidelines mentioned at https://kudu.apache.org/docs/contributing.html#_submitting_patches 

Please take a look at https://git-scm.com/book/en/v2/Distributed-Git-Contributing-to-a-Project#_commit_guidelines

As for formatting, please keep the lines in the description under 72 chars long, if possible.


http://gerrit.cloudera.org:8080/#/c/14745/5//COMMIT_MSG@7
PS5, Line 7: blocked based
block-based

?


http://gerrit.cloudera.org:8080/#/c/14745/5//COMMIT_MSG@9
PS5, Line 9: blocked based
block-based

?


http://gerrit.cloudera.org:8080/#/c/14745/5//COMMIT_MSG@10
PS5, Line 10: predicate
predicates

?


http://gerrit.cloudera.org:8080/#/c/14745/5//COMMIT_MSG@12
PS5, Line 12: part
a part


http://gerrit.cloudera.org:8080/#/c/14745/5//COMMIT_MSG@13
PS5, Line 13: kudu
Kudu


http://gerrit.cloudera.org:8080/#/c/14745/5//COMMIT_MSG@15
PS5, Line 15: generated
auto-generated



-- 
To view, visit http://gerrit.cloudera.org:8080/14745
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I89c54a051c5093cf5fb81481a47a0a6677d7d906
Gerrit-Change-Number: 14745
Gerrit-PatchSet: 5
Gerrit-Owner: Bankim Bhavsar <ba...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Bankim Bhavsar <ba...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Tue, 03 Dec 2019 04:47:17 +0000
Gerrit-HasComments: Yes

[kudu-CR] Import Impala's blocked based BloomFilter

Posted by "Bankim Bhavsar (Code Review)" <ge...@cloudera.org>.
Bankim Bhavsar has posted comments on this change. ( http://gerrit.cloudera.org:8080/14745 )

Change subject: Import Impala's blocked based BloomFilter
......................................................................


Patch Set 2:

(7 comments)

http://gerrit.cloudera.org:8080/#/c/14745/2/src/kudu/util/CMakeLists.txt
File src/kudu/util/CMakeLists.txt:

http://gerrit.cloudera.org:8080/#/c/14745/2/src/kudu/util/CMakeLists.txt@246
PS2, Line 246: set_source_files_properties(block_bloom_filter_avx2.cc PROPERTIES COMPILE_FLAGS "-mavx2")
> I guess we also use our own version of binutils that has the right support 
Done


http://gerrit.cloudera.org:8080/#/c/14745/2/src/kudu/util/block_bloom_filter-test.cc
File src/kudu/util/block_bloom_filter-test.cc:

http://gerrit.cloudera.org:8080/#/c/14745/2/src/kudu/util/block_bloom_filter-test.cc@52
PS2, Line 52:   void BfInsert(const std::shared_ptr<BlockBloomFilter>& bf, uint32_t h) {
> Consider adding using statements for this and the other STL usage in this f
Hmm. One of the bots complained about "using" directive. Added now.


http://gerrit.cloudera.org:8080/#/c/14745/2/src/kudu/util/block_bloom_filter-test.cc@96
PS2, Line 96:   srand(0);
> Instead of these calls, consider SeedRandom() from KuduTest, which makes it
Done


http://gerrit.cloudera.org:8080/#/c/14745/2/src/kudu/util/block_bloom_filter.h
File src/kudu/util/block_bloom_filter.h:

http://gerrit.cloudera.org:8080/#/c/14745/2/src/kudu/util/block_bloom_filter.h@18
PS2, Line 18: #ifndef IMPALA_UTIL_BLOOM_H
            : #define IMPALA_UTIL_BLOOM_H
> Replace with #pragma once
Done


http://gerrit.cloudera.org:8080/#/c/14745/2/src/kudu/util/block_bloom_filter.h@39
PS2, Line 39: /// A BloomFilter stores sets of items and offers a query operation indicating whether or
> Nit: Kudu style uses double slashes for comments rather than triple slashes
Done


http://gerrit.cloudera.org:8080/#/c/14745/2/src/kudu/util/block_bloom_filter.h@63
PS2, Line 63: class BloomFilterBufferAllocatorIf {
            :  public:
            :   virtual Status AllocateBuffer(size_t bytes, void **ptr) = 0;
            :   virtual void FreeBuffer(void *ptr) = 0;
            : };
            : 
            : class DefaultBloomFilterBufferAllocator : public BloomFilterBufferAllocatorIf {
            :  public:
            :   // Required for Singleton
            :   DefaultBloomFilterBufferAllocator() = default;
            : 
            :   Status AllocateBuffer(size_t bytes, void **ptr) override;
            :   void FreeBuffer(void *ptr) override;
            : 
            :   static DefaultBloomFilterBufferAllocator* GetSingleton();
            :  private:
            :   DISALLOW_COPY_AND_ASSIGN(DefaultBloomFilterBufferAllocator);
            : };
> Should these class names also include the word Block?
Done


http://gerrit.cloudera.org:8080/#/c/14745/2/src/kudu/util/block_bloom_filter_avx2.cc
File src/kudu/util/block_bloom_filter_avx2.cc:

http://gerrit.cloudera.org:8080/#/c/14745/2/src/kudu/util/block_bloom_filter_avx2.cc@20
PS2, Line 20: #include <immintrin.h>
> warning: #includes are not sorted properly [llvm-include-order]
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/14745
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I89c54a051c5093cf5fb81481a47a0a6677d7d906
Gerrit-Change-Number: 14745
Gerrit-PatchSet: 2
Gerrit-Owner: Bankim Bhavsar <ba...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Bankim Bhavsar <ba...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Mon, 25 Nov 2019 21:34:17 +0000
Gerrit-HasComments: Yes

[kudu-CR] Import Impala's blocked based BloomFilter

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/14745 )

Change subject: Import Impala's blocked based BloomFilter
......................................................................


Patch Set 1:

(3 comments)

I just glanced over, not trying to understand the essence.  It seems importing this comes which duplication of some utilities which are in the Kudu's codebase already.

http://gerrit.cloudera.org:8080/#/c/14745/1/src/kudu/impala/util/bloom-filter.h
File src/kudu/impala/util/bloom-filter.h:

http://gerrit.cloudera.org:8080/#/c/14745/1/src/kudu/impala/util/bloom-filter.h@22
PS1, Line 22: #include <stddef.h>
            : #include <stdint.h>
nit for here and other files: consider using C++ headers, such as cstddef and cstdint instead


http://gerrit.cloudera.org:8080/#/c/14745/1/src/kudu/impala/util/cpu-info.cc
File src/kudu/impala/util/cpu-info.cc:

http://gerrit.cloudera.org:8080/#/c/14745/1/src/kudu/impala/util/cpu-info.cc@156
PS1, Line 156: #if !defined(__APPLE__)
All the code above for /proc/cpuinfo in this method doesn't work on macOS, so I'm curious why only sched_getcpu() is put under ifdef (probably, just to avoid build failures?)

BTW, it's possible to extract that information using sysctl like the existing code in CpuInfo::GetCacheInfo() (corresponding names of the sysctl variables are machdep.cpu.* like machdep.cpu.features for flags (but they are uppercase in case of macOS), and frequency can be retrieved from hw.cpufrequency)


http://gerrit.cloudera.org:8080/#/c/14745/1/src/kudu/impala/util/pretty-printer.h
File src/kudu/impala/util/pretty-printer.h:

PS1: 
Is it possible to use utilities from gutil/strings/human_readable.h instead of importing these?



-- 
To view, visit http://gerrit.cloudera.org:8080/14745
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I89c54a051c5093cf5fb81481a47a0a6677d7d906
Gerrit-Change-Number: 14745
Gerrit-PatchSet: 1
Gerrit-Owner: Bankim Bhavsar <ba...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Bankim Bhavsar <ba...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Tue, 19 Nov 2019 20:50:26 +0000
Gerrit-HasComments: Yes

[kudu-CR] [util] Import Impala's block based BloomFilter

Posted by "helifu (Code Review)" <ge...@cloudera.org>.
helifu has posted comments on this change. ( http://gerrit.cloudera.org:8080/14745 )

Change subject: [util] Import Impala's block based BloomFilter
......................................................................


Patch Set 9:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/14745/9/src/kudu/util/block_bloom_filter.cc
File src/kudu/util/block_bloom_filter.cc:

http://gerrit.cloudera.org:8080/#/c/14745/9/src/kudu/util/block_bloom_filter.cc@83
PS9, Line 83: directory_mask_ = (1ULL << std::min(63, log_num_buckets_)) - 1;
seems 'log_num_buckets_' is always less than 63 according to Line77.


http://gerrit.cloudera.org:8080/#/c/14745/9/src/kudu/util/block_bloom_filter.cc@108
PS9, Line 108: new_bucket[i] =
             :         (kRehash[i] * hash) >> ((1 << kLogBucketWordBits) - kLogBucketWordBits);
The code would look nicer if we didn't wrap(newline) it, what do you think?


http://gerrit.cloudera.org:8080/#/c/14745/9/src/kudu/util/block_bloom_filter.cc@113
PS9, Line 113:  __m128i new_bucket_sse =
             :         _mm_load_si128(reinterpret_cast<__m128i*>(new_bucket + 4 * i));
the same.


http://gerrit.cloudera.org:8080/#/c/14745/9/src/kudu/util/block_bloom_filter.cc@124
PS9, Line 124: BucketWord hval =
             :         (kRehash[i] * hash) >> ((1 << kLogBucketWordBits) - kLogBucketWordBits);
the same.



-- 
To view, visit http://gerrit.cloudera.org:8080/14745
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I89c54a051c5093cf5fb81481a47a0a6677d7d906
Gerrit-Change-Number: 14745
Gerrit-PatchSet: 9
Gerrit-Owner: Bankim Bhavsar <ba...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Bankim Bhavsar <ba...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: helifu <hz...@corp.netease.com>
Gerrit-Comment-Date: Tue, 10 Dec 2019 09:58:08 +0000
Gerrit-HasComments: Yes

[kudu-CR] Import Impala's blocked based BloomFilter

Posted by "Bankim Bhavsar (Code Review)" <ge...@cloudera.org>.
Hello Thomas Tauber-Marshall, Tidy Bot, Alexey Serbin, Kudu Jenkins, Andrew Wong, Adar Dembo, Todd Lipcon, Tim Armstrong, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/14745

to look at the new patch set (#5).

Change subject: Import Impala's blocked based BloomFilter
......................................................................

Import Impala's blocked based BloomFilter

This change imports Impala's blocked based BloomFilter that uses tiny BloomFilters that fit
in cache lines. Plan is to use this BloomFilter for pushing down predicate from Impala.

Added this BF as part of kudu-util and renamed to BlockBloomFilter to avoid name collision
with existing BloomFilter in kudu.

Removed dependencies on generated code like Thrift/Protobuf, Impala specific
utilities like BufferPool, CPUInfo.

Added a simple BufferAllocator interface to allow integration with Impala
that uses its own BufferPool. Added a DefaultBufferAllocator implementation derived
from earlier version of the BF in Impala.

This BF uses AVX2 operations which may not be available on target CPU when run.
So the AVX2 specific methods have been moved to a separate file and compiled with -mavx2
flag and a runtime check is made to invoke the method only if the CPU supports AVX2.

Compiler may not support AVX2, hence the file block_bloom_filter_avx2.cc is conditionally
compiled.

Made modifications mostly to adhere to the coding style and make the bots pass.

TODO: Benchmarking and compare against existing Kudu BloomFilter.

Change-Id: I89c54a051c5093cf5fb81481a47a0a6677d7d906
---
M src/kudu/util/CMakeLists.txt
A src/kudu/util/block_bloom_filter-test.cc
A src/kudu/util/block_bloom_filter.cc
A src/kudu/util/block_bloom_filter.h
A src/kudu/util/block_bloom_filter_avx2.cc
5 files changed, 772 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/45/14745/5
-- 
To view, visit http://gerrit.cloudera.org:8080/14745
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I89c54a051c5093cf5fb81481a47a0a6677d7d906
Gerrit-Change-Number: 14745
Gerrit-PatchSet: 5
Gerrit-Owner: Bankim Bhavsar <ba...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Bankim Bhavsar <ba...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] [util] Import Impala's block based BloomFilter

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has posted comments on this change. ( http://gerrit.cloudera.org:8080/14745 )

Change subject: [util] Import Impala's block based BloomFilter
......................................................................


Patch Set 9: Code-Review+2

(2 comments)

http://gerrit.cloudera.org:8080/#/c/14745/5/src/kudu/util/block_bloom_filter-test.cc
File src/kudu/util/block_bloom_filter-test.cc:

http://gerrit.cloudera.org:8080/#/c/14745/5/src/kudu/util/block_bloom_filter-test.cc@88
PS5, Line 88:   ASSERT_STR_CONTAINS(s.ToString(), "Bloom filter too large");
> Callers of CreateBloomFilter() need to act on the returned BFF. So if I wer
In this situation (when there's a clear "owner" of an object, and a bunch of non-owning accesses to it), we use unique_ptr to convey ownership, and raw pointers to grant temporary access. The advantage of doing this is you can avoid the (locked) ref inc/dec operations that are incurred as shared_ptrs are handed out to new callers. When the ownership isn't clear, or when it's truly shared, we eat the cost of the inc/dec ops and use a shared_ptr.

It doesn't matter in a test where perf isn't an issue, so feel free to keep it as-is if you like.


http://gerrit.cloudera.org:8080/#/c/14745/7/src/kudu/util/block_bloom_filter.cc
File src/kudu/util/block_bloom_filter.cc:

http://gerrit.cloudera.org:8080/#/c/14745/7/src/kudu/util/block_bloom_filter.cc@177
PS7, Line 177: this->
> Yes, we need to specify the object explicitly, so "this" is unavoidable to 
Ugh I hate that stuff. Thanks for figuring it out.



-- 
To view, visit http://gerrit.cloudera.org:8080/14745
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I89c54a051c5093cf5fb81481a47a0a6677d7d906
Gerrit-Change-Number: 14745
Gerrit-PatchSet: 9
Gerrit-Owner: Bankim Bhavsar <ba...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Bankim Bhavsar <ba...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Fri, 06 Dec 2019 00:19:28 +0000
Gerrit-HasComments: Yes

[kudu-CR] [util] Import Impala's block based BloomFilter

Posted by "Bankim Bhavsar (Code Review)" <ge...@cloudera.org>.
Bankim Bhavsar has posted comments on this change. ( http://gerrit.cloudera.org:8080/14745 )

Change subject: [util] Import Impala's block based BloomFilter
......................................................................


Patch Set 7:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/14745/5/src/kudu/util/CMakeLists.txt
File src/kudu/util/CMakeLists.txt:

http://gerrit.cloudera.org:8080/#/c/14745/5/src/kudu/util/CMakeLists.txt@258
PS5, Line 258:   set_source_files_properties(block_bloom_filter_avx2.cc PROPERTIES COMPILE_FLAGS "-mavx2")
> Using the preferred method of COMPILE_OPTIONS here.
This property doesn't apply on source files, so reverting to earlier COMPILE_FLAGS.
Unfortunately locally Ubuntu 18.04 and on ve0518, compiler doesn't need explicit -mavx2 flag but it's needed on gcc 4.8.4 on jenkins gerrit.



-- 
To view, visit http://gerrit.cloudera.org:8080/14745
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I89c54a051c5093cf5fb81481a47a0a6677d7d906
Gerrit-Change-Number: 14745
Gerrit-PatchSet: 7
Gerrit-Owner: Bankim Bhavsar <ba...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Bankim Bhavsar <ba...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Thu, 05 Dec 2019 18:53:41 +0000
Gerrit-HasComments: Yes

[kudu-CR] Import Impala's blocked based BloomFilter

Posted by "Bankim Bhavsar (Code Review)" <ge...@cloudera.org>.
Hello Thomas Tauber-Marshall, Tidy Bot, Alexey Serbin, Kudu Jenkins, Andrew Wong, Adar Dembo, Todd Lipcon, Tim Armstrong, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/14745

to look at the new patch set (#4).

Change subject: Import Impala's blocked based BloomFilter
......................................................................

Import Impala's blocked based BloomFilter

This change imports Impala's blocked based BloomFilter that uses tiny BloomFilters that fit
in cache lines. Plan is to use this BloomFilter for pushing down predicate from Impala.

Added this BF as part of kudu-util and renamed to BlockBloomFilter to avoid name collision
with existing BloomFilter in kudu.

Removed dependencies on generated code like Thrift/Protobuf, Impala specific
utilities like BufferPool, CPUInfo.

Added a simple BufferAllocator interface to allow integration with Impala
that uses its own BufferPool. Added a DefaultBufferAllocator implementation derived
from earlier version of the BF in Impala.

This BF uses AVX2 operations which may not be available on target CPU when run.
So the AVX2 specific methods have been moved to a separate file and compiled with -mavx2
flag and a runtime check is made to invoke the method only if the CPU supports AVX2.

Compiler may not support AVX2, hence the file block_bloom_filter_avx2.cc is conditionally
compiled.

Made modifications mostly to adhere to the coding style and make the bots pass.

TODO: Benchmarking and compare against existing Kudu BloomFilter.

Change-Id: I89c54a051c5093cf5fb81481a47a0a6677d7d906
---
M src/kudu/util/CMakeLists.txt
A src/kudu/util/block_bloom_filter-test.cc
A src/kudu/util/block_bloom_filter.cc
A src/kudu/util/block_bloom_filter.h
A src/kudu/util/block_bloom_filter_avx2.cc
5 files changed, 772 insertions(+), 0 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/45/14745/4
-- 
To view, visit http://gerrit.cloudera.org:8080/14745
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I89c54a051c5093cf5fb81481a47a0a6677d7d906
Gerrit-Change-Number: 14745
Gerrit-PatchSet: 4
Gerrit-Owner: Bankim Bhavsar <ba...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Bankim Bhavsar <ba...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] [util] Import Impala's block based BloomFilter

Posted by "Andrew Wong (Code Review)" <ge...@cloudera.org>.
Andrew Wong has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/14745 )

Change subject: [util] Import Impala's block based BloomFilter
......................................................................

[util] Import Impala's block based BloomFilter

This change imports Impala's block based BloomFilter(BF) that uses tiny
BloomFilters that are space, cache and hash efficient.
Plan is to use this BloomFilter for pushing down predicates from Impala.

Added this BloomFilter in kudu-util and renamed to BlockBloomFilter to
avoid name collision with existing BloomFilter in Kudu.

Removed dependencies on auto-generated code like Thrift/Protobuf,
Impala specific utilities like BufferPool, CPUInfo.

Added a simple BlockBloomFilterBufferAllocatorIf interface to allow
integration with Impala that uses its own BufferPool.
Added a DefaultBlockBloomFilterBufferAllocator implementation derived
from earlier version in Impala.

AVX2 operations are used which leads to following scenarios:
1) Compiler lacks AVX2 support.
Code is conditionally compiled using USE_AVX2 macro.
2) Compiler supports AVX2 but CPU at runtime does not.
Function pointers used to dispatch to generic implementation.
3) Compiler supports AVX2 and CPU at runtime does too.
Function pointers used to dispatch to AVX2 specific implementation.

Made bunch of cosmetic changes to adhere to coding style.

Change-Id: I89c54a051c5093cf5fb81481a47a0a6677d7d906
Reviewed-on: http://gerrit.cloudera.org:8080/14745
Tested-by: Kudu Jenkins
Reviewed-by: Alexey Serbin <as...@cloudera.com>
Reviewed-by: Andrew Wong <aw...@cloudera.com>
---
M src/kudu/util/CMakeLists.txt
A src/kudu/util/block_bloom_filter-test.cc
A src/kudu/util/block_bloom_filter.cc
A src/kudu/util/block_bloom_filter.h
A src/kudu/util/block_bloom_filter_avx2.cc
M src/kudu/util/bloom_filter.h
6 files changed, 777 insertions(+), 0 deletions(-)

Approvals:
  Kudu Jenkins: Verified
  Alexey Serbin: Looks good to me, but someone else must approve
  Andrew Wong: Looks good to me, approved

-- 
To view, visit http://gerrit.cloudera.org:8080/14745
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I89c54a051c5093cf5fb81481a47a0a6677d7d906
Gerrit-Change-Number: 14745
Gerrit-PatchSet: 13
Gerrit-Owner: Bankim Bhavsar <ba...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Bankim Bhavsar <ba...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: helifu <hz...@corp.netease.com>

[kudu-CR] Import Impala's blocked based BloomFilter

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/14745 )

Change subject: Import Impala's blocked based BloomFilter
......................................................................


Patch Set 5:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/14745/5/src/kudu/util/block_bloom_filter.cc
File src/kudu/util/block_bloom_filter.cc:

http://gerrit.cloudera.org:8080/#/c/14745/5/src/kudu/util/block_bloom_filter.cc@52
PS5, Line 52:     "Close() should have been called before the object is destroyed.";
> a bit of a strange API, but I guess we are borrowing this from Impala
Yeah if the convention in Kudu is to release resources in the destructor, might as well do that.

The reasons for doing this in Impala were to a) make it explicit where resources are released and in what order (there's various pitfalls if you use smart pointers and have it happen implicitly) and b) avoid issues with dangling pointers to control structures - if you use the destructor to release resources, then it means you need to be sure that you don't leave dangling pointers to the object.

This pattern works well in Impala query execution where these things are important, and where we have convenient points where we can deallocate a whole graph of objects at one (the query finishing).


http://gerrit.cloudera.org:8080/#/c/14745/5/src/kudu/util/block_bloom_filter.cc@64
PS5, Line 64:   DCHECK(log_num_buckets_ <= 32) << "Bloom filter too large. log_space_bytes: "
> should this be either a CHECK or a return of InvalidArgument? seems like a 
Yeah, this invariant I think came from the Impala planner not generating filters above a size, so might not make sense in this context.



-- 
To view, visit http://gerrit.cloudera.org:8080/14745
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I89c54a051c5093cf5fb81481a47a0a6677d7d906
Gerrit-Change-Number: 14745
Gerrit-PatchSet: 5
Gerrit-Owner: Bankim Bhavsar <ba...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Bankim Bhavsar <ba...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Wed, 27 Nov 2019 23:23:32 +0000
Gerrit-HasComments: Yes

[kudu-CR] [util] Import Impala's block based BloomFilter

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/14745 )

Change subject: [util] Import Impala's block based BloomFilter
......................................................................


Patch Set 12: Code-Review+1


-- 
To view, visit http://gerrit.cloudera.org:8080/14745
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I89c54a051c5093cf5fb81481a47a0a6677d7d906
Gerrit-Change-Number: 14745
Gerrit-PatchSet: 12
Gerrit-Owner: Bankim Bhavsar <ba...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Bankim Bhavsar <ba...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Thomas Tauber-Marshall <tm...@cloudera.com>
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: helifu <hz...@corp.netease.com>
Gerrit-Comment-Date: Mon, 16 Dec 2019 19:21:01 +0000
Gerrit-HasComments: No