You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@kudu.apache.org by "Andrew Wong (Code Review)" <ge...@cloudera.org> on 2019/03/05 00:27:27 UTC

[kudu-CR] KUDU-2038-p1: Introduce CRoaring

Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/12588 )

Change subject: KUDU-2038-p1: Introduce CRoaring
......................................................................


Patch Set 2:

(9 comments)

http://gerrit.cloudera.org:8080/#/c/12588/2//COMMIT_MSG
Commit Message:

PS2: 
Given that this patch contains some tests to benchmark CRoaring, I think it's worth adding some numbers to the commit message.


http://gerrit.cloudera.org:8080/#/c/12588/2//COMMIT_MSG@7
PS2, Line 7: KUDU-2038-p1: Introduce CRoaring
In your original patch, you had two abstractions: "data" (i.e. bitmaps and indexing), and "readers/writers" (i.e. iterators and writers) that used the bitmap data. What do you think about doing the separation as:
- bitmaps: a class that provides a clean API to the indexes/readers/writers
- indexes, iterators, and writers: the logical components that use bitmaps to evaluate predicates, write blocks, etc.

I think this would be cleaner because it:
1) hides some of the CRoaring-specific bits and encapsulates them in one place,
2) would therefore allow us to more easily move to a different bitmapping library in the future if we wanted to, and
3) would allow us reuse these "Kudu" bitmaps in other places if we wanted to without digging through CRoaring APIs.

Defining a good API, though, requires that we have a solid understanding of how we will use it, that is, the "indexes, iterators, and writers". Since you have that context, do you agree with this separation? Or do you prefer the "data" and "reader/writers" approach that you had?

FWIW I also think it would make it easier to check in this change; adding CRoaring alone with nothing else seems a bit bare. Ideally, each patch has a clear trajectory towards usefulness.


http://gerrit.cloudera.org:8080/#/c/12588/2//COMMIT_MSG@9
PS2, Line 9: Introduce CRoaring and do a simple memory and disk consumption tests.
nit: It may be obvious with the context we have, but it's probably worth noting how this fits into a larger feature. E.g.:

 This library will be used as the underlying bitmap for the upcoming bitmap indexing feature.


http://gerrit.cloudera.org:8080/#/c/12588/2/cmake_modules/FindCRoaring.cmake
File cmake_modules/FindCRoaring.cmake:

http://gerrit.cloudera.org:8080/#/c/12588/2/cmake_modules/FindCRoaring.cmake@18
PS2, Line 18: # - Find CRoaring (roaring/roaring.hh, libroaring.a, and libroaring.so)
Are all of these necessary? E.g. could we get by with just the static lib?


http://gerrit.cloudera.org:8080/#/c/12588/2/src/kudu/tablet/index/croaring-test.cc
File src/kudu/tablet/index/croaring-test.cc:

PS2: 
nit: I think this makes more sense put into the /util directory, given there isn't anything tablet-specific about CRoaring, unless you pull more scope into this patch.


http://gerrit.cloudera.org:8080/#/c/12588/2/src/kudu/tablet/index/croaring-test.cc@37
PS2, Line 37:      TEST_TYPE_SINGLE,
            :      TEST_TYPE_PAIR,
            :      TEST_TYPE_SKIP
nit: add comments documenting what these are.


http://gerrit.cloudera.org:8080/#/c/12588/2/src/kudu/tablet/index/croaring-test.cc@66
PS2, Line 66: t.n_bytes_array_containers +
            :            t.n_bytes_run_containers +
            :            t.n_bytes_bitset_containers;
How is this different from getSizeInBytes?


http://gerrit.cloudera.org:8080/#/c/12588/2/src/kudu/tablet/index/croaring-test.cc@119
PS2, Line 119:   printf("          Single         Pair        Skip\n");
nit: I think there's a DataTable class you can use that helps print things out nicely. Same above.


http://gerrit.cloudera.org:8080/#/c/12588/2/src/kudu/tablet/index/croaring-test.cc@120
PS2, Line 120: *6*/
Remove this.



-- 
To view, visit http://gerrit.cloudera.org:8080/12588
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I878da34e388fb402c52c51364a68e738f785673d
Gerrit-Change-Number: 12588
Gerrit-PatchSet: 2
Gerrit-Owner: helifu <hz...@corp.netease.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Tue, 05 Mar 2019 00:27:27 +0000
Gerrit-HasComments: Yes