You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@kudu.apache.org by "helifu (Code Review)" <ge...@cloudera.org> on 2019/04/20 09:01:29 UTC

[kudu-CR] KUDU-2038: Support bitmap index

Hello Tidy Bot, Kudu Jenkins, Andrew Wong, Adar Dembo, Todd Lipcon, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/11722

to look at the new patch set (#7).

Change subject: KUDU-2038: Support bitmap index
......................................................................

KUDU-2038: Support bitmap index

The bitmap index can be added to the picked column when creating or
altering table, and also can be dropped whenever you don't like it.

After adding index, the bitmap info will be created on every disk
rowset, and it is composed of two parts, column values and bitmap
data. Notice that there is no bitmap info on memory rowset. And,
the bitmap info is generated during compaction, DiskRowSet Compaction
and Major Compaction.

The mutability on the index column is allowed, including deletion and
update. It will be to invalidate the bitmap info according to the
type of mutability. The delete operation will be affect all of the
bitmap info in the disk rowset, and the update operation is only
affect the changed column.

The bitmap index is good for the predicates of Equality and InList.
The benchmark test shows that it can speed up 10x in the case of
unidimensional filtering when the cardinality is sparse and the case
of multidimensional filtering when the cardinality is dense.

==================================================================
Here is some test data:
  I create index on columns p_name and p_mfgr from TPCH table part.
The former has a high cardinality, while the latter has a low
cardinality. Then load data to table part. When a diskrowset(32MB)
is flushed, the index will be generated too.

TPCH table: part/1T/200,000,000
--------------------------------
column      cardinality
p_name      198,339,659  indexed
p_mfgr      5            indexed

I get below test data from tserver.INFO:

"I0420 14:44:27.940719 98484 column_index_bitmap.cc:94] index info
(row count:543046, roaring count:543030, disk space:25453493,
memory:9774812) for column p_name[string NULLABLE(BITMAP)]
I0420 14:44:27.941799 98484 column_index_bitmap.cc:94] index info
(row count:543046, roaring count:5, disk space:365870,
memory:370050) for column p_mfgr[string NULLABLE(BITMAP)]"

Change-Id: I0edaa0ef1dba2dbce85ebf15f0a731e4939a7860
---
M CMakeLists.txt
A cmake_modules/FindCRoaring.cmake
M src/kudu/cfile/bshuf_block.h
M src/kudu/client/schema-internal.h
M src/kudu/client/schema.cc
M src/kudu/client/schema.h
M src/kudu/client/table_alterer-internal.cc
M src/kudu/common/common.proto
M src/kudu/common/partial_row-test.cc
M src/kudu/common/scan_spec.cc
M src/kudu/common/scan_spec.h
M src/kudu/common/schema-test.cc
M src/kudu/common/schema.cc
M src/kudu/common/schema.h
M src/kudu/common/wire_protocol.cc
M src/kudu/gutil/manual_constructor.h
M src/kudu/gutil/port.h
M src/kudu/master/catalog_manager.cc
M src/kudu/tablet/CMakeLists.txt
M src/kudu/tablet/all_types-scan-correctness-test.cc
M src/kudu/tablet/cfile_set-test.cc
M src/kudu/tablet/cfile_set.cc
M src/kudu/tablet/cfile_set.h
A src/kudu/tablet/column_index_base.h
A src/kudu/tablet/column_index_bitmap.cc
A src/kudu/tablet/column_index_bitmap.h
A src/kudu/tablet/column_index_bitmap_data.cc
A src/kudu/tablet/column_index_bitmap_data.h
A src/kudu/tablet/column_index_set-test.cc
A src/kudu/tablet/column_index_set.cc
A src/kudu/tablet/column_index_set.h
M src/kudu/tablet/delta_applier.cc
M src/kudu/tablet/delta_compaction.cc
M src/kudu/tablet/delta_compaction.h
M src/kudu/tablet/delta_iterator_merger.cc
M src/kudu/tablet/delta_iterator_merger.h
M src/kudu/tablet/delta_stats.cc
M src/kudu/tablet/delta_stats.h
M src/kudu/tablet/delta_store.h
M src/kudu/tablet/deltafile.cc
M src/kudu/tablet/deltafile.h
M src/kudu/tablet/deltamemstore.cc
M src/kudu/tablet/deltamemstore.h
M src/kudu/tablet/diskrowset.cc
M src/kudu/tablet/diskrowset.h
M src/kudu/tablet/memrowset.h
M src/kudu/tablet/metadata.proto
M src/kudu/tablet/mock-rowsets.h
M src/kudu/tablet/rowset.cc
M src/kudu/tablet/rowset.h
M src/kudu/tablet/rowset_metadata.cc
M src/kudu/tablet/rowset_metadata.h
M src/kudu/tablet/tablet-decoder-eval-test.cc
M src/kudu/tablet/tablet.cc
M src/kudu/tools/kudu-tool-test.cc
M thirdparty/build-definitions.sh
M thirdparty/build-thirdparty.sh
M thirdparty/download-thirdparty.sh
M thirdparty/vars.sh
59 files changed, 3,157 insertions(+), 59 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/22/11722/7
-- 
To view, visit http://gerrit.cloudera.org:8080/11722
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I0edaa0ef1dba2dbce85ebf15f0a731e4939a7860
Gerrit-Change-Number: 11722
Gerrit-PatchSet: 7
Gerrit-Owner: helifu <hz...@corp.netease.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: helifu <hz...@corp.netease.com>