You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@kudu.apache.org by "helifu (Code Review)" <ge...@cloudera.org> on 2019/08/02 13:57:07 UTC
[kudu-CR] KUDU-2854 short circuit predicates on dictionary-coded columns
helifu has uploaded this change for review. ( http://gerrit.cloudera.org:8080/13987
Change subject: KUDU-2854 short circuit predicates on dictionary-coded columns
......................................................................
KUDU-2854 short circuit predicates on dictionary-coded columns
1. A dictionary encoding column has no updates in DRS, if there
are not entries in the dictionary match the predicate:
a) Skip the whole DRS when a flag in the cfile footer is true
which indicates that all blocks are dict-coded;
b) Skip any front dict-coded blocks without decoding the
dictionary words;
2. A dictionary encoding column has updates in DRS, if there are
not deltas any more during scanning and the entries in the
dictionary doesn't match the predicates:
a) Skip all of the remaining dict-coded blocks when the flag
in the cfile footer is true;
b) Skip the remaining dict-coded blocks without decoding the
dictionary words;
Change-Id: Id348583cc7d85773e8f32a189f4344d7a36a30b6
---
M src/kudu/cfile/binary_dict_block.h
M src/kudu/cfile/cfile-test.cc
M src/kudu/cfile/cfile.proto
M src/kudu/cfile/cfile_reader.cc
M src/kudu/cfile/cfile_reader.h
M src/kudu/cfile/cfile_writer.cc
M src/kudu/cfile/cfile_writer.h
M src/kudu/common/column_materialization_context.h
M src/kudu/tablet/cfile_set.cc
M src/kudu/tablet/delta_applier.cc
M src/kudu/tablet/delta_iterator_merger.cc
M src/kudu/tablet/delta_iterator_merger.h
M src/kudu/tablet/delta_store.cc
M src/kudu/tablet/delta_store.h
M src/kudu/tablet/deltafile.cc
M src/kudu/tablet/deltafile.h
M src/kudu/tablet/deltamemstore.cc
M src/kudu/tablet/deltamemstore.h
M src/kudu/tablet/diskrowset-test.cc
M src/kudu/tablet/tablet-test-util.h
20 files changed, 561 insertions(+), 35 deletions(-)
git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/87/13987/1
--
To view, visit http://gerrit.cloudera.org:8080/13987
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: Id348583cc7d85773e8f32a189f4344d7a36a30b6
Gerrit-Change-Number: 13987
Gerrit-PatchSet: 1
Gerrit-Owner: helifu <hz...@corp.netease.com>
[kudu-CR] KUDU-2854 short circuit predicates on dictionary-coded columns
Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has posted comments on this change. ( http://gerrit.cloudera.org:8080/13987 )
Change subject: KUDU-2854 short circuit predicates on dictionary-coded columns
......................................................................
Patch Set 1:
(4 comments)
I reviewed the DeltaStore::may_have_deltas() piece and just scanned the rest.
Could you add a unit test for it? Either deltamemstore-test or deltafile-test should work; probably the latter is better because you can test that reinserts show up as deltas.
http://gerrit.cloudera.org:8080/#/c/13987/1/src/kudu/cfile/cfile_reader.h
File src/kudu/cfile/cfile_reader.h:
http://gerrit.cloudera.org:8080/#/c/13987/1/src/kudu/cfile/cfile_reader.h@236
PS1, Line 236: enum ShortCircuitType {
Nit: use an enum class here. The full qualification will allow you to shorten the names of the values. For example, ShortCircuitType::UNINITIALIZED, ShortCircuitType::SKIP_NONE, etc.
http://gerrit.cloudera.org:8080/#/c/13987/1/src/kudu/tablet/delta_store.h
File src/kudu/tablet/delta_store.h:
http://gerrit.cloudera.org:8080/#/c/13987/1/src/kudu/tablet/delta_store.h@305
PS1, Line 305: // Returns true if there might exist deltas to be applied. It is safe to
: // conservatively return true, but this would force a skip over decoder-level
: // evaluation.
Please update to describe the effect of 'col_idx'.
http://gerrit.cloudera.org:8080/#/c/13987/1/src/kudu/tablet/delta_store.cc
File src/kudu/tablet/delta_store.cc:
http://gerrit.cloudera.org:8080/#/c/13987/1/src/kudu/tablet/delta_store.cc@494
PS1, Line 494: return may_have_deltas_ ?
: (!updates_by_col_[col_idx].empty() || !deleted_.empty() || !reinserted_.empty()) : false;
Does this regress the perf gain from KUDU-2381? Could you check?
It might be more performant to add a bitmap variant of may_have_deltas_ where each bit represents a different column. Then this becomes something like:
return may_have_deltas_ || may_have_deltas_per_col_[col_idx];
Regardless of how you handle it, you should rerun the microbenchmarks used in KUDU-2381 to ensure we don't regress that perf gain.
http://gerrit.cloudera.org:8080/#/c/13987/1/src/kudu/tablet/tablet-test-util.h
File src/kudu/tablet/tablet-test-util.h:
http://gerrit.cloudera.org:8080/#/c/13987/1/src/kudu/tablet/tablet-test-util.h@336
PS1, Line 336: DCHECK_EQ(row_slice.size(), schema.byte_size() + ContiguousRowHelper::null_bitmap_size(schema));
This was already merged in d5eb68327; please rebase.
--
To view, visit http://gerrit.cloudera.org:8080/13987
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id348583cc7d85773e8f32a189f4344d7a36a30b6
Gerrit-Change-Number: 13987
Gerrit-PatchSet: 1
Gerrit-Owner: helifu <hz...@corp.netease.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Fri, 02 Aug 2019 17:45:36 +0000
Gerrit-HasComments: Yes
[kudu-CR] KUDU-2854 short circuit predicates on dictionary-coded columns
Posted by "helifu (Code Review)" <ge...@cloudera.org>.
helifu has posted comments on this change. ( http://gerrit.cloudera.org:8080/13987 )
Change subject: KUDU-2854 short circuit predicates on dictionary-coded columns
......................................................................
Patch Set 2:
Sorry for the late update.
> In his commit description, Zhang Yao talks about running a benchmark; perhaps you could reach out to him and ask for more specifics?
I asked ZhangYao about the benchmark two weeks ago on WeChat, but she didn't know how to run it either. Maybe I should ask Todd for some help :)
--
To view, visit http://gerrit.cloudera.org:8080/13987
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id348583cc7d85773e8f32a189f4344d7a36a30b6
Gerrit-Change-Number: 13987
Gerrit-PatchSet: 2
Gerrit-Owner: helifu <hz...@corp.netease.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: helifu <hz...@corp.netease.com>
Gerrit-Comment-Date: Fri, 30 Aug 2019 11:29:40 +0000
Gerrit-HasComments: No
[kudu-CR] KUDU-2854 short circuit predicates on dictionary-coded columns
Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has posted comments on this change. ( http://gerrit.cloudera.org:8080/13987 )
Change subject: KUDU-2854 short circuit predicates on dictionary-coded columns
......................................................................
Patch Set 1:
(1 comment)
http://gerrit.cloudera.org:8080/#/c/13987/1/src/kudu/tablet/delta_store.cc
File src/kudu/tablet/delta_store.cc:
http://gerrit.cloudera.org:8080/#/c/13987/1/src/kudu/tablet/delta_store.cc@494
PS1, Line 494: return may_have_deltas_ ?
: (!updates_by_col_[col_idx].empty() || !deleted_.empty() || !reinserted_.empty()) : false;
> Hmmm, the code here is close to O(1). It is different with KUDU-2381 who ha
Wouldn't it be just a single memset across all e.g. 35 bytes? Internally memset operates on 8 bytes at a time, then one operation at the end for the remaining <8 bytes.
In his commit description, Zhang Yao talks about running a benchmark; perhaps you could reach out to him and ask for more specifics?
--
To view, visit http://gerrit.cloudera.org:8080/13987
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id348583cc7d85773e8f32a189f4344d7a36a30b6
Gerrit-Change-Number: 13987
Gerrit-PatchSet: 1
Gerrit-Owner: helifu <hz...@corp.netease.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: helifu <hz...@corp.netease.com>
Gerrit-Comment-Date: Tue, 13 Aug 2019 07:34:15 +0000
Gerrit-HasComments: Yes
[kudu-CR] KUDU-2854 short circuit predicates on dictionary-coded columns
Posted by "helifu (Code Review)" <ge...@cloudera.org>.
helifu has posted comments on this change. ( http://gerrit.cloudera.org:8080/13987 )
Change subject: KUDU-2854 short circuit predicates on dictionary-coded columns
......................................................................
Patch Set 1:
(4 comments)
http://gerrit.cloudera.org:8080/#/c/13987/1/src/kudu/cfile/cfile_reader.h
File src/kudu/cfile/cfile_reader.h:
http://gerrit.cloudera.org:8080/#/c/13987/1/src/kudu/cfile/cfile_reader.h@236
PS1, Line 236: enum ShortCircuitType {
> Nit: use an enum class here. The full qualification will allow you to short
Done
http://gerrit.cloudera.org:8080/#/c/13987/1/src/kudu/tablet/delta_store.h
File src/kudu/tablet/delta_store.h:
http://gerrit.cloudera.org:8080/#/c/13987/1/src/kudu/tablet/delta_store.h@305
PS1, Line 305: // Returns true if there might exist deltas to be applied. It is safe to
: // conservatively return true, but this would force a skip over decoder-level
: // evaluation.
> Please update to describe the effect of 'col_idx'.
Done
http://gerrit.cloudera.org:8080/#/c/13987/1/src/kudu/tablet/delta_store.cc
File src/kudu/tablet/delta_store.cc:
http://gerrit.cloudera.org:8080/#/c/13987/1/src/kudu/tablet/delta_store.cc@494
PS1, Line 494: return may_have_deltas_ ?
: (!updates_by_col_[col_idx].empty() || !deleted_.empty() || !reinserted_.empty()) : false;
> Does this regress the perf gain from KUDU-2381? Could you check?
Hmmm, the code here is close to O(1). It is different with KUDU-2381 who has 280 loops.
If we use a bitmap variable((280 + 7) / 8 = 35 bytes), we have to clear(memset) the bitmap(35 loops?) every time there are deltas.
In addition, It seems it's a little bit difficult to rerun the microbenchmarks you mentioned since the workload is not published except 280 columns.
http://gerrit.cloudera.org:8080/#/c/13987/1/src/kudu/tablet/tablet-test-util.h
File src/kudu/tablet/tablet-test-util.h:
http://gerrit.cloudera.org:8080/#/c/13987/1/src/kudu/tablet/tablet-test-util.h@336
PS1, Line 336: DCHECK_EQ(row_slice.size(), schema.byte_size() + ContiguousRowHelper::null_bitmap_size(schema));
> This was already merged in d5eb68327; please rebase.
Done
--
To view, visit http://gerrit.cloudera.org:8080/13987
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id348583cc7d85773e8f32a189f4344d7a36a30b6
Gerrit-Change-Number: 13987
Gerrit-PatchSet: 1
Gerrit-Owner: helifu <hz...@corp.netease.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: helifu <hz...@corp.netease.com>
Gerrit-Comment-Date: Thu, 08 Aug 2019 09:59:53 +0000
Gerrit-HasComments: Yes
[kudu-CR] KUDU-2854 short circuit predicates on dictionary-coded columns
Posted by "helifu (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins, Adar Dembo,
I'd like you to reexamine a change. Please visit
http://gerrit.cloudera.org:8080/13987
to look at the new patch set (#3).
Change subject: KUDU-2854 short circuit predicates on dictionary-coded columns
......................................................................
KUDU-2854 short circuit predicates on dictionary-coded columns
1. A dictionary encoding column has no updates in DRS, if there
are not entries in the dictionary match the predicate:
a) Skip the whole DRS when a flag in the cfile footer is true
which indicates that all blocks are dict-coded;
b) Skip any front dict-coded blocks without decoding the
dictionary words;
2. A dictionary encoding column has updates in DRS, if there are
not deltas any more during scanning and the entries in the
dictionary doesn't match the predicates:
a) Skip all of the remaining dict-coded blocks when the flag
in the cfile footer is true;
b) Skip the remaining dict-coded blocks without decoding the
dictionary words;
Change-Id: Id348583cc7d85773e8f32a189f4344d7a36a30b6
---
M src/kudu/cfile/binary_dict_block.h
M src/kudu/cfile/cfile-test.cc
M src/kudu/cfile/cfile.proto
M src/kudu/cfile/cfile_reader.cc
M src/kudu/cfile/cfile_reader.h
M src/kudu/cfile/cfile_writer.cc
M src/kudu/cfile/cfile_writer.h
M src/kudu/common/column_materialization_context.h
M src/kudu/tablet/cfile_set.cc
M src/kudu/tablet/delta_applier.cc
M src/kudu/tablet/delta_iterator_merger.cc
M src/kudu/tablet/delta_iterator_merger.h
M src/kudu/tablet/delta_store.cc
M src/kudu/tablet/delta_store.h
M src/kudu/tablet/deltafile.cc
M src/kudu/tablet/deltafile.h
M src/kudu/tablet/deltamemstore.cc
M src/kudu/tablet/deltamemstore.h
M src/kudu/tablet/diskrowset-test.cc
M src/kudu/tablet/tablet-test-util.h
20 files changed, 640 insertions(+), 37 deletions(-)
git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/87/13987/3
--
To view, visit http://gerrit.cloudera.org:8080/13987
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Id348583cc7d85773e8f32a189f4344d7a36a30b6
Gerrit-Change-Number: 13987
Gerrit-PatchSet: 3
Gerrit-Owner: helifu <hz...@corp.netease.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: helifu <hz...@corp.netease.com>
[kudu-CR] KUDU-2854 short circuit predicates on dictionary-coded columns
Posted by "helifu (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins, Adar Dembo,
I'd like you to reexamine a change. Please visit
http://gerrit.cloudera.org:8080/13987
to look at the new patch set (#2).
Change subject: KUDU-2854 short circuit predicates on dictionary-coded columns
......................................................................
KUDU-2854 short circuit predicates on dictionary-coded columns
1. A dictionary encoding column has no updates in DRS, if there
are not entries in the dictionary match the predicate:
a) Skip the whole DRS when a flag in the cfile footer is true
which indicates that all blocks are dict-coded;
b) Skip any front dict-coded blocks without decoding the
dictionary words;
2. A dictionary encoding column has updates in DRS, if there are
not deltas any more during scanning and the entries in the
dictionary doesn't match the predicates:
a) Skip all of the remaining dict-coded blocks when the flag
in the cfile footer is true;
b) Skip the remaining dict-coded blocks without decoding the
dictionary words;
Change-Id: Id348583cc7d85773e8f32a189f4344d7a36a30b6
---
M src/kudu/cfile/binary_dict_block.h
M src/kudu/cfile/cfile-test.cc
M src/kudu/cfile/cfile.proto
M src/kudu/cfile/cfile_reader.cc
M src/kudu/cfile/cfile_reader.h
M src/kudu/cfile/cfile_writer.cc
M src/kudu/cfile/cfile_writer.h
M src/kudu/common/column_materialization_context.h
M src/kudu/tablet/cfile_set.cc
M src/kudu/tablet/delta_applier.cc
M src/kudu/tablet/delta_iterator_merger.cc
M src/kudu/tablet/delta_iterator_merger.h
M src/kudu/tablet/delta_store.cc
M src/kudu/tablet/delta_store.h
M src/kudu/tablet/deltafile.cc
M src/kudu/tablet/deltafile.h
M src/kudu/tablet/deltamemstore.cc
M src/kudu/tablet/deltamemstore.h
M src/kudu/tablet/diskrowset-test.cc
M src/kudu/tablet/tablet-test-util.h
20 files changed, 641 insertions(+), 38 deletions(-)
git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/87/13987/2
--
To view, visit http://gerrit.cloudera.org:8080/13987
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Id348583cc7d85773e8f32a189f4344d7a36a30b6
Gerrit-Change-Number: 13987
Gerrit-PatchSet: 2
Gerrit-Owner: helifu <hz...@corp.netease.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: helifu <hz...@corp.netease.com>
[kudu-CR] KUDU-2854 short circuit predicates on dictionary-coded columns
Posted by "helifu (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins, Adar Dembo,
I'd like you to reexamine a change. Please visit
http://gerrit.cloudera.org:8080/13987
to look at the new patch set (#4).
Change subject: KUDU-2854 short circuit predicates on dictionary-coded columns
......................................................................
KUDU-2854 short circuit predicates on dictionary-coded columns
1. A dictionary encoding column has no updates in DRS, if there
are not entries in the dictionary match the predicate:
a) Skip the whole DRS when a flag in the cfile footer is true
which indicates that all blocks are dict-coded;
b) Skip any front dict-coded blocks without decoding the
dictionary words;
2. A dictionary encoding column has updates in DRS, if there are
not deltas any more during scanning and the entries in the
dictionary doesn't match the predicates:
a) Skip all of the remaining dict-coded blocks when the flag
in the cfile footer is true;
b) Skip the remaining dict-coded blocks without decoding the
dictionary words;
Change-Id: Id348583cc7d85773e8f32a189f4344d7a36a30b6
---
M src/kudu/cfile/binary_dict_block.h
M src/kudu/cfile/cfile-test.cc
M src/kudu/cfile/cfile.proto
M src/kudu/cfile/cfile_reader.cc
M src/kudu/cfile/cfile_reader.h
M src/kudu/cfile/cfile_writer.cc
M src/kudu/cfile/cfile_writer.h
M src/kudu/common/column_materialization_context.h
M src/kudu/tablet/cfile_set.cc
M src/kudu/tablet/delta_applier.cc
M src/kudu/tablet/delta_iterator_merger.cc
M src/kudu/tablet/delta_iterator_merger.h
M src/kudu/tablet/delta_store.cc
M src/kudu/tablet/delta_store.h
M src/kudu/tablet/deltafile.cc
M src/kudu/tablet/deltafile.h
M src/kudu/tablet/deltamemstore.cc
M src/kudu/tablet/deltamemstore.h
M src/kudu/tablet/diskrowset-test.cc
M src/kudu/tablet/tablet-test-util.h
20 files changed, 641 insertions(+), 37 deletions(-)
git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/87/13987/4
--
To view, visit http://gerrit.cloudera.org:8080/13987
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Id348583cc7d85773e8f32a189f4344d7a36a30b6
Gerrit-Change-Number: 13987
Gerrit-PatchSet: 4
Gerrit-Owner: helifu <hz...@corp.netease.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: helifu <hz...@corp.netease.com>
[kudu-CR] KUDU-2854 short circuit predicates on dictionary-coded columns
Posted by "helifu (Code Review)" <ge...@cloudera.org>.
helifu has posted comments on this change. ( http://gerrit.cloudera.org:8080/13987 )
Change subject: KUDU-2854 short circuit predicates on dictionary-coded columns
......................................................................
Patch Set 4:
This morning on slack, Todd suggested me to use 'kudu perf loadgen' tool to load data to an existing wide table with ~280+ columns since his benchmark was a customer workload. So, I did some benchmarks today. And the conclusion is that we don't regress that perf gain with current patch since we don't see "DeltaPreparer". Here are the details:
##1.I ran the benchmark with this patch:
(a) select * from my_wide_table where c280 = <value that does not exist>
Samples: 2K of event 'cycles', Event count (approx.): 176315474
25.30% rpc worker-6290 kudu-tserver [.] kudu::cfile::BinaryPlainBlockDecoder::CopyNextAndEval(unsigned long*, kudu::ColumnMaterializationContext*, kudu::SelectionVectorView*, kudu
12.46% rpc worker-6290 kudu-tserver [.] kudu::cfile::CFileIterator::PrepareMatchingCodeWords(kudu::ColumnMaterializationContext*)
9.17% rpc worker-6290 kudu-tserver [.] kudu::Slice::compare(kudu::Slice const&) const
3.74% rpc worker-6290 kudu-tserver [.] kudu::tablet::CFileSet::Iterator::CreateColumnIterators(kudu::ScanSpec const*)
2.93% rpc worker-6290 kudu-tserver [.] kudu::cfile::BinaryPlainBlockDecoder::ParseHeader()
2.44% rpc worker-6290 kudu-tserver [.] boost::container::flat_map<int, std::unique_ptr<kudu::cfile::CFileReader, std::default_delete<kudu::cfile::CFileReader> >, std::less<int>,
1.83% rpc worker-6290 kudu-tserver [.] kudu::tablet::CFileSet::Iterator::GetIteratorStats(std::vector<kudu::IteratorStats, std::allocator<kudu::IteratorStats> >*) const
1.83% rpc worker-6290 kudu-tserver [.] bshuf_shuffle_bit_eightelem_SSE_avx2
1.83% rpc worker-6290 kudu-tserver [.] operator delete[](void*, std::nothrow_t const&)
1.47% rpc worker-6290 kudu-tserver [.] LZ4_memcpy_using_offset
1.30% rpc reactor-628 [kernel.kallsyms] [k] find_busiest_group
1.10% rpc worker-6290 kudu-tserver [.] tcmalloc::CentralFreeList::ReleaseToSpans(void*)
1.10% rpc worker-6290 kudu-tserver [.] tcmalloc::ThreadCache::ReleaseToCentralCache(tcmalloc::ThreadCache::FreeList*, unsigned int, int)
1.10% rpc worker-6290 kudu-tserver [.] kudu::cfile::CFileIterator::Scan(kudu::ColumnMaterializationContext*)
1.10% rpc worker-6290 kudu-tserver [.] kudu::cfile::BinaryDictBlockDecoder::GetFirstRowId() const
1.10% rpc worker-6290 kudu-tserver [.] LZ4_decompress_generic.constprop.3
0.91% rpc worker-6290 libstdc++.so.6.0.21 [.] std::_Hash_bytes(void const*, unsigned long, unsigned long)
0.75% rpc reactor-628 libssl.so.1.0.0 [.] 0x0000000000024667
0.73% rpc worker-6290 kudu-tserver [.] kudu::ArenaBase<false>::Reset()
0.73% rpc worker-6290 kudu-tserver [.] kudu::BitmapChangeBits(unsigned char*, unsigned long, unsigned long, bool)
0.73% rpc worker-6290 libstdc++.so.6.0.21 [.] __cxxabiv1::__si_class_type_info::__do_dyncast(long, __cxxabiv1::__class_type_info::__sub_kind, __cxxabiv1::__class_type_info const*, void
0.73% rpc worker-6290 kudu-tserver [.] kudu::cfile::CFileIterator::~CFileIterator()
0.73% rpc worker-6290 kudu-tserver [.] kudu::cfile::CFileIterator::TrySkipDictCodedBlock(unsigned long*, kudu::SelectionVectorView*, kudu::cfile::BlockDecoder*) const
0.73% rpc worker-6290 kudu-tserver [.] kudu::tablet::DeltaApplier::MaterializeColumn(kudu::ColumnMaterializationContext*)
(b) select * from my_wide_table where c280 = <value that exists>
Samples: 2K of event 'cycles', Event count (approx.): 233293310
22.17% rpc worker-6290 kudu-tserver [.] kudu::cfile::BinaryPlainBlockDecoder::CopyNextAndEval(unsigned long*, kudu::ColumnMaterializationContext*, kudu::SelectionVectorView*, kudu
10.81% rpc worker-6290 kudu-tserver [.] kudu::cfile::CFileIterator::PrepareMatchingCodeWords(kudu::ColumnMaterializationContext*)
10.25% rpc worker-6290 kudu-tserver [.] kudu::Slice::compare(kudu::Slice const&) const
5.54% rpc worker-6290 kudu-tserver [.] bshuf_shuffle_bit_eightelem_SSE_avx2
4.99% rpc worker-6290 kudu-tserver [.] kudu::cfile::BinaryPlainBlockDecoder::ParseHeader()
2.96% rpc worker-6290 kudu-tserver [.] kudu::tablet::CFileSet::Iterator::CreateColumnIterators(kudu::ScanSpec const*)
2.70% rpc reactor-628 [kernel.kallsyms] [k] find_busiest_group
1.66% rpc worker-6290 kudu-tserver [.] bshuf_trans_byte_bitrow_SSE_avx2
1.41% rpc worker-6290 kudu-tserver [.] boost::container::flat_map<int, std::unique_ptr<kudu::cfile::CFileReader, std::default_delete<kudu::cfile::CFileReader> >, std::less<int>,
1.11% rpc worker-6290 kudu-tserver [.] kudu::tablet::CFileSet::Iterator::PrepareColumn(kudu::ColumnMaterializationContext*)
1.11% rpc worker-6290 kudu-tserver [.] Bits::Count(void const*, int)
1.06% rpc worker-6290 [kernel.kallsyms] [k] clear_page_c_e
0.88% rpc reactor-628 kudu-tserver [.] ev_run
0.83% rpc worker-6290 kudu-tserver [.] tcmalloc::CentralFreeList::ReleaseToSpans(void*)
0.83% rpc worker-6290 kudu-tserver [.] LZ4_memcpy_using_offset
0.83% rpc worker-6290 [kernel.kallsyms] [k] page_fault
0.83% rpc worker-6290 kudu-tserver [.] kudu::cfile::CFileIterator::TrySkipDictCodedBlock(unsigned long*, kudu::SelectionVectorView*, kudu::cfile::BlockDecoder*) const
0.83% rpc worker-6290 kudu-tserver [.] tcmalloc::ThreadCache::ReleaseToCentralCache(tcmalloc::ThreadCache::FreeList*, unsigned int, int)
0.83% rpc worker-6290 [vdso] [.] __vdso_clock_gettime
0.83% rpc worker-6290 kudu-tserver [.] kudu::cfile::CFileIterator::Scan(kudu::ColumnMaterializationContext*)
0.77% rpc worker-6290 libstdc++.so.6.0.21 [.] std::_Hash_bytes(void const*, unsigned long, unsigned long)
0.73% rpc worker-6290 kudu-tserver [.] tcmalloc::ThreadCache::FetchFromCentralCache(unsigned int, int, void* (*)(unsigned long))
0.69% rpc reactor-628 kudu-tserver [.] operator delete[](void*, std::nothrow_t const&)
0.67% raft [worker]-6 [kernel.kallsyms] [k] find_busiest_group
0.67% rpc reactor-628 [kernel.kallsyms] [k] cpumask_next_and
0.59% rpc reactor-628 [kernel.kallsyms] [k] find_next_bit
0.55% rpc worker-6290 kudu-tserver [.] LZ4_decompress_generic.constprop.3
0.55% rpc worker-6290 kudu-tserver [.] kudu::MaterializingIterator::MaterializeBlock(kudu::RowBlock*)
##2.I ran the benchmark without this patch:
(a) select * from my_wide_table where c280 = <value that does not exist>
Samples: 863 of event 'cycles', Event count (approx.): 177463370
21.86% rpc worker-6567 kudu-tserver [.] kudu::cfile::BinaryPlainBlockDecoder::CopyNextAndEval(unsigned long*, kudu::ColumnMaterializationContext*, kudu::SelectionVectorView*, kudu:
14.21% rpc worker-6567 kudu-tserver [.] kudu::cfile::CFileIterator::Scan(kudu::ColumnMaterializationContext*)
11.29% rpc worker-6567 kudu-tserver [.] kudu::Slice::compare(kudu::Slice const&) const
5.10% rpc worker-6567 kudu-tserver [.] kudu::cfile::BinaryPlainBlockDecoder::ParseHeader()
3.28% rpc worker-6567 kudu-tserver [.] bshuf_shuffle_bit_eightelem_SSE_avx2
2.68% rpc worker-6567 kudu-tserver [.] boost::container::flat_map<int, std::unique_ptr<kudu::cfile::CFileReader, std::default_delete<kudu::cfile::CFileReader> >, std::less<int>, b
2.57% rpc worker-6567 kudu-tserver [.] kudu::tablet::CFileSet::Iterator::CreateColumnIterators(kudu::ScanSpec const*)
2.19% rpc worker-6567 kudu-tserver [.] operator delete[](void*, std::nothrow_t const&)
1.77% rpc worker-6567 kudu-tserver [.] kudu::cfile::CFileReader::NewIterator(std::unique_ptr<kudu::cfile::CFileIterator, std::default_delete<kudu::cfile::CFileIterator> >*, kudu::
1.46% rpc worker-6567 kudu-tserver [.] kudu::cfile::BinaryDictBlockDecoder::GetFirstRowId() const
1.46% rpc worker-6567 kudu-tserver [.] kudu::SelectionVector::AnySelected() const
1.13% rpc worker-6577 [kernel.kallsyms] [k] fput
1.09% rpc worker-6567 kudu-tserver [.] LZ4_memcpy_using_offset
1.09% rpc worker-6567 kudu-tserver [.] kudu::cfile::CFileIterator::~CFileIterator()
0.86% rpc reactor-656 [kernel.kallsyms] [k] copy_user_enhanced_fast_string
0.81% rpc reactor-656 [kernel.kallsyms] [k] find_busiest_group
0.78% rpc worker-6567 kudu-tserver [.] std::_Hashtable<std::string, std::pair<std::string const, kudu::ColumnPredicate>, std::allocator<std::pair<std::string const, kudu::ColumnPr
0.73% rpc worker-6567 kudu-tserver [.] kudu::BitmapChangeBits(unsigned char*, unsigned long, unsigned long, bool)
0.73% rpc worker-6567 kudu-tserver [.] kudu::tserver::Scanner::has_fulfilled_limit() const
0.73% rpc worker-6567 kudu-tserver [.] tcmalloc::ThreadCache::ReleaseToCentralCache(tcmalloc::ThreadCache::FreeList*, unsigned int, int)
0.73% rpc worker-6567 kudu-tserver [.] kudu::tablet::DeltaApplier::FinishBatch()
0.73% rpc worker-6567 libc-2.19.so [.] __clock_gettime
0.73% rpc worker-6567 kudu-tserver [.] kudu::tablet::CFileSet::Iterator::InitializeSelectionVector(kudu::SelectionVector*)
0.73% rpc worker-6567 kudu-tserver [.] kudu::MonoTime::Now()
0.73% rpc worker-6567 kudu-tserver [.] kudu::UnionIterator::HasNext() const
0.73% rpc worker-6567 kudu-tserver [.] kudu::cfile::CFileIterator::PrepareBatch(unsigned long*)
0.73% rpc worker-6567 kudu-tserver [.] kudu::ColumnSchemaPB::SharedDtor()
0.73% rpc worker-6567 kudu-tserver [.] kudu::MaterializingIterator::MaterializeBlock(kudu::RowBlock*)
(b) select * from my_wide_table where c280 = <value that exists>
Samples: 2K of event 'cycles', Event count (approx.): 235218822
23.63% rpc worker-6567 kudu-tserver [.] kudu::cfile::BinaryPlainBlockDecoder::CopyNextAndEval(unsigned long*, kudu::ColumnMaterializationContext*, kudu::SelectionVectorView*, kudu
10.44% rpc worker-6567 kudu-tserver [.] kudu::Slice::compare(kudu::Slice const&) const
9.34% rpc worker-6567 kudu-tserver [.] kudu::cfile::CFileIterator::Scan(kudu::ColumnMaterializationContext*)
6.60% rpc worker-6567 kudu-tserver [.] kudu::cfile::BinaryPlainBlockDecoder::ParseHeader()
5.22% rpc worker-6567 kudu-tserver [.] bshuf_shuffle_bit_eightelem_SSE_avx2
2.62% rpc worker-6567 kudu-tserver [.] kudu::tablet::CFileSet::Iterator::CreateColumnIterators(kudu::ScanSpec const*)
2.20% rpc worker-6567 kudu-tserver [.] bshuf_trans_byte_bitrow_SSE_avx2
1.92% rpc worker-6567 kudu-tserver [.] LZ4_decompress_generic.constprop.3
1.37% rpc worker-6567 kudu-tserver [.] kudu::SelectionVector::AnySelected() const
1.26% rpc worker-6567 kudu-tserver [.] std::_Hashtable<std::string, std::pair<std::string const, kudu::ColumnPredicate>, std::allocator<std::pair<std::string const, kudu::ColumnP
1.20% rpc worker-6567 kudu-tserver [.] tcmalloc::CentralFreeList::Populate()
1.10% rpc worker-6567 kudu-tserver [.] tcmalloc::ThreadCache::ReleaseToCentralCache(tcmalloc::ThreadCache::FreeList*, unsigned int, int)
1.10% rpc worker-6567 [kernel.kallsyms] [k] clear_page_c_e
0.84% rpc worker-6567 kudu-tserver [.] boost::container::flat_map<int, std::unique_ptr<kudu::cfile::CFileReader, std::default_delete<kudu::cfile::CFileReader> >, std::less<int>,
0.82% rpc worker-6567 kudu-tserver [.] LZ4_memcpy_using_offset
0.82% rpc worker-6567 kudu-tserver [.] kudu::tablet::CFileSet::Iterator::GetIteratorStats(std::vector<kudu::IteratorStats, std::allocator<kudu::IteratorStats> >*) const
0.67% rpc reactor-656 kudu-tserver [.] std::unordered_map<unsigned long, kudu::rpc::InboundCall*, std::hash<unsigned long>, std::equal_to<unsigned long>, std::allocator<std::pair
0.62% rpc reactor-656 libssl.so.1.0.0 [.] 0x0000000000023566
0.55% rpc reactor-656 libcrypto.so.1.0.0 [.] 0x00000000000a9e6f
0.55% rpc worker-6567 kudu-tserver [.] kudu::tablet::CFileSet::Iterator::MaterializeColumn(kudu::ColumnMaterializationContext*)
##3.I ran the benchmark on released version 1.10.1 which didn't include the patch for KUDU-2381:
(a) select * from my_wide_table where c280 = <value that does not exist>
Samples: 1K of event 'cycles', Event count (approx.): 272379216
18.76% rpc worker-1038 kudu-tserver [.] kudu::tablet::DeltaPreparer<kudu::tablet::DMSPreparerTraits>::Start(unsigned long, int)
16.15% rpc worker-1038 kudu-tserver [.] kudu::cfile::BinaryPlainBlockDecoder::CopyNextAndEval(unsigned long*, kudu::ColumnMaterializationContext*, kudu::SelectionVectorView*, kudu
7.60% rpc worker-1038 kudu-tserver [.] kudu::cfile::CFileIterator::Scan(kudu::ColumnMaterializationContext*)
7.37% rpc worker-1038 kudu-tserver [.] kudu::Slice::compare(kudu::Slice const&) const
7.12% rpc worker-1038 kudu-tserver [.] kudu::tablet::CFileSet::Iterator::FinishBatch()
4.51% rpc worker-1038 kudu-tserver [.] kudu::tablet::DeltaPreparer<kudu::tablet::DMSPreparerTraits>::MayHaveDeltas() const
3.13% rpc worker-1038 kudu-tserver [.] operator new[](unsigned long)
2.14% rpc worker-1038 kudu-tserver [.] kudu::SelectionVector::AnySelected() const
2.14% rpc worker-1038 kudu-tserver [.] boost::container::flat_map<int, std::unique_ptr<kudu::cfile::CFileReader, std::default_delete<kudu::cfile::CFileReader> >, std::less<int>,
1.67% rpc worker-1038 kudu-tserver [.] kudu::cfile::BinaryPlainBlockDecoder::ParseHeader()
1.49% rpc worker-1038 kudu-tserver [.] kudu::tablet::CFileSet::Iterator::CreateColumnIterators(kudu::ScanSpec const*)
1.43% rpc worker-1038 kudu-tserver [.] LZ4_decompress_fast
1.43% rpc worker-1038 kudu-tserver [.] operator delete[](void*, std::nothrow_t const&)
1.40% rpc worker-1038 kudu-tserver [.] std::_Hashtable<std::string, std::pair<std::string const, kudu::ColumnPredicate>, std::allocator<std::pair<std::string const, kudu::ColumnP
1.19% rpc worker-1038 kudu-tserver [.] kudu::cfile::CFileIterator::seeked() const
0.95% rpc worker-1038 kudu-tserver [.] kudu::cfile::CFileIterator::~CFileIterator()
0.83% rpc worker-1038 libstdc++.so.6.0.21 [.] std::_Hash_bytes(void const*, unsigned long, unsigned long)
0.71% rpc worker-1038 kudu-tserver [.] kudu::(anonymous namespace)::ShardedCache<(kudu::Cache::EvictionPolicy)1>::Lookup(kudu::Slice const&, kudu::Cache::CacheBehavior)
0.71% rpc worker-1038 kudu-tserver [.] kudu::cfile::CFileIterator::~CFileIterator()
0.71% rpc worker-1038 kudu-tserver [.] kudu::tablet::DMSIterator::~DMSIterator()
0.71% rpc worker-1038 [vdso] [.] __vdso_clock_gettime
0.71% rpc worker-1038 kudu-tserver [.] kudu::tserver::TabletServiceImpl::HandleContinueScanRequest(kudu::tserver::ScanRequestPB const*, kudu::rpc::RpcContext const*, kudu::tserve
0.71% rpc worker-1038 kudu-tserver [.] kudu::tablet::CFileSet::Iterator::InitializeSelectionVector(kudu::SelectionVector*)
0.59% rpc worker-1038 libstdc++.so.6.0.21 [.] std::basic_string<char, std::char_traits<char>, std::allocator<char> >::basic_string(std::string const&)
0.56% rpc worker-1038 kudu-tserver [.] kudu::ColumnSchemaFromPB(kudu::ColumnSchemaPB const&)
0.55% rpc worker-1038 kudu-tserver [.] tcmalloc::CentralFreeList::Populate()
0.50% maintenance_sch [kernel.kallsyms] [k] queue_delayed_work_on
0.50% maintenance_sch kudu-tserver [.] kudu::tablet::DiskRowSet::DeltaMemStoreEmpty() const
0.48% rpc worker-1038 kudu-tserver [.] kudu::tablet::DeltaApplier::MaterializeColumn(kudu::ColumnMaterializationContext*)
(b) select * from my_wide_table where c280 = <value that exists>
Samples: 1K of event 'cycles', Event count (approx.): 304579571
16.99% rpc worker-1038 kudu-tserver [.] kudu::tablet::DeltaPreparer<kudu::tablet::DMSPreparerTraits>::Start(unsigned long, int)
14.65% rpc worker-1038 kudu-tserver [.] kudu::cfile::BinaryPlainBlockDecoder::CopyNextAndEval(unsigned long*, kudu::ColumnMaterializationContext*, kudu::SelectionVectorView*, kudu
8.92% rpc worker-1038 kudu-tserver [.] kudu::tablet::CFileSet::Iterator::FinishBatch()
7.86% rpc worker-1038 kudu-tserver [.] kudu::Slice::compare(kudu::Slice const&) const
6.37% rpc worker-1038 kudu-tserver [.] kudu::cfile::CFileIterator::Scan(kudu::ColumnMaterializationContext*)
5.31% rpc worker-1038 kudu-tserver [.] kudu::cfile::BinaryPlainBlockDecoder::ParseHeader()
4.25% rpc worker-1038 kudu-tserver [.] bshuf_shuffle_bit_eightelem_SSE_avx2
3.40% rpc worker-1038 kudu-tserver [.] kudu::tablet::DeltaPreparer<kudu::tablet::DMSPreparerTraits>::MayHaveDeltas() const
2.55% rpc worker-1038 kudu-tserver [.] LZ4_decompress_fast
2.12% rpc worker-1038 kudu-tserver [.] kudu::SelectionVector::AnySelected() const
1.91% rpc worker-1038 kudu-tserver [.] operator new[](unsigned long)
1.83% rpc worker-1038 kudu-tserver [.] boost::container::flat_map<int, std::unique_ptr<kudu::cfile::CFileReader, std::default_delete<kudu::cfile::CFileReader> >, std::less<int>,
1.49% rpc worker-1038 kudu-tserver [.] operator delete[](void*, std::nothrow_t const&)
1.36% rpc worker-1038 kudu-tserver [.] kudu::tablet::CFileSet::Iterator::CreateColumnIterators(kudu::ScanSpec const*)
1.19% rpc worker-1038 kudu-tserver [.] tcmalloc::CentralFreeList::Populate()
1.06% rpc worker-1038 kudu-tserver [.] tcmalloc::ThreadCache::ReleaseToCentralCache(tcmalloc::ThreadCache::FreeList*, unsigned int, int)
0.85% rpc worker-1038 kudu-tserver [.] kudu::cfile::CFileIterator::~CFileIterator()
0.82% rpc worker-1038 kudu-tserver [.] kudu::cfile::CFileReader::NewIterator(std::unique_ptr<kudu::cfile::CFileIterator, std::default_delete<kudu::cfile::CFileIterator> >*, kudu:
0.64% rpc worker-1038 kudu-tserver [.] std::vector<std::deque<kudu::tablet::DeltaPreparer<kudu::tablet::DMSPreparerTraits>::ColumnUpdate, std::allocator<kudu::tablet::DeltaPrepar
0.64% rpc worker-1038 kudu-tserver [.] std::_Deque_base<kudu::tablet::DeltaPreparer<kudu::tablet::DMSPreparerTraits>::ColumnUpdate, std::allocator<kudu::tablet::DeltaPreparer<kud
0.64% rpc worker-1038 kudu-tserver [.] kudu::(anonymous namespace)::ShardedCache<(kudu::Cache::EvictionPolicy)1>::Lookup(kudu::Slice const&, kudu::Cache::CacheBehavior)
0.64% rpc worker-1038 kudu-tserver [.] kudu::MaterializingIterator::HasNext() const
--
To view, visit http://gerrit.cloudera.org:8080/13987
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id348583cc7d85773e8f32a189f4344d7a36a30b6
Gerrit-Change-Number: 13987
Gerrit-PatchSet: 4
Gerrit-Owner: helifu <hz...@corp.netease.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: helifu <hz...@corp.netease.com>
Gerrit-Comment-Date: Thu, 05 Sep 2019 10:10:05 +0000
Gerrit-HasComments: No