You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@kudu.apache.org by "Adar Dembo (Code Review)" <ge...@cloudera.org> on 2019/04/05 22:07:20 UTC

[kudu-CR] generic iterators: implement three-heap merge algorithm

Hello Mike Percy, Todd Lipcon,

I'd like you to do a code review. Please visit

    http://gerrit.cloudera.org:8080/12947

to review the following change.


Change subject: generic_iterators: implement three-heap merge algorithm
......................................................................

generic_iterators: implement three-heap merge algorithm

This patch implements a new algorithm for efficient merging in
MergeIterator. The algorithm is documented extensively in the code so
there's no point in recapping it here. There's also a high-level overview
with pseudocode available[1].

I microbenchmarked the old implementation against the new one, using both
overlapping and non-overlapping inputs with a varying number of lists and
10000 rows per list (see generic_iterators-test). Here are the scan times,
averaged across five runs:

Parameters                  | old      | new      | diff
----------------------------+----------+----------+------------------
overlapping, 10 lists       | 0.015s   | 0.0522s  | +0.0372   (0.29x)
overlapping, 100 lists      | 1.0798s  | 1.1024s  | +0.0226   (0.98x)
overlapping, 1000 lists     | 184.245s | 22.8156s | -161.4294 (8.07x)
non-overlapping, 10 lists   | 0.0126s  | 0.0196s  | +0.007    (0.64x)
non-overlapping, 100 lists  | 0.5038s  | 0.129s   | -0.3748   (3.91x)
non-overlapping, 1000 lists | 89.8626s | 0.9874s  | -88.8752  (91x)
----------------------------+----------+----------+------------------

The goal was to optimize ORDERED scans for large and mostly compacted
tablets, and the new algorithm does just that. With smaller input, the
overhead of using heaps becomes more pronounced. Overlapping input still
benefits from heap-based merging, but the overlapping defeats the hot vs.
cold optimization provided by the algorithm.

Another goal of the algorithm was to reduce peak memory consumption by
using rowset bounds as proxies for hot vs. cold separation, thus deferring
the loading of blocks into memory until absolutely necessary. I didn't
measure this explicitly in microbenchmarks, but I plan to explore it next.

Lastly, the new algorithm opens the door to another optimization: while
there's just one sub-iterator in the hot heap, we can copy data
block-by-block instead of row-by-row. I'll implement it in a follow-up.

1. https://docs.google.com/document/d/1uP0ubjM6ulnKVCRrXtwT_dqrTWjF9tlFSRk0JN2e_O0/edit#

Change-Id: I6deab76a103f45c1b5042b104731e46a771a0f5d
---
M src/kudu/common/encoded_key.cc
M src/kudu/common/generic_iterators-test.cc
M src/kudu/common/generic_iterators.cc
M src/kudu/common/generic_iterators.h
M src/kudu/common/schema.cc
M src/kudu/common/schema.h
6 files changed, 387 insertions(+), 111 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/47/12947/1
-- 
To view, visit http://gerrit.cloudera.org:8080/12947
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I6deab76a103f45c1b5042b104731e46a771a0f5d
Gerrit-Change-Number: 12947
Gerrit-PatchSet: 1
Gerrit-Owner: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>