You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@kudu.apache.org by "Adar Dembo (Code Review)" <ge...@cloudera.org> on 2019/04/05 22:07:20 UTC

[kudu-CR] generic iterators: implement three-heap merge algorithm

Hello Mike Percy, Todd Lipcon,

I'd like you to do a code review. Please visit

    http://gerrit.cloudera.org:8080/12947

to review the following change.


Change subject: generic_iterators: implement three-heap merge algorithm
......................................................................

generic_iterators: implement three-heap merge algorithm

This patch implements a new algorithm for efficient merging in
MergeIterator. The algorithm is documented extensively in the code so
there's no point in recapping it here. There's also a high-level overview
with pseudocode available[1].

I microbenchmarked the old implementation against the new one, using both
overlapping and non-overlapping inputs with a varying number of lists and
10000 rows per list (see generic_iterators-test). Here are the scan times,
averaged across five runs:

Parameters                  | old      | new      | diff
----------------------------+----------+----------+------------------
overlapping, 10 lists       | 0.015s   | 0.0522s  | +0.0372   (0.29x)
overlapping, 100 lists      | 1.0798s  | 1.1024s  | +0.0226   (0.98x)
overlapping, 1000 lists     | 184.245s | 22.8156s | -161.4294 (8.07x)
non-overlapping, 10 lists   | 0.0126s  | 0.0196s  | +0.007    (0.64x)
non-overlapping, 100 lists  | 0.5038s  | 0.129s   | -0.3748   (3.91x)
non-overlapping, 1000 lists | 89.8626s | 0.9874s  | -88.8752  (91x)
----------------------------+----------+----------+------------------

The goal was to optimize ORDERED scans for large and mostly compacted
tablets, and the new algorithm does just that. With smaller input, the
overhead of using heaps becomes more pronounced. Overlapping input still
benefits from heap-based merging, but the overlapping defeats the hot vs.
cold optimization provided by the algorithm.

Another goal of the algorithm was to reduce peak memory consumption by
using rowset bounds as proxies for hot vs. cold separation, thus deferring
the loading of blocks into memory until absolutely necessary. I didn't
measure this explicitly in microbenchmarks, but I plan to explore it next.

Lastly, the new algorithm opens the door to another optimization: while
there's just one sub-iterator in the hot heap, we can copy data
block-by-block instead of row-by-row. I'll implement it in a follow-up.

1. https://docs.google.com/document/d/1uP0ubjM6ulnKVCRrXtwT_dqrTWjF9tlFSRk0JN2e_O0/edit#

Change-Id: I6deab76a103f45c1b5042b104731e46a771a0f5d
---
M src/kudu/common/encoded_key.cc
M src/kudu/common/generic_iterators-test.cc
M src/kudu/common/generic_iterators.cc
M src/kudu/common/generic_iterators.h
M src/kudu/common/schema.cc
M src/kudu/common/schema.h
6 files changed, 387 insertions(+), 111 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/47/12947/1
-- 
To view, visit http://gerrit.cloudera.org:8080/12947
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I6deab76a103f45c1b5042b104731e46a771a0f5d
Gerrit-Change-Number: 12947
Gerrit-PatchSet: 1
Gerrit-Owner: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] generic iterators: implement three-heap merge algorithm

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has removed Kudu Jenkins from this change.  ( http://gerrit.cloudera.org:8080/12947 )

Change subject: generic_iterators: implement three-heap merge algorithm
......................................................................


Removed reviewer Kudu Jenkins with the following votes:

* Verified-1 by Kudu Jenkins (120)
-- 
To view, visit http://gerrit.cloudera.org:8080/12947
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: deleteReviewer
Gerrit-Change-Id: I6deab76a103f45c1b5042b104731e46a771a0f5d
Gerrit-Change-Number: 12947
Gerrit-PatchSet: 7
Gerrit-Owner: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] generic iterators: implement three-heap merge algorithm

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Hello Tidy Bot, Mike Percy, Kudu Jenkins, Grant Henke, Todd Lipcon, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/12947

to look at the new patch set (#4).

Change subject: generic_iterators: implement three-heap merge algorithm
......................................................................

generic_iterators: implement three-heap merge algorithm

This patch implements a new algorithm for efficient merging in
MergeIterator. The algorithm is documented extensively in the code so
there's no point in recapping it here. There's also a high-level overview
with pseudocode available[1].

I microbenchmarked the old implementation against the new one, using both
overlapping and non-overlapping inputs with a varying number of lists and
10000 rows per list (see generic_iterators-test). Here are the scan times,
averaged across five runs:

Parameters                  | old      | new      | diff
----------------------------+----------+----------+------------------
overlapping, 10 lists       | 0.015s   | 0.0522s  | +0.0372   (0.29x)
overlapping, 100 lists      | 1.0798s  | 1.1024s  | +0.0226   (0.98x)
overlapping, 1000 lists     | 184.245s | 22.8156s | -161.4294 (8.07x)
non-overlapping, 10 lists   | 0.0126s  | 0.0196s  | +0.007    (0.64x)
non-overlapping, 100 lists  | 0.5038s  | 0.129s   | -0.3748   (3.91x)
non-overlapping, 1000 lists | 89.8626s | 0.9874s  | -88.8752  (91x)
----------------------------+----------+----------+------------------

The goal was to optimize ORDERED scans for large and mostly compacted
tablets, and the new algorithm does just that. With smaller input, the
overhead of using heaps becomes more pronounced. Overlapping input still
benefits from heap-based merging, but the overlapping defeats the hot vs.
cold optimization provided by the algorithm.

To see how it performed against real data, I tested the algorithm against a
mostly compacted "representative" 40G tablet with 1294 rowsets, all stored
on one disk. I used a new CLI tool that does a full tablet scan without
sending any data over the wire. I ran the tool three times: first with
UNORDERED scans, then with ORDERED scans using the old algorithm, and
finally with ORDERED scans using the new algorithm. Each run did six scans;
to account for any caching effects, I dropped the first scan. Results:

Parameters   | Average run time (s) | Peak RSS (kb)
-------------+----------------------+--------------
UNORDERED:   | 232                  | 710320
ORDERED, old | 33787                | 4465532
ORDERED, new | 979                  | 749440
-------------+----------------------+--------------

Lastly, the new algorithm opens the door to another optimization: while
there's just one "hot" sub-iterator, we can copy data block-by-block instead
of row-by-row. I'll implement this in a follow-up.

1. https://docs.google.com/document/d/1uP0ubjM6ulnKVCRrXtwT_dqrTWjF9tlFSRk0JN2e_O0/edit#

Change-Id: I6deab76a103f45c1b5042b104731e46a771a0f5d
---
M src/kudu/common/encoded_key.cc
M src/kudu/common/generic_iterators-test.cc
M src/kudu/common/generic_iterators.cc
M src/kudu/common/generic_iterators.h
M src/kudu/common/schema.cc
M src/kudu/common/schema.h
6 files changed, 408 insertions(+), 131 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/47/12947/4
-- 
To view, visit http://gerrit.cloudera.org:8080/12947
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I6deab76a103f45c1b5042b104731e46a771a0f5d
Gerrit-Change-Number: 12947
Gerrit-PatchSet: 4
Gerrit-Owner: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] generic iterators: implement three-heap merge algorithm

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Hello Tidy Bot, Mike Percy, Kudu Jenkins, Grant Henke, Todd Lipcon, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/12947

to look at the new patch set (#7).

Change subject: generic_iterators: implement three-heap merge algorithm
......................................................................

generic_iterators: implement three-heap merge algorithm

This patch implements a new algorithm for efficient merging in
MergeIterator. The algorithm is documented extensively in the code so
there's no point in recapping it here. There's also a high-level overview
with pseudocode available[1].

I microbenchmarked the old implementation against the new one, using both
overlapping and non-overlapping inputs with a varying number of lists and
10000 rows per list (see generic_iterators-test). Here are the scan times,
averaged across five runs:

Parameters                  | old      | new      | diff
----------------------------+----------+----------+------------------
overlapping, 10 lists       | 0.015s   | 0.0522s  | +0.0372   (0.29x)
overlapping, 100 lists      | 1.0798s  | 1.1024s  | +0.0226   (0.98x)
overlapping, 1000 lists     | 184.245s | 22.8156s | -161.4294 (8.07x)
non-overlapping, 10 lists   | 0.0126s  | 0.0196s  | +0.007    (0.64x)
non-overlapping, 100 lists  | 0.5038s  | 0.129s   | -0.3748   (3.91x)
non-overlapping, 1000 lists | 89.8626s | 0.9874s  | -88.8752  (91x)
----------------------------+----------+----------+------------------

The goal was to optimize ORDERED scans for large and mostly compacted
tablets, and the new algorithm does just that. With smaller input, the
overhead of using heaps becomes more pronounced. Overlapping input still
benefits from heap-based merging, but the overlapping defeats the hot vs.
cold optimization provided by the algorithm.

To see how it performed against real data, I tested the algorithm against a
mostly compacted "representative" 40G tablet with 1294 rowsets, all stored
on one disk. I used a new CLI tool that does a full tablet scan without
sending any data over the wire. I ran the tool three times: first with
UNORDERED scans, then with ORDERED scans using the old algorithm, and
finally with ORDERED scans using the new algorithm. Each run did six scans;
to account for any caching effects, I dropped the first scan. Results:

Parameters   | Average run time (s) | Peak RSS (kb)
-------------+----------------------+--------------
UNORDERED:   | 232                  | 710320
ORDERED, old | 33787                | 4465532
ORDERED, new | 979                  | 749440
-------------+----------------------+--------------

Lastly, the new algorithm opens the door to another optimization: while
there's just one "hot" sub-iterator, we can copy data block-by-block instead
of row-by-row. I'll implement this in a follow-up.

1. https://docs.google.com/document/d/1uP0ubjM6ulnKVCRrXtwT_dqrTWjF9tlFSRk0JN2e_O0/edit#

Change-Id: I6deab76a103f45c1b5042b104731e46a771a0f5d
---
M src/kudu/common/encoded_key.cc
M src/kudu/common/generic_iterators-test.cc
M src/kudu/common/generic_iterators.cc
M src/kudu/common/generic_iterators.h
M src/kudu/common/schema.cc
M src/kudu/common/schema.h
6 files changed, 424 insertions(+), 131 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/47/12947/7
-- 
To view, visit http://gerrit.cloudera.org:8080/12947
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I6deab76a103f45c1b5042b104731e46a771a0f5d
Gerrit-Change-Number: 12947
Gerrit-PatchSet: 7
Gerrit-Owner: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] generic iterators: implement three-heap merge algorithm

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has posted comments on this change. ( http://gerrit.cloudera.org:8080/12947 )

Change subject: generic_iterators: implement three-heap merge algorithm
......................................................................


Patch Set 7: Verified+1

Overriding Jenkins, Java flake.


-- 
To view, visit http://gerrit.cloudera.org:8080/12947
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I6deab76a103f45c1b5042b104731e46a771a0f5d
Gerrit-Change-Number: 12947
Gerrit-PatchSet: 7
Gerrit-Owner: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Wed, 10 Apr 2019 03:46:15 +0000
Gerrit-HasComments: No

[kudu-CR] KUDU-2466: implement three-heap merge algorithm

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/12947 )

Change subject: KUDU-2466: implement three-heap merge algorithm
......................................................................

KUDU-2466: implement three-heap merge algorithm

This patch implements a new algorithm for efficient merging in
MergeIterator. The algorithm is documented extensively in the code so
there's no point in recapping it here. There's also a high-level overview
with pseudocode available[1].

I microbenchmarked the old implementation against the new one, using both
overlapping and non-overlapping inputs with a varying number of lists and
10000 rows per list (see generic_iterators-test). Here are the scan times,
averaged across five runs:

Parameters                  | old      | new      | diff
----------------------------+----------+----------+------------------
overlapping, 10 lists       | 0.015s   | 0.0522s  | +0.0372   (0.29x)
overlapping, 100 lists      | 1.0798s  | 1.1024s  | +0.0226   (0.98x)
overlapping, 1000 lists     | 184.245s | 22.8156s | -161.4294 (8.07x)
non-overlapping, 10 lists   | 0.0126s  | 0.0196s  | +0.007    (0.64x)
non-overlapping, 100 lists  | 0.5038s  | 0.129s   | -0.3748   (3.91x)
non-overlapping, 1000 lists | 89.8626s | 0.9874s  | -88.8752  (91x)
----------------------------+----------+----------+------------------

The goal was to optimize ORDERED scans for large and mostly compacted
tablets, and the new algorithm does just that. With smaller input, the
overhead of using heaps becomes more pronounced. Overlapping input still
benefits from heap-based merging, but the overlapping defeats the hot vs.
cold optimization provided by the algorithm.

To see how it performed against real data, I tested the algorithm against a
mostly compacted "representative" 40G tablet with 1294 rowsets, all stored
on one disk. I used a new CLI tool that does a full tablet scan without
sending any data over the wire. I ran the tool three times: first with
UNORDERED scans, then with ORDERED scans using the old algorithm, and
finally with ORDERED scans using the new algorithm. Each run did six scans;
to account for any caching effects, I dropped the first scan. Results:

Parameters   | Average run time (s) | Peak RSS (kb)
-------------+----------------------+--------------
UNORDERED:   | 232                  | 710320
ORDERED, old | 33787                | 4465532
ORDERED, new | 979                  | 749440
-------------+----------------------+--------------

Lastly, the new algorithm opens the door to another optimization: while
there's just one "hot" sub-iterator, we can copy data block-by-block instead
of row-by-row. I'll implement this in a follow-up.

1. https://docs.google.com/document/d/1uP0ubjM6ulnKVCRrXtwT_dqrTWjF9tlFSRk0JN2e_O0/edit#

Change-Id: I6deab76a103f45c1b5042b104731e46a771a0f5d
Reviewed-on: http://gerrit.cloudera.org:8080/12947
Reviewed-by: Mike Percy <mp...@apache.org>
Tested-by: Kudu Jenkins
---
M src/kudu/common/encoded_key.cc
M src/kudu/common/generic_iterators-test.cc
M src/kudu/common/generic_iterators.cc
M src/kudu/common/generic_iterators.h
M src/kudu/common/schema.cc
M src/kudu/common/schema.h
6 files changed, 424 insertions(+), 131 deletions(-)

Approvals:
  Mike Percy: Looks good to me, approved
  Kudu Jenkins: Verified

-- 
To view, visit http://gerrit.cloudera.org:8080/12947
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I6deab76a103f45c1b5042b104731e46a771a0f5d
Gerrit-Change-Number: 12947
Gerrit-PatchSet: 11
Gerrit-Owner: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] KUDU-2466: implement three-heap merge algorithm

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Hello Tidy Bot, Mike Percy, Kudu Jenkins, Grant Henke, Todd Lipcon, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/12947

to look at the new patch set (#10).

Change subject: KUDU-2466: implement three-heap merge algorithm
......................................................................

KUDU-2466: implement three-heap merge algorithm

This patch implements a new algorithm for efficient merging in
MergeIterator. The algorithm is documented extensively in the code so
there's no point in recapping it here. There's also a high-level overview
with pseudocode available[1].

I microbenchmarked the old implementation against the new one, using both
overlapping and non-overlapping inputs with a varying number of lists and
10000 rows per list (see generic_iterators-test). Here are the scan times,
averaged across five runs:

Parameters                  | old      | new      | diff
----------------------------+----------+----------+------------------
overlapping, 10 lists       | 0.015s   | 0.0522s  | +0.0372   (0.29x)
overlapping, 100 lists      | 1.0798s  | 1.1024s  | +0.0226   (0.98x)
overlapping, 1000 lists     | 184.245s | 22.8156s | -161.4294 (8.07x)
non-overlapping, 10 lists   | 0.0126s  | 0.0196s  | +0.007    (0.64x)
non-overlapping, 100 lists  | 0.5038s  | 0.129s   | -0.3748   (3.91x)
non-overlapping, 1000 lists | 89.8626s | 0.9874s  | -88.8752  (91x)
----------------------------+----------+----------+------------------

The goal was to optimize ORDERED scans for large and mostly compacted
tablets, and the new algorithm does just that. With smaller input, the
overhead of using heaps becomes more pronounced. Overlapping input still
benefits from heap-based merging, but the overlapping defeats the hot vs.
cold optimization provided by the algorithm.

To see how it performed against real data, I tested the algorithm against a
mostly compacted "representative" 40G tablet with 1294 rowsets, all stored
on one disk. I used a new CLI tool that does a full tablet scan without
sending any data over the wire. I ran the tool three times: first with
UNORDERED scans, then with ORDERED scans using the old algorithm, and
finally with ORDERED scans using the new algorithm. Each run did six scans;
to account for any caching effects, I dropped the first scan. Results:

Parameters   | Average run time (s) | Peak RSS (kb)
-------------+----------------------+--------------
UNORDERED:   | 232                  | 710320
ORDERED, old | 33787                | 4465532
ORDERED, new | 979                  | 749440
-------------+----------------------+--------------

Lastly, the new algorithm opens the door to another optimization: while
there's just one "hot" sub-iterator, we can copy data block-by-block instead
of row-by-row. I'll implement this in a follow-up.

1. https://docs.google.com/document/d/1uP0ubjM6ulnKVCRrXtwT_dqrTWjF9tlFSRk0JN2e_O0/edit#

Change-Id: I6deab76a103f45c1b5042b104731e46a771a0f5d
---
M src/kudu/common/encoded_key.cc
M src/kudu/common/generic_iterators-test.cc
M src/kudu/common/generic_iterators.cc
M src/kudu/common/generic_iterators.h
M src/kudu/common/schema.cc
M src/kudu/common/schema.h
6 files changed, 424 insertions(+), 131 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/47/12947/10
-- 
To view, visit http://gerrit.cloudera.org:8080/12947
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I6deab76a103f45c1b5042b104731e46a771a0f5d
Gerrit-Change-Number: 12947
Gerrit-PatchSet: 10
Gerrit-Owner: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] generic iterators: implement three-heap merge algorithm

Posted by "Mike Percy (Code Review)" <ge...@cloudera.org>.
Mike Percy has posted comments on this change. ( http://gerrit.cloudera.org:8080/12947 )

Change subject: generic_iterators: implement three-heap merge algorithm
......................................................................


Patch Set 8: Code-Review+1

(5 comments)

http://gerrit.cloudera.org:8080/#/c/12947/7/src/kudu/common/generic_iterators.cc
File src/kudu/common/generic_iterators.cc:

http://gerrit.cloudera.org:8080/#/c/12947/7/src/kudu/common/generic_iterators.cc@148
PS7, Line 148:   Status Init(Arena* decoded_bounds_arena) {
It looks like this also fixes KUDU-2466, right? If so, maybe that should be included in the commit title line.


http://gerrit.cloudera.org:8080/#/c/12947/7/src/kudu/common/generic_iterators.cc@217
PS7, Line 217:  
nit: add /*nrows=*/


http://gerrit.cloudera.org:8080/#/c/12947/7/src/kudu/common/generic_iterators.cc@319
PS7, Line 319: // Three different heaps are used to optimize the merge process. To explain how
Great description.


http://gerrit.cloudera.org:8080/#/c/12947/7/src/kudu/common/generic_iterators.cc@420
PS7, Line 420: HOT to COLD
I think this should be from COLD to HOT.


http://gerrit.cloudera.org:8080/#/c/12947/7/src/kudu/common/generic_iterators.cc@739
PS7, Line 739: iter++
nit: ++iter for complex objects like iterators so the previous iterator state doesn't need to be copied and returned to the expression



-- 
To view, visit http://gerrit.cloudera.org:8080/12947
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I6deab76a103f45c1b5042b104731e46a771a0f5d
Gerrit-Change-Number: 12947
Gerrit-PatchSet: 8
Gerrit-Owner: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Sat, 13 Apr 2019 04:16:45 +0000
Gerrit-HasComments: Yes

[kudu-CR] KUDU-2466: implement three-heap merge algorithm

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Hello Tidy Bot, Mike Percy, Kudu Jenkins, Grant Henke, Todd Lipcon, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/12947

to look at the new patch set (#9).

Change subject: KUDU-2466: implement three-heap merge algorithm
......................................................................

KUDU-2466: implement three-heap merge algorithm

This patch implements a new algorithm for efficient merging in
MergeIterator. The algorithm is documented extensively in the code so
there's no point in recapping it here. There's also a high-level overview
with pseudocode available[1].

I microbenchmarked the old implementation against the new one, using both
overlapping and non-overlapping inputs with a varying number of lists and
10000 rows per list (see generic_iterators-test). Here are the scan times,
averaged across five runs:

Parameters                  | old      | new      | diff
----------------------------+----------+----------+------------------
overlapping, 10 lists       | 0.015s   | 0.0522s  | +0.0372   (0.29x)
overlapping, 100 lists      | 1.0798s  | 1.1024s  | +0.0226   (0.98x)
overlapping, 1000 lists     | 184.245s | 22.8156s | -161.4294 (8.07x)
non-overlapping, 10 lists   | 0.0126s  | 0.0196s  | +0.007    (0.64x)
non-overlapping, 100 lists  | 0.5038s  | 0.129s   | -0.3748   (3.91x)
non-overlapping, 1000 lists | 89.8626s | 0.9874s  | -88.8752  (91x)
----------------------------+----------+----------+------------------

The goal was to optimize ORDERED scans for large and mostly compacted
tablets, and the new algorithm does just that. With smaller input, the
overhead of using heaps becomes more pronounced. Overlapping input still
benefits from heap-based merging, but the overlapping defeats the hot vs.
cold optimization provided by the algorithm.

To see how it performed against real data, I tested the algorithm against a
mostly compacted "representative" 40G tablet with 1294 rowsets, all stored
on one disk. I used a new CLI tool that does a full tablet scan without
sending any data over the wire. I ran the tool three times: first with
UNORDERED scans, then with ORDERED scans using the old algorithm, and
finally with ORDERED scans using the new algorithm. Each run did six scans;
to account for any caching effects, I dropped the first scan. Results:

Parameters   | Average run time (s) | Peak RSS (kb)
-------------+----------------------+--------------
UNORDERED:   | 232                  | 710320
ORDERED, old | 33787                | 4465532
ORDERED, new | 979                  | 749440
-------------+----------------------+--------------

Lastly, the new algorithm opens the door to another optimization: while
there's just one "hot" sub-iterator, we can copy data block-by-block instead
of row-by-row. I'll implement this in a follow-up.

1. https://docs.google.com/document/d/1uP0ubjM6ulnKVCRrXtwT_dqrTWjF9tlFSRk0JN2e_O0/edit#

Change-Id: I6deab76a103f45c1b5042b104731e46a771a0f5d
---
M src/kudu/common/encoded_key.cc
M src/kudu/common/generic_iterators-test.cc
M src/kudu/common/generic_iterators.cc
M src/kudu/common/generic_iterators.h
M src/kudu/common/schema.cc
M src/kudu/common/schema.h
6 files changed, 424 insertions(+), 131 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/47/12947/9
-- 
To view, visit http://gerrit.cloudera.org:8080/12947
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I6deab76a103f45c1b5042b104731e46a771a0f5d
Gerrit-Change-Number: 12947
Gerrit-PatchSet: 9
Gerrit-Owner: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] generic iterators: implement three-heap merge algorithm

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Hello Tidy Bot, Mike Percy, Kudu Jenkins, Grant Henke, Todd Lipcon, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/12947

to look at the new patch set (#3).

Change subject: generic_iterators: implement three-heap merge algorithm
......................................................................

generic_iterators: implement three-heap merge algorithm

This patch implements a new algorithm for efficient merging in
MergeIterator. The algorithm is documented extensively in the code so
there's no point in recapping it here. There's also a high-level overview
with pseudocode available[1].

I microbenchmarked the old implementation against the new one, using both
overlapping and non-overlapping inputs with a varying number of lists and
10000 rows per list (see generic_iterators-test). Here are the scan times,
averaged across five runs:

Parameters                  | old      | new      | diff
----------------------------+----------+----------+------------------
overlapping, 10 lists       | 0.015s   | 0.0522s  | +0.0372   (0.29x)
overlapping, 100 lists      | 1.0798s  | 1.1024s  | +0.0226   (0.98x)
overlapping, 1000 lists     | 184.245s | 22.8156s | -161.4294 (8.07x)
non-overlapping, 10 lists   | 0.0126s  | 0.0196s  | +0.007    (0.64x)
non-overlapping, 100 lists  | 0.5038s  | 0.129s   | -0.3748   (3.91x)
non-overlapping, 1000 lists | 89.8626s | 0.9874s  | -88.8752  (91x)
----------------------------+----------+----------+------------------

The goal was to optimize ORDERED scans for large and mostly compacted
tablets, and the new algorithm does just that. With smaller input, the
overhead of using heaps becomes more pronounced. Overlapping input still
benefits from heap-based merging, but the overlapping defeats the hot vs.
cold optimization provided by the algorithm.

To see how it performed against real data, I tested the algorithm against a
mostly compacted "representative" 40G tablet with 1294 rowsets, all stored
on one disk. I used a new CLI tool that does a full tablet scan without
sending any data over the wire. I ran the tool three times: first with
UNORDERED scans, then with ORDERED scans using the old algorithm, and
finally with ORDERED scans using the new algorithm. Each run did six scans;
to account for any caching effects, I dropped the first scan. Results:

Parameters   | Average run time (s) | Peak RSS (kb)
-------------+----------------------+--------------
UNORDERED:   | 232                  | 710320
ORDERED, old | 33787                | 4465532
ORDERED, new | 979                  | 749440
-------------+----------------------+--------------

Lastly, the new algorithm opens the door to another optimization: while
there's just one "hot" sub-iterator, we can copy data block-by-block instead
of row-by-row. I'll implement this in a follow-up.

1. https://docs.google.com/document/d/1uP0ubjM6ulnKVCRrXtwT_dqrTWjF9tlFSRk0JN2e_O0/edit#

Change-Id: I6deab76a103f45c1b5042b104731e46a771a0f5d
---
M src/kudu/common/encoded_key.cc
M src/kudu/common/generic_iterators-test.cc
M src/kudu/common/generic_iterators.cc
M src/kudu/common/generic_iterators.h
M src/kudu/common/schema.cc
M src/kudu/common/schema.h
6 files changed, 408 insertions(+), 131 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/47/12947/3
-- 
To view, visit http://gerrit.cloudera.org:8080/12947
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I6deab76a103f45c1b5042b104731e46a771a0f5d
Gerrit-Change-Number: 12947
Gerrit-PatchSet: 3
Gerrit-Owner: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] generic iterators: implement three-heap merge algorithm

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Hello Tidy Bot, Mike Percy, Kudu Jenkins, Grant Henke, Todd Lipcon, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/12947

to look at the new patch set (#6).

Change subject: generic_iterators: implement three-heap merge algorithm
......................................................................

generic_iterators: implement three-heap merge algorithm

This patch implements a new algorithm for efficient merging in
MergeIterator. The algorithm is documented extensively in the code so
there's no point in recapping it here. There's also a high-level overview
with pseudocode available[1].

I microbenchmarked the old implementation against the new one, using both
overlapping and non-overlapping inputs with a varying number of lists and
10000 rows per list (see generic_iterators-test). Here are the scan times,
averaged across five runs:

Parameters                  | old      | new      | diff
----------------------------+----------+----------+------------------
overlapping, 10 lists       | 0.015s   | 0.0522s  | +0.0372   (0.29x)
overlapping, 100 lists      | 1.0798s  | 1.1024s  | +0.0226   (0.98x)
overlapping, 1000 lists     | 184.245s | 22.8156s | -161.4294 (8.07x)
non-overlapping, 10 lists   | 0.0126s  | 0.0196s  | +0.007    (0.64x)
non-overlapping, 100 lists  | 0.5038s  | 0.129s   | -0.3748   (3.91x)
non-overlapping, 1000 lists | 89.8626s | 0.9874s  | -88.8752  (91x)
----------------------------+----------+----------+------------------

The goal was to optimize ORDERED scans for large and mostly compacted
tablets, and the new algorithm does just that. With smaller input, the
overhead of using heaps becomes more pronounced. Overlapping input still
benefits from heap-based merging, but the overlapping defeats the hot vs.
cold optimization provided by the algorithm.

To see how it performed against real data, I tested the algorithm against a
mostly compacted "representative" 40G tablet with 1294 rowsets, all stored
on one disk. I used a new CLI tool that does a full tablet scan without
sending any data over the wire. I ran the tool three times: first with
UNORDERED scans, then with ORDERED scans using the old algorithm, and
finally with ORDERED scans using the new algorithm. Each run did six scans;
to account for any caching effects, I dropped the first scan. Results:

Parameters   | Average run time (s) | Peak RSS (kb)
-------------+----------------------+--------------
UNORDERED:   | 232                  | 710320
ORDERED, old | 33787                | 4465532
ORDERED, new | 979                  | 749440
-------------+----------------------+--------------

Lastly, the new algorithm opens the door to another optimization: while
there's just one "hot" sub-iterator, we can copy data block-by-block instead
of row-by-row. I'll implement this in a follow-up.

1. https://docs.google.com/document/d/1uP0ubjM6ulnKVCRrXtwT_dqrTWjF9tlFSRk0JN2e_O0/edit#

Change-Id: I6deab76a103f45c1b5042b104731e46a771a0f5d
---
M src/kudu/common/encoded_key.cc
M src/kudu/common/generic_iterators-test.cc
M src/kudu/common/generic_iterators.cc
M src/kudu/common/generic_iterators.h
M src/kudu/common/schema.cc
M src/kudu/common/schema.h
6 files changed, 418 insertions(+), 131 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/47/12947/6
-- 
To view, visit http://gerrit.cloudera.org:8080/12947
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I6deab76a103f45c1b5042b104731e46a771a0f5d
Gerrit-Change-Number: 12947
Gerrit-PatchSet: 6
Gerrit-Owner: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] KUDU-2466: implement three-heap merge algorithm

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has posted comments on this change. ( http://gerrit.cloudera.org:8080/12947 )

Change subject: KUDU-2466: implement three-heap merge algorithm
......................................................................


Patch Set 9:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/12947/7/src/kudu/common/generic_iterators.cc
File src/kudu/common/generic_iterators.cc:

http://gerrit.cloudera.org:8080/#/c/12947/7/src/kudu/common/generic_iterators.cc@148
PS7, Line 148:   Status Init(Arena* decoded_bounds_arena) {
> It looks like this also fixes KUDU-2466, right? If so, maybe that should be
Oh yeah, I suppose it should.


http://gerrit.cloudera.org:8080/#/c/12947/7/src/kudu/common/generic_iterators.cc@217
PS7, Line 217:  
> nit: add /*nrows=*/
Done


http://gerrit.cloudera.org:8080/#/c/12947/7/src/kudu/common/generic_iterators.cc@420
PS7, Line 420: COLD to HOT
> I think this should be from COLD to HOT.
https://www.snopes.com/fact-check/brown-out/

(done)


http://gerrit.cloudera.org:8080/#/c/12947/7/src/kudu/common/generic_iterators.cc@739
PS7, Line 739: ++iter
> nit: ++iter for complex objects like iterators so the previous iterator sta
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/12947
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I6deab76a103f45c1b5042b104731e46a771a0f5d
Gerrit-Change-Number: 12947
Gerrit-PatchSet: 9
Gerrit-Owner: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Sat, 13 Apr 2019 04:22:29 +0000
Gerrit-HasComments: Yes

[kudu-CR] generic iterators: implement three-heap merge algorithm

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Hello Tidy Bot, Mike Percy, Kudu Jenkins, Grant Henke, Todd Lipcon, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/12947

to look at the new patch set (#2).

Change subject: generic_iterators: implement three-heap merge algorithm
......................................................................

generic_iterators: implement three-heap merge algorithm

This patch implements a new algorithm for efficient merging in
MergeIterator. The algorithm is documented extensively in the code so
there's no point in recapping it here. There's also a high-level overview
with pseudocode available[1].

I microbenchmarked the old implementation against the new one, using both
overlapping and non-overlapping inputs with a varying number of lists and
10000 rows per list (see generic_iterators-test). Here are the scan times,
averaged across five runs:

Parameters                  | old      | new      | diff
----------------------------+----------+----------+------------------
overlapping, 10 lists       | 0.015s   | 0.0522s  | +0.0372   (0.29x)
overlapping, 100 lists      | 1.0798s  | 1.1024s  | +0.0226   (0.98x)
overlapping, 1000 lists     | 184.245s | 22.8156s | -161.4294 (8.07x)
non-overlapping, 10 lists   | 0.0126s  | 0.0196s  | +0.007    (0.64x)
non-overlapping, 100 lists  | 0.5038s  | 0.129s   | -0.3748   (3.91x)
non-overlapping, 1000 lists | 89.8626s | 0.9874s  | -88.8752  (91x)
----------------------------+----------+----------+------------------

The goal was to optimize ORDERED scans for large and mostly compacted
tablets, and the new algorithm does just that. With smaller input, the
overhead of using heaps becomes more pronounced. Overlapping input still
benefits from heap-based merging, but the overlapping defeats the hot vs.
cold optimization provided by the algorithm.

To see how it performed against real data, I tested the algorithm against a
mostly compacted "representative" 40G tablet with 1294 rowsets, all stored
on one disk. I used a new CLI tool that does a full tablet scan without
sending any data over the wire. I ran the tool three times: first with
UNORDERED scans, then with ORDERED scans using the old algorithm, and
finally with ORDERED scans using the new algorithm. Each run did six scans;
to account for any caching effects, I dropped the first scan. Results:

Parameters   | Average run time (s) | Peak RSS (kb)
-------------+----------------------+--------------
UNORDERED:   | 232                  | 710320
ORDERED, old | 33787                | 4465532
ORDERED, new | 979                  | 749440
-------------+----------------------+--------------

Lastly, the new algorithm opens the door to another optimization: while
there's just one "hot" sub-iterator, we can copy data block-by-block instead
of row-by-row. I'll implement this in a follow-up.

1. https://docs.google.com/document/d/1uP0ubjM6ulnKVCRrXtwT_dqrTWjF9tlFSRk0JN2e_O0/edit#

Change-Id: I6deab76a103f45c1b5042b104731e46a771a0f5d
---
M src/kudu/common/encoded_key.cc
M src/kudu/common/generic_iterators-test.cc
M src/kudu/common/generic_iterators.cc
M src/kudu/common/generic_iterators.h
M src/kudu/common/schema.cc
M src/kudu/common/schema.h
6 files changed, 408 insertions(+), 126 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/47/12947/2
-- 
To view, visit http://gerrit.cloudera.org:8080/12947
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I6deab76a103f45c1b5042b104731e46a771a0f5d
Gerrit-Change-Number: 12947
Gerrit-PatchSet: 2
Gerrit-Owner: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] generic iterators: implement three-heap merge algorithm

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has posted comments on this change. ( http://gerrit.cloudera.org:8080/12947 )

Change subject: generic_iterators: implement three-heap merge algorithm
......................................................................


Patch Set 6:

I experimented with the different boost heaps that are available. For each one, I ran TestMerge and TestMergeNonOverlapping from generic_iterators-test with --num_rows=1000, num_lists=1000, and --num_iters=10. I averaged the wall clock times (they were fairly stable) and listed them below. For each heap, the first result is for overlapping input and the second is for non-overlapping.

  results.binomial
  2.737
  0.104
  results.d_ary_2
  1.1864
  0.0941
  results.d_ary_3
  1.2917
  0.0935
  results.d_ary_4
  1.3408
  0.0963
  results.d_ary_5
  1.5982
  0.0995
  results.fibonacci
  2.3818
  0.1063
  results.pairing
  2.4572
  0.0927
  results.skew
  1.0659
  0.0881

Based on these results, I switched from using fibonacci heaps to skew heaps.


-- 
To view, visit http://gerrit.cloudera.org:8080/12947
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I6deab76a103f45c1b5042b104731e46a771a0f5d
Gerrit-Change-Number: 12947
Gerrit-PatchSet: 6
Gerrit-Owner: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Tue, 09 Apr 2019 22:20:42 +0000
Gerrit-HasComments: No

[kudu-CR] KUDU-2466: implement three-heap merge algorithm

Posted by "Mike Percy (Code Review)" <ge...@cloudera.org>.
Mike Percy has posted comments on this change. ( http://gerrit.cloudera.org:8080/12947 )

Change subject: KUDU-2466: implement three-heap merge algorithm
......................................................................


Patch Set 9: Code-Review+2

(1 comment)

http://gerrit.cloudera.org:8080/#/c/12947/7/src/kudu/common/generic_iterators.cc
File src/kudu/common/generic_iterators.cc:

http://gerrit.cloudera.org:8080/#/c/12947/7/src/kudu/common/generic_iterators.cc@420
PS7, Line 420: COLD to HOT
> https://www.snopes.com/fact-check/brown-out/
You didn't think I'd review this patch line-by-line, eh? :)



-- 
To view, visit http://gerrit.cloudera.org:8080/12947
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I6deab76a103f45c1b5042b104731e46a771a0f5d
Gerrit-Change-Number: 12947
Gerrit-PatchSet: 9
Gerrit-Owner: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Sat, 13 Apr 2019 05:12:56 +0000
Gerrit-HasComments: Yes