You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@kudu.apache.org by "Todd Lipcon (Code Review)" <ge...@cloudera.org> on 2017/03/25 01:10:01 UTC

[kudu-CR] WIP: rowset tree: add bulk queries

Hello David Ribeiro Alves,

I'd like you to do a code review.  Please visit

    http://gerrit.cloudera.org:8080/6482

to review the following change.

Change subject: WIP: rowset_tree: add bulk queries
......................................................................

WIP: rowset_tree: add bulk queries

This extends the bulk query API from IntervalTree up into RowSet.
TODO: more docs to be written

Change-Id: I6ab24681dfbb3b1e6f08d52eb0647a5f3ca6851f
---
M src/kudu/tablet/rowset_tree-test.cc
M src/kudu/tablet/rowset_tree.cc
M src/kudu/tablet/rowset_tree.h
3 files changed, 68 insertions(+), 5 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/82/6482/1
-- 
To view, visit http://gerrit.cloudera.org:8080/6482
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: I6ab24681dfbb3b1e6f08d52eb0647a5f3ca6851f
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: David Ribeiro Alves <dr...@apache.org>

[kudu-CR] rowset tree: add bulk queries

Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/6482

to look at the new patch set (#6).

Change subject: rowset_tree: add bulk queries
......................................................................

rowset_tree: add bulk queries

This extends the bulk query API from IntervalTree up into RowSet.

TestRowSetTreePerformance.TestPerformance shows a noticeable improvement for
the cases where the number of query points is large:

Q=   10 R=   10     1-by-1 3 ms
Q=   10 R=   10    batched 0 ms (3.00x)
Q=  100 R=   10     1-by-1 9 ms
Q=  100 R=   10    batched 9 ms (1.11x)
Q=  500 R=   10     1-by-1 54 ms
Q=  500 R=   10    batched 57 ms (0.95x)
Q= 1000 R=   10     1-by-1 92 ms
Q= 1000 R=   10    batched 125 ms (0.74x)
Q= 5000 R=   10     1-by-1 490 ms
Q= 5000 R=   10    batched 814 ms (0.60x)
Q=   10 R=  100     1-by-1 4 ms
Q=   10 R=  100    batched 4 ms (1.25x)
Q=  100 R=  100     1-by-1 19 ms
Q=  100 R=  100    batched 22 ms (0.87x)
Q=  500 R=  100     1-by-1 112 ms
Q=  500 R=  100    batched 78 ms (1.43x)
Q= 1000 R=  100     1-by-1 210 ms
Q= 1000 R=  100    batched 181 ms (1.16x)
Q= 5000 R=  100     1-by-1 1031 ms
Q= 5000 R=  100    batched 941 ms (1.10x)
Q=   10 R=  250     1-by-1 1 ms
Q=   10 R=  250    batched 7 ms (0.13x)
Q=  100 R=  250     1-by-1 23 ms
Q=  100 R=  250    batched 20 ms (1.14x)
Q=  500 R=  250     1-by-1 166 ms
Q=  500 R=  250    batched 112 ms (1.48x)
Q= 1000 R=  250     1-by-1 333 ms
Q= 1000 R=  250    batched 207 ms (1.61x)
Q= 5000 R=  250     1-by-1 1568 ms
Q= 5000 R=  250    batched 1155 ms (1.36x)
Q=   10 R=  500     1-by-1 10 ms
Q=   10 R=  500    batched 8 ms (1.37x)
Q=  100 R=  500     1-by-1 46 ms
Q=  100 R=  500    batched 45 ms (1.02x)
Q=  500 R=  500     1-by-1 238 ms
Q=  500 R=  500    batched 169 ms (1.41x)
Q= 1000 R=  500     1-by-1 451 ms
Q= 1000 R=  500    batched 307 ms (1.47x)
Q= 5000 R=  500     1-by-1 2234 ms
Q= 5000 R=  500    batched 1487 ms (1.50x)

The cases where the number of query points is small relative to the number of
rowsets are worse, as predicted by big-O analysis, but in those cases the
other fixed overhead of operations (eg RPC overhead) are probably so much
larger that it isn't noticeable.

If it does turn out to be noticeable for the small-batch case, we could
easily switch to the one-at-a-time algorithm when Q << R.

More important than the above CPU optimizations, however, is that this
will allow for other higher-level optimizations during the process of
applying row operations.

Change-Id: I6ab24681dfbb3b1e6f08d52eb0647a5f3ca6851f
---
M src/kudu/tablet/rowset_tree-test.cc
M src/kudu/tablet/rowset_tree.cc
M src/kudu/tablet/rowset_tree.h
3 files changed, 130 insertions(+), 14 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/82/6482/6
-- 
To view, visit http://gerrit.cloudera.org:8080/6482
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I6ab24681dfbb3b1e6f08d52eb0647a5f3ca6851f
Gerrit-PatchSet: 6
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: David Ribeiro Alves <dr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] rowset tree: add bulk queries

Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/6482

to look at the new patch set (#4).

Change subject: rowset_tree: add bulk queries
......................................................................

rowset_tree: add bulk queries

This extends the bulk query API from IntervalTree up into RowSet.

TestRowSetTree.TestPerformance shows a 6-7x improvement for bulk query vs
one-by-one:

I0328 15:36:15.330320 30901 rowset_tree-test.cc:126] Time spent Querying rowset 1000000 times: real 0.344s	user 0.344s	sys 0.000s
I0328 15:36:15.701807 30901 rowset_tree-test.cc:142] Time spent Querying rowset 1000000 times in batch: real 0.045s	user 0.044s	sys 0.000s

More importantly, this will allow for other higher-level optimizations.

Change-Id: I6ab24681dfbb3b1e6f08d52eb0647a5f3ca6851f
---
M src/kudu/tablet/rowset_tree-test.cc
M src/kudu/tablet/rowset_tree.cc
M src/kudu/tablet/rowset_tree.h
3 files changed, 85 insertions(+), 5 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/82/6482/4
-- 
To view, visit http://gerrit.cloudera.org:8080/6482
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I6ab24681dfbb3b1e6f08d52eb0647a5f3ca6851f
Gerrit-PatchSet: 4
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: David Ribeiro Alves <dr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins

[kudu-CR] rowset tree: add bulk queries

Posted by "David Ribeiro Alves (Code Review)" <ge...@cloudera.org>.
David Ribeiro Alves has posted comments on this change.

Change subject: rowset_tree: add bulk queries
......................................................................


Patch Set 4:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/6482/4/src/kudu/tablet/rowset_tree-test.cc
File src/kudu/tablet/rowset_tree-test.cc:

PS4, Line 108:   const int kNumRowSets = 200;
can you increase the number of rowsets to maybe make this more realistic? feels like 1M rows in a single batch to a single tablet is over the range we usually see, whereas 200 rowsets is under the range we usually see.


http://gerrit.cloudera.org:8080/#/c/6482/4/src/kudu/tablet/rowset_tree.h
File src/kudu/tablet/rowset_tree.h:

PS4, Line 80: callback
callbacks


-- 
To view, visit http://gerrit.cloudera.org:8080/6482
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I6ab24681dfbb3b1e6f08d52eb0647a5f3ca6851f
Gerrit-PatchSet: 4
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: David Ribeiro Alves <dr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-HasComments: Yes

[kudu-CR] rowset tree: add bulk queries

Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/6482

to look at the new patch set (#5).

Change subject: rowset_tree: add bulk queries
......................................................................

rowset_tree: add bulk queries

This extends the bulk query API from IntervalTree up into RowSet.

TestRowSetTreePerformance.TestPerformance shows a noticeable improvement for
the cases where the number of query points is large:

Q=   10 R=   10     1-by-1 3 ms
Q=   10 R=   10    batched 0 ms (3.00x)
Q=  100 R=   10     1-by-1 9 ms
Q=  100 R=   10    batched 9 ms (1.11x)
Q=  500 R=   10     1-by-1 54 ms
Q=  500 R=   10    batched 57 ms (0.95x)
Q= 1000 R=   10     1-by-1 92 ms
Q= 1000 R=   10    batched 125 ms (0.74x)
Q= 5000 R=   10     1-by-1 490 ms
Q= 5000 R=   10    batched 814 ms (0.60x)
Q=   10 R=  100     1-by-1 4 ms
Q=   10 R=  100    batched 4 ms (1.25x)
Q=  100 R=  100     1-by-1 19 ms
Q=  100 R=  100    batched 22 ms (0.87x)
Q=  500 R=  100     1-by-1 112 ms
Q=  500 R=  100    batched 78 ms (1.43x)
Q= 1000 R=  100     1-by-1 210 ms
Q= 1000 R=  100    batched 181 ms (1.16x)
Q= 5000 R=  100     1-by-1 1031 ms
Q= 5000 R=  100    batched 941 ms (1.10x)
Q=   10 R=  250     1-by-1 1 ms
Q=   10 R=  250    batched 7 ms (0.13x)
Q=  100 R=  250     1-by-1 23 ms
Q=  100 R=  250    batched 20 ms (1.14x)
Q=  500 R=  250     1-by-1 166 ms
Q=  500 R=  250    batched 112 ms (1.48x)
Q= 1000 R=  250     1-by-1 333 ms
Q= 1000 R=  250    batched 207 ms (1.61x)
Q= 5000 R=  250     1-by-1 1568 ms
Q= 5000 R=  250    batched 1155 ms (1.36x)
Q=   10 R=  500     1-by-1 10 ms
Q=   10 R=  500    batched 8 ms (1.37x)
Q=  100 R=  500     1-by-1 46 ms
Q=  100 R=  500    batched 45 ms (1.02x)
Q=  500 R=  500     1-by-1 238 ms
Q=  500 R=  500    batched 169 ms (1.41x)
Q= 1000 R=  500     1-by-1 451 ms
Q= 1000 R=  500    batched 307 ms (1.47x)
Q= 5000 R=  500     1-by-1 2234 ms
Q= 5000 R=  500    batched 1487 ms (1.50x)

The cases where the number of query points is small relative to the number of
rowsets are worse, as predicted by big-O analysis, but in those cases the
other fixed overhead of operations (eg RPC overhead) are probably so much
larger that it isn't noticeable.

If it does turn out to be noticeable for the small-batch case, we could
easily switch to the one-at-a-time algorithm when Q << R.

More important than the above CPU optimizations, however, is that this
will allow for other higher-level optimizations during the process of
applying row operations.

Change-Id: I6ab24681dfbb3b1e6f08d52eb0647a5f3ca6851f
---
M src/kudu/tablet/rowset_tree-test.cc
M src/kudu/tablet/rowset_tree.cc
M src/kudu/tablet/rowset_tree.h
3 files changed, 129 insertions(+), 14 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/82/6482/5
-- 
To view, visit http://gerrit.cloudera.org:8080/6482
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I6ab24681dfbb3b1e6f08d52eb0647a5f3ca6851f
Gerrit-PatchSet: 5
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: David Ribeiro Alves <dr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] rowset tree: add bulk queries

Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Todd Lipcon has submitted this change and it was merged.

Change subject: rowset_tree: add bulk queries
......................................................................


rowset_tree: add bulk queries

This extends the bulk query API from IntervalTree up into RowSet.

TestRowSetTreePerformance.TestPerformance shows a noticeable improvement for
the cases where the number of query points is large:

Q=   10 R=   10     1-by-1 3 ms
Q=   10 R=   10    batched 0 ms (3.00x)
Q=  100 R=   10     1-by-1 9 ms
Q=  100 R=   10    batched 9 ms (1.11x)
Q=  500 R=   10     1-by-1 54 ms
Q=  500 R=   10    batched 57 ms (0.95x)
Q= 1000 R=   10     1-by-1 92 ms
Q= 1000 R=   10    batched 125 ms (0.74x)
Q= 5000 R=   10     1-by-1 490 ms
Q= 5000 R=   10    batched 814 ms (0.60x)
Q=   10 R=  100     1-by-1 4 ms
Q=   10 R=  100    batched 4 ms (1.25x)
Q=  100 R=  100     1-by-1 19 ms
Q=  100 R=  100    batched 22 ms (0.87x)
Q=  500 R=  100     1-by-1 112 ms
Q=  500 R=  100    batched 78 ms (1.43x)
Q= 1000 R=  100     1-by-1 210 ms
Q= 1000 R=  100    batched 181 ms (1.16x)
Q= 5000 R=  100     1-by-1 1031 ms
Q= 5000 R=  100    batched 941 ms (1.10x)
Q=   10 R=  250     1-by-1 1 ms
Q=   10 R=  250    batched 7 ms (0.13x)
Q=  100 R=  250     1-by-1 23 ms
Q=  100 R=  250    batched 20 ms (1.14x)
Q=  500 R=  250     1-by-1 166 ms
Q=  500 R=  250    batched 112 ms (1.48x)
Q= 1000 R=  250     1-by-1 333 ms
Q= 1000 R=  250    batched 207 ms (1.61x)
Q= 5000 R=  250     1-by-1 1568 ms
Q= 5000 R=  250    batched 1155 ms (1.36x)
Q=   10 R=  500     1-by-1 10 ms
Q=   10 R=  500    batched 8 ms (1.37x)
Q=  100 R=  500     1-by-1 46 ms
Q=  100 R=  500    batched 45 ms (1.02x)
Q=  500 R=  500     1-by-1 238 ms
Q=  500 R=  500    batched 169 ms (1.41x)
Q= 1000 R=  500     1-by-1 451 ms
Q= 1000 R=  500    batched 307 ms (1.47x)
Q= 5000 R=  500     1-by-1 2234 ms
Q= 5000 R=  500    batched 1487 ms (1.50x)

The cases where the number of query points is small relative to the number of
rowsets are worse, as predicted by big-O analysis, but in those cases the
other fixed overhead of operations (eg RPC overhead) are probably so much
larger that it isn't noticeable.

If it does turn out to be noticeable for the small-batch case, we could
easily switch to the one-at-a-time algorithm when Q << R.

More important than the above CPU optimizations, however, is that this
will allow for other higher-level optimizations during the process of
applying row operations.

Change-Id: I6ab24681dfbb3b1e6f08d52eb0647a5f3ca6851f
Reviewed-on: http://gerrit.cloudera.org:8080/6482
Tested-by: Kudu Jenkins
Reviewed-by: David Ribeiro Alves <dr...@apache.org>
---
M src/kudu/tablet/rowset_tree-test.cc
M src/kudu/tablet/rowset_tree.cc
M src/kudu/tablet/rowset_tree.h
3 files changed, 130 insertions(+), 14 deletions(-)

Approvals:
  David Ribeiro Alves: Looks good to me, approved
  Kudu Jenkins: Verified



-- 
To view, visit http://gerrit.cloudera.org:8080/6482
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: I6ab24681dfbb3b1e6f08d52eb0647a5f3ca6851f
Gerrit-PatchSet: 7
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: David Ribeiro Alves <dr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] rowset tree: add bulk queries

Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Todd Lipcon has posted comments on this change.

Change subject: rowset_tree: add bulk queries
......................................................................


Patch Set 4:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/6482/4/src/kudu/tablet/rowset_tree-test.cc
File src/kudu/tablet/rowset_tree-test.cc:

PS4, Line 108:   const int kNumRowSets = 200;
> can you increase the number of rowsets to maybe make this more realistic? f
k, changed this test to do a parameter sweep across various ratios of query points and interval count


http://gerrit.cloudera.org:8080/#/c/6482/4/src/kudu/tablet/rowset_tree.h
File src/kudu/tablet/rowset_tree.h:

PS4, Line 80: callback
> callbacks
but it's the same callback called multiple times.


-- 
To view, visit http://gerrit.cloudera.org:8080/6482
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I6ab24681dfbb3b1e6f08d52eb0647a5f3ca6851f
Gerrit-PatchSet: 4
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: David Ribeiro Alves <dr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes

[kudu-CR] rowset tree: add bulk queries

Posted by "David Ribeiro Alves (Code Review)" <ge...@cloudera.org>.
David Ribeiro Alves has posted comments on this change.

Change subject: rowset_tree: add bulk queries
......................................................................


Patch Set 6: Code-Review+2

(1 comment)

yeah, it seems that this always outperforms the 1-by-1 version for the cases that matter (>500 RS). nice.

http://gerrit.cloudera.org:8080/#/c/6482/4/src/kudu/tablet/rowset_tree.h
File src/kudu/tablet/rowset_tree.h:

PS4, Line 80: callback
> but it's the same callback called multiple times.
oh, k, never mind


-- 
To view, visit http://gerrit.cloudera.org:8080/6482
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I6ab24681dfbb3b1e6f08d52eb0647a5f3ca6851f
Gerrit-PatchSet: 6
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: David Ribeiro Alves <dr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes

[kudu-CR] rowset tree: add bulk queries

Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Todd Lipcon has posted comments on this change.

Change subject: rowset_tree: add bulk queries
......................................................................


Patch Set 5:

hrm, gcc seems unhappy with my parameterized test. Please take a look anyway and if it looks good I'll rev to fix.

-- 
To view, visit http://gerrit.cloudera.org:8080/6482
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I6ab24681dfbb3b1e6f08d52eb0647a5f3ca6851f
Gerrit-PatchSet: 5
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: David Ribeiro Alves <dr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: No

[kudu-CR] WIP: rowset tree: add bulk queries

Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/6482

to look at the new patch set (#3).

Change subject: WIP: rowset_tree: add bulk queries
......................................................................

WIP: rowset_tree: add bulk queries

This extends the bulk query API from IntervalTree up into RowSet.

TestRowSetTree.TestPerformance shows a ~6x improvement for bulk query vs
one-by-one, and this also will allow for other higher-level
optimizations.

TODO: more docs to be written

Change-Id: I6ab24681dfbb3b1e6f08d52eb0647a5f3ca6851f
---
M src/kudu/tablet/rowset_tree-test.cc
M src/kudu/tablet/rowset_tree.cc
M src/kudu/tablet/rowset_tree.h
3 files changed, 76 insertions(+), 5 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/82/6482/3
-- 
To view, visit http://gerrit.cloudera.org:8080/6482
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I6ab24681dfbb3b1e6f08d52eb0647a5f3ca6851f
Gerrit-PatchSet: 3
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: David Ribeiro Alves <dr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins