You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@kudu.apache.org by "Todd Lipcon (Code Review)" <ge...@cloudera.org> on 2017/03/25 01:10:01 UTC
[kudu-CR] WIP: rowset tree: add bulk queries
Hello David Ribeiro Alves,
I'd like you to do a code review. Please visit
http://gerrit.cloudera.org:8080/6482
to review the following change.
Change subject: WIP: rowset_tree: add bulk queries
......................................................................
WIP: rowset_tree: add bulk queries
This extends the bulk query API from IntervalTree up into RowSet.
TODO: more docs to be written
Change-Id: I6ab24681dfbb3b1e6f08d52eb0647a5f3ca6851f
---
M src/kudu/tablet/rowset_tree-test.cc
M src/kudu/tablet/rowset_tree.cc
M src/kudu/tablet/rowset_tree.h
3 files changed, 68 insertions(+), 5 deletions(-)
git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/82/6482/1
--
To view, visit http://gerrit.cloudera.org:8080/6482
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-MessageType: newchange
Gerrit-Change-Id: I6ab24681dfbb3b1e6f08d52eb0647a5f3ca6851f
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: David Ribeiro Alves <dr...@apache.org>
[kudu-CR] rowset tree: add bulk queries
Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins,
I'd like you to reexamine a change. Please visit
http://gerrit.cloudera.org:8080/6482
to look at the new patch set (#6).
Change subject: rowset_tree: add bulk queries
......................................................................
rowset_tree: add bulk queries
This extends the bulk query API from IntervalTree up into RowSet.
TestRowSetTreePerformance.TestPerformance shows a noticeable improvement for
the cases where the number of query points is large:
Q= 10 R= 10 1-by-1 3 ms
Q= 10 R= 10 batched 0 ms (3.00x)
Q= 100 R= 10 1-by-1 9 ms
Q= 100 R= 10 batched 9 ms (1.11x)
Q= 500 R= 10 1-by-1 54 ms
Q= 500 R= 10 batched 57 ms (0.95x)
Q= 1000 R= 10 1-by-1 92 ms
Q= 1000 R= 10 batched 125 ms (0.74x)
Q= 5000 R= 10 1-by-1 490 ms
Q= 5000 R= 10 batched 814 ms (0.60x)
Q= 10 R= 100 1-by-1 4 ms
Q= 10 R= 100 batched 4 ms (1.25x)
Q= 100 R= 100 1-by-1 19 ms
Q= 100 R= 100 batched 22 ms (0.87x)
Q= 500 R= 100 1-by-1 112 ms
Q= 500 R= 100 batched 78 ms (1.43x)
Q= 1000 R= 100 1-by-1 210 ms
Q= 1000 R= 100 batched 181 ms (1.16x)
Q= 5000 R= 100 1-by-1 1031 ms
Q= 5000 R= 100 batched 941 ms (1.10x)
Q= 10 R= 250 1-by-1 1 ms
Q= 10 R= 250 batched 7 ms (0.13x)
Q= 100 R= 250 1-by-1 23 ms
Q= 100 R= 250 batched 20 ms (1.14x)
Q= 500 R= 250 1-by-1 166 ms
Q= 500 R= 250 batched 112 ms (1.48x)
Q= 1000 R= 250 1-by-1 333 ms
Q= 1000 R= 250 batched 207 ms (1.61x)
Q= 5000 R= 250 1-by-1 1568 ms
Q= 5000 R= 250 batched 1155 ms (1.36x)
Q= 10 R= 500 1-by-1 10 ms
Q= 10 R= 500 batched 8 ms (1.37x)
Q= 100 R= 500 1-by-1 46 ms
Q= 100 R= 500 batched 45 ms (1.02x)
Q= 500 R= 500 1-by-1 238 ms
Q= 500 R= 500 batched 169 ms (1.41x)
Q= 1000 R= 500 1-by-1 451 ms
Q= 1000 R= 500 batched 307 ms (1.47x)
Q= 5000 R= 500 1-by-1 2234 ms
Q= 5000 R= 500 batched 1487 ms (1.50x)
The cases where the number of query points is small relative to the number of
rowsets are worse, as predicted by big-O analysis, but in those cases the
other fixed overhead of operations (eg RPC overhead) are probably so much
larger that it isn't noticeable.
If it does turn out to be noticeable for the small-batch case, we could
easily switch to the one-at-a-time algorithm when Q << R.
More important than the above CPU optimizations, however, is that this
will allow for other higher-level optimizations during the process of
applying row operations.
Change-Id: I6ab24681dfbb3b1e6f08d52eb0647a5f3ca6851f
---
M src/kudu/tablet/rowset_tree-test.cc
M src/kudu/tablet/rowset_tree.cc
M src/kudu/tablet/rowset_tree.h
3 files changed, 130 insertions(+), 14 deletions(-)
git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/82/6482/6
--
To view, visit http://gerrit.cloudera.org:8080/6482
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I6ab24681dfbb3b1e6f08d52eb0647a5f3ca6851f
Gerrit-PatchSet: 6
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: David Ribeiro Alves <dr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
[kudu-CR] rowset tree: add bulk queries
Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins,
I'd like you to reexamine a change. Please visit
http://gerrit.cloudera.org:8080/6482
to look at the new patch set (#4).
Change subject: rowset_tree: add bulk queries
......................................................................
rowset_tree: add bulk queries
This extends the bulk query API from IntervalTree up into RowSet.
TestRowSetTree.TestPerformance shows a 6-7x improvement for bulk query vs
one-by-one:
I0328 15:36:15.330320 30901 rowset_tree-test.cc:126] Time spent Querying rowset 1000000 times: real 0.344s user 0.344s sys 0.000s
I0328 15:36:15.701807 30901 rowset_tree-test.cc:142] Time spent Querying rowset 1000000 times in batch: real 0.045s user 0.044s sys 0.000s
More importantly, this will allow for other higher-level optimizations.
Change-Id: I6ab24681dfbb3b1e6f08d52eb0647a5f3ca6851f
---
M src/kudu/tablet/rowset_tree-test.cc
M src/kudu/tablet/rowset_tree.cc
M src/kudu/tablet/rowset_tree.h
3 files changed, 85 insertions(+), 5 deletions(-)
git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/82/6482/4
--
To view, visit http://gerrit.cloudera.org:8080/6482
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I6ab24681dfbb3b1e6f08d52eb0647a5f3ca6851f
Gerrit-PatchSet: 4
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: David Ribeiro Alves <dr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
[kudu-CR] rowset tree: add bulk queries
Posted by "David Ribeiro Alves (Code Review)" <ge...@cloudera.org>.
David Ribeiro Alves has posted comments on this change.
Change subject: rowset_tree: add bulk queries
......................................................................
Patch Set 4:
(2 comments)
http://gerrit.cloudera.org:8080/#/c/6482/4/src/kudu/tablet/rowset_tree-test.cc
File src/kudu/tablet/rowset_tree-test.cc:
PS4, Line 108: const int kNumRowSets = 200;
can you increase the number of rowsets to maybe make this more realistic? feels like 1M rows in a single batch to a single tablet is over the range we usually see, whereas 200 rowsets is under the range we usually see.
http://gerrit.cloudera.org:8080/#/c/6482/4/src/kudu/tablet/rowset_tree.h
File src/kudu/tablet/rowset_tree.h:
PS4, Line 80: callback
callbacks
--
To view, visit http://gerrit.cloudera.org:8080/6482
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-MessageType: comment
Gerrit-Change-Id: I6ab24681dfbb3b1e6f08d52eb0647a5f3ca6851f
Gerrit-PatchSet: 4
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: David Ribeiro Alves <dr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-HasComments: Yes
[kudu-CR] rowset tree: add bulk queries
Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins,
I'd like you to reexamine a change. Please visit
http://gerrit.cloudera.org:8080/6482
to look at the new patch set (#5).
Change subject: rowset_tree: add bulk queries
......................................................................
rowset_tree: add bulk queries
This extends the bulk query API from IntervalTree up into RowSet.
TestRowSetTreePerformance.TestPerformance shows a noticeable improvement for
the cases where the number of query points is large:
Q= 10 R= 10 1-by-1 3 ms
Q= 10 R= 10 batched 0 ms (3.00x)
Q= 100 R= 10 1-by-1 9 ms
Q= 100 R= 10 batched 9 ms (1.11x)
Q= 500 R= 10 1-by-1 54 ms
Q= 500 R= 10 batched 57 ms (0.95x)
Q= 1000 R= 10 1-by-1 92 ms
Q= 1000 R= 10 batched 125 ms (0.74x)
Q= 5000 R= 10 1-by-1 490 ms
Q= 5000 R= 10 batched 814 ms (0.60x)
Q= 10 R= 100 1-by-1 4 ms
Q= 10 R= 100 batched 4 ms (1.25x)
Q= 100 R= 100 1-by-1 19 ms
Q= 100 R= 100 batched 22 ms (0.87x)
Q= 500 R= 100 1-by-1 112 ms
Q= 500 R= 100 batched 78 ms (1.43x)
Q= 1000 R= 100 1-by-1 210 ms
Q= 1000 R= 100 batched 181 ms (1.16x)
Q= 5000 R= 100 1-by-1 1031 ms
Q= 5000 R= 100 batched 941 ms (1.10x)
Q= 10 R= 250 1-by-1 1 ms
Q= 10 R= 250 batched 7 ms (0.13x)
Q= 100 R= 250 1-by-1 23 ms
Q= 100 R= 250 batched 20 ms (1.14x)
Q= 500 R= 250 1-by-1 166 ms
Q= 500 R= 250 batched 112 ms (1.48x)
Q= 1000 R= 250 1-by-1 333 ms
Q= 1000 R= 250 batched 207 ms (1.61x)
Q= 5000 R= 250 1-by-1 1568 ms
Q= 5000 R= 250 batched 1155 ms (1.36x)
Q= 10 R= 500 1-by-1 10 ms
Q= 10 R= 500 batched 8 ms (1.37x)
Q= 100 R= 500 1-by-1 46 ms
Q= 100 R= 500 batched 45 ms (1.02x)
Q= 500 R= 500 1-by-1 238 ms
Q= 500 R= 500 batched 169 ms (1.41x)
Q= 1000 R= 500 1-by-1 451 ms
Q= 1000 R= 500 batched 307 ms (1.47x)
Q= 5000 R= 500 1-by-1 2234 ms
Q= 5000 R= 500 batched 1487 ms (1.50x)
The cases where the number of query points is small relative to the number of
rowsets are worse, as predicted by big-O analysis, but in those cases the
other fixed overhead of operations (eg RPC overhead) are probably so much
larger that it isn't noticeable.
If it does turn out to be noticeable for the small-batch case, we could
easily switch to the one-at-a-time algorithm when Q << R.
More important than the above CPU optimizations, however, is that this
will allow for other higher-level optimizations during the process of
applying row operations.
Change-Id: I6ab24681dfbb3b1e6f08d52eb0647a5f3ca6851f
---
M src/kudu/tablet/rowset_tree-test.cc
M src/kudu/tablet/rowset_tree.cc
M src/kudu/tablet/rowset_tree.h
3 files changed, 129 insertions(+), 14 deletions(-)
git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/82/6482/5
--
To view, visit http://gerrit.cloudera.org:8080/6482
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I6ab24681dfbb3b1e6f08d52eb0647a5f3ca6851f
Gerrit-PatchSet: 5
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: David Ribeiro Alves <dr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
[kudu-CR] rowset tree: add bulk queries
Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Todd Lipcon has submitted this change and it was merged.
Change subject: rowset_tree: add bulk queries
......................................................................
rowset_tree: add bulk queries
This extends the bulk query API from IntervalTree up into RowSet.
TestRowSetTreePerformance.TestPerformance shows a noticeable improvement for
the cases where the number of query points is large:
Q= 10 R= 10 1-by-1 3 ms
Q= 10 R= 10 batched 0 ms (3.00x)
Q= 100 R= 10 1-by-1 9 ms
Q= 100 R= 10 batched 9 ms (1.11x)
Q= 500 R= 10 1-by-1 54 ms
Q= 500 R= 10 batched 57 ms (0.95x)
Q= 1000 R= 10 1-by-1 92 ms
Q= 1000 R= 10 batched 125 ms (0.74x)
Q= 5000 R= 10 1-by-1 490 ms
Q= 5000 R= 10 batched 814 ms (0.60x)
Q= 10 R= 100 1-by-1 4 ms
Q= 10 R= 100 batched 4 ms (1.25x)
Q= 100 R= 100 1-by-1 19 ms
Q= 100 R= 100 batched 22 ms (0.87x)
Q= 500 R= 100 1-by-1 112 ms
Q= 500 R= 100 batched 78 ms (1.43x)
Q= 1000 R= 100 1-by-1 210 ms
Q= 1000 R= 100 batched 181 ms (1.16x)
Q= 5000 R= 100 1-by-1 1031 ms
Q= 5000 R= 100 batched 941 ms (1.10x)
Q= 10 R= 250 1-by-1 1 ms
Q= 10 R= 250 batched 7 ms (0.13x)
Q= 100 R= 250 1-by-1 23 ms
Q= 100 R= 250 batched 20 ms (1.14x)
Q= 500 R= 250 1-by-1 166 ms
Q= 500 R= 250 batched 112 ms (1.48x)
Q= 1000 R= 250 1-by-1 333 ms
Q= 1000 R= 250 batched 207 ms (1.61x)
Q= 5000 R= 250 1-by-1 1568 ms
Q= 5000 R= 250 batched 1155 ms (1.36x)
Q= 10 R= 500 1-by-1 10 ms
Q= 10 R= 500 batched 8 ms (1.37x)
Q= 100 R= 500 1-by-1 46 ms
Q= 100 R= 500 batched 45 ms (1.02x)
Q= 500 R= 500 1-by-1 238 ms
Q= 500 R= 500 batched 169 ms (1.41x)
Q= 1000 R= 500 1-by-1 451 ms
Q= 1000 R= 500 batched 307 ms (1.47x)
Q= 5000 R= 500 1-by-1 2234 ms
Q= 5000 R= 500 batched 1487 ms (1.50x)
The cases where the number of query points is small relative to the number of
rowsets are worse, as predicted by big-O analysis, but in those cases the
other fixed overhead of operations (eg RPC overhead) are probably so much
larger that it isn't noticeable.
If it does turn out to be noticeable for the small-batch case, we could
easily switch to the one-at-a-time algorithm when Q << R.
More important than the above CPU optimizations, however, is that this
will allow for other higher-level optimizations during the process of
applying row operations.
Change-Id: I6ab24681dfbb3b1e6f08d52eb0647a5f3ca6851f
Reviewed-on: http://gerrit.cloudera.org:8080/6482
Tested-by: Kudu Jenkins
Reviewed-by: David Ribeiro Alves <dr...@apache.org>
---
M src/kudu/tablet/rowset_tree-test.cc
M src/kudu/tablet/rowset_tree.cc
M src/kudu/tablet/rowset_tree.h
3 files changed, 130 insertions(+), 14 deletions(-)
Approvals:
David Ribeiro Alves: Looks good to me, approved
Kudu Jenkins: Verified
--
To view, visit http://gerrit.cloudera.org:8080/6482
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-MessageType: merged
Gerrit-Change-Id: I6ab24681dfbb3b1e6f08d52eb0647a5f3ca6851f
Gerrit-PatchSet: 7
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: David Ribeiro Alves <dr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
[kudu-CR] rowset tree: add bulk queries
Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Todd Lipcon has posted comments on this change.
Change subject: rowset_tree: add bulk queries
......................................................................
Patch Set 4:
(2 comments)
http://gerrit.cloudera.org:8080/#/c/6482/4/src/kudu/tablet/rowset_tree-test.cc
File src/kudu/tablet/rowset_tree-test.cc:
PS4, Line 108: const int kNumRowSets = 200;
> can you increase the number of rowsets to maybe make this more realistic? f
k, changed this test to do a parameter sweep across various ratios of query points and interval count
http://gerrit.cloudera.org:8080/#/c/6482/4/src/kudu/tablet/rowset_tree.h
File src/kudu/tablet/rowset_tree.h:
PS4, Line 80: callback
> callbacks
but it's the same callback called multiple times.
--
To view, visit http://gerrit.cloudera.org:8080/6482
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-MessageType: comment
Gerrit-Change-Id: I6ab24681dfbb3b1e6f08d52eb0647a5f3ca6851f
Gerrit-PatchSet: 4
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: David Ribeiro Alves <dr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes
[kudu-CR] rowset tree: add bulk queries
Posted by "David Ribeiro Alves (Code Review)" <ge...@cloudera.org>.
David Ribeiro Alves has posted comments on this change.
Change subject: rowset_tree: add bulk queries
......................................................................
Patch Set 6: Code-Review+2
(1 comment)
yeah, it seems that this always outperforms the 1-by-1 version for the cases that matter (>500 RS). nice.
http://gerrit.cloudera.org:8080/#/c/6482/4/src/kudu/tablet/rowset_tree.h
File src/kudu/tablet/rowset_tree.h:
PS4, Line 80: callback
> but it's the same callback called multiple times.
oh, k, never mind
--
To view, visit http://gerrit.cloudera.org:8080/6482
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-MessageType: comment
Gerrit-Change-Id: I6ab24681dfbb3b1e6f08d52eb0647a5f3ca6851f
Gerrit-PatchSet: 6
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: David Ribeiro Alves <dr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes
[kudu-CR] rowset tree: add bulk queries
Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Todd Lipcon has posted comments on this change.
Change subject: rowset_tree: add bulk queries
......................................................................
Patch Set 5:
hrm, gcc seems unhappy with my parameterized test. Please take a look anyway and if it looks good I'll rev to fix.
--
To view, visit http://gerrit.cloudera.org:8080/6482
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-MessageType: comment
Gerrit-Change-Id: I6ab24681dfbb3b1e6f08d52eb0647a5f3ca6851f
Gerrit-PatchSet: 5
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: David Ribeiro Alves <dr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: No
[kudu-CR] WIP: rowset tree: add bulk queries
Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins,
I'd like you to reexamine a change. Please visit
http://gerrit.cloudera.org:8080/6482
to look at the new patch set (#3).
Change subject: WIP: rowset_tree: add bulk queries
......................................................................
WIP: rowset_tree: add bulk queries
This extends the bulk query API from IntervalTree up into RowSet.
TestRowSetTree.TestPerformance shows a ~6x improvement for bulk query vs
one-by-one, and this also will allow for other higher-level
optimizations.
TODO: more docs to be written
Change-Id: I6ab24681dfbb3b1e6f08d52eb0647a5f3ca6851f
---
M src/kudu/tablet/rowset_tree-test.cc
M src/kudu/tablet/rowset_tree.cc
M src/kudu/tablet/rowset_tree.h
3 files changed, 76 insertions(+), 5 deletions(-)
git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/82/6482/3
--
To view, visit http://gerrit.cloudera.org:8080/6482
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I6ab24681dfbb3b1e6f08d52eb0647a5f3ca6851f
Gerrit-PatchSet: 3
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: David Ribeiro Alves <dr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins