You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@kudu.apache.org by "Adar Dembo (Code Review)" <ge...@cloudera.org> on 2018/11/02 05:14:37 UTC
[kudu-CR] delta store: avoid copying deleted row data in ApplyUpdates
Hello Mike Percy, David Ribeiro Alves, Grant Henke, Todd Lipcon,
I'd like you to do a code review. Please visit
http://gerrit.cloudera.org:8080/11856
to review the following change.
Change subject: delta_store: avoid copying deleted row data in ApplyUpdates
......................................................................
delta_store: avoid copying deleted row data in ApplyUpdates
When applying deltas, the scan path will first populate a selection vector
with deletes (i.e. a row is unset if there's a relevant DELETE for it), and
then use it to enable two optimizations:
1. Short-circuit out if all rows in the block have been deleted.
2. Skip predicate evaluation and base data copying for deleted rows.
To this we can add a third optimization: don't apply a delta whose mutation
is to a deleted row. Note that it's not incorrect to apply such deltas (as
we'll skip deleted rows when serializing the scan response to the client);
it's just unnecessary work.
I wrote a microbenchmark to evaluate the impact of this optimization. Below
is the timing and perf stat output of the microbenchmark before and after
applying the optimization (specifically, applying this patch, then
commenting out the filtering in DeltaPrepare::ApplyUpdates for the first
run, and uncommenting it for the next run).
Time spent running 1000 scans with 0 deletes: real 0.523s user 0.524s sys 0.000s
Time spent running 1000 scans with 10 deletes: real 0.515s user 0.516s sys 0.000s
Time spent running 1000 scans with 100 deletes: real 0.512s user 0.512s sys 0.000s
Time spent running 1000 scans with 1000 deletes: real 0.553s user 0.552s sys 0.000s
Performance counter stats for 'bin/deltamemstore-test --gtest_filter=*Varying* --benchmark_num_passes=1000':
2201.029290 task-clock (msec) # 0.991 CPUs utilized
5 context-switches # 0.002 K/sec
0 cpu-migrations # 0.000 K/sec
4,950 page-faults # 0.002 M/sec
8,276,723,832 cycles # 3.760 GHz
24,539,935,941 instructions # 2.96 insn per cycle
4,709,709,705 branches # 2139.776 M/sec
12,631,579 branch-misses # 0.27% of all branches
2.220370506 seconds time elapsed
Time spent running 1000 scans with 0 deletes: real 0.474s user 0.475s sys 0.000s
Time spent running 1000 scans with 10 deletes: real 0.475s user 0.472s sys 0.004s
Time spent running 1000 scans with 100 deletes: real 0.478s user 0.476s sys 0.004s
Time spent running 1000 scans with 1000 deletes: real 0.550s user 0.552s sys 0.000s
Performance counter stats for 'bin/deltamemstore-test --gtest_filter=*Varying* --benchmark_num_passes=1000':
2074.795741 task-clock (msec) # 0.990 CPUs utilized
23 context-switches # 0.011 K/sec
1 cpu-migrations # 0.000 K/sec
4,951 page-faults # 0.002 M/sec
7,675,100,058 cycles # 3.699 GHz
23,100,692,252 instructions # 3.01 insn per cycle
4,539,777,117 branches # 2188.060 M/sec
11,819,267 branch-misses # 0.26% of all branches
2.096193851 seconds time elapsed
Note: I originally wrote this patch thinking it was necessary for diff scan
correctness, but have since convinced myself that it's just an optimization.
Change-Id: I6a646d0816a96e9aba486c0d0a1e6b4a7e15c144
---
M src/kudu/tablet/delta_applier.cc
M src/kudu/tablet/delta_applier.h
M src/kudu/tablet/delta_iterator_merger.cc
M src/kudu/tablet/delta_iterator_merger.h
M src/kudu/tablet/delta_store.cc
M src/kudu/tablet/delta_store.h
M src/kudu/tablet/deltafile-test.cc
M src/kudu/tablet/deltafile.cc
M src/kudu/tablet/deltafile.h
M src/kudu/tablet/deltamemstore-test.cc
M src/kudu/tablet/deltamemstore.cc
M src/kudu/tablet/deltamemstore.h
M src/kudu/tablet/tablet-test-util.h
13 files changed, 116 insertions(+), 46 deletions(-)
git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/56/11856/1
--
To view, visit http://gerrit.cloudera.org:8080/11856
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I6a646d0816a96e9aba486c0d0a1e6b4a7e15c144
Gerrit-Change-Number: 11856
Gerrit-PatchSet: 1
Gerrit-Owner: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: David Ribeiro Alves <da...@gmail.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
[kudu-CR] (01/05) delta store: avoid copying deleted row data in ApplyUpdates
Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/11856 )
Change subject: (01/05) delta_store: avoid copying deleted row data in ApplyUpdates
......................................................................
(01/05) delta_store: avoid copying deleted row data in ApplyUpdates
When applying deltas, the scan path will first populate a selection vector
with deletes (i.e. a row is unset if there's a relevant DELETE for it), and
then use it to enable two optimizations:
1. Short-circuit out if all rows in the block have been deleted.
2. Skip predicate evaluation and base data copying for deleted rows.
To this we can add a third optimization: don't apply a delta whose mutation
is to a deleted row. Note that it's not incorrect to apply such deltas (as
we'll skip deleted rows when serializing the scan response to the client);
it's just unnecessary work.
I wrote a microbenchmark to evaluate the impact of this optimization. Below
is the timing and perf stat output of the microbenchmark before and after
applying the optimization (specifically, applying this patch, then
commenting out the filtering in DeltaPrepare::ApplyUpdates for the first
run, and uncommenting it for the next run).
Time spent running 1000 scans with 0 deletes: real 0.523s user 0.524s sys 0.000s
Time spent running 1000 scans with 10 deletes: real 0.515s user 0.516s sys 0.000s
Time spent running 1000 scans with 100 deletes: real 0.512s user 0.512s sys 0.000s
Time spent running 1000 scans with 1000 deletes: real 0.553s user 0.552s sys 0.000s
Performance counter stats for 'bin/deltamemstore-test --gtest_filter=*Varying* --benchmark_num_passes=1000':
2201.029290 task-clock (msec) # 0.991 CPUs utilized
5 context-switches # 0.002 K/sec
0 cpu-migrations # 0.000 K/sec
4,950 page-faults # 0.002 M/sec
8,276,723,832 cycles # 3.760 GHz
24,539,935,941 instructions # 2.96 insn per cycle
4,709,709,705 branches # 2139.776 M/sec
12,631,579 branch-misses # 0.27% of all branches
2.220370506 seconds time elapsed
Time spent running 1000 scans with 0 deletes: real 0.474s user 0.475s sys 0.000s
Time spent running 1000 scans with 10 deletes: real 0.475s user 0.472s sys 0.004s
Time spent running 1000 scans with 100 deletes: real 0.478s user 0.476s sys 0.004s
Time spent running 1000 scans with 1000 deletes: real 0.550s user 0.552s sys 0.000s
Performance counter stats for 'bin/deltamemstore-test --gtest_filter=*Varying* --benchmark_num_passes=1000':
2074.795741 task-clock (msec) # 0.990 CPUs utilized
23 context-switches # 0.011 K/sec
1 cpu-migrations # 0.000 K/sec
4,951 page-faults # 0.002 M/sec
7,675,100,058 cycles # 3.699 GHz
23,100,692,252 instructions # 3.01 insn per cycle
4,539,777,117 branches # 2188.060 M/sec
11,819,267 branch-misses # 0.26% of all branches
2.096193851 seconds time elapsed
Note: I originally wrote this patch thinking it was necessary for diff scan
correctness, but have since convinced myself that it's just an optimization.
Change-Id: I6a646d0816a96e9aba486c0d0a1e6b4a7e15c144
Reviewed-on: http://gerrit.cloudera.org:8080/11856
Tested-by: Kudu Jenkins
Reviewed-by: Grant Henke <gr...@apache.org>
Reviewed-by: Mike Percy <mp...@apache.org>
---
M src/kudu/tablet/delta_applier.cc
M src/kudu/tablet/delta_applier.h
M src/kudu/tablet/delta_iterator_merger.cc
M src/kudu/tablet/delta_iterator_merger.h
M src/kudu/tablet/delta_store.cc
M src/kudu/tablet/delta_store.h
M src/kudu/tablet/deltafile-test.cc
M src/kudu/tablet/deltafile.cc
M src/kudu/tablet/deltafile.h
M src/kudu/tablet/deltamemstore-test.cc
M src/kudu/tablet/deltamemstore.cc
M src/kudu/tablet/deltamemstore.h
M src/kudu/tablet/tablet-test-util.h
13 files changed, 116 insertions(+), 46 deletions(-)
Approvals:
Kudu Jenkins: Verified
Grant Henke: Looks good to me, but someone else must approve
Mike Percy: Looks good to me, approved
--
To view, visit http://gerrit.cloudera.org:8080/11856
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I6a646d0816a96e9aba486c0d0a1e6b4a7e15c144
Gerrit-Change-Number: 11856
Gerrit-PatchSet: 3
Gerrit-Owner: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: David Ribeiro Alves <da...@gmail.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
[kudu-CR] (01/05) delta store: avoid copying deleted row data in ApplyUpdates
Posted by "Grant Henke (Code Review)" <ge...@cloudera.org>.
Grant Henke has posted comments on this change. ( http://gerrit.cloudera.org:8080/11856 )
Change subject: (01/05) delta_store: avoid copying deleted row data in ApplyUpdates
......................................................................
Patch Set 2: Code-Review+1
--
To view, visit http://gerrit.cloudera.org:8080/11856
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I6a646d0816a96e9aba486c0d0a1e6b4a7e15c144
Gerrit-Change-Number: 11856
Gerrit-PatchSet: 2
Gerrit-Owner: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: David Ribeiro Alves <da...@gmail.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Tue, 06 Nov 2018 17:48:35 +0000
Gerrit-HasComments: No
[kudu-CR] (01/05) delta store: avoid copying deleted row data in ApplyUpdates
Posted by "Mike Percy (Code Review)" <ge...@cloudera.org>.
Mike Percy has posted comments on this change. ( http://gerrit.cloudera.org:8080/11856 )
Change subject: (01/05) delta_store: avoid copying deleted row data in ApplyUpdates
......................................................................
Patch Set 2: Code-Review+2
--
To view, visit http://gerrit.cloudera.org:8080/11856
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I6a646d0816a96e9aba486c0d0a1e6b4a7e15c144
Gerrit-Change-Number: 11856
Gerrit-PatchSet: 2
Gerrit-Owner: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: David Ribeiro Alves <da...@gmail.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Thu, 08 Nov 2018 19:52:00 +0000
Gerrit-HasComments: No
[kudu-CR] (01/05) delta store: avoid copying deleted row data in ApplyUpdates
Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Hello Mike Percy, David Ribeiro Alves, Kudu Jenkins, Grant Henke, Todd Lipcon,
I'd like you to reexamine a change. Please visit
http://gerrit.cloudera.org:8080/11856
to look at the new patch set (#2).
Change subject: (01/05) delta_store: avoid copying deleted row data in ApplyUpdates
......................................................................
(01/05) delta_store: avoid copying deleted row data in ApplyUpdates
When applying deltas, the scan path will first populate a selection vector
with deletes (i.e. a row is unset if there's a relevant DELETE for it), and
then use it to enable two optimizations:
1. Short-circuit out if all rows in the block have been deleted.
2. Skip predicate evaluation and base data copying for deleted rows.
To this we can add a third optimization: don't apply a delta whose mutation
is to a deleted row. Note that it's not incorrect to apply such deltas (as
we'll skip deleted rows when serializing the scan response to the client);
it's just unnecessary work.
I wrote a microbenchmark to evaluate the impact of this optimization. Below
is the timing and perf stat output of the microbenchmark before and after
applying the optimization (specifically, applying this patch, then
commenting out the filtering in DeltaPrepare::ApplyUpdates for the first
run, and uncommenting it for the next run).
Time spent running 1000 scans with 0 deletes: real 0.523s user 0.524s sys 0.000s
Time spent running 1000 scans with 10 deletes: real 0.515s user 0.516s sys 0.000s
Time spent running 1000 scans with 100 deletes: real 0.512s user 0.512s sys 0.000s
Time spent running 1000 scans with 1000 deletes: real 0.553s user 0.552s sys 0.000s
Performance counter stats for 'bin/deltamemstore-test --gtest_filter=*Varying* --benchmark_num_passes=1000':
2201.029290 task-clock (msec) # 0.991 CPUs utilized
5 context-switches # 0.002 K/sec
0 cpu-migrations # 0.000 K/sec
4,950 page-faults # 0.002 M/sec
8,276,723,832 cycles # 3.760 GHz
24,539,935,941 instructions # 2.96 insn per cycle
4,709,709,705 branches # 2139.776 M/sec
12,631,579 branch-misses # 0.27% of all branches
2.220370506 seconds time elapsed
Time spent running 1000 scans with 0 deletes: real 0.474s user 0.475s sys 0.000s
Time spent running 1000 scans with 10 deletes: real 0.475s user 0.472s sys 0.004s
Time spent running 1000 scans with 100 deletes: real 0.478s user 0.476s sys 0.004s
Time spent running 1000 scans with 1000 deletes: real 0.550s user 0.552s sys 0.000s
Performance counter stats for 'bin/deltamemstore-test --gtest_filter=*Varying* --benchmark_num_passes=1000':
2074.795741 task-clock (msec) # 0.990 CPUs utilized
23 context-switches # 0.011 K/sec
1 cpu-migrations # 0.000 K/sec
4,951 page-faults # 0.002 M/sec
7,675,100,058 cycles # 3.699 GHz
23,100,692,252 instructions # 3.01 insn per cycle
4,539,777,117 branches # 2188.060 M/sec
11,819,267 branch-misses # 0.26% of all branches
2.096193851 seconds time elapsed
Note: I originally wrote this patch thinking it was necessary for diff scan
correctness, but have since convinced myself that it's just an optimization.
Change-Id: I6a646d0816a96e9aba486c0d0a1e6b4a7e15c144
---
M src/kudu/tablet/delta_applier.cc
M src/kudu/tablet/delta_applier.h
M src/kudu/tablet/delta_iterator_merger.cc
M src/kudu/tablet/delta_iterator_merger.h
M src/kudu/tablet/delta_store.cc
M src/kudu/tablet/delta_store.h
M src/kudu/tablet/deltafile-test.cc
M src/kudu/tablet/deltafile.cc
M src/kudu/tablet/deltafile.h
M src/kudu/tablet/deltamemstore-test.cc
M src/kudu/tablet/deltamemstore.cc
M src/kudu/tablet/deltamemstore.h
M src/kudu/tablet/tablet-test-util.h
13 files changed, 116 insertions(+), 46 deletions(-)
git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/56/11856/2
--
To view, visit http://gerrit.cloudera.org:8080/11856
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I6a646d0816a96e9aba486c0d0a1e6b4a7e15c144
Gerrit-Change-Number: 11856
Gerrit-PatchSet: 2
Gerrit-Owner: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: David Ribeiro Alves <da...@gmail.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>