You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@kudu.apache.org by "Adar Dembo (Code Review)" <ge...@cloudera.org> on 2018/11/02 05:14:37 UTC

[kudu-CR] delta store: avoid copying deleted row data in ApplyUpdates

Hello Mike Percy, David Ribeiro Alves, Grant Henke, Todd Lipcon,

I'd like you to do a code review. Please visit

    http://gerrit.cloudera.org:8080/11856

to review the following change.


Change subject: delta_store: avoid copying deleted row data in ApplyUpdates
......................................................................

delta_store: avoid copying deleted row data in ApplyUpdates

When applying deltas, the scan path will first populate a selection vector
with deletes (i.e. a row is unset if there's a relevant DELETE for it), and
then use it to enable two optimizations:
1. Short-circuit out if all rows in the block have been deleted.
2. Skip predicate evaluation and base data copying for deleted rows.

To this we can add a third optimization: don't apply a delta whose mutation
is to a deleted row. Note that it's not incorrect to apply such deltas (as
we'll skip deleted rows when serializing the scan response to the client);
it's just unnecessary work.

I wrote a microbenchmark to evaluate the impact of this optimization. Below
is the timing and perf stat output of the microbenchmark before and after
applying the optimization (specifically, applying this patch, then
commenting out the filtering in DeltaPrepare::ApplyUpdates for the first
run, and uncommenting it for the next run).

  Time spent running 1000 scans with 0 deletes: real 0.523s	user 0.524s	sys 0.000s
  Time spent running 1000 scans with 10 deletes: real 0.515s	user 0.516s	sys 0.000s
  Time spent running 1000 scans with 100 deletes: real 0.512s	user 0.512s	sys 0.000s
  Time spent running 1000 scans with 1000 deletes: real 0.553s	user 0.552s	sys 0.000s

   Performance counter stats for 'bin/deltamemstore-test --gtest_filter=*Varying* --benchmark_num_passes=1000':

       2201.029290      task-clock (msec)         #    0.991 CPUs utilized
                 5      context-switches          #    0.002 K/sec
                 0      cpu-migrations            #    0.000 K/sec
             4,950      page-faults               #    0.002 M/sec
     8,276,723,832      cycles                    #    3.760 GHz
    24,539,935,941      instructions              #    2.96  insn per cycle
     4,709,709,705      branches                  # 2139.776 M/sec
        12,631,579      branch-misses             #    0.27% of all branches

       2.220370506 seconds time elapsed

  Time spent running 1000 scans with 0 deletes: real 0.474s	user 0.475s	sys 0.000s
  Time spent running 1000 scans with 10 deletes: real 0.475s	user 0.472s	sys 0.004s
  Time spent running 1000 scans with 100 deletes: real 0.478s	user 0.476s	sys 0.004s
  Time spent running 1000 scans with 1000 deletes: real 0.550s	user 0.552s	sys 0.000s

   Performance counter stats for 'bin/deltamemstore-test --gtest_filter=*Varying* --benchmark_num_passes=1000':

       2074.795741      task-clock (msec)         #    0.990 CPUs utilized
                23      context-switches          #    0.011 K/sec
                 1      cpu-migrations            #    0.000 K/sec
             4,951      page-faults               #    0.002 M/sec
     7,675,100,058      cycles                    #    3.699 GHz
    23,100,692,252      instructions              #    3.01  insn per cycle
     4,539,777,117      branches                  # 2188.060 M/sec
        11,819,267      branch-misses             #    0.26% of all branches

       2.096193851 seconds time elapsed

Note: I originally wrote this patch thinking it was necessary for diff scan
correctness, but have since convinced myself that it's just an optimization.

Change-Id: I6a646d0816a96e9aba486c0d0a1e6b4a7e15c144
---
M src/kudu/tablet/delta_applier.cc
M src/kudu/tablet/delta_applier.h
M src/kudu/tablet/delta_iterator_merger.cc
M src/kudu/tablet/delta_iterator_merger.h
M src/kudu/tablet/delta_store.cc
M src/kudu/tablet/delta_store.h
M src/kudu/tablet/deltafile-test.cc
M src/kudu/tablet/deltafile.cc
M src/kudu/tablet/deltafile.h
M src/kudu/tablet/deltamemstore-test.cc
M src/kudu/tablet/deltamemstore.cc
M src/kudu/tablet/deltamemstore.h
M src/kudu/tablet/tablet-test-util.h
13 files changed, 116 insertions(+), 46 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/56/11856/1
-- 
To view, visit http://gerrit.cloudera.org:8080/11856
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I6a646d0816a96e9aba486c0d0a1e6b4a7e15c144
Gerrit-Change-Number: 11856
Gerrit-PatchSet: 1
Gerrit-Owner: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: David Ribeiro Alves <da...@gmail.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] (01/05) delta store: avoid copying deleted row data in ApplyUpdates

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/11856 )

Change subject: (01/05) delta_store: avoid copying deleted row data in ApplyUpdates
......................................................................

(01/05) delta_store: avoid copying deleted row data in ApplyUpdates

When applying deltas, the scan path will first populate a selection vector
with deletes (i.e. a row is unset if there's a relevant DELETE for it), and
then use it to enable two optimizations:
1. Short-circuit out if all rows in the block have been deleted.
2. Skip predicate evaluation and base data copying for deleted rows.

To this we can add a third optimization: don't apply a delta whose mutation
is to a deleted row. Note that it's not incorrect to apply such deltas (as
we'll skip deleted rows when serializing the scan response to the client);
it's just unnecessary work.

I wrote a microbenchmark to evaluate the impact of this optimization. Below
is the timing and perf stat output of the microbenchmark before and after
applying the optimization (specifically, applying this patch, then
commenting out the filtering in DeltaPrepare::ApplyUpdates for the first
run, and uncommenting it for the next run).

  Time spent running 1000 scans with 0 deletes: real 0.523s	user 0.524s	sys 0.000s
  Time spent running 1000 scans with 10 deletes: real 0.515s	user 0.516s	sys 0.000s
  Time spent running 1000 scans with 100 deletes: real 0.512s	user 0.512s	sys 0.000s
  Time spent running 1000 scans with 1000 deletes: real 0.553s	user 0.552s	sys 0.000s

   Performance counter stats for 'bin/deltamemstore-test --gtest_filter=*Varying* --benchmark_num_passes=1000':

       2201.029290      task-clock (msec)         #    0.991 CPUs utilized
                 5      context-switches          #    0.002 K/sec
                 0      cpu-migrations            #    0.000 K/sec
             4,950      page-faults               #    0.002 M/sec
     8,276,723,832      cycles                    #    3.760 GHz
    24,539,935,941      instructions              #    2.96  insn per cycle
     4,709,709,705      branches                  # 2139.776 M/sec
        12,631,579      branch-misses             #    0.27% of all branches

       2.220370506 seconds time elapsed

  Time spent running 1000 scans with 0 deletes: real 0.474s	user 0.475s	sys 0.000s
  Time spent running 1000 scans with 10 deletes: real 0.475s	user 0.472s	sys 0.004s
  Time spent running 1000 scans with 100 deletes: real 0.478s	user 0.476s	sys 0.004s
  Time spent running 1000 scans with 1000 deletes: real 0.550s	user 0.552s	sys 0.000s

   Performance counter stats for 'bin/deltamemstore-test --gtest_filter=*Varying* --benchmark_num_passes=1000':

       2074.795741      task-clock (msec)         #    0.990 CPUs utilized
                23      context-switches          #    0.011 K/sec
                 1      cpu-migrations            #    0.000 K/sec
             4,951      page-faults               #    0.002 M/sec
     7,675,100,058      cycles                    #    3.699 GHz
    23,100,692,252      instructions              #    3.01  insn per cycle
     4,539,777,117      branches                  # 2188.060 M/sec
        11,819,267      branch-misses             #    0.26% of all branches

       2.096193851 seconds time elapsed

Note: I originally wrote this patch thinking it was necessary for diff scan
correctness, but have since convinced myself that it's just an optimization.

Change-Id: I6a646d0816a96e9aba486c0d0a1e6b4a7e15c144
Reviewed-on: http://gerrit.cloudera.org:8080/11856
Tested-by: Kudu Jenkins
Reviewed-by: Grant Henke <gr...@apache.org>
Reviewed-by: Mike Percy <mp...@apache.org>
---
M src/kudu/tablet/delta_applier.cc
M src/kudu/tablet/delta_applier.h
M src/kudu/tablet/delta_iterator_merger.cc
M src/kudu/tablet/delta_iterator_merger.h
M src/kudu/tablet/delta_store.cc
M src/kudu/tablet/delta_store.h
M src/kudu/tablet/deltafile-test.cc
M src/kudu/tablet/deltafile.cc
M src/kudu/tablet/deltafile.h
M src/kudu/tablet/deltamemstore-test.cc
M src/kudu/tablet/deltamemstore.cc
M src/kudu/tablet/deltamemstore.h
M src/kudu/tablet/tablet-test-util.h
13 files changed, 116 insertions(+), 46 deletions(-)

Approvals:
  Kudu Jenkins: Verified
  Grant Henke: Looks good to me, but someone else must approve
  Mike Percy: Looks good to me, approved

-- 
To view, visit http://gerrit.cloudera.org:8080/11856
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I6a646d0816a96e9aba486c0d0a1e6b4a7e15c144
Gerrit-Change-Number: 11856
Gerrit-PatchSet: 3
Gerrit-Owner: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: David Ribeiro Alves <da...@gmail.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] (01/05) delta store: avoid copying deleted row data in ApplyUpdates

Posted by "Grant Henke (Code Review)" <ge...@cloudera.org>.
Grant Henke has posted comments on this change. ( http://gerrit.cloudera.org:8080/11856 )

Change subject: (01/05) delta_store: avoid copying deleted row data in ApplyUpdates
......................................................................


Patch Set 2: Code-Review+1


-- 
To view, visit http://gerrit.cloudera.org:8080/11856
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I6a646d0816a96e9aba486c0d0a1e6b4a7e15c144
Gerrit-Change-Number: 11856
Gerrit-PatchSet: 2
Gerrit-Owner: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: David Ribeiro Alves <da...@gmail.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Tue, 06 Nov 2018 17:48:35 +0000
Gerrit-HasComments: No

[kudu-CR] (01/05) delta store: avoid copying deleted row data in ApplyUpdates

Posted by "Mike Percy (Code Review)" <ge...@cloudera.org>.
Mike Percy has posted comments on this change. ( http://gerrit.cloudera.org:8080/11856 )

Change subject: (01/05) delta_store: avoid copying deleted row data in ApplyUpdates
......................................................................


Patch Set 2: Code-Review+2


-- 
To view, visit http://gerrit.cloudera.org:8080/11856
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I6a646d0816a96e9aba486c0d0a1e6b4a7e15c144
Gerrit-Change-Number: 11856
Gerrit-PatchSet: 2
Gerrit-Owner: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: David Ribeiro Alves <da...@gmail.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Thu, 08 Nov 2018 19:52:00 +0000
Gerrit-HasComments: No

[kudu-CR] (01/05) delta store: avoid copying deleted row data in ApplyUpdates

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Hello Mike Percy, David Ribeiro Alves, Kudu Jenkins, Grant Henke, Todd Lipcon, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/11856

to look at the new patch set (#2).

Change subject: (01/05) delta_store: avoid copying deleted row data in ApplyUpdates
......................................................................

(01/05) delta_store: avoid copying deleted row data in ApplyUpdates

When applying deltas, the scan path will first populate a selection vector
with deletes (i.e. a row is unset if there's a relevant DELETE for it), and
then use it to enable two optimizations:
1. Short-circuit out if all rows in the block have been deleted.
2. Skip predicate evaluation and base data copying for deleted rows.

To this we can add a third optimization: don't apply a delta whose mutation
is to a deleted row. Note that it's not incorrect to apply such deltas (as
we'll skip deleted rows when serializing the scan response to the client);
it's just unnecessary work.

I wrote a microbenchmark to evaluate the impact of this optimization. Below
is the timing and perf stat output of the microbenchmark before and after
applying the optimization (specifically, applying this patch, then
commenting out the filtering in DeltaPrepare::ApplyUpdates for the first
run, and uncommenting it for the next run).

  Time spent running 1000 scans with 0 deletes: real 0.523s	user 0.524s	sys 0.000s
  Time spent running 1000 scans with 10 deletes: real 0.515s	user 0.516s	sys 0.000s
  Time spent running 1000 scans with 100 deletes: real 0.512s	user 0.512s	sys 0.000s
  Time spent running 1000 scans with 1000 deletes: real 0.553s	user 0.552s	sys 0.000s

   Performance counter stats for 'bin/deltamemstore-test --gtest_filter=*Varying* --benchmark_num_passes=1000':

       2201.029290      task-clock (msec)         #    0.991 CPUs utilized
                 5      context-switches          #    0.002 K/sec
                 0      cpu-migrations            #    0.000 K/sec
             4,950      page-faults               #    0.002 M/sec
     8,276,723,832      cycles                    #    3.760 GHz
    24,539,935,941      instructions              #    2.96  insn per cycle
     4,709,709,705      branches                  # 2139.776 M/sec
        12,631,579      branch-misses             #    0.27% of all branches

       2.220370506 seconds time elapsed

  Time spent running 1000 scans with 0 deletes: real 0.474s	user 0.475s	sys 0.000s
  Time spent running 1000 scans with 10 deletes: real 0.475s	user 0.472s	sys 0.004s
  Time spent running 1000 scans with 100 deletes: real 0.478s	user 0.476s	sys 0.004s
  Time spent running 1000 scans with 1000 deletes: real 0.550s	user 0.552s	sys 0.000s

   Performance counter stats for 'bin/deltamemstore-test --gtest_filter=*Varying* --benchmark_num_passes=1000':

       2074.795741      task-clock (msec)         #    0.990 CPUs utilized
                23      context-switches          #    0.011 K/sec
                 1      cpu-migrations            #    0.000 K/sec
             4,951      page-faults               #    0.002 M/sec
     7,675,100,058      cycles                    #    3.699 GHz
    23,100,692,252      instructions              #    3.01  insn per cycle
     4,539,777,117      branches                  # 2188.060 M/sec
        11,819,267      branch-misses             #    0.26% of all branches

       2.096193851 seconds time elapsed

Note: I originally wrote this patch thinking it was necessary for diff scan
correctness, but have since convinced myself that it's just an optimization.

Change-Id: I6a646d0816a96e9aba486c0d0a1e6b4a7e15c144
---
M src/kudu/tablet/delta_applier.cc
M src/kudu/tablet/delta_applier.h
M src/kudu/tablet/delta_iterator_merger.cc
M src/kudu/tablet/delta_iterator_merger.h
M src/kudu/tablet/delta_store.cc
M src/kudu/tablet/delta_store.h
M src/kudu/tablet/deltafile-test.cc
M src/kudu/tablet/deltafile.cc
M src/kudu/tablet/deltafile.h
M src/kudu/tablet/deltamemstore-test.cc
M src/kudu/tablet/deltamemstore.cc
M src/kudu/tablet/deltamemstore.h
M src/kudu/tablet/tablet-test-util.h
13 files changed, 116 insertions(+), 46 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/56/11856/2
-- 
To view, visit http://gerrit.cloudera.org:8080/11856
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I6a646d0816a96e9aba486c0d0a1e6b4a7e15c144
Gerrit-Change-Number: 11856
Gerrit-PatchSet: 2
Gerrit-Owner: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: David Ribeiro Alves <da...@gmail.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>