You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@kudu.apache.org by "Todd Lipcon (Code Review)" <ge...@cloudera.org> on 2017/04/06 23:45:38 UTC

[kudu-CR] tablet: revert batch-check-presence optimization for UPDATE/DELETE

Hello David Ribeiro Alves,

I'd like you to do a code review.  Please visit

    http://gerrit.cloudera.org:8080/6577

to review the following change.

Change subject: tablet: revert batch-check-presence optimization for UPDATE/DELETE
......................................................................

tablet: revert batch-check-presence optimization for UPDATE/DELETE

4568f2a4cbd999d5a2ab64d285205b8d30997339 implemented an optimization
which does a bulk "check presence" path for all operations in a batch
prior to applying them.

This is beneficial for INSERT, but caused a regression for UPDATE, in
particular in the case where the UPDATEs are affecting rows that are
still present in the MRS. The issue was that the old mutate path would
do:

  for each mutation:
    attempt to mutate row in MRS
    if it didn't find the row in MRS:
      for each rowset:
        attempt to mutate row in that rowset
          check if possibly present (using bloom)
          find the row index
          apply the mutation
        if successful, return

With the "bulk presence" optimization, it ended up being:

  bulk presence check:
    for each mutation/rowset pair
      try to find if it exists in any non-MRS rowset
        check bloom filters
        FindRow
        if we found it, make a note and stop checking

  for each op:
    if we noted a rowset above, apply there
      check bloom
      find the row index
    otherwise check MRS

So, in the case that the update applied to a row still in MRS, we used
to avoid doing any bloom reads, but regressed to having to check every
DRS, and only look _last_ in the MRS. Additionally even in the case that
we did find it in an existing DRS, we'd check that DRS's bloom and key
index twice instead of once.

This regression was exposed by some timeouts of the Updater thread in
linked_list-test. In this test, the updater thread "trails" the inserter
thread updating records immediately after they've been inserted.

The fix here is to basically exclude UPDATE/DELETE ops from the "bulk
presence" code path and revert to the old way of performing them. There
is probably a way we could extend the batching optimization to
UPDATE/DELETE as well, but it may require a more complex restructuring,
so I'll defer that to a TODO.

To quantify the regression and this fix, I compared the latency histogram of
1000-row batches of updates from this test:

Before bulk-presence-check patch
--------------------------------
  Count: 2454
  Mean: 1740.77
  Percentiles:
     0%  (min) = 1365
    25%        = 1638
    50%  (med) = 1731
    75%        = 1834
    95%        = 1997
    99%        = 2126
    99.9%      = 2292
    99.99%     = 2834
    100% (max) = 2835
Total rows inserted: 1929400

After bulk-presence-check-patch (regression)
-------------------------------
  Mean: 6964.93
  Percentiles:
     0%  (min) = 1578
    25%        = 4014
    50%  (med) = 6560
    75%        = 9688
    95%        = 12576 (5-6x slower)
    99%        = 13672
    99.9%      = 14752
    99.99%     = 14976
    100% (max) = 14981
Total rows inserted: 2327900

With this fix in place
-----------------------
  Count: 2471
  Mean: 1842.69
  Percentiles:
     0%  (min) = 1405
    25%        = 1748
    50%  (med) = 1835
    75%        = 1926
    95%        = 2084
    99%        = 2216
    99.9%      = 2350
    99.99%     = 2696
    100% (max) = 2697
  Total rows inserted: 2471550

The numbers aren't entirely stable across multiple runs of the benchmark
because there's a lot going on, but it seems the regression is largely
fixed.

Change-Id: I78e1e0a0293d737074e1309d2ef450c5439db2c6
---
M src/kudu/tablet/row_op.h
M src/kudu/tablet/tablet.cc
2 files changed, 70 insertions(+), 36 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/77/6577/1
-- 
To view, visit http://gerrit.cloudera.org:8080/6577
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: I78e1e0a0293d737074e1309d2ef450c5439db2c6
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: David Ribeiro Alves <dr...@apache.org>

[kudu-CR] tablet: revert batch-check-presence optimization for UPDATE/DELETE

Posted by "David Ribeiro Alves (Code Review)" <ge...@cloudera.org>.
David Ribeiro Alves has posted comments on this change.

Change subject: tablet: revert batch-check-presence optimization for UPDATE/DELETE
......................................................................


Patch Set 1:

(1 comment)

still want this reviewed?
(had an old unposted comment that I'll post now too)

http://gerrit.cloudera.org:8080/#/c/6577/1//COMMIT_MSG
Commit Message:

PS1, Line 37:   for each op:
            :     if we noted a rowset above, apply there
            :       check bloom
            :       find the row index
            :     otherwise check MRS
this will still regress UPSERTS that transform to MUTATES right? (I guess we should monitor perf for those better)


-- 
To view, visit http://gerrit.cloudera.org:8080/6577
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I78e1e0a0293d737074e1309d2ef450c5439db2c6
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: David Ribeiro Alves <da...@gmail.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-HasComments: Yes