You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@kudu.apache.org by "Todd Lipcon (Code Review)" <ge...@cloudera.org> on 2017/03/28 23:44:35 UTC

[kudu-CR] WIP: tablet: check for row presence in rowset-wise order

Hello Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/6483

to look at the new patch set (#6).

Change subject: WIP: tablet: check for row presence in rowset-wise order
......................................................................

WIP: tablet: check for row presence in rowset-wise order

This changes the overall flow of applying batches of writes from:

  for each key in batch:
    find which rowsets may contain this key
    for each rowset:
      check if key is present in rowset
    if not present in any rowset, apply the write

to:

  sort all ops by key
  bulk-query the interval tree
  for each rowset:
    for each key which might be in that rowset, in sorted order:
      check if the key is present
      if so, mark it as such
  for each op:
    if insert:
      if it's not present in any rowset (based on above marking)
        apply it
    if mutate:
      apply to the rowset determined above

The one wrinkle not expressed in the above pseudo-code is the handling
of batches with duplicate keys (eg insert/update/delete/insert
sequences of the same PK in one batch). In those cases, the later
operations may find their keys in different RowSets based on the results
of the earlier operations, and thus we need to continue to do the
presence checks one-by-one.

The aim with this patch is that because we end up accessing each rowset
many times sequentially in ascending key order, we open up the
opportunity for two potential optimizations:

1) a thread-local cache can keep seeked iterators within that rowset, as
   well as hot blocks such as the index blocks, and reuse it cross-op
   within the same batch
2) we can detect when several of the ops in a batch all fall into a
   "gap" in a particular rowset, and "skip ahead", short circuiting a
   bunch of checks

This patch isn't likely to give big wins on its own, but the further
patches in the series ought to.

TODO: do a quick benchmark or two to make sure this doesn't _regress_.

Change-Id: I9a678bb0da130ce94aae3634c82ace2d6898ffa2
---
M src/kudu/integration-tests/fuzz-itest.cc
M src/kudu/tablet/row_op.h
M src/kudu/tablet/tablet.cc
M src/kudu/tablet/tablet.h
M src/kudu/tablet/transactions/write_transaction.cc
M src/kudu/tablet/transactions/write_transaction.h
6 files changed, 297 insertions(+), 82 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/83/6483/6
-- 
To view, visit http://gerrit.cloudera.org:8080/6483
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I9a678bb0da130ce94aae3634c82ace2d6898ffa2
Gerrit-PatchSet: 6
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: David Ribeiro Alves <dr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot