You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@kudu.apache.org by "Todd Lipcon (Code Review)" <ge...@cloudera.org> on 2017/03/28 23:44:35 UTC
[kudu-CR] WIP: tablet: check for row presence in rowset-wise order
Hello Kudu Jenkins,
I'd like you to reexamine a change. Please visit
http://gerrit.cloudera.org:8080/6483
to look at the new patch set (#6).
Change subject: WIP: tablet: check for row presence in rowset-wise order
......................................................................
WIP: tablet: check for row presence in rowset-wise order
This changes the overall flow of applying batches of writes from:
for each key in batch:
find which rowsets may contain this key
for each rowset:
check if key is present in rowset
if not present in any rowset, apply the write
to:
sort all ops by key
bulk-query the interval tree
for each rowset:
for each key which might be in that rowset, in sorted order:
check if the key is present
if so, mark it as such
for each op:
if insert:
if it's not present in any rowset (based on above marking)
apply it
if mutate:
apply to the rowset determined above
The one wrinkle not expressed in the above pseudo-code is the handling
of batches with duplicate keys (eg insert/update/delete/insert
sequences of the same PK in one batch). In those cases, the later
operations may find their keys in different RowSets based on the results
of the earlier operations, and thus we need to continue to do the
presence checks one-by-one.
The aim with this patch is that because we end up accessing each rowset
many times sequentially in ascending key order, we open up the
opportunity for two potential optimizations:
1) a thread-local cache can keep seeked iterators within that rowset, as
well as hot blocks such as the index blocks, and reuse it cross-op
within the same batch
2) we can detect when several of the ops in a batch all fall into a
"gap" in a particular rowset, and "skip ahead", short circuiting a
bunch of checks
This patch isn't likely to give big wins on its own, but the further
patches in the series ought to.
TODO: do a quick benchmark or two to make sure this doesn't _regress_.
Change-Id: I9a678bb0da130ce94aae3634c82ace2d6898ffa2
---
M src/kudu/integration-tests/fuzz-itest.cc
M src/kudu/tablet/row_op.h
M src/kudu/tablet/tablet.cc
M src/kudu/tablet/tablet.h
M src/kudu/tablet/transactions/write_transaction.cc
M src/kudu/tablet/transactions/write_transaction.h
6 files changed, 297 insertions(+), 82 deletions(-)
git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/83/6483/6
--
To view, visit http://gerrit.cloudera.org:8080/6483
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I9a678bb0da130ce94aae3634c82ace2d6898ffa2
Gerrit-PatchSet: 6
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: David Ribeiro Alves <dr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot