You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@kudu.apache.org by "David Ribeiro Alves (Code Review)" <ge...@cloudera.org> on 2016/11/14 00:39:45 UTC

[kudu-CR] KUDU-798 (part 1) - Unify leader/follower mvcc behavior

Hello Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/5055

to look at the new patch set (#6).

Change subject: KUDU-798 (part 1) - Unify leader/follower mvcc behavior
......................................................................

KUDU-798 (part 1) - Unify leader/follower mvcc behavior

Safe time is a timestamp such that all transactions before it are
known and either completed or in-flight. Waiting for the Mvcc
snapshot at "safe time" to be "clean" allows to yield repeatable
reads: scans of a tablet at a snapshot defined by a timestamp
that will always return the same results. Proper "safe time"
advancement also allows to improve load balancing: A scan at a clean
timestamp that is lower that "safe time" on a replica is guaranteed
to yield the same results as the same scan on the leader replica
(though maybe with a lantency penalty).

Currently this timestamp is advanced within Mvcc but this is not
natural as in conflates the consensus state (all the operations
that are being replicated and/or replayed) and the mvcc state
(all the operations that have been consensus committed and are
being applied). Furthermore, there is a confusing mixing of
concepts in Mvcc between "safe time" and "clean time" where the
latter means a timestamp such that all operation have been
completed, whereas the former also includes the operations that
are in-flight, even if they haven't started being applied to
the tablet.

This patch series aims at separating the two concepts and fixing
safe time advancement:
a) - Safe time advancement will be handled by consensus: The leader
can easily establish which timestamps are safe for a replica by
looking at which operations that replica knows and what the
timestamp of the last committed operation is.
b) - Mvcc will only take care of monitoring "clean time" advancement.
This makes it simpler to wait for a timestamp to be "safe" and "clean"
the caller will first wait for a timestamp to be "safe" meaning all
operations are known and in-flight and then wait for it to be "clean"
in mvcc meaning all the in-flight operations before have completed.

This patch in particular takes the first two steps in this direction:
1) It moves timestamp assignment from tablet and into the
TransactionDriver to be done prior to pushing the operation to
consensus for replication. Follow up patches will move it to be done
within consensus itself (though not necessarily managed by any of the
consensus classes).
2) It makes all operations be "operations at a timestamp", making
all operations have the same behavior within mvcc independently of
whether they were started at the leader or at a follower.

Follow up patches will completely remove the Mvcc APIs for automatic
safe time advancement and timestamp assignment and will introduce
the new entity responsible for "safe time".

Change-Id: I3ba7212f9211f585d4bef00e5ccfc24d5eece224
---
M src/kudu/tablet/local_tablet_writer.h
M src/kudu/tablet/tablet.cc
M src/kudu/tablet/tablet.h
M src/kudu/tablet/tablet_peer.cc
M src/kudu/tablet/transactions/transaction_driver.cc
M src/kudu/tablet/transactions/transaction_driver.h
M src/kudu/tablet/transactions/transaction_tracker-test.cc
7 files changed, 51 insertions(+), 18 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/55/5055/6
-- 
To view, visit http://gerrit.cloudera.org:8080/5055
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I3ba7212f9211f585d4bef00e5ccfc24d5eece224
Gerrit-PatchSet: 6
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: David Ribeiro Alves <dr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot