You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@kudu.apache.org by "Andrew Wong (Code Review)" <ge...@cloudera.org> on 2020/10/08 17:54:36 UTC

[kudu-CR] KUDU-2612 p12: have MRS iteration account for txn metadata

Hello Kudu Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/16510

to look at the new patch set (#2).

Change subject: KUDU-2612 p12: have MRS iteration account for txn metadata
......................................................................

KUDU-2612 p12: have MRS iteration account for txn metadata

This patch introduces the ability to iterate through the rows of a MRS
taking into account the transaction's commit status, rather than relying
on the apply timestamps of the individual mutations therein. It does so
by adding a reference to the TxnMetadata in the MRS, and upon iteration,
if a commit timestamp exists for the transaction, using the commit
timestamp to determine relevancy.

As a refresher, the MvccManager tracks mutations by maintaining a
"current" MvccSnapshot that encapsulates timestamps for ops that have
been applied. Rather than keeping track of every applied timestamp
individually, the MvccManager also keeps track of the currently
in-flight ops and the lower bound on future ops' timestamps, as
guaranteed by the TimeManager. Taken together, these define a watermark
timestamp below which all timestamps can be considered applied, as well
as a set of higher timestamps that are considered applied, but are
higher than the earliest in-flight (not-yet-applied) op's timestamp.

The MvccManager passes out MvccSnapshots that detail whether iterators
should consider certain timestamps as relevant to iteration. These
snapshots are used in the following ways:
- Snapshot scans:
  - The user input is a timestamp, which is used to generate an
    MvccSnapshot defined by that timestamp, i.e. all timestamps before
    are applied, and all timestamps above are not applied.
    - Such a snapshot is defined to be a "clean" snapshot.
  - Before iterating through data, Kudu waits for the safe time to pass
    beyond the given timestamp, and waiting for all ops with lower
    timestamps to complete. Only then can Kudu safely iterate through
    mutations with certainty that relevancy can be determined via a
    simple comparison against the clean snapshot.
- Diff scans:
  - Similar to the above case, but with a second, lower input timestamp
    to serve as a lower bound on relevant mutation timestamps.
- READ_LATEST scans:
  - Unlike the above two scenarios, no input timestamp is given here.
    Instead, Kudu will use the MvccManager's current MvccSnapshot, which
    isn't guaranteed to be a clean snapshot.
  - If it can, Kudu uses the watermark to determine relevancy (fast
    path, like with clean snapshots), and if not, it falls back on the
    set of higher timestamps that are considered applied (slow path).
- Flushes and compactions:
  - Snapshots are also used in the context of flushes and compactions to
    track ops that get applied in the process of a flush or compaction,
    for the sake of duplicating ops onto new data stores if they were
    missed while swapping in the new data stores.
  - As with READ_LATEST, the snapshots used here aren't necessarily
    clean snapshots.

Based on the above usages, this patch distinguishes between two types of
MvccSnapshots that encapsulate all usage today:
- kTimestamp: we are iterating as of a specific timestamp T. We must
  guarantee that iteration will see all mutations made visible before T
  (i.e. Raft committed before T for non-transaction ops, transaction
  committed before T for transaction ops). We may wait for MVCC ops to
  complete to ensure this is guaranteed. Scans in this mode are
  repeatable. Snapshot and diff scans use these snapshots.
- kLatest: we are iterating without waiting for the completion of any
  ops -- instead, we only care about seeing a view of the latest
  completed ops, regardless of whether there are non-applied ops from
  before the latest applied ops. READ_LATEST scans and flushes use these
  snapshots.

In the context of evaluating commit status in transactions, these
snapshot types behave as follows when iterating:
- kTimestamp: since we care about displaying all ops or transactions
  from before T, scanners should wait for T to become safe, and for ops
  before T to complete (including all commit MVCC ops). After waiting,
  all transactions that would have a commit timestamp lower than T will
  have a commit timestamp in their metadata. As such, it's sufficient
  that, while iterating, we look at the commit timestamp of each
  mutation and compare it to T. If no commit timestamp exists for a
  transactional mutation, it must not have committed as of T.
- kLatest: since we don't care about using a clean snapshot, it's
  sufficient to use the current snapshot, which includes transactions'
  commit MVCC ops. If that op is finished for a given transaction, Kudu
  should check whether the transaction was aborted or committed. If the
  op was not finished in the snapshot, it could not have committed.

This only adds the APIs to the MvccManager, with some initial usage for
snapshot and diff scans in the memrowset; there is still no way to
exercise these APIs using a real tablet.

Change-Id: I6bb02c6025eea1a327cf9d9ee1f14a38d63ae4ad
---
M src/kudu/tablet/memrowset-test.cc
M src/kudu/tablet/memrowset.cc
M src/kudu/tablet/memrowset.h
M src/kudu/tablet/metadata.proto
M src/kudu/tablet/mvcc-test.cc
M src/kudu/tablet/mvcc.cc
M src/kudu/tablet/mvcc.h
M src/kudu/tablet/tablet_metadata-test.cc
M src/kudu/tablet/tablet_metadata.cc
M src/kudu/tablet/tablet_metadata.h
M src/kudu/tablet/txn_participant.cc
11 files changed, 541 insertions(+), 144 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/10/16510/2
-- 
To view, visit http://gerrit.cloudera.org:8080/16510
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I6bb02c6025eea1a327cf9d9ee1f14a38d63ae4ad
Gerrit-Change-Number: 16510
Gerrit-PatchSet: 2
Gerrit-Owner: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)