You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@accumulo.apache.org by el...@apache.org on 2014/06/20 03:25:25 UTC

[6/6] git commit: ACCUMULO-2925 Add warning about server-assigned timestamps with replication

ACCUMULO-2925 Add warning about server-assigned timestamps with replication

Leave a note about updates to equal keys that have different updates that are
assigned the same timestamp by the server.


Project: http://git-wip-us.apache.org/repos/asf/accumulo/repo
Commit: http://git-wip-us.apache.org/repos/asf/accumulo/commit/4d7e90ae
Tree: http://git-wip-us.apache.org/repos/asf/accumulo/tree/4d7e90ae
Diff: http://git-wip-us.apache.org/repos/asf/accumulo/diff/4d7e90ae

Branch: refs/heads/master
Commit: 4d7e90aeef3a6de6a36a30a188d5c1bc564ade3a
Parents: 0676057
Author: Josh Elser <el...@apache.org>
Authored: Thu Jun 19 17:58:10 2014 -0700
Committer: Josh Elser <el...@apache.org>
Committed: Thu Jun 19 17:58:10 2014 -0700

----------------------------------------------------------------------
 docs/src/main/asciidoc/chapters/replication.txt | 20 ++++++++++++++++++++
 1 file changed, 20 insertions(+)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/accumulo/blob/4d7e90ae/docs/src/main/asciidoc/chapters/replication.txt
----------------------------------------------------------------------
diff --git a/docs/src/main/asciidoc/chapters/replication.txt b/docs/src/main/asciidoc/chapters/replication.txt
index 8755e24..5d24649 100644
--- a/docs/src/main/asciidoc/chapters/replication.txt
+++ b/docs/src/main/asciidoc/chapters/replication.txt
@@ -361,3 +361,23 @@ primary and peer. As such, the SummingCombiner wouldn't be recommended on a tabl
 While there are changes that could be made to the replication implementation which could attempt to mitigate this risk,
 presently, it is not recommended to configure Iterators or Combiners which are not idempotent to support cases where
 inaccuracy of aggregations is not acceptable.
+
+==== Server-Assigned Timestamps
+
+Accumulo has the ability to, when not provided by the client, assign a timestamp to updates made to a table. This is a
+very useful feature as it reduces the amount of code a client must write and also gives some notion of ordering to the
+updates that were made to a table (in addition to some solving some very problematic Accumulo implementation details).
+However, replicating Mutations that were created with a server-assigned timestamp can be very problematic. To understand
+this, we must first start at the BatchWriter.
+
+To allow for efficient ingest into Accumulo, the BatchWriter will collect many mutations, group them into batches and
+send them to the correct server to be applied to the appropriate Tablet. For each Mutation in that batch that the server
+receives, the server will set a timestamp that is at least as large as the last timestamp (to account for clock skew). In short,
+this means that all of the Mutations in this batch will get the same timestamp and be deduplicated in a certain order
+via the in-memory map and recorded in the write-ahead log.
+
+The problem is that these updates could be replayed on the remote in different commit sessions, which means that they
+could result in different RFiles on disk (separate minor-compactions). Because of this, mutations with server-assigned
+timestamps which are written within the same batch have the possibility to be applied in a different order on a peer. In
+the case where a user might submit multiple updates for the same Key in rapid succession, the user should ensure proper
+timestamps are set at the client.