You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@accumulo.apache.org by el...@apache.org on 2017/08/07 21:42:11 UTC

[1/2] accumulo-website git commit: ACCUMULO-4684 Add replication table schema to docs

Repository: accumulo-website
Updated Branches:
  refs/heads/asf-site c9354697f -> fcce417af
  refs/heads/master 65f2d23f1 -> ddd5b7223


ACCUMULO-4684 Add replication table schema to docs


Project: http://git-wip-us.apache.org/repos/asf/accumulo-website/repo
Commit: http://git-wip-us.apache.org/repos/asf/accumulo-website/commit/ddd5b722
Tree: http://git-wip-us.apache.org/repos/asf/accumulo-website/tree/ddd5b722
Diff: http://git-wip-us.apache.org/repos/asf/accumulo-website/diff/ddd5b722

Branch: refs/heads/master
Commit: ddd5b722351d70060884f4f68fe34eca64c181dc
Parents: 65f2d23
Author: Josh Elser <el...@apache.org>
Authored: Mon Aug 7 17:37:46 2017 -0400
Committer: Josh Elser <el...@apache.org>
Committed: Mon Aug 7 17:37:46 2017 -0400

----------------------------------------------------------------------
 _docs-unreleased/administration/replication.md | 52 +++++++++++++++++++++
 1 file changed, 52 insertions(+)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/accumulo-website/blob/ddd5b722/_docs-unreleased/administration/replication.md
----------------------------------------------------------------------
diff --git a/_docs-unreleased/administration/replication.md b/_docs-unreleased/administration/replication.md
index bff89aa..9cd5586 100644
--- a/_docs-unreleased/administration/replication.md
+++ b/_docs-unreleased/administration/replication.md
@@ -383,3 +383,55 @@ data into two instances. Given some existing bulk import process which creates f
 Accumulo instance, it is trivial to copy those files to a new HDFS instance and import them into another Accumulo
 instance using the same process. Hadoop's `distcp` command provides an easy way to copy large amounts of data to another
 HDFS instance which makes the problem of duplicating bulk imports very easy to solve.
+
+## Table Schema
+
+The following describes the kinds of keys, their format, and their general function for the purposes of individuals
+understanding what the replication table describes. Because the replication table is essentially a state machine,
+this data is often the source of truth for why Accumulo is doing what it is with respect to replication. There are
+three "sections" in this table: "repl", "work", and "order".
+
+### Repl section
+
+This section is for the tracking of a WAL file that needs to be replicated to one or more Accumulo remote tables.
+This entry is tracking that replication needs to happen on the given WAL file, but also that the local Accumulo table,
+as specified by the column qualifier "local table ID", has information in this WAL file.
+
+The structure of the key-value is as follows:
+
+```
+<HDFS_uri_to_WAL> repl:<local_table_id> [] -> <protobuf>
+```
+
+This entry is created based on a replication entry from the Accumlo metadata table, and is deleted from the replication table
+when the WAL has been fully replicated to all remote Accumulo tables.
+
+### Work section
+
+This section is for the tracking of a WAL file that needs to be replicated to a single Accumulo table in a remote
+Accumulo cluster. If a WAL must be replicated to multiple tables, there will be multiple entries. The Value for this
+Key is a serialized ProtocolBuffer message which encapsulates the portion of the WAL which was already sent for
+this file. The "replication target" is the unique location of where the file needs to be replicated: the identifier
+for the remote Accumulo cluster and the table ID in that remote Accumulo cluster. The protocol buffer in the value
+tracks the progress of replication to the remote cluster.
+
+```
+<HDFS_uri_to_WAL> work:<replication_target> [] -> <protobuf>
+```
+
+The "work" entry is created when a WAL has an "order" entry, and deleted after the WAL is replicated to all
+necessary remote clusters.
+
+### Order section
+
+This section is used to order and schedule (create) replication work. In some cases, data with the same timestamp
+may be provided multiple times. In this case, it is important that WALs are replicated in the same order they were
+created/used. In this case (and in cases where this is not important), the order entry ensures that oldest WALs
+are processed most quickly and pushed through the replication framework.
+
+```
+<time_of_WAL_closing>\x00<HDFS_uri_to_WAL> order:<local_table_id> [] -> <protobuf>
+```
+
+The "order" entry is created when the WAL is closed (no longer being written to) and is removed when
+the WAL is fully replicated to all remote locations.


[2/2] accumulo-website git commit: Jekyll build from master:ddd5b72

Posted by el...@apache.org.
Jekyll build from master:ddd5b72

ACCUMULO-4684 Add replication table schema to docs


Project: http://git-wip-us.apache.org/repos/asf/accumulo-website/repo
Commit: http://git-wip-us.apache.org/repos/asf/accumulo-website/commit/fcce417a
Tree: http://git-wip-us.apache.org/repos/asf/accumulo-website/tree/fcce417a
Diff: http://git-wip-us.apache.org/repos/asf/accumulo-website/diff/fcce417a

Branch: refs/heads/asf-site
Commit: fcce417afae30a99dae2295d3cb8c6fc6c6e6873
Parents: c935469
Author: Josh Elser <el...@apache.org>
Authored: Mon Aug 7 17:41:34 2017 -0400
Committer: Josh Elser <el...@apache.org>
Committed: Mon Aug 7 17:41:34 2017 -0400

----------------------------------------------------------------------
 docs/unreleased/administration/replication.html | 52 ++++++++++++++++++++
 feed.xml                                        |  4 +-
 2 files changed, 54 insertions(+), 2 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/accumulo-website/blob/fcce417a/docs/unreleased/administration/replication.html
----------------------------------------------------------------------
diff --git a/docs/unreleased/administration/replication.html b/docs/unreleased/administration/replication.html
index 4aca63b..63dec5f 100644
--- a/docs/unreleased/administration/replication.html
+++ b/docs/unreleased/administration/replication.html
@@ -760,6 +760,58 @@ Accumulo instance, it is trivial to copy those files to a new HDFS instance and
 instance using the same process. Hadoop’s <code class="highlighter-rouge">distcp</code> command provides an easy way to copy large amounts of data to another
 HDFS instance which makes the problem of duplicating bulk imports very easy to solve.</p>
 
+<h2 id="table-schema">Table Schema</h2>
+
+<p>The following describes the kinds of keys, their format, and their general function for the purposes of individuals
+understanding what the replication table describes. Because the replication table is essentially a state machine,
+this data is often the source of truth for why Accumulo is doing what it is with respect to replication. There are
+three “sections” in this table: “repl”, “work”, and “order”.</p>
+
+<h3 id="repl-section">Repl section</h3>
+
+<p>This section is for the tracking of a WAL file that needs to be replicated to one or more Accumulo remote tables.
+This entry is tracking that replication needs to happen on the given WAL file, but also that the local Accumulo table,
+as specified by the column qualifier “local table ID”, has information in this WAL file.</p>
+
+<p>The structure of the key-value is as follows:</p>
+
+<div class="highlighter-rouge"><pre class="highlight"><code>&lt;HDFS_uri_to_WAL&gt; repl:&lt;local_table_id&gt; [] -&gt; &lt;protobuf&gt;
+</code></pre>
+</div>
+
+<p>This entry is created based on a replication entry from the Accumlo metadata table, and is deleted from the replication table
+when the WAL has been fully replicated to all remote Accumulo tables.</p>
+
+<h3 id="work-section">Work section</h3>
+
+<p>This section is for the tracking of a WAL file that needs to be replicated to a single Accumulo table in a remote
+Accumulo cluster. If a WAL must be replicated to multiple tables, there will be multiple entries. The Value for this
+Key is a serialized ProtocolBuffer message which encapsulates the portion of the WAL which was already sent for
+this file. The “replication target” is the unique location of where the file needs to be replicated: the identifier
+for the remote Accumulo cluster and the table ID in that remote Accumulo cluster. The protocol buffer in the value
+tracks the progress of replication to the remote cluster.</p>
+
+<div class="highlighter-rouge"><pre class="highlight"><code>&lt;HDFS_uri_to_WAL&gt; work:&lt;replication_target&gt; [] -&gt; &lt;protobuf&gt;
+</code></pre>
+</div>
+
+<p>The “work” entry is created when a WAL has an “order” entry, and deleted after the WAL is replicated to all
+necessary remote clusters.</p>
+
+<h3 id="order-section">Order section</h3>
+
+<p>This section is used to order and schedule (create) replication work. In some cases, data with the same timestamp
+may be provided multiple times. In this case, it is important that WALs are replicated in the same order they were
+created/used. In this case (and in cases where this is not important), the order entry ensures that oldest WALs
+are processed most quickly and pushed through the replication framework.</p>
+
+<div class="highlighter-rouge"><pre class="highlight"><code>&lt;time_of_WAL_closing&gt;\x00&lt;HDFS_uri_to_WAL&gt; order:&lt;local_table_id&gt; [] -&gt; &lt;protobuf&gt;
+</code></pre>
+</div>
+
+<p>The “order” entry is created when the WAL is closed (no longer being written to) and is removed when
+the WAL is fully replicated to all remote locations.</p>
+
 
     <div class="row" style="margin-top: 20px;">
       <div class="col-md-10"><strong>Find documentation for all releases in the <a href="/docs-archive">archive</strong></div>

http://git-wip-us.apache.org/repos/asf/accumulo-website/blob/fcce417a/feed.xml
----------------------------------------------------------------------
diff --git a/feed.xml b/feed.xml
index aa6eac7..83a433d 100644
--- a/feed.xml
+++ b/feed.xml
@@ -6,8 +6,8 @@
 </description>
     <link>https://accumulo.apache.org/</link>
     <atom:link href="https://accumulo.apache.org/feed.xml" rel="self" type="application/rss+xml"/>
-    <pubDate>Wed, 02 Aug 2017 13:44:25 -0400</pubDate>
-    <lastBuildDate>Wed, 02 Aug 2017 13:44:25 -0400</lastBuildDate>
+    <pubDate>Mon, 07 Aug 2017 17:41:16 -0400</pubDate>
+    <lastBuildDate>Mon, 07 Aug 2017 17:41:16 -0400</lastBuildDate>
     <generator>Jekyll v3.3.1</generator>
     
       <item>