You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@accumulo.apache.org by el...@apache.org on 2015/01/22 16:45:20 UTC

accumulo git commit: ACCUMULO-3500 Update replication docs for bulk imports

Repository: accumulo
Updated Branches:
  refs/heads/master 4b1196257 -> 80805545e


ACCUMULO-3500 Update replication docs for bulk imports


Project: http://git-wip-us.apache.org/repos/asf/accumulo/repo
Commit: http://git-wip-us.apache.org/repos/asf/accumulo/commit/80805545
Tree: http://git-wip-us.apache.org/repos/asf/accumulo/tree/80805545
Diff: http://git-wip-us.apache.org/repos/asf/accumulo/diff/80805545

Branch: refs/heads/master
Commit: 80805545e7617bed41bfd5f50c0ba8032fd71d91
Parents: 4b11962
Author: Josh Elser <el...@apache.org>
Authored: Thu Jan 22 10:39:41 2015 -0500
Committer: Josh Elser <el...@apache.org>
Committed: Thu Jan 22 10:39:41 2015 -0500

----------------------------------------------------------------------
 docs/src/main/asciidoc/chapters/replication.txt | 10 ++++++++++
 1 file changed, 10 insertions(+)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/accumulo/blob/80805545/docs/src/main/asciidoc/chapters/replication.txt
----------------------------------------------------------------------
diff --git a/docs/src/main/asciidoc/chapters/replication.txt b/docs/src/main/asciidoc/chapters/replication.txt
index 48f6ffa..69bb3c4 100644
--- a/docs/src/main/asciidoc/chapters/replication.txt
+++ b/docs/src/main/asciidoc/chapters/replication.txt
@@ -377,3 +377,13 @@ As is the recommendation without replication enabled, if multiple values for the
 Accumulo, it is strongly recommended that the value in the timestamp properly reflects the intended version by
 the client. That is to say, newer values inserted into the table should have larger timestamps. If the time between
 writing updates to the same key is significant (order minutes), this concern can likely be ignored.
+
+==== Bulk Imports
+
+Currently, files that are bulk imported into a table configured for replication are not replicated. There is no
+technical reason why it was not implemented, it was simply omitted from the initial implementation. This is considered a
+fair limitation because bulk importing generated files multiple locations is much simpler than bifurcating "live" ingest
+data into two instances. Given some existing bulk import process which creates files and them imports them into an
+Accumulo instance, it is trivial to copy those files to a new HDFS instance and import them into another Accumulo
+instance using the same process. Hadoop's +distcp+ command provides an easy way to copy large amounts of data to another
+HDFS instance which makes the problem of duplicating bulk imports very easy to solve.