You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@falcon.apache.org by ba...@apache.org on 2016/05/13 16:09:46 UTC

falcon git commit: FALCON-1908 Document HDFS snapshot based mirroring extension

Repository: falcon
Updated Branches:
  refs/heads/master 7b78c39eb -> ed410e841


FALCON-1908 Document HDFS snapshot based mirroring extension

Author: bvellanki <bv...@hortonworks.com>

Reviewers: "Ying Zheng <yz...@hortonworks.com>"

Closes #139 from bvellanki/FALCON-1908


Project: http://git-wip-us.apache.org/repos/asf/falcon/repo
Commit: http://git-wip-us.apache.org/repos/asf/falcon/commit/ed410e84
Tree: http://git-wip-us.apache.org/repos/asf/falcon/tree/ed410e84
Diff: http://git-wip-us.apache.org/repos/asf/falcon/diff/ed410e84

Branch: refs/heads/master
Commit: ed410e841b8465af45dbef236e83db5618508816
Parents: 7b78c39
Author: bvellanki <bv...@hortonworks.com>
Authored: Fri May 13 09:09:41 2016 -0700
Committer: bvellanki <bv...@hortonworks.com>
Committed: Fri May 13 09:09:41 2016 -0700

----------------------------------------------------------------------
 docs/src/site/twiki/Extensions.twiki            |  1 +
 docs/src/site/twiki/HdfsSnapshotMirroring.twiki | 93 ++++++++++++++++++++
 2 files changed, 94 insertions(+)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/falcon/blob/ed410e84/docs/src/site/twiki/Extensions.twiki
----------------------------------------------------------------------
diff --git a/docs/src/site/twiki/Extensions.twiki b/docs/src/site/twiki/Extensions.twiki
index 8e74321..cf88c87 100644
--- a/docs/src/site/twiki/Extensions.twiki
+++ b/docs/src/site/twiki/Extensions.twiki
@@ -43,6 +43,7 @@ Sample extensions are published in addons/extensions
 ---++ Types of extensions
    * [[HDFSMirroring][HDFS mirroring extension]]
    * [[HiveMirroring][Hive mirroring extension]]
+   * [[HdfsSnapshotMirroring][HDFS snapshot based mirroring]]
 
 ---++ Packaging and installation
 

http://git-wip-us.apache.org/repos/asf/falcon/blob/ed410e84/docs/src/site/twiki/HdfsSnapshotMirroring.twiki
----------------------------------------------------------------------
diff --git a/docs/src/site/twiki/HdfsSnapshotMirroring.twiki b/docs/src/site/twiki/HdfsSnapshotMirroring.twiki
new file mode 100644
index 0000000..ec4f16c
--- /dev/null
+++ b/docs/src/site/twiki/HdfsSnapshotMirroring.twiki
@@ -0,0 +1,93 @@
+---+HDFS Snapshot based Mirroring
+
+---++Overview
+HDFS snapshots are very cost effective to create ( cost is O(1) excluding iNode lookup time). Once created, it is very
+efficient to find modifications relative to a snapshot and copy over these modifications for disaster recovery (DR).
+This makes for cost effective HDFS mirroring.
+
+---++Prerequisites
+Following is the prerequisite to use HDFS Snapshot based Mirrroring.
+
+   * Hadoop version 2.7.0 or higher.
+   * User submitting and scheduling falcon snapshot based mirroring job should have permission to create and manage snapshots on both source and target directories.
+
+---++ Use Case
+Create and manage snapshots on source/target directories. Mirror data from source to target for disaster
+recovery using these snapshots. Perform retention on the snapshots created on source and target.
+
+
+---++ Usage
+
+---+++ Setup
+   * Submit a source cluster and target cluster entities to Falcon.
+   <verbatim>
+    $FALCON_HOME/bin/falcon entity -submit -type cluster -file source-cluster-definition.xml
+    $FALCON_HOME/bin/falcon entity -submit -type cluster -file target-cluster-definition.xml
+   </verbatim>
+   * Ensure that source directory on source cluster and target directory on target cluster exists.
+   * Ensure that these dirs are snapshot-able by user submitting extension. You can find more [[https://hadoop.apache.org/docs/r2.7.2/hadoop-project-dist/hadoop-hdfs/HdfsSnapshots.html][information on snapshots here]].
+
+---+++ HDFS Snapshot based mirroring extension properties
+   Extension artifacts are expected to be installed on HDFS at the path specified by "extension.store.uri" in startup properties.
+   hdfs-snapshot-mirroring-properties.json file located at "<extension.store.uri>/hdfs-snapshot-mirroring/META/hdfs-snapshot-mirroring-properties.json"
+   lists all the required and optional parameters/arguments for scheduling the mirroring job.
+
+   Here is a sample set of properties,
+   <verbatim>
+   ## Job Properties
+   jobName=hdfs-snapshot-test
+   jobClusterName=backupCluster
+   jobValidityStart=2016-01-01T00:00Z
+   jobValidityEnd=2016-04-01T00:00Z
+   jobFrequency=hours(12)
+   jobTimezone=UTC
+   jobTags=consumer=consumer@xyz.com
+   jobRetryPolicy=periodic
+   jobRetryDelay=minutes(30)
+   jobRetryAttempts=3
+
+   ## Job owner
+   jobAclOwner=ambari-qa
+   jobAclGroup=users
+   jobAclPermission=*
+
+   ## Source information
+   sourceCluster=primaryCluster
+   sourceSnapshotDir=/apps/falcon/snapshots/source/
+   sourceSnapshotRetentionPolicy=delete
+   sourceSnapshotRetentionAgeLimit=days(15)
+   sourceSnapshotRetentionNumber=10
+
+   ## Target information
+   targetCluster=backupCluster
+   targetSnapshotDir=/apps/falcon/snapshots/target/
+   targetSnapshotRetentionPolicy=delete
+   targetSnapshotRetentionAgeLimit=months(6)
+   targetSnapshotRetentionNumber=20
+
+   ## Distcp properties
+   distcpMaxMaps=1
+   distcpMapBandwidth=100
+   tdeEncryptionEnabled=false
+   </verbatim>
+
+
+The above properties ensure Falcon hdfs snapshot based mirroring extension does the following every 12 hours.
+   * Create snapshot on dir /apps/falcon/snapshots/source/ on primaryCluster.
+   * DistCP data from /apps/falcon/snapshots/source/ on primaryCluster to /apps/falcon/snapshots/target/ on backupCluster.
+   * Create snapshot on dir /apps/falcon/snapshots/target/ on backupCluster.
+   * Perform retention job on source and target.
+      * Maintain at least N latest snapshots and delete all other snapshots older than specified age limit.
+      * Today, only "delete" policy is supported for snapshot retention.
+
+*Note:*
+When TDE encryption is enabled on source/target directories, DistCP ignores the snapshots and treats it like a regular
+replication. While user may not get the performance benefit of using snapshot based DistCP, the extension is still useful
+for creating and maintaining snapshots.
+
+---+++ Submit and schedule HDFS snapshot mirroring extension
+User can submit extension using CLI or RestAPI. CLI command looks as follows
+   <verbatim>
+    $FALCON_HOME/bin/falcon extension -submitAndSchedule -extensionName hdfs-snapshot-mirroring -file propeties-file.txt
+   </verbatim>
+   Please Refer to [[falconcli/FalconCLI][Falcon CLI]] and [[restapi/ResourceList][REST API]] for more details on usage of CLI and REST API's.
\ No newline at end of file