You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@falcon.apache.org by ba...@apache.org on 2016/05/13 16:09:46 UTC
falcon git commit: FALCON-1908 Document HDFS snapshot based mirroring
extension
Repository: falcon
Updated Branches:
refs/heads/master 7b78c39eb -> ed410e841
FALCON-1908 Document HDFS snapshot based mirroring extension
Author: bvellanki <bv...@hortonworks.com>
Reviewers: "Ying Zheng <yz...@hortonworks.com>"
Closes #139 from bvellanki/FALCON-1908
Project: http://git-wip-us.apache.org/repos/asf/falcon/repo
Commit: http://git-wip-us.apache.org/repos/asf/falcon/commit/ed410e84
Tree: http://git-wip-us.apache.org/repos/asf/falcon/tree/ed410e84
Diff: http://git-wip-us.apache.org/repos/asf/falcon/diff/ed410e84
Branch: refs/heads/master
Commit: ed410e841b8465af45dbef236e83db5618508816
Parents: 7b78c39
Author: bvellanki <bv...@hortonworks.com>
Authored: Fri May 13 09:09:41 2016 -0700
Committer: bvellanki <bv...@hortonworks.com>
Committed: Fri May 13 09:09:41 2016 -0700
----------------------------------------------------------------------
docs/src/site/twiki/Extensions.twiki | 1 +
docs/src/site/twiki/HdfsSnapshotMirroring.twiki | 93 ++++++++++++++++++++
2 files changed, 94 insertions(+)
----------------------------------------------------------------------
http://git-wip-us.apache.org/repos/asf/falcon/blob/ed410e84/docs/src/site/twiki/Extensions.twiki
----------------------------------------------------------------------
diff --git a/docs/src/site/twiki/Extensions.twiki b/docs/src/site/twiki/Extensions.twiki
index 8e74321..cf88c87 100644
--- a/docs/src/site/twiki/Extensions.twiki
+++ b/docs/src/site/twiki/Extensions.twiki
@@ -43,6 +43,7 @@ Sample extensions are published in addons/extensions
---++ Types of extensions
* [[HDFSMirroring][HDFS mirroring extension]]
* [[HiveMirroring][Hive mirroring extension]]
+ * [[HdfsSnapshotMirroring][HDFS snapshot based mirroring]]
---++ Packaging and installation
http://git-wip-us.apache.org/repos/asf/falcon/blob/ed410e84/docs/src/site/twiki/HdfsSnapshotMirroring.twiki
----------------------------------------------------------------------
diff --git a/docs/src/site/twiki/HdfsSnapshotMirroring.twiki b/docs/src/site/twiki/HdfsSnapshotMirroring.twiki
new file mode 100644
index 0000000..ec4f16c
--- /dev/null
+++ b/docs/src/site/twiki/HdfsSnapshotMirroring.twiki
@@ -0,0 +1,93 @@
+---+HDFS Snapshot based Mirroring
+
+---++Overview
+HDFS snapshots are very cost effective to create ( cost is O(1) excluding iNode lookup time). Once created, it is very
+efficient to find modifications relative to a snapshot and copy over these modifications for disaster recovery (DR).
+This makes for cost effective HDFS mirroring.
+
+---++Prerequisites
+Following is the prerequisite to use HDFS Snapshot based Mirrroring.
+
+ * Hadoop version 2.7.0 or higher.
+ * User submitting and scheduling falcon snapshot based mirroring job should have permission to create and manage snapshots on both source and target directories.
+
+---++ Use Case
+Create and manage snapshots on source/target directories. Mirror data from source to target for disaster
+recovery using these snapshots. Perform retention on the snapshots created on source and target.
+
+
+---++ Usage
+
+---+++ Setup
+ * Submit a source cluster and target cluster entities to Falcon.
+ <verbatim>
+ $FALCON_HOME/bin/falcon entity -submit -type cluster -file source-cluster-definition.xml
+ $FALCON_HOME/bin/falcon entity -submit -type cluster -file target-cluster-definition.xml
+ </verbatim>
+ * Ensure that source directory on source cluster and target directory on target cluster exists.
+ * Ensure that these dirs are snapshot-able by user submitting extension. You can find more [[https://hadoop.apache.org/docs/r2.7.2/hadoop-project-dist/hadoop-hdfs/HdfsSnapshots.html][information on snapshots here]].
+
+---+++ HDFS Snapshot based mirroring extension properties
+ Extension artifacts are expected to be installed on HDFS at the path specified by "extension.store.uri" in startup properties.
+ hdfs-snapshot-mirroring-properties.json file located at "<extension.store.uri>/hdfs-snapshot-mirroring/META/hdfs-snapshot-mirroring-properties.json"
+ lists all the required and optional parameters/arguments for scheduling the mirroring job.
+
+ Here is a sample set of properties,
+ <verbatim>
+ ## Job Properties
+ jobName=hdfs-snapshot-test
+ jobClusterName=backupCluster
+ jobValidityStart=2016-01-01T00:00Z
+ jobValidityEnd=2016-04-01T00:00Z
+ jobFrequency=hours(12)
+ jobTimezone=UTC
+ jobTags=consumer=consumer@xyz.com
+ jobRetryPolicy=periodic
+ jobRetryDelay=minutes(30)
+ jobRetryAttempts=3
+
+ ## Job owner
+ jobAclOwner=ambari-qa
+ jobAclGroup=users
+ jobAclPermission=*
+
+ ## Source information
+ sourceCluster=primaryCluster
+ sourceSnapshotDir=/apps/falcon/snapshots/source/
+ sourceSnapshotRetentionPolicy=delete
+ sourceSnapshotRetentionAgeLimit=days(15)
+ sourceSnapshotRetentionNumber=10
+
+ ## Target information
+ targetCluster=backupCluster
+ targetSnapshotDir=/apps/falcon/snapshots/target/
+ targetSnapshotRetentionPolicy=delete
+ targetSnapshotRetentionAgeLimit=months(6)
+ targetSnapshotRetentionNumber=20
+
+ ## Distcp properties
+ distcpMaxMaps=1
+ distcpMapBandwidth=100
+ tdeEncryptionEnabled=false
+ </verbatim>
+
+
+The above properties ensure Falcon hdfs snapshot based mirroring extension does the following every 12 hours.
+ * Create snapshot on dir /apps/falcon/snapshots/source/ on primaryCluster.
+ * DistCP data from /apps/falcon/snapshots/source/ on primaryCluster to /apps/falcon/snapshots/target/ on backupCluster.
+ * Create snapshot on dir /apps/falcon/snapshots/target/ on backupCluster.
+ * Perform retention job on source and target.
+ * Maintain at least N latest snapshots and delete all other snapshots older than specified age limit.
+ * Today, only "delete" policy is supported for snapshot retention.
+
+*Note:*
+When TDE encryption is enabled on source/target directories, DistCP ignores the snapshots and treats it like a regular
+replication. While user may not get the performance benefit of using snapshot based DistCP, the extension is still useful
+for creating and maintaining snapshots.
+
+---+++ Submit and schedule HDFS snapshot mirroring extension
+User can submit extension using CLI or RestAPI. CLI command looks as follows
+ <verbatim>
+ $FALCON_HOME/bin/falcon extension -submitAndSchedule -extensionName hdfs-snapshot-mirroring -file propeties-file.txt
+ </verbatim>
+ Please Refer to [[falconcli/FalconCLI][Falcon CLI]] and [[restapi/ResourceList][REST API]] for more details on usage of CLI and REST API's.
\ No newline at end of file