You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@falcon.apache.org by Sowmya Ramesh <sr...@hortonworks.com> on 2015/06/15 20:10:42 UTC

Review Request 35468: FeedReplicator improvement to include more DistCP options

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/35468/
-----------------------------------------------------------

Review request for Falcon, Pallavi Rao, Srikanth Sundarrajan, and Venkat Ranganathan.


Bugs: FALCON-668
    https://issues.apache.org/jira/browse/FALCON-668


Repository: falcon-git


Description
-------

FeedReplicator improvement to include more DistCP options listed below. User is expected to pass these as custom properties in the feed defined for replicaiton and those will be propagated to DistCp tool.

* overwrite
* ignore errors
* skip checksum
* remove deleted files
* preserve block size 
* preserve replication count
* preserve permissions


Diffs
-----

  common/src/main/java/org/apache/falcon/util/ReplicationConstants.java PRE-CREATION 
  docs/src/site/twiki/EntitySpecification.twiki 1ed2cb5 
  oozie/src/main/java/org/apache/falcon/oozie/OozieOrchestrationWorkflowBuilder.java 49f9e07 
  oozie/src/main/java/org/apache/falcon/oozie/feed/FSReplicationWorkflowBuilder.java 1d97204 
  oozie/src/main/java/org/apache/falcon/oozie/feed/HCatReplicationWorkflowBuilder.java 72bbca4 
  replication/src/main/java/org/apache/falcon/replication/FeedReplicator.java b2175b2 
  replication/src/test/java/org/apache/falcon/replication/FeedReplicatorTest.java 539d00d 

Diff: https://reviews.apache.org/r/35468/diff/


Testing
-------

Verified custom property added got propagated to FeedReplicator Java action and the replication succeeded.

Action configuration:

<main-class>org.apache.falcon.replication.FeedReplicator</main-class>
  <arg>-Dfalcon.include.path=hftp://sandbox.hortonworks.com:50070/user/ambari-qa/dr/test/primaryCluster/feed/input/2015/06/08</arg>
  <arg>-Dmapred.job.queue.name=default</arg>
  <arg>-Dmapred.job.priority=NORMAL</arg>
  <arg>-maxMaps</arg>
  <arg>33</arg>
  <arg>-mapBandwidth</arg>
  <arg>2</arg>
  <arg>-sourcePaths</arg>
  <arg>hftp://sandbox.hortonworks.com:50070/user/ambari-qa/dr/test/primaryCluster/feed/input/2015/06/08</arg>
  <arg>-targetPath</arg>
  <arg>hdfs://sandbox.hortonworks.com:8020/user/ambari-qa/dr/test/backupCluster/feed/output/2015/06/08</arg>
  <arg>-falconFeedStorageType</arg>
  <arg>FILESYSTEM</arg>
  <arg>-availabilityFlag</arg>
  <arg>NA</arg>
  <arg>-overwrite</arg>
  <arg>true</arg>
  <arg>-ignoreErrors</arg>
  <arg>false</arg>
  <arg>-preservePermissions</arg>
  <arg>false</arg>


Thanks,

Sowmya Ramesh


Re: Review Request 35468: FeedReplicator improvement to include more DistCP options

Posted by Pallavi Rao <pa...@inmobi.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/35468/#review88351
-----------------------------------------------------------

Ship it!


Ship It!

- Pallavi Rao


On June 16, 2015, 10:18 p.m., Sowmya Ramesh wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/35468/
> -----------------------------------------------------------
> 
> (Updated June 16, 2015, 10:18 p.m.)
> 
> 
> Review request for Falcon, Pallavi Rao, Srikanth Sundarrajan, and Venkat Ranganathan.
> 
> 
> Bugs: FALCON-668
>     https://issues.apache.org/jira/browse/FALCON-668
> 
> 
> Repository: falcon-git
> 
> 
> Description
> -------
> 
> FeedReplicator improvement to include more DistCP options listed below. User is expected to pass these as custom properties in the feed defined for replicaiton and those will be propagated to DistCp tool.
> 
> * overwrite
> * ignore errors
> * skip checksum
> * remove deleted files
> * preserve block size 
> * preserve replication count
> * preserve permissions
> 
> 
> Diffs
> -----
> 
>   common/src/main/java/org/apache/falcon/util/ReplicationDistCpOption.java PRE-CREATION 
>   docs/src/site/twiki/EntitySpecification.twiki 1ed2cb5 
>   oozie/src/main/java/org/apache/falcon/oozie/OozieOrchestrationWorkflowBuilder.java 49f9e07 
>   oozie/src/main/java/org/apache/falcon/oozie/feed/FSReplicationWorkflowBuilder.java 1d97204 
>   oozie/src/main/java/org/apache/falcon/oozie/feed/HCatReplicationWorkflowBuilder.java 72bbca4 
>   replication/src/main/java/org/apache/falcon/replication/FeedReplicator.java b2175b2 
>   replication/src/test/java/org/apache/falcon/replication/FeedReplicatorTest.java 539d00d 
> 
> Diff: https://reviews.apache.org/r/35468/diff/
> 
> 
> Testing
> -------
> 
> Verified custom property added got propagated to FeedReplicator Java action and the replication succeeded.
> 
> Action configuration:
> 
> <main-class>org.apache.falcon.replication.FeedReplicator</main-class>
>   <arg>-Dfalcon.include.path=hftp://sandbox.hortonworks.com:50070/user/ambari-qa/dr/test/primaryCluster/feed/input/2015/06/08</arg>
>   <arg>-Dmapred.job.queue.name=default</arg>
>   <arg>-Dmapred.job.priority=NORMAL</arg>
>   <arg>-maxMaps</arg>
>   <arg>33</arg>
>   <arg>-mapBandwidth</arg>
>   <arg>2</arg>
>   <arg>-sourcePaths</arg>
>   <arg>hftp://sandbox.hortonworks.com:50070/user/ambari-qa/dr/test/primaryCluster/feed/input/2015/06/08</arg>
>   <arg>-targetPath</arg>
>   <arg>hdfs://sandbox.hortonworks.com:8020/user/ambari-qa/dr/test/backupCluster/feed/output/2015/06/08</arg>
>   <arg>-falconFeedStorageType</arg>
>   <arg>FILESYSTEM</arg>
>   <arg>-availabilityFlag</arg>
>   <arg>NA</arg>
>   <arg>-overwrite</arg>
>   <arg>true</arg>
>   <arg>-ignoreErrors</arg>
>   <arg>false</arg>
>   <arg>-preservePermissions</arg>
>   <arg>false</arg>
> 
> 
> Thanks,
> 
> Sowmya Ramesh
> 
>


Re: Review Request 35468: FeedReplicator improvement to include more DistCP options

Posted by Sowmya Ramesh <sr...@hortonworks.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/35468/
-----------------------------------------------------------

(Updated June 16, 2015, 10:18 p.m.)


Review request for Falcon, Pallavi Rao, Srikanth Sundarrajan, and Venkat Ranganathan.


Changes
-------

Applied feedback!


Bugs: FALCON-668
    https://issues.apache.org/jira/browse/FALCON-668


Repository: falcon-git


Description
-------

FeedReplicator improvement to include more DistCP options listed below. User is expected to pass these as custom properties in the feed defined for replicaiton and those will be propagated to DistCp tool.

* overwrite
* ignore errors
* skip checksum
* remove deleted files
* preserve block size 
* preserve replication count
* preserve permissions


Diffs (updated)
-----

  common/src/main/java/org/apache/falcon/util/ReplicationDistCpOption.java PRE-CREATION 
  docs/src/site/twiki/EntitySpecification.twiki 1ed2cb5 
  oozie/src/main/java/org/apache/falcon/oozie/OozieOrchestrationWorkflowBuilder.java 49f9e07 
  oozie/src/main/java/org/apache/falcon/oozie/feed/FSReplicationWorkflowBuilder.java 1d97204 
  oozie/src/main/java/org/apache/falcon/oozie/feed/HCatReplicationWorkflowBuilder.java 72bbca4 
  replication/src/main/java/org/apache/falcon/replication/FeedReplicator.java b2175b2 
  replication/src/test/java/org/apache/falcon/replication/FeedReplicatorTest.java 539d00d 

Diff: https://reviews.apache.org/r/35468/diff/


Testing
-------

Verified custom property added got propagated to FeedReplicator Java action and the replication succeeded.

Action configuration:

<main-class>org.apache.falcon.replication.FeedReplicator</main-class>
  <arg>-Dfalcon.include.path=hftp://sandbox.hortonworks.com:50070/user/ambari-qa/dr/test/primaryCluster/feed/input/2015/06/08</arg>
  <arg>-Dmapred.job.queue.name=default</arg>
  <arg>-Dmapred.job.priority=NORMAL</arg>
  <arg>-maxMaps</arg>
  <arg>33</arg>
  <arg>-mapBandwidth</arg>
  <arg>2</arg>
  <arg>-sourcePaths</arg>
  <arg>hftp://sandbox.hortonworks.com:50070/user/ambari-qa/dr/test/primaryCluster/feed/input/2015/06/08</arg>
  <arg>-targetPath</arg>
  <arg>hdfs://sandbox.hortonworks.com:8020/user/ambari-qa/dr/test/backupCluster/feed/output/2015/06/08</arg>
  <arg>-falconFeedStorageType</arg>
  <arg>FILESYSTEM</arg>
  <arg>-availabilityFlag</arg>
  <arg>NA</arg>
  <arg>-overwrite</arg>
  <arg>true</arg>
  <arg>-ignoreErrors</arg>
  <arg>false</arg>
  <arg>-preservePermissions</arg>
  <arg>false</arg>


Thanks,

Sowmya Ramesh


Re: Review Request 35468: FeedReplicator improvement to include more DistCP options

Posted by Pallavi Rao <pa...@inmobi.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/35468/#review88025
-----------------------------------------------------------



docs/src/site/twiki/EntitySpecification.twiki (line 293)
<https://reviews.apache.org/r/35468/#comment140431>

    The Distcp Options do not exactly match the options you have listed above. For example, -pr of Distcp is preserveReplicationNumber of Falcon. Given that, I think we should list the options as we support it, in Falcon documentation.



oozie/src/main/java/org/apache/falcon/oozie/OozieOrchestrationWorkflowBuilder.java (line 150)
<https://reviews.apache.org/r/35468/#comment140432>

    Repititive code. May be we can define the Replication options supported as enum and just have one for loop to iterate over the enum values.


- Pallavi Rao


On June 15, 2015, 6:10 p.m., Sowmya Ramesh wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/35468/
> -----------------------------------------------------------
> 
> (Updated June 15, 2015, 6:10 p.m.)
> 
> 
> Review request for Falcon, Pallavi Rao, Srikanth Sundarrajan, and Venkat Ranganathan.
> 
> 
> Bugs: FALCON-668
>     https://issues.apache.org/jira/browse/FALCON-668
> 
> 
> Repository: falcon-git
> 
> 
> Description
> -------
> 
> FeedReplicator improvement to include more DistCP options listed below. User is expected to pass these as custom properties in the feed defined for replicaiton and those will be propagated to DistCp tool.
> 
> * overwrite
> * ignore errors
> * skip checksum
> * remove deleted files
> * preserve block size 
> * preserve replication count
> * preserve permissions
> 
> 
> Diffs
> -----
> 
>   common/src/main/java/org/apache/falcon/util/ReplicationConstants.java PRE-CREATION 
>   docs/src/site/twiki/EntitySpecification.twiki 1ed2cb5 
>   oozie/src/main/java/org/apache/falcon/oozie/OozieOrchestrationWorkflowBuilder.java 49f9e07 
>   oozie/src/main/java/org/apache/falcon/oozie/feed/FSReplicationWorkflowBuilder.java 1d97204 
>   oozie/src/main/java/org/apache/falcon/oozie/feed/HCatReplicationWorkflowBuilder.java 72bbca4 
>   replication/src/main/java/org/apache/falcon/replication/FeedReplicator.java b2175b2 
>   replication/src/test/java/org/apache/falcon/replication/FeedReplicatorTest.java 539d00d 
> 
> Diff: https://reviews.apache.org/r/35468/diff/
> 
> 
> Testing
> -------
> 
> Verified custom property added got propagated to FeedReplicator Java action and the replication succeeded.
> 
> Action configuration:
> 
> <main-class>org.apache.falcon.replication.FeedReplicator</main-class>
>   <arg>-Dfalcon.include.path=hftp://sandbox.hortonworks.com:50070/user/ambari-qa/dr/test/primaryCluster/feed/input/2015/06/08</arg>
>   <arg>-Dmapred.job.queue.name=default</arg>
>   <arg>-Dmapred.job.priority=NORMAL</arg>
>   <arg>-maxMaps</arg>
>   <arg>33</arg>
>   <arg>-mapBandwidth</arg>
>   <arg>2</arg>
>   <arg>-sourcePaths</arg>
>   <arg>hftp://sandbox.hortonworks.com:50070/user/ambari-qa/dr/test/primaryCluster/feed/input/2015/06/08</arg>
>   <arg>-targetPath</arg>
>   <arg>hdfs://sandbox.hortonworks.com:8020/user/ambari-qa/dr/test/backupCluster/feed/output/2015/06/08</arg>
>   <arg>-falconFeedStorageType</arg>
>   <arg>FILESYSTEM</arg>
>   <arg>-availabilityFlag</arg>
>   <arg>NA</arg>
>   <arg>-overwrite</arg>
>   <arg>true</arg>
>   <arg>-ignoreErrors</arg>
>   <arg>false</arg>
>   <arg>-preservePermissions</arg>
>   <arg>false</arg>
> 
> 
> Thanks,
> 
> Sowmya Ramesh
> 
>