You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-issues@hadoop.apache.org by "ASF GitHub Bot (Jira)" <ji...@apache.org> on 2023/02/01 08:05:00 UTC

[jira] [Commented] (HADOOP-18596) Distcp -update between different cloud stores to use modification time while checking for file skip.

    [ https://issues.apache.org/jira/browse/HADOOP-18596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17682900#comment-17682900 ] 

ASF GitHub Bot commented on HADOOP-18596:
-----------------------------------------

mehakmeet commented on code in PR #5308:
URL: https://github.com/apache/hadoop/pull/5308#discussion_r1092876062


##########
hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/DistCpConstants.java:
##########
@@ -142,6 +142,13 @@ private DistCpConstants() {
       "distcp.blocks.per.chunk";
 
   public static final String CONF_LABEL_USE_ITERATOR = "distcp.use.iterator";
+
+  /** Distcp -update to use modification time of source and target file to
+   * check while skipping.
+   */
+  public static final String CONF_LABEL_UPDATE_MOD_TIME =

Review Comment:
   Not sure if you mean ".md" file modification or the modification to already added java docs? I'll add some more info to the javadocs about the constant.





> Distcp -update between different cloud stores to use modification time while checking for file skip.
> ----------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-18596
>                 URL: https://issues.apache.org/jira/browse/HADOOP-18596
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: tools/distcp
>            Reporter: Mehakmeet Singh
>            Assignee: Mehakmeet Singh
>            Priority: Major
>              Labels: pull-request-available
>
> Distcp -update currently relies on File size, block size, and Checksum comparisons to figure out which files should be skipped or copied. 
> Since different cloud stores have different checksum algorithms we should check for modification time as well to the checks.
> This would ensure that while performing -update if the files are perceived to be out of sync we should copy them. The machines between which the file transfers occur should be in time sync to avoid any extra copies.
> Improving testing and documentation for modification time checks between different object stores to ensure no incorrect skipping of files.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org