You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "Steve Loughran (Jira)" <ji...@apache.org> on 2020/05/21 12:54:00 UTC

[jira] [Resolved] (HADOOP-15300) distcp -update to WASB and ADL copies up all the files, always

     [ https://issues.apache.org/jira/browse/HADOOP-15300?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Steve Loughran resolved HADOOP-15300.
-------------------------------------
    Resolution: Duplicate

> distcp -update to WASB and ADL copies up all the files, always
> --------------------------------------------------------------
>
>                 Key: HADOOP-15300
>                 URL: https://issues.apache.org/jira/browse/HADOOP-15300
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: fs/adl, fs/azure
>    Affects Versions: 3.1.0
>            Reporter: Steve Loughran
>            Priority: Major
>
> If you use {{distcp -update}} to an adl or wasb store, repeatedly, all the source files are copied up every time. In contrast, if you use hdfs:// or s3a:// as a destination, only the new ones are uploaded. hdfs uses checksums for a diff, but s3a is just returning file length and relying on distcp logic being "if either src or dest doesn't do checksums, only compare file len"
> somehow that's not kicking in. Tested for file:  and hdfs sources, wasb and adl dests



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-dev-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-dev-help@hadoop.apache.org