You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-issues@hadoop.apache.org by "Steve Loughran (Jira)" <ji...@apache.org> on 2020/04/02 15:20:00 UTC

[jira] [Commented] (HADOOP-16756) distcp -update to S3A always overwrites due to block size mismatch

    [ https://issues.apache.org/jira/browse/HADOOP-16756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17073818#comment-17073818 ] 

Steve Loughran commented on HADOOP-16756:
-----------------------------------------

this is a regression caused by HADOOP-8143. it defaults to preserving block size, so for all stores where we make the block size up (all the object stores), it is forcing the copy

> distcp -update to S3A always overwrites due to block size mismatch
> ------------------------------------------------------------------
>
>                 Key: HADOOP-16756
>                 URL: https://issues.apache.org/jira/browse/HADOOP-16756
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs/s3, tools/distcp
>    Affects Versions: 3.3.0
>            Reporter: Daisuke Kobayashi
>            Priority: Major
>
> Distcp over S3A always copies all source files no matter the files are changed or not. This is opposite to the statement in the doc below.
> [http://hadoop.apache.org/docs/current/hadoop-distcp/DistCp.html]
> {noformat}
> And to use -update to only copy changed files.
> {noformat}
> CopyMapper compares file length as well as block size before copying. While the file length should match, the block size does not. This is apparently because the returned block size from S3A is always 32MB.
> [https://github.com/apache/hadoop/blob/release-3.2.0-RC1/hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/mapred/CopyMapper.java#L348]
> I'd suppose we should update the documentation or make code change.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org