You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-issues@hadoop.apache.org by "Daryn Sharp (Jira)" <ji...@apache.org> on 2020/02/18 17:32:00 UTC
[jira] [Commented] (HADOOP-16776) backport HADOOP-16775 (distcp
unique files) to branch-2
[ https://issues.apache.org/jira/browse/HADOOP-16776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17039267#comment-17039267 ]
Daryn Sharp commented on HADOOP-16776:
--------------------------------------
HADOOP-16775 (this jira's is a back port) does not clearly explain the severity: *distcp copies to s3 will be randomly corrupted*. Basically every file other than the first file copied by each task has the risk of being a dup of a previously copied file by that task. It happens surprisingly often and the job _does not fail_.
[~stevel@apache.org], please explain this circular logic:
bq. I don't Think back reporting is this is justified. It's just a safety measure for people who aren't using -direct
Ok, sounds great, but you blocked adding the -direct flag in HADOOP-15281:
bq. Closing as fixed. I'm not going apply the -direct option to branch-2: if you want to work with cloud stores, run, don't walk to branch-3
So I can't have the fix and I can't have the -direct workaround...
I'm appalled and dismayed. You're blocking fixes for a critical data corruption bug due to a personal interest in advancing branch-3? We've been telling customers for months that it was impossible for distcp to copy the wrong data and they must be overwriting the s3 destination.
> backport HADOOP-16775 (distcp unique files) to branch-2
> -------------------------------------------------------
>
> Key: HADOOP-16776
> URL: https://issues.apache.org/jira/browse/HADOOP-16776
> Project: Hadoop Common
> Issue Type: Improvement
> Components: tools/distcp
> Affects Versions: 2.8.0, 3.0.0
> Reporter: Amir Shenavandeh
> Priority: Major
> Labels: DistCp
> Attachments: HADOOP-16776-branch-2.8-001.patch, HADOOP-16776-branch-2.8-002.patch
>
>
> This is to back port HADOOP-16775 to hadoop 2.8 branch.
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org