You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-issues@hadoop.apache.org by "Daryn Sharp (Jira)" <ji...@apache.org> on 2020/02/18 17:32:00 UTC

[jira] [Commented] (HADOOP-16776) backport HADOOP-16775 (distcp unique files) to branch-2

    [ https://issues.apache.org/jira/browse/HADOOP-16776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17039267#comment-17039267 ] 

Daryn Sharp commented on HADOOP-16776:
--------------------------------------

HADOOP-16775 (this jira's is a back port) does not clearly explain the severity: *distcp copies to s3 will be randomly corrupted*.  Basically every file other than the first file copied by each task has the risk of being a dup of a previously copied file by that task.  It happens surprisingly often and the job _does not fail_.

[~stevel@apache.org], please explain this circular logic:
bq.  I don't Think back reporting is this is justified. It's just a safety measure for people who aren't using -direct
Ok, sounds great, but you blocked adding the -direct flag in HADOOP-15281:
bq. Closing as fixed. I'm not going apply the -direct option to branch-2: if you want to work with cloud stores, run, don't walk to branch-3
So I can't have the fix and I can't have the -direct workaround...

I'm appalled and dismayed.  You're blocking fixes for a critical data corruption bug due to a personal interest in advancing branch-3?   We've been telling customers for months that it was impossible for distcp to copy the wrong data and they must be overwriting the s3 destination.






> backport HADOOP-16775 (distcp unique files) to branch-2
> -------------------------------------------------------
>
>                 Key: HADOOP-16776
>                 URL: https://issues.apache.org/jira/browse/HADOOP-16776
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: tools/distcp
>    Affects Versions: 2.8.0, 3.0.0
>            Reporter: Amir Shenavandeh
>            Priority: Major
>              Labels: DistCp
>         Attachments: HADOOP-16776-branch-2.8-001.patch, HADOOP-16776-branch-2.8-002.patch
>
>
> This is to back port HADOOP-16775 to hadoop 2.8 branch.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org