You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "Ayush Saxena (Jira)" <ji...@apache.org> on 2022/01/05 18:30:00 UTC

[jira] [Resolved] (HADOOP-18056) DistCp: Filter duplicates in the source paths

     [ https://issues.apache.org/jira/browse/HADOOP-18056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ayush Saxena resolved HADOOP-18056.
-----------------------------------
    Fix Version/s: 3.4.0
                   3.3.3
     Hadoop Flags: Reviewed
       Resolution: Fixed

> DistCp: Filter duplicates in the source paths
> ---------------------------------------------
>
>                 Key: HADOOP-18056
>                 URL: https://issues.apache.org/jira/browse/HADOOP-18056
>             Project: Hadoop Common
>          Issue Type: Improvement
>            Reporter: Ayush Saxena
>            Assignee: Ayush Saxena
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 3.4.0, 3.3.3
>
>          Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> Add a basic filtering to remove the exact duplicate paths exposed for copying.
> In case two same srcPath say /tmp/file1 is passed in the list twice. DistCp fails with DuplicateFileException, post building the listing.
> Would be better if we do a basic filtering of duplicate paths. 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-dev-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-dev-help@hadoop.apache.org