You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-issues@hadoop.apache.org by "ASF GitHub Bot (Jira)" <ji...@apache.org> on 2023/05/10 21:37:00 UTC

[jira] [Updated] (HADOOP-18739) Parallelize concatenation of distcp chunks of separate files in CopyCommitter

     [ https://issues.apache.org/jira/browse/HADOOP-18739?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

ASF GitHub Bot updated HADOOP-18739:
------------------------------------
    Labels: pull-request-available  (was: )

> Parallelize concatenation of distcp chunks of separate files in CopyCommitter
> -----------------------------------------------------------------------------
>
>                 Key: HADOOP-18739
>                 URL: https://issues.apache.org/jira/browse/HADOOP-18739
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: tools/distcp
>            Reporter: Abhay Yadav
>            Priority: Trivial
>              Labels: pull-request-available
>
> While copying a folder containing large files consisting of multiple distcp chunks, copy committer synchronously picks chunks of each file and concatenates them. This part can be improved by parallelizing the concatenation of distcp chunks of separate files. We are able to save 2-3 minutes while copying a folder of 100 GB containing 20 files of 5GB size with this improvement.
> Contributing a patch for this.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org