You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-issues@hadoop.apache.org by "Erik Krogen (JIRA)" <ji...@apache.org> on 2017/02/27 16:51:45 UTC
[jira] [Comment Edited] (HADOOP-14086) Improve DistCp Speed for
small files
[ https://issues.apache.org/jira/browse/HADOOP-14086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15886105#comment-15886105 ]
Erik Krogen edited comment on HADOOP-14086 at 2/27/17 4:51 PM:
---------------------------------------------------------------
[~zhz] currently there are multiple calls made for each file; even reducing a distcp for 1M files to 1M {{getFileInfo}} calls would be a big improvement over the current implementation.
[~stevel@apache.org], what about this JIRA makes you worry that object store performance will be worse? Nothing stands out to me so I am curious. Also, are you saying that the listFiles performance work is already done, or under progress? Do you have a JIRA link? Sounds very interesting.
was (Author: xkrogen):
[~zhz] currently there are multiple calls made for each file; even reducing a distcp for 1M files to 1M {{getFileInfo}} calls would be a big improvement over the current implementation.
[~stevel@apache.org], what about this JIRA makes you worry that object store performance will be worse? Nothing stands out to me so I am curious. Also, are you saying that the listFiles performance work is already done, or under progress? Do you have a JIRA link?
> Improve DistCp Speed for small files
> ------------------------------------
>
> Key: HADOOP-14086
> URL: https://issues.apache.org/jira/browse/HADOOP-14086
> Project: Hadoop Common
> Issue Type: Improvement
> Components: tools/distcp
> Affects Versions: 2.6.5
> Reporter: Zheng Shao
> Assignee: Zheng Shao
> Priority: Minor
>
> When using distcp to copy lots of small files, NameNode naturally becomes a bottleneck.
> The current distcp code did *not* optimize to reduce the NameNode calls. We should restructure the code to reduce the number of NameNode calls as much as possible to speed up the copy of small files.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org