You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-issues@hadoop.apache.org by "Erik Krogen (JIRA)" <ji...@apache.org> on 2017/02/09 17:11:42 UTC

[jira] [Commented] (HADOOP-13975) Allow DistCp to use MultiThreadedMapper

    [ https://issues.apache.org/jira/browse/HADOOP-13975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15859827#comment-15859827 ] 

Erik Krogen commented on HADOOP-13975:
--------------------------------------

This seems very useful! Thanks for working on this.

Why is there both {{parseThreadsPerMap}} and {{parseNumThreadsPerMap}} in {{OptionsParser}}? It seems only one of them is used. Additionally the error message is incorrect in both of them, with one referring to {{MAX_MAPS}} and one referring to {{NUM_LISTSSTATUS_THREADS}}. 

> Allow DistCp to use MultiThreadedMapper
> ---------------------------------------
>
>                 Key: HADOOP-13975
>                 URL: https://issues.apache.org/jira/browse/HADOOP-13975
>             Project: Hadoop Common
>          Issue Type: New Feature
>          Components: tools/distcp
>    Affects Versions: 3.0.0-alpha1
>            Reporter: Zheng Shao
>            Assignee: Zheng Shao
>            Priority: Minor
>         Attachments: HADOOP-distcp-multithreaded-mapper-branch26.1.patch, HADOOP-distcp-multithreaded-mapper-branch26.2.patch, HADOOP-distcp-multithreaded-mapper-branch26.3.patch, HADOOP-distcp-multithreaded-mapper-branch26.4.patch, HADOOP-distcp-multithreaded-mapper-trunk.1.patch, HADOOP-distcp-multithreaded-mapper-trunk.2.patch, HADOOP-distcp-multithreaded-mapper-trunk.3.patch, HADOOP-distcp-multithreaded-mapper-trunk.4.patch
>
>
> Although distcp allow users to control the parallelism via number of mappers, sometimes it's desirable to run fewer mappers but more threads per mapper.  Since distcp is network bound (either by throughput or more frequently by latency of creating connections, opening files, reading/writing files, and closing files), this can make each mapper much more efficient.
> In that way, a lot of resources can be shared so we can save memory and connections to NameNode.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org