You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-issues@hadoop.apache.org by "Arun C Murthy (JIRA)" <ji...@apache.org> on 2012/07/13 01:57:36 UTC

[jira] [Commented] (MAPREDUCE-4049) plugin for generic shuffle service

    [ https://issues.apache.org/jira/browse/MAPREDUCE-4049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13413344#comment-13413344 ] 

Arun C Murthy commented on MAPREDUCE-4049:
------------------------------------------

Avner, apologies for taking this long.

The patch looks reasonable, and small(!) which is great.

The concern I have is that this patch introduces an interface (i.e. ShuffleProvider/Consumer) which isn't present in hadoop-2.x. Should we do both hadoop-2 and hadoop-1 simultaneously? Else, this 'feature' will break as soon as we upgrade to hadoop-2.x.

Other nits:
# We should get TaskTracker.MapOutputServlet to implement ShuffleProvider interface, else it's very easy to break an interface if no one in the core implements it. For e.g. I have no idea about ShuffleProvider.taskDone or ShuffleProvider.jobDone are used.
# Minor nits: ShuffleProvider is mis-spelt in a couple of places.
# We should add the new configs for provider/consumer in mapred-default.xml

Again, apologies it took me so long to get to your patch and thanks for being super-patient! I'd like to work with you to get this committed asap!
                
> plugin for generic shuffle service
> ----------------------------------
>
>                 Key: MAPREDUCE-4049
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4049
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: performance, task, tasktracker
>    Affects Versions: 1.0.3, 1.1.0, 2.0.0-alpha, 3.0.0
>            Reporter: Avner BenHanoch
>              Labels: merge, plugin, rdma, shuffle
>         Attachments: HADOOP-1.0.2.patch, HADOOP-1.0.x.patch, HADOOP-1.1.patch, HADOOP-1.x.y-review-oriented.patch, Hadoop Shuffle Consumer Plugin TLD.rtf, Hadoop Shuffle Provider Plugin TLD.rtf, mapred-site.xml
>
>
> Support generic shuffle service as set of two plugins: ShuffleProvider & ShuffleConsumer.
> This will satisfy the following needs:
> # Better shuffle and merge performance. For example: we are working on shuffle plugin that performs shuffle over RDMA in fast networks (10gE, 40gE, or Infiniband) instead of using the current HTTP shuffle. Based on the fast RDMA shuffle, the plugin can also utilize a suitable merge approach during the intermediate merges. Hence, getting much better performance.
> # Satisfy MAPREDUCE-3060 - generic shuffle service for avoiding hidden dependency of NodeManager with a specific version of mapreduce shuffle (currently targeted to 0.24.0).
> References:
> # Hadoop Acceleration through Network Levitated Merging, by Prof. Weikuan Yu from Auburn University with others, [http://pasl.eng.auburn.edu/pubs/sc11-netlev.pdf]
> # I am attaching 2 documents with suggested Top Level Design for both plugins (currently, based on 1.0 branch)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira