You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-issues@hadoop.apache.org by "Avner BenHanoch (JIRA)" <ji...@apache.org> on 2012/09/02 15:39:10 UTC

[jira] [Commented] (MAPREDUCE-4049) plugin for generic shuffle service

    [ https://issues.apache.org/jira/browse/MAPREDUCE-4049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13446945#comment-13446945 ] 

Avner BenHanoch commented on MAPREDUCE-4049:
--------------------------------------------

Hi Asokan,
I don’t have conflict of interests with you.  4 months ago, I already welcomed the watchers of this issue to help you commit your patch.
RDMA is not coupled with any merge and there is no such a thing "RDMA merge".  It is current Hadoop that couples shuffle with merge.  You can be relaxed. I don’t “want to retain that coupling”.  My opinion is that your decoupling is correct idea and I encourage it.
My patch passed code review for hadoop-1 and left with a request to do both hadoop-2 and hadoop-1 simultaneously.  Few days ago, I submitted the patch to the trunk and already passed “Automatic QA”.  *I am currently waiting for code review for trunk version.*
 
_Asokan,_
Your patch contains more than 7,000 rows, while my patch is only 400 rows.  I don’t want to wait till your patch passes Automatic QA, and code review, and additional rounds.  
I have no problem with the design you suggested me.  _However, this design can't work with the current trunk architecture, since in your design, shuffle.run() returns void and not iterator (you rely on merger to return the iterator)._  
I suggest that you’ll continue with your patch on top of my patch.  In case, you’ll need my help with the integration, I will be honored to assist.   
I am open to any idea that you or someone else may have.

Thanks,
 Avner

                
> plugin for generic shuffle service
> ----------------------------------
>
>                 Key: MAPREDUCE-4049
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4049
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: performance, task, tasktracker
>    Affects Versions: 1.0.3, 1.1.0, 2.0.0-alpha, 3.0.0
>            Reporter: Avner BenHanoch
>              Labels: merge, plugin, rdma, shuffle
>         Attachments: HADOOP-1.x.y.patch, Hadoop Shuffle Consumer Plugin TLD.rtf, Hadoop Shuffle Provider Plugin TLD.rtf, mapred-site.xml, mapreduce-4049.patch, mapreduce-4049.patch
>
>
> Support generic shuffle service as set of two plugins: ShuffleProvider & ShuffleConsumer.
> This will satisfy the following needs:
> # Better shuffle and merge performance. For example: we are working on shuffle plugin that performs shuffle over RDMA in fast networks (10gE, 40gE, or Infiniband) instead of using the current HTTP shuffle. Based on the fast RDMA shuffle, the plugin can also utilize a suitable merge approach during the intermediate merges. Hence, getting much better performance.
> # Satisfy MAPREDUCE-3060 - generic shuffle service for avoiding hidden dependency of NodeManager with a specific version of mapreduce shuffle (currently targeted to 0.24.0).
> References:
> # Hadoop Acceleration through Network Levitated Merging, by Prof. Weikuan Yu from Auburn University with others, [http://pasl.eng.auburn.edu/pubs/sc11-netlev.pdf]
> # I am attaching 2 documents with suggested Top Level Design for both plugins (currently, based on 1.0 branch)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira