You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-issues@hadoop.apache.org by "Arun C Murthy (JIRA)" <ji...@apache.org> on 2013/01/10 05:38:17 UTC

[jira] [Updated] (MAPREDUCE-4808) Allow reduce-side merge to be pluggable

     [ https://issues.apache.org/jira/browse/MAPREDUCE-4808?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Arun C Murthy updated MAPREDUCE-4808:
-------------------------------------

    Status: Open  (was: Patch Available)

Asokan, sorry I've been away traveling home during the holidays and hence the delay.

I have more comments, but I'll put some here to keep the discussion going.

Thanks for the design doc, but I was looking for thoughts on *how* the plugin was going used for use-cases you've mentioned (hash-join etc.), alternatives on design etc. 

IAC, taking a step back, the 'goal' here is to make the 'merge' pluggable.

Reduce-side has 2 pieces:
# Shuffle - Move data from maps to the reduce.
# Merge - Merge already sorted map-outputs.

The rest (MergeManager etc.) are merely implementation details to manage memory etc., which are irrelevant in several scenarios as soon as we consider alternatives to the current HTTP-based shuffle (several alternatives exist such RDMA etc.).

Your current approach tries to encapsulate and enshrine the current implementation of the reduce task, which I'm not wild about. By this I mean, you are focussing too much on the current state and trying to make interfaces which are unnecessary for now and might not suffice for the future.

I really don't think we should be tying Shuffle & Merge as you have done by introducing yet another new interface (regardless of whether it's public or not).


As I've noted above, adding a simple 'Merge' interface with one 'merge' call will address all of the use-cases you have outlined. If not, let's discuss.

                
> Allow reduce-side merge to be pluggable
> ---------------------------------------
>
>                 Key: MAPREDUCE-4808
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4808
>             Project: Hadoop Map/Reduce
>          Issue Type: New Feature
>    Affects Versions: 2.0.2-alpha
>            Reporter: Arun C Murthy
>            Assignee: Mariappan Asokan
>             Fix For: 2.0.3-alpha
>
>         Attachments: COMBO-mapreduce-4809-4812-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, MergeManagerPlugin.pdf
>
>
> Allow reduce-side merge to be pluggable for MAPREDUCE-2454

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira