You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-issues@hadoop.apache.org by "Arun C Murthy (JIRA)" <ji...@apache.org> on 2013/01/10 05:38:17 UTC
[jira] [Updated] (MAPREDUCE-4808) Allow reduce-side merge to be
pluggable
[ https://issues.apache.org/jira/browse/MAPREDUCE-4808?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Arun C Murthy updated MAPREDUCE-4808:
-------------------------------------
Status: Open (was: Patch Available)
Asokan, sorry I've been away traveling home during the holidays and hence the delay.
I have more comments, but I'll put some here to keep the discussion going.
Thanks for the design doc, but I was looking for thoughts on *how* the plugin was going used for use-cases you've mentioned (hash-join etc.), alternatives on design etc.
IAC, taking a step back, the 'goal' here is to make the 'merge' pluggable.
Reduce-side has 2 pieces:
# Shuffle - Move data from maps to the reduce.
# Merge - Merge already sorted map-outputs.
The rest (MergeManager etc.) are merely implementation details to manage memory etc., which are irrelevant in several scenarios as soon as we consider alternatives to the current HTTP-based shuffle (several alternatives exist such RDMA etc.).
Your current approach tries to encapsulate and enshrine the current implementation of the reduce task, which I'm not wild about. By this I mean, you are focussing too much on the current state and trying to make interfaces which are unnecessary for now and might not suffice for the future.
I really don't think we should be tying Shuffle & Merge as you have done by introducing yet another new interface (regardless of whether it's public or not).
As I've noted above, adding a simple 'Merge' interface with one 'merge' call will address all of the use-cases you have outlined. If not, let's discuss.
> Allow reduce-side merge to be pluggable
> ---------------------------------------
>
> Key: MAPREDUCE-4808
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-4808
> Project: Hadoop Map/Reduce
> Issue Type: New Feature
> Affects Versions: 2.0.2-alpha
> Reporter: Arun C Murthy
> Assignee: Mariappan Asokan
> Fix For: 2.0.3-alpha
>
> Attachments: COMBO-mapreduce-4809-4812-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, mapreduce-4808.patch, MergeManagerPlugin.pdf
>
>
> Allow reduce-side merge to be pluggable for MAPREDUCE-2454
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira