You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "dhruba borthakur (JIRA)" <ji...@apache.org> on 2009/05/18 07:01:45 UTC
[jira] Commented: (HADOOP-5831) Implement memory-to-memory merge in
the reduce
[ https://issues.apache.org/jira/browse/HADOOP-5831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12710271#action_12710271 ]
dhruba borthakur commented on HADOOP-5831:
------------------------------------------
This looks a really impressive performance gain! awesome.
> Implement memory-to-memory merge in the reduce
> ----------------------------------------------
>
> Key: HADOOP-5831
> URL: https://issues.apache.org/jira/browse/HADOOP-5831
> Project: Hadoop Core
> Issue Type: Improvement
> Components: mapred
> Reporter: Arun C Murthy
> Assignee: Arun C Murthy
> Fix For: 0.21.0
>
>
> HADOOP-3446 fixed the reduce to not flush the in-memory shuffled map-outputs before feeding to the reduce. However for latency-sensitive applications with lots of memory like the terasort this hurts performance since the fan-in for the final in-memory merge is too large (all 8000 map-outputs very in-memory) resulting in less than optimal performance.
> When I put in an intermediate memory-to-memory merge for the terasort's reduce (there-by avoiding disk i/o) to cut the fan-in from 8000 to <100 the 'reduce' phase (including the local datanode-write) sped-up 250% (from 10s to 4s).
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.