You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-issues@hadoop.apache.org by "Sandy Ryza (JIRA)" <ji...@apache.org> on 2013/08/15 22:01:47 UTC

[jira] [Commented] (MAPREDUCE-5462) In map-side sort, swap entire meta entries instead of indexes for better cache performance

    [ https://issues.apache.org/jira/browse/MAPREDUCE-5462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13741418#comment-13741418 ] 

Sandy Ryza commented on MAPREDUCE-5462:
---------------------------------------

Submitting a patch based on half of Todd's patch from MAPREDUCE-3235.

I benchmarked the change with the LocalJobRunner, using a WordCount job with a single map task on 64 MB of data.  I did five runs with and without the patch.  In all runs, the rest of the job after the map task finished took less than a second.  I measured cache misses using the perf command.

Average cache misses without the change: 165,083,881 (stddev 986,099)
Average job run time without the change: 14.46 seconds (stddev 1.24)
Average cache misses with the change: 83,130,729 (stddev 342,826)
Average job run time with the change: 12.018 seconds (stddev 1.95)
                
> In map-side sort, swap entire meta entries instead of indexes for better cache performance 
> -------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-5462
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5462
>             Project: Hadoop Map/Reduce
>          Issue Type: Sub-task
>          Components: performance, task
>    Affects Versions: 2.1.0-beta
>            Reporter: Sandy Ryza
>            Assignee: Sandy Ryza
>


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira