You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "Devaraj Das (JIRA)" <ji...@apache.org> on 2008/05/16 15:09:55 UTC

[jira] Updated: (HADOOP-3366) Shuffle/Merge improvements

     [ https://issues.apache.org/jira/browse/HADOOP-3366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Devaraj Das updated HADOOP-3366:
--------------------------------

    Attachment: 3366.1.patch

(An offline discussion led me to agree to the suggestion that we should not have the file abstraction for the in memory merge. The file streams adds overhead which is not desirable in a performance critical section.)
This half-done patch is up for a high-level review. It introduces a ByteArrayManager that shuffle can use to store files as raw byte-arrays instead of files in the ramfs. It also defines a merge routine that can merge a bunch of such byte-arrays. There is some dependency of the remaining work, i.e., changing the shuffle code to use the ByteArrayManager instead of the ramfs, on the patch for HADOOP-2095 (since that patch changes the layout of the intermediate sequence file). I'll see what else can be done without that patch being available.

By the way, I have done the patch assuming the layout as <key-len><val-len><key><value>   (the difference w.r.t the earlier proposed layout is that the lengths are together). That made the parsing of the byte arrays simpler. 

> Shuffle/Merge improvements
> --------------------------
>
>                 Key: HADOOP-3366
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3366
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: mapred
>            Reporter: Arun C Murthy
>            Assignee: Arun C Murthy
>             Fix For: 0.18.0
>
>         Attachments: 3366.1.patch
>
>
> This is intended to be a meta-issue to track various improvements to shuffle/merge in the reducer.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.