You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "Amar Kamat (JIRA)" <ji...@apache.org> on 2007/10/17 13:38:50 UTC

[jira] Updated: (HADOOP-1965) Handle map output buffers better

     [ https://issues.apache.org/jira/browse/HADOOP-1965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Amar Kamat updated HADOOP-1965:
-------------------------------

    Attachment: 1965_single_proc_150mb_gziped.pdf

I am attaching the results of having a threaded spill [ with two DataOutputBuffer (each is half the size of io.sort.mb) and a separate thread for sorting and spilling ] as compared to a sequential spill [ as done now ] provided that the data is non-splittable and should be consumed by a single map task.  The setup is as follows
* Data source: Random text using random-text-writer
* Key size : 10 words
* Data size : ~150mb
* Input type : Gzip
* DFS-block-size : 200mb
* # nodes : 1
* Job type : wordcount
* io.sort.mb : {5,15,25,50,75,100,125}
----
Results : See the attachment 1965_single_proc_150mb_gziped.pdf
----
comments ? 
Kindly let me know if a text format of the comparison is required.

> Handle map output buffers better
> --------------------------------
>
>                 Key: HADOOP-1965
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1965
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: mapred
>            Reporter: Devaraj Das
>            Assignee: Amar Kamat
>         Attachments: 1965_single_proc_150mb_gziped.pdf
>
>
> Today, the map task stops calling the map method while sort/spill is using the (single instance of) map output buffer. One improvement that can be done to improve performance of the map task is to have another buffer for writing the map outputs to, while sort/spill is using the first buffer.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.