You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-dev@hadoop.apache.org by "Scott Chen (JIRA)" <ji...@apache.org> on 2010/12/07 20:50:13 UTC

[jira] Created: (MAPREDUCE-2212) MapTask and ReduceTask should only compress/decompress the final map output file

MapTask and ReduceTask should only compress/decompress the final map output file
--------------------------------------------------------------------------------

                 Key: MAPREDUCE-2212
                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2212
             Project: Hadoop Map/Reduce
          Issue Type: Improvement
          Components: task
    Affects Versions: 0.23.0
            Reporter: Scott Chen
            Assignee: Scott Chen
             Fix For: 0.23.0


Currently if we set mapred.map.output.compression.codec
1. MapTask will compress every spill, decompress every spill, merge and compress the final map output file
2. ReduceTask will decompress, merge and compress every map output file. And repeat the compression/decompression every pass.

This cause all the data being compressed/decompressed many times.
The reason we need mapred.map.output.compression.codec is for network traffic.
We should not compress/decompress the data again and again during merge sort.

We should do the compression only for the final map output file that is been transmit over the network.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Resolved: (MAPREDUCE-2212) MapTask and ReduceTask should only compress/decompress the final map output file

Posted by "Scott Chen (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-2212?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Scott Chen resolved MAPREDUCE-2212.
-----------------------------------

    Resolution: Won't Fix

I am closing this now because I think there is no much benefit to do this.
This will increase complexity of the code.

> MapTask and ReduceTask should only compress/decompress the final map output file
> --------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-2212
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2212
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: task
>    Affects Versions: 0.23.0
>            Reporter: Scott Chen
>            Assignee: Scott Chen
>             Fix For: 0.23.0
>
>
> Currently if we set mapred.map.output.compression.codec
> 1. MapTask will compress every spill, decompress every spill, merge and compress the final map output file
> 2. ReduceTask will decompress, merge and compress every map output file. And repeat the compression/decompression every pass.
> This causes all the data being compressed/decompressed many times.
> The reason we need mapred.map.output.compression.codec is for network traffic.
> We should not compress/decompress the data again and again during merge sort.
> We should only compress the final map output file that will be transmitted over the network.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.