You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "Runping Qi (JIRA)" <ji...@apache.org> on 2007/04/03 02:27:32 UTC

[jira] Created: (HADOOP-1194) map output should not do block level compression

map output should not do block level compression
------------------------------------------------

                 Key: HADOOP-1194
                 URL: https://issues.apache.org/jira/browse/HADOOP-1194
             Project: Hadoop
          Issue Type: Bug
            Reporter: Runping Qi



If the user sets to compress the map output, the compression style should be record level, not block level, since using block level compression for map outputs causes performance degragation significantly.


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-1194) map output should not do block level compression

Posted by "Owen O'Malley (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-1194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12486724 ] 

Owen O'Malley commented on HADOOP-1194:
---------------------------------------

+1

> map output should not do block level compression
> ------------------------------------------------
>
>                 Key: HADOOP-1194
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1194
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: mapred
>    Affects Versions: 0.12.2
>            Reporter: Runping Qi
>         Assigned To: Arun C Murthy
>             Fix For: 0.13.0
>
>         Attachments: HADOOP-1194_20070403_1.patch
>
>
> If the user sets to compress the map output, the compression style should be record level, not block level, since using block level compression for map outputs causes performance degragation significantly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-1194) map output should not do block level compression

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-1194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12486924 ] 

Hadoop QA commented on HADOOP-1194:
-----------------------------------

Integrated in Hadoop-Nightly #48 (See http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Nightly/48/)

> map output should not do block level compression
> ------------------------------------------------
>
>                 Key: HADOOP-1194
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1194
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: mapred
>    Affects Versions: 0.12.2
>            Reporter: Runping Qi
>         Assigned To: Arun C Murthy
>             Fix For: 0.13.0
>
>         Attachments: HADOOP-1194_20070403_1.patch
>
>
> If the user sets to compress the map output, the compression style should be record level, not block level, since using block level compression for map outputs causes performance degragation significantly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-1194) map output should not do block level compression

Posted by "Owen O'Malley (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-1194?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Owen O'Malley updated HADOOP-1194:
----------------------------------

    Fix Version/s: 0.13.0
         Assignee: Owen O'Malley
       Issue Type: Improvement  (was: Bug)

I interpret this request as the default for map.output.compression.type to be BLOCK rather than the value of io.seqfile.compression.type. Is that correct, Runping?

> map output should not do block level compression
> ------------------------------------------------
>
>                 Key: HADOOP-1194
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1194
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: mapred
>    Affects Versions: 0.12.2
>            Reporter: Runping Qi
>         Assigned To: Owen O'Malley
>             Fix For: 0.13.0
>
>
> If the user sets to compress the map output, the compression style should be record level, not block level, since using block level compression for map outputs causes performance degragation significantly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-1194) map output should not do block level compression

Posted by "Arun C Murthy (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-1194?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Arun C Murthy updated HADOOP-1194:
----------------------------------

    Status: Patch Available  (was: Open)

This patch should provide what Runping is looking for...

> map output should not do block level compression
> ------------------------------------------------
>
>                 Key: HADOOP-1194
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1194
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: mapred
>    Affects Versions: 0.12.2
>            Reporter: Runping Qi
>         Assigned To: Arun C Murthy
>             Fix For: 0.13.0
>
>         Attachments: HADOOP-1194_20070403_1.patch
>
>
> If the user sets to compress the map output, the compression style should be record level, not block level, since using block level compression for map outputs causes performance degragation significantly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-1194) map output should not do block level compression

Posted by "Runping Qi (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-1194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12486226 ] 

Runping Qi commented on HADOOP-1194:
------------------------------------

The Jira was particularly about the situation where the user sets the map output compression to be true, and also set 'io.seqfile.compression.type' to be BLOCK (in order to get block level compression for the reduce outputs). In this case, the map output will also be block-level compressed, which is not desirble.

So both Owen and Arun's interpretations are correct.



> map output should not do block level compression
> ------------------------------------------------
>
>                 Key: HADOOP-1194
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1194
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: mapred
>    Affects Versions: 0.12.2
>            Reporter: Runping Qi
>         Assigned To: Arun C Murthy
>             Fix For: 0.13.0
>
>         Attachments: HADOOP-1194_20070403_1.patch
>
>
> If the user sets to compress the map output, the compression style should be record level, not block level, since using block level compression for map outputs causes performance degragation significantly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-1194) map output should not do block level compression

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-1194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12486249 ] 

Hadoop QA commented on HADOOP-1194:
-----------------------------------

+1, because http://issues.apache.org/jira/secure/attachment/12354810/HADOOP-1194_20070403_1.patch applied and successfully tested against trunk revision http://svn.apache.org/repos/asf/lucene/hadoop/trunk/524929. Results are at http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch

> map output should not do block level compression
> ------------------------------------------------
>
>                 Key: HADOOP-1194
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1194
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: mapred
>    Affects Versions: 0.12.2
>            Reporter: Runping Qi
>         Assigned To: Arun C Murthy
>             Fix For: 0.13.0
>
>         Attachments: HADOOP-1194_20070403_1.patch
>
>
> If the user sets to compress the map output, the compression style should be record level, not block level, since using block level compression for map outputs causes performance degragation significantly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-1194) map output should not do block level compression

Posted by "Arun C Murthy (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-1194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12486213 ] 

Arun C Murthy commented on HADOOP-1194:
---------------------------------------

Runping, 

'mapred.compress.map.output' is the boolean which controls whether map outputs are compressed. If set to true (@see MapTask.java:279) the 'mapred.output.compression.type' can be used to specify RECORD/BLOCK - this defaults to SequenceFile's compression-type i.e. 'io.seqfile.compression.type' which defaults to RECORD compression. (@see JobConf.getMapOutputCompressionType).

Does this help? Or do you propose we change JobConf.getMapOutputCompressionType to return RECORD if 'mapred.output.compression.type' is unset i.e. do not use io.seqfile.compression.type? Thanks for the clarification...

> map output should not do block level compression
> ------------------------------------------------
>
>                 Key: HADOOP-1194
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1194
>             Project: Hadoop
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.12.2
>            Reporter: Runping Qi
>
> If the user sets to compress the map output, the compression style should be record level, not block level, since using block level compression for map outputs causes performance degragation significantly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-1194) map output should not do block level compression

Posted by "Tom White (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-1194?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Tom White updated HADOOP-1194:
------------------------------

    Resolution: Fixed
        Status: Resolved  (was: Patch Available)

I've just committed this. Thanks Arun!

> map output should not do block level compression
> ------------------------------------------------
>
>                 Key: HADOOP-1194
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1194
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: mapred
>    Affects Versions: 0.12.2
>            Reporter: Runping Qi
>         Assigned To: Arun C Murthy
>             Fix For: 0.13.0
>
>         Attachments: HADOOP-1194_20070403_1.patch
>
>
> If the user sets to compress the map output, the compression style should be record level, not block level, since using block level compression for map outputs causes performance degragation significantly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-1194) map output should not do block level compression

Posted by "Arun C Murthy (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-1194?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Arun C Murthy updated HADOOP-1194:
----------------------------------

    Attachment: HADOOP-1194_20070403_1.patch

Patch to ensure map outputs are RECORD compressed by default... is this what you had in mind Runping?

> map output should not do block level compression
> ------------------------------------------------
>
>                 Key: HADOOP-1194
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1194
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: mapred
>    Affects Versions: 0.12.2
>            Reporter: Runping Qi
>         Assigned To: Arun C Murthy
>             Fix For: 0.13.0
>
>         Attachments: HADOOP-1194_20070403_1.patch
>
>
> If the user sets to compress the map output, the compression style should be record level, not block level, since using block level compression for map outputs causes performance degragation significantly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-1194) map output should not do block level compression

Posted by "Runping Qi (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-1194?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Runping Qi updated HADOOP-1194:
-------------------------------

          Component/s: mapred
    Affects Version/s: 0.12.2

> map output should not do block level compression
> ------------------------------------------------
>
>                 Key: HADOOP-1194
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1194
>             Project: Hadoop
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.12.2
>            Reporter: Runping Qi
>
> If the user sets to compress the map output, the compression style should be record level, not block level, since using block level compression for map outputs causes performance degragation significantly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Assigned: (HADOOP-1194) map output should not do block level compression

Posted by "Arun C Murthy (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-1194?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Arun C Murthy reassigned HADOOP-1194:
-------------------------------------

    Assignee: Arun C Murthy  (was: Owen O'Malley)

> map output should not do block level compression
> ------------------------------------------------
>
>                 Key: HADOOP-1194
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1194
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: mapred
>    Affects Versions: 0.12.2
>            Reporter: Runping Qi
>         Assigned To: Arun C Murthy
>             Fix For: 0.13.0
>
>
> If the user sets to compress the map output, the compression style should be record level, not block level, since using block level compression for map outputs causes performance degragation significantly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.