You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Yan Zhou (JIRA)" <ji...@apache.org> on 2010/08/09 18:55:16 UTC

[jira] Updated: (PIG-1501) need to investigate the impact of compression on pig performance

     [ https://issues.apache.org/jira/browse/PIG-1501?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Yan Zhou updated PIG-1501:
--------------------------

    Attachment: compress_perf_data_2.txt

The data set in the last tests are small such that the performance difference was lost in background noise.  This test case generates more temporary data.

In summary, lzo generates about 3% compression ration and sees 4x  speed improvement than uncompressed;  gzip generates less than 1% compress ratio but the speed is 1%-2% slower than uncompressed. This observation is in line with the general observation that gzip compresses better but performs worse.

> need to investigate the impact of compression on pig performance
> ----------------------------------------------------------------
>
>                 Key: PIG-1501
>                 URL: https://issues.apache.org/jira/browse/PIG-1501
>             Project: Pig
>          Issue Type: Test
>            Reporter: Olga Natkovich
>            Assignee: Yan Zhou
>             Fix For: 0.8.0
>
>         Attachments: compress_perf_data.txt, compress_perf_data_2.txt
>
>
> We would like to understand how compressing map results as well as well as reducer output in a chain of MR jobs impacts performance. We can use PigMix queries for this investigation.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.