You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Olga Natkovich (JIRA)" <ji...@apache.org> on 2009/06/12 02:40:07 UTC

[jira] Resolved: (PIG-686) PERFORMANCE: improve how data is stored between M-R jobs and between Map and Reduce

     [ https://issues.apache.org/jira/browse/PIG-686?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Olga Natkovich resolved PIG-686.
--------------------------------

    Resolution: Won't Fix

We have experimented with this work and the performance gains (at most 5-7%) are not sufficient for the complexity it would add to the code. Hopefully, once we integrate with AVRO, we get the improvement.

> PERFORMANCE: improve how data is stored between M-R jobs and between Map and Reduce
> -----------------------------------------------------------------------------------
>
>                 Key: PIG-686
>                 URL: https://issues.apache.org/jira/browse/PIG-686
>             Project: Pig
>          Issue Type: Improvement
>    Affects Versions: 0.2.0
>            Reporter: Olga Natkovich
>
> Currently, there is quite a bit of overhead in how the data is serialized in both cases because a type information is stored with each field.
> However, most of the time the data has known and consistent schema in which case, it is sufficient to store the schema once. 
> This change could really decrease the ammount of intermediate data generated.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.