You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Olga Natkovich (JIRA)" <ji...@apache.org> on 2009/06/12 02:40:07 UTC
[jira] Resolved: (PIG-686) PERFORMANCE: improve how data is stored
between M-R jobs and between Map and Reduce
[ https://issues.apache.org/jira/browse/PIG-686?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Olga Natkovich resolved PIG-686.
--------------------------------
Resolution: Won't Fix
We have experimented with this work and the performance gains (at most 5-7%) are not sufficient for the complexity it would add to the code. Hopefully, once we integrate with AVRO, we get the improvement.
> PERFORMANCE: improve how data is stored between M-R jobs and between Map and Reduce
> -----------------------------------------------------------------------------------
>
> Key: PIG-686
> URL: https://issues.apache.org/jira/browse/PIG-686
> Project: Pig
> Issue Type: Improvement
> Affects Versions: 0.2.0
> Reporter: Olga Natkovich
>
> Currently, there is quite a bit of overhead in how the data is serialized in both cases because a type information is stored with each field.
> However, most of the time the data has known and consistent schema in which case, it is sufficient to store the schema once.
> This change could really decrease the ammount of intermediate data generated.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.