You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Dmitriy V. Ryaboy (JIRA)" <ji...@apache.org> on 2012/08/23 00:18:42 UTC

[jira] [Updated] (PIG-2888) Improve performance of POPartialAgg

     [ https://issues.apache.org/jira/browse/PIG-2888?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Dmitriy V. Ryaboy updated PIG-2888:
-----------------------------------

    Attachment: partialagg_patch_1.patch

The attached patch is an initial pass at this implementation. Reading it as a diff may be hard -- about 70% of the code in POPartialAgg changed -- I recommend applying it to a git branch and looking at the class directly.

I have not implemented memory-based triggering yet, for now just relying on hardcoded limits on number of tuples in the caches.

I have also not implemented the functionality to automatically turn off hash-based aggregation.

Tests (except the memory setting related tests) pass.

Test runs on synthetic data both in local mode and on a cluster produced correct data.

Cluster runs indicate significant improvement in overall speed of execution when using this approach.
                
> Improve performance of POPartialAgg
> -----------------------------------
>
>                 Key: PIG-2888
>                 URL: https://issues.apache.org/jira/browse/PIG-2888
>             Project: Pig
>          Issue Type: Improvement
>            Reporter: Dmitriy V. Ryaboy
>            Assignee: Dmitriy V. Ryaboy
>         Attachments: partialagg_patch_1.patch
>
>
> During performance testing, we found that POPartialAgg can cause performance degradation for Pig jobs when the Algebraic UDFs it's being applied to aren't well suited to the operator's assumptions. Changing the implementation to a more flexible hash-based model can provide significant performance improvements.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira