You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Alan Gates (JIRA)" <ji...@apache.org> on 2007/12/03 20:25:43 UTC

[jira] Updated: (PIG-7) Optimize execution of algebraic functions

     [ https://issues.apache.org/jira/browse/PIG-7?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Alan Gates updated PIG-7:
-------------------------

    Attachment: combiner3.patch

The patch combiner3.patch addresses Utkarsh's points that the previous code wasn't handling the case where there was a func(func()) in the projection.  It also wasn't handling the case where the projection was anything other than: group, func(), [func()...].  Both of those are explicitly caught now.

One note is that the use of the combiner in this patch is fairly restrictive.  The user has to have a projection with the group in the position 0.  We should probably rework this so that the group can either be omitted or moved around.  I don't have time to do this now, but it shouldn't be too much work and it will make using the code more flexible.

> Optimize execution of algebraic functions
> -----------------------------------------
>
>                 Key: PIG-7
>                 URL: https://issues.apache.org/jira/browse/PIG-7
>             Project: Pig
>          Issue Type: Improvement
>          Components: impl
>            Reporter: Olga Natkovich
>            Assignee: Alan Gates
>         Attachments: combiner.patch, combiner2.patch, combiner3.patch
>
>
> Algebraic are functions that can be computed incrementally like count(X), SUM(X), etc. They can be computed effciently by doing the first level computation using hadoop combiner. This can give a significant (2-3x) speedup for many aggregation queries. 
> Several users asked us for this feature so it is pretty high priority.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.