You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Jonathan Coveney (Updated) (JIRA)" <ji...@apache.org> on 2012/03/13 22:24:40 UTC

[jira] [Updated] (PIG-2551) Create an AlgebraicEvalFunc and AccumulatorEvalFunc abstract class which gives you the lower levels for free

     [ https://issues.apache.org/jira/browse/PIG-2551?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Coveney updated PIG-2551:
----------------------------------

    Attachment: PIG-2551-1.patch

Thanks for your comments, Julien and Daniel!

All, please find attached the revised patch, per your notes.

- I added comments
- I added a basic heuristic to apply the intermediate EvalFunc in cases where applying it gives a useful reduction in size.
- I added PigCounterHelper to Pig from ElephantBird. It's a more reasonable place to live, and it is useful. This facilitates logging to Pig from UDFs. I use this to collect stats on the combining activity when an Algebraic UDF is used as an Accumulator.

Also, Daniel, I did some benchmarking per Dmitriy's comment, and I don't know that it's appreciably slower. On 1M bags, here is a benchmark on the accumulator piece:

   AlgSum 14.9 ============================
 AlgCount 15.9 ==============================
      Sum 13.7 =========================
    Count 13.4 =========================

AlgSum and AlgCount are just a version of AlgebraicEvalFunc that returns the static classes from LongSum and COUNT, but in this benchmark I called accumulate. The purpose of this is because it is in using accumulate that the function calling overhead is going to be largest.

As you can see, the falloff is minimal, so I don't know that some big disclaimer is necessary (any more than it's necessary to say that Jython UDFs are slower than Java UDFs or whatnot).

For the accumulator eval func, there is no overhead, and a lot of people I know when implementing accumulative UDFs basically do that manually as is.
                
> Create an AlgebraicEvalFunc and AccumulatorEvalFunc abstract class which gives you the lower levels for free
> ------------------------------------------------------------------------------------------------------------
>
>                 Key: PIG-2551
>                 URL: https://issues.apache.org/jira/browse/PIG-2551
>             Project: Pig
>          Issue Type: Improvement
>            Reporter: Jonathan Coveney
>            Assignee: Jonathan Coveney
>            Priority: Minor
>             Fix For: 0.11
>
>         Attachments: PIG-2551-0.patch, PIG-2551-1.patch
>
>
> This is more of a win for the Algebraic interface than the Accumulator interface, but the idea is that if you implement the Algebraic interface, you should get Accumulator/EvalFunc for free, and if you implement Accumulator, you should get EvalFunc for free. The win of this is that in cases such as JRuby, you don't have to muck around doing this yourself...you have them implement the algebraic portion, and the rest comes free (that is where this came out of, but I feel like it is generally useful enough).
> The next piece of work I'd like to do is making an easier to implement way to make Algebraic UDFs, but then again, my to do is huge :) Would love thoughts on this. If it doesn't make it into Pig, it's still going to come in the JRuby stuff, so I thought it'd at least be worth having it separate, tested, and available to everyone.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira