You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Raghu Angadi (JIRA)" <ji...@apache.org> on 2012/08/31 01:08:08 UTC
[jira] [Commented] (PIG-2421) EvalFuncs need redesigned

    [ https://issues.apache.org/jira/browse/PIG-2421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13445402#comment-13445402 ] 

Raghu Angadi commented on PIG-2421:
-----------------------------------


 # +1 for making a context available (current UDFContext is not available for UDFs).
    #* use case: I want to be able to this write UDF 'NullIfMissing()' define this way: {code}
a = load 'input' as (p:(one, two, three), q:int);
b = foreach a generate NullIfMissing(p);
describe b;
{t: (one: bytearray, two: bytearray, three: bytearray)}
-- NullIfMissing Returns 
-- (null, null, null) if 'p' is null
-- (x, y, z), if p == (x, y, z)
-- (x, y, null) if p == (x, y)
{code}
# making conf available (readonly is sufficient, and probably preferred since a UDF context can used to store any state).

                
> EvalFuncs need redesigned
> -------------------------
>
>                 Key: PIG-2421
>                 URL: https://issues.apache.org/jira/browse/PIG-2421
>             Project: Pig
>          Issue Type: New Feature
>          Components: impl
>    Affects Versions: 0.11
>            Reporter: Alan Gates
>            Assignee: Alan Gates
>         Attachments: examples.patch, PIG-newudf.patch
>
>
> The current EvalFunc interface (and associated Algebraic and Accumulator interfaces) have grown unwieldy.  In particular, people have noted the following issues:
> # Writing a UDF requires a lot of boiler plate code.
> # Since UDFs always pass a tuple, users are required to manage their own type checking for input.
> # Declaring schemas for output data is confusing.
> # Writing a UDF that accepts multiple different parameters (using getArgToFuncMapping) is confusing.
> # Using Algebraic and Accumulator interfaces often entails duplicating code from the initial implementation.
> # UDF implementors are exposed to the internals of Pig since they have to know when to return a tuple (Initial, Intermediate) and when not to (exec, Final).
> # The separation of Initial, Intermediate, and Final into separate classes forces code duplication and makes it hard for UDFs in other languages to use those interfaces.
> # There is unused code in the current interface that occasionally causes confusion (e.g. isAsynchronous)
> Any change must be done in a way that allows existing UDFs to continue working essentially forever.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira