You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Raghu Angadi (JIRA)" <ji...@apache.org> on 2012/08/31 01:08:08 UTC
[jira] [Commented] (PIG-2421) EvalFuncs need redesigned
[ https://issues.apache.org/jira/browse/PIG-2421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13445402#comment-13445402 ]
Raghu Angadi commented on PIG-2421:
-----------------------------------
# +1 for making a context available (current UDFContext is not available for UDFs).
#* use case: I want to be able to this write UDF 'NullIfMissing()' define this way: {code}
a = load 'input' as (p:(one, two, three), q:int);
b = foreach a generate NullIfMissing(p);
describe b;
{t: (one: bytearray, two: bytearray, three: bytearray)}
-- NullIfMissing Returns
-- (null, null, null) if 'p' is null
-- (x, y, z), if p == (x, y, z)
-- (x, y, null) if p == (x, y)
{code}
# making conf available (readonly is sufficient, and probably preferred since a UDF context can used to store any state).
> EvalFuncs need redesigned
> -------------------------
>
> Key: PIG-2421
> URL: https://issues.apache.org/jira/browse/PIG-2421
> Project: Pig
> Issue Type: New Feature
> Components: impl
> Affects Versions: 0.11
> Reporter: Alan Gates
> Assignee: Alan Gates
> Attachments: examples.patch, PIG-newudf.patch
>
>
> The current EvalFunc interface (and associated Algebraic and Accumulator interfaces) have grown unwieldy. In particular, people have noted the following issues:
> # Writing a UDF requires a lot of boiler plate code.
> # Since UDFs always pass a tuple, users are required to manage their own type checking for input.
> # Declaring schemas for output data is confusing.
> # Writing a UDF that accepts multiple different parameters (using getArgToFuncMapping) is confusing.
> # Using Algebraic and Accumulator interfaces often entails duplicating code from the initial implementation.
> # UDF implementors are exposed to the internals of Pig since they have to know when to return a tuple (Initial, Intermediate) and when not to (exec, Final).
> # The separation of Initial, Intermediate, and Final into separate classes forces code duplication and makes it hard for UDFs in other languages to use those interfaces.
> # There is unused code in the current interface that occasionally causes confusion (e.g. isAsynchronous)
> Any change must be done in a way that allows existing UDFs to continue working essentially forever.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira