You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Daniel Dai (JIRA)" <ji...@apache.org> on 2015/02/24 00:48:13 UTC

[jira] [Updated] (PIG-3294) Allow Pig use Hive UDFs

     [ https://issues.apache.org/jira/browse/PIG-3294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Daniel Dai updated PIG-3294:
----------------------------
    Attachment: PIG-3294-before-refactory.patch
                PIG-3294-1.patch

To use it, define HiveUDF/HiveUDTF/HiveUDAF in Pig:
define sin HiveUDF('sin');  -- alias in FunctionRegistry
define sin HiveUDF('org.apache.hadoop.hive.ql.udf.UDFSin'); -- full class name
define explode HiveUDTF('explode');  -- UDTF maps to Pig UDF returns bag
define avg HiveUDAF('avg');  -- UDAF maps to Pig Algebraic UDF

Some Hive UDF require constant parameters. Since Hive use ObjectInspector to communicate schema to UDF, and ObjectInspector is richer than Schema in that ObjectInspector can express a field is a constant or not. To support this function, HiveUDF take an optional constant tuple. null item in the tuple means it is not a constant:

define in_file HiveUDF('in_file', '(null, "names.txt")');

The patch contain the following changes:
1. Allow UDF produce a last record in close. This is used in HiveUDTF to process all the records as input, and produce the output in close().
2. Add input schema to Initial, Intermed, Final to Algebraic. The input schema is the original input schema of the UDF. The actual input schema is the internal knowledge of the Algebraic and Pig does not know.
3. Several minor fix in combiner 
 * tez combiner conf does not have UDFContext
 * does not set parentPlan for combiner plan operators
 * resultType of FINAL is not set properly

4. Refactory OrcUtils -> HiveUtils (also include patch before refactory to ease review)

> Allow Pig use Hive UDFs
> -----------------------
>
>                 Key: PIG-3294
>                 URL: https://issues.apache.org/jira/browse/PIG-3294
>             Project: Pig
>          Issue Type: New Feature
>            Reporter: Daniel Dai
>              Labels: gsoc2013, java
>         Attachments: PIG-3294-1.patch, PIG-3294-before-refactory.patch
>
>
> It would be nice if Pig provide some interoperability with Hive. We can wrap Hive UDF in Pig so we can use Hive UDF in Pig.
> This is a candidate project for Google summer of code 2013. More information about the program can be found at https://cwiki.apache.org/confluence/display/PIG/GSoc2013



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)