You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "JunHo Cho (JIRA)" <ji...@apache.org> on 2011/08/09 03:26:29 UTC

[jira] [Updated] (HIVE-2361) Add some UDFs which help to migrate Oracle to Hive

     [ https://issues.apache.org/jira/browse/HIVE-2361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

JunHo Cho updated HIVE-2361:
----------------------------

    Attachment: nexr-udf.tar

>  Add some UDFs which help to migrate Oracle to Hive
> ---------------------------------------------------
>
>                 Key: HIVE-2361
>                 URL: https://issues.apache.org/jira/browse/HIVE-2361
>             Project: Hive
>          Issue Type: New Feature
>          Components: UDF
>    Affects Versions: 0.8.0
>            Reporter: JunHo Cho
>            Priority: Minor
>              Labels: features
>         Attachments: nexr-udf.tar
>
>
> Here some UDFs which can be matched to oracle functions:
> There are two kinds of oracle functions. one is scalar function and another is analytic function.
> Most scalar functions in Oracle can be converted to hive's udf directly.  
> Oracle Scalar Function
> GenericUDFDecode : Compares first argument to each other value one by one. e.g., DECODE(x,0,'zero',1,'one') will return 'zero' if x is 0
> GenericUDFGreatest : Return the greatest of the list of one or more expressions. e.g., GREATEST(2,5,12,3) will return 12
> GenericUDFInstr : Return the location of a substring in a string. e.g., INSTR('next', 'e') will return 2
> GenericUDFLnnvl : Evaluate a condition when one or both operands of the condition may be null. e.g., LNNVL(2 > 4) will return true
> GenericUDFNVL : Replace null with a string in the results of a query. e.g., NVL(null,'hive') will return hive
> GenericUDFNVL2 : Determine the value returned by a query based on whether a specified expression is null or not null. e.g., NVL2(null,'not null','null value') will return 'null value'
> GenericUDFToNumber : Convert a string to a number. e.g., TO_NUMBER('112','999') will return 112
> GenericUDFTrunc : Returns a date truncated to a specific unit of measure. e.g., TRUNC('2002-11-02 01:01:01','YYYY') will return '2002-01-01 00:00:00'
> Oracle Analytic Function
> Most analytic functions in Oracle can't be converted to hive's query and udf directly.
> Following udfs should be used with DISTRIBUTED, SORT BY and HASH of hive to support analytic functions 
> e.q., SELECT _FUNC_(hash(col1), col2, ...) FROM SELECT ~ FROM table DISTRIBUTED BY hash(col1) SORT BY col1, col2 ...
> GenericUDFSum : Calculate a cumulative sum.
> GenericUDFRank : Assign a sequential order, or rank within some group based on key.
> GenericUDFDenseRank : Act like RANK function except that it assigns consecutive ranks.
> GenericUDFRowNumber : Return sequence integer value within some group based on key.
> GenericUDFMax : Determine the highest value within some group based on key.
> GenericUDFMin : Determine the lowest value within some group based on key.
> GenericUDFLag : Access data from a previous row.
> This udfs was developed with hive-pdk

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira