You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "JunHo Cho (JIRA)" <ji...@apache.org> on 2011/08/09 03:26:29 UTC
[jira] [Created] (HIVE-2361) Add some UDFs which help to migrate
Oracle to Hive
Add some UDFs which help to migrate Oracle to Hive
---------------------------------------------------
Key: HIVE-2361
URL: https://issues.apache.org/jira/browse/HIVE-2361
Project: Hive
Issue Type: New Feature
Components: UDF
Affects Versions: 0.8.0
Reporter: JunHo Cho
Priority: Minor
Attachments: nexr-udf.tar
Here some UDFs which can be matched to oracle functions:
There are two kinds of oracle functions. one is scalar function and another is analytic function.
Most scalar functions in Oracle can be converted to hive's udf directly.
Oracle Scalar Function
GenericUDFDecode : Compares first argument to each other value one by one. e.g., DECODE(x,0,'zero',1,'one') will return 'zero' if x is 0
GenericUDFGreatest : Return the greatest of the list of one or more expressions. e.g., GREATEST(2,5,12,3) will return 12
GenericUDFInstr : Return the location of a substring in a string. e.g., INSTR('next', 'e') will return 2
GenericUDFLnnvl : Evaluate a condition when one or both operands of the condition may be null. e.g., LNNVL(2 > 4) will return true
GenericUDFNVL : Replace null with a string in the results of a query. e.g., NVL(null,'hive') will return hive
GenericUDFNVL2 : Determine the value returned by a query based on whether a specified expression is null or not null. e.g., NVL2(null,'not null','null value') will return 'null value'
GenericUDFToNumber : Convert a string to a number. e.g., TO_NUMBER('112','999') will return 112
GenericUDFTrunc : Returns a date truncated to a specific unit of measure. e.g., TRUNC('2002-11-02 01:01:01','YYYY') will return '2002-01-01 00:00:00'
Oracle Analytic Function
Most analytic functions in Oracle can't be converted to hive's query and udf directly.
Following udfs should be used with DISTRIBUTED, SORT BY and HASH of hive to support analytic functions
e.q., SELECT _FUNC_(hash(col1), col2, ...) FROM SELECT ~ FROM table DISTRIBUTED BY hash(col1) SORT BY col1, col2 ...
GenericUDFSum : Calculate a cumulative sum.
GenericUDFRank : Assign a sequential order, or rank within some group based on key.
GenericUDFDenseRank : Act like RANK function except that it assigns consecutive ranks.
GenericUDFRowNumber : Return sequence integer value within some group based on key.
GenericUDFMax : Determine the highest value within some group based on key.
GenericUDFMin : Determine the lowest value within some group based on key.
GenericUDFLag : Access data from a previous row.
This udfs was developed with hive-pdk
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2361) Add some UDFs which help to migrate
Oracle to Hive
Posted by "Carl Steinbach (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-2361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13458257#comment-13458257 ]
Carl Steinbach commented on HIVE-2361:
--------------------------------------
@JunHo: It would be great to get these committed. Do you have time to modify the patch so that they are added as standard UDFs? Thanks.
> Add some UDFs which help to migrate Oracle to Hive
> ---------------------------------------------------
>
> Key: HIVE-2361
> URL: https://issues.apache.org/jira/browse/HIVE-2361
> Project: Hive
> Issue Type: New Feature
> Components: UDF
> Affects Versions: 0.8.0
> Reporter: JunHo Cho
> Assignee: JunHo Cho
> Priority: Minor
> Labels: features
> Attachments: nexr-udf.tar
>
>
> Here some UDFs which can be matched to oracle functions:
> There are two kinds of oracle functions. one is scalar function and another is analytic function.
> Most scalar functions in Oracle can be converted to hive's udf directly.
> Oracle Scalar Function
> GenericUDFDecode : Compares first argument to each other value one by one. e.g., DECODE(x,0,'zero',1,'one') will return 'zero' if x is 0
> GenericUDFGreatest : Return the greatest of the list of one or more expressions. e.g., GREATEST(2,5,12,3) will return 12
> GenericUDFInstr : Return the location of a substring in a string. e.g., INSTR('next', 'e') will return 2
> GenericUDFLnnvl : Evaluate a condition when one or both operands of the condition may be null. e.g., LNNVL(2 > 4) will return true
> GenericUDFNVL : Replace null with a string in the results of a query. e.g., NVL(null,'hive') will return hive
> GenericUDFNVL2 : Determine the value returned by a query based on whether a specified expression is null or not null. e.g., NVL2(null,'not null','null value') will return 'null value'
> GenericUDFToNumber : Convert a string to a number. e.g., TO_NUMBER('112','999') will return 112
> GenericUDFTrunc : Returns a date truncated to a specific unit of measure. e.g., TRUNC('2002-11-02 01:01:01','YYYY') will return '2002-01-01 00:00:00'
> Oracle Analytic Function
> Most analytic functions in Oracle can't be converted to hive's query and udf directly.
> Following udfs should be used with DISTRIBUTED, SORT BY and HASH of hive to support analytic functions
> e.q., SELECT _FUNC_(hash(col1), col2, ...) FROM SELECT ~ FROM table DISTRIBUTED BY hash(col1) SORT BY col1, col2 ...
> GenericUDFSum : Calculate a cumulative sum.
> GenericUDFRank : Assign a sequential order, or rank within some group based on key.
> GenericUDFDenseRank : Act like RANK function except that it assigns consecutive ranks.
> GenericUDFRowNumber : Return sequence integer value within some group based on key.
> GenericUDFMax : Determine the highest value within some group based on key.
> GenericUDFMin : Determine the lowest value within some group based on key.
> GenericUDFLag : Access data from a previous row.
> This udfs was developed with hive-pdk
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2361) Add some UDFs which help to migrate
Oracle to Hive
Posted by "JunHo Cho (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-2361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
JunHo Cho updated HIVE-2361:
----------------------------
Attachment: nexr-udf.tar
> Add some UDFs which help to migrate Oracle to Hive
> ---------------------------------------------------
>
> Key: HIVE-2361
> URL: https://issues.apache.org/jira/browse/HIVE-2361
> Project: Hive
> Issue Type: New Feature
> Components: UDF
> Affects Versions: 0.8.0
> Reporter: JunHo Cho
> Priority: Minor
> Labels: features
> Attachments: nexr-udf.tar
>
>
> Here some UDFs which can be matched to oracle functions:
> There are two kinds of oracle functions. one is scalar function and another is analytic function.
> Most scalar functions in Oracle can be converted to hive's udf directly.
> Oracle Scalar Function
> GenericUDFDecode : Compares first argument to each other value one by one. e.g., DECODE(x,0,'zero',1,'one') will return 'zero' if x is 0
> GenericUDFGreatest : Return the greatest of the list of one or more expressions. e.g., GREATEST(2,5,12,3) will return 12
> GenericUDFInstr : Return the location of a substring in a string. e.g., INSTR('next', 'e') will return 2
> GenericUDFLnnvl : Evaluate a condition when one or both operands of the condition may be null. e.g., LNNVL(2 > 4) will return true
> GenericUDFNVL : Replace null with a string in the results of a query. e.g., NVL(null,'hive') will return hive
> GenericUDFNVL2 : Determine the value returned by a query based on whether a specified expression is null or not null. e.g., NVL2(null,'not null','null value') will return 'null value'
> GenericUDFToNumber : Convert a string to a number. e.g., TO_NUMBER('112','999') will return 112
> GenericUDFTrunc : Returns a date truncated to a specific unit of measure. e.g., TRUNC('2002-11-02 01:01:01','YYYY') will return '2002-01-01 00:00:00'
> Oracle Analytic Function
> Most analytic functions in Oracle can't be converted to hive's query and udf directly.
> Following udfs should be used with DISTRIBUTED, SORT BY and HASH of hive to support analytic functions
> e.q., SELECT _FUNC_(hash(col1), col2, ...) FROM SELECT ~ FROM table DISTRIBUTED BY hash(col1) SORT BY col1, col2 ...
> GenericUDFSum : Calculate a cumulative sum.
> GenericUDFRank : Assign a sequential order, or rank within some group based on key.
> GenericUDFDenseRank : Act like RANK function except that it assigns consecutive ranks.
> GenericUDFRowNumber : Return sequence integer value within some group based on key.
> GenericUDFMax : Determine the highest value within some group based on key.
> GenericUDFMin : Determine the lowest value within some group based on key.
> GenericUDFLag : Access data from a previous row.
> This udfs was developed with hive-pdk
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2361) Add some UDFs which help to migrate
Oracle to Hive
Posted by "Aniket Mokashi (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-2361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13142542#comment-13142542 ]
Aniket Mokashi commented on HIVE-2361:
--------------------------------------
This issue depends on Hive-2524 for better utility functions. I suggest to commit these UDFs on 0.8. I will create a patch for the same.
> Add some UDFs which help to migrate Oracle to Hive
> ---------------------------------------------------
>
> Key: HIVE-2361
> URL: https://issues.apache.org/jira/browse/HIVE-2361
> Project: Hive
> Issue Type: New Feature
> Components: UDF
> Affects Versions: 0.8.0
> Reporter: JunHo Cho
> Priority: Minor
> Labels: features
> Attachments: nexr-udf.tar
>
>
> Here some UDFs which can be matched to oracle functions:
> There are two kinds of oracle functions. one is scalar function and another is analytic function.
> Most scalar functions in Oracle can be converted to hive's udf directly.
> Oracle Scalar Function
> GenericUDFDecode : Compares first argument to each other value one by one. e.g., DECODE(x,0,'zero',1,'one') will return 'zero' if x is 0
> GenericUDFGreatest : Return the greatest of the list of one or more expressions. e.g., GREATEST(2,5,12,3) will return 12
> GenericUDFInstr : Return the location of a substring in a string. e.g., INSTR('next', 'e') will return 2
> GenericUDFLnnvl : Evaluate a condition when one or both operands of the condition may be null. e.g., LNNVL(2 > 4) will return true
> GenericUDFNVL : Replace null with a string in the results of a query. e.g., NVL(null,'hive') will return hive
> GenericUDFNVL2 : Determine the value returned by a query based on whether a specified expression is null or not null. e.g., NVL2(null,'not null','null value') will return 'null value'
> GenericUDFToNumber : Convert a string to a number. e.g., TO_NUMBER('112','999') will return 112
> GenericUDFTrunc : Returns a date truncated to a specific unit of measure. e.g., TRUNC('2002-11-02 01:01:01','YYYY') will return '2002-01-01 00:00:00'
> Oracle Analytic Function
> Most analytic functions in Oracle can't be converted to hive's query and udf directly.
> Following udfs should be used with DISTRIBUTED, SORT BY and HASH of hive to support analytic functions
> e.q., SELECT _FUNC_(hash(col1), col2, ...) FROM SELECT ~ FROM table DISTRIBUTED BY hash(col1) SORT BY col1, col2 ...
> GenericUDFSum : Calculate a cumulative sum.
> GenericUDFRank : Assign a sequential order, or rank within some group based on key.
> GenericUDFDenseRank : Act like RANK function except that it assigns consecutive ranks.
> GenericUDFRowNumber : Return sequence integer value within some group based on key.
> GenericUDFMax : Determine the highest value within some group based on key.
> GenericUDFMin : Determine the lowest value within some group based on key.
> GenericUDFLag : Access data from a previous row.
> This udfs was developed with hive-pdk
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (HIVE-2361) Add some UDFs which help to migrate
Oracle to Hive
Posted by "John Sichi (Assigned) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-2361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
John Sichi reassigned HIVE-2361:
--------------------------------
Assignee: JunHo Cho
> Add some UDFs which help to migrate Oracle to Hive
> ---------------------------------------------------
>
> Key: HIVE-2361
> URL: https://issues.apache.org/jira/browse/HIVE-2361
> Project: Hive
> Issue Type: New Feature
> Components: UDF
> Affects Versions: 0.8.0
> Reporter: JunHo Cho
> Assignee: JunHo Cho
> Priority: Minor
> Labels: features
> Attachments: nexr-udf.tar
>
>
> Here some UDFs which can be matched to oracle functions:
> There are two kinds of oracle functions. one is scalar function and another is analytic function.
> Most scalar functions in Oracle can be converted to hive's udf directly.
> Oracle Scalar Function
> GenericUDFDecode : Compares first argument to each other value one by one. e.g., DECODE(x,0,'zero',1,'one') will return 'zero' if x is 0
> GenericUDFGreatest : Return the greatest of the list of one or more expressions. e.g., GREATEST(2,5,12,3) will return 12
> GenericUDFInstr : Return the location of a substring in a string. e.g., INSTR('next', 'e') will return 2
> GenericUDFLnnvl : Evaluate a condition when one or both operands of the condition may be null. e.g., LNNVL(2 > 4) will return true
> GenericUDFNVL : Replace null with a string in the results of a query. e.g., NVL(null,'hive') will return hive
> GenericUDFNVL2 : Determine the value returned by a query based on whether a specified expression is null or not null. e.g., NVL2(null,'not null','null value') will return 'null value'
> GenericUDFToNumber : Convert a string to a number. e.g., TO_NUMBER('112','999') will return 112
> GenericUDFTrunc : Returns a date truncated to a specific unit of measure. e.g., TRUNC('2002-11-02 01:01:01','YYYY') will return '2002-01-01 00:00:00'
> Oracle Analytic Function
> Most analytic functions in Oracle can't be converted to hive's query and udf directly.
> Following udfs should be used with DISTRIBUTED, SORT BY and HASH of hive to support analytic functions
> e.q., SELECT _FUNC_(hash(col1), col2, ...) FROM SELECT ~ FROM table DISTRIBUTED BY hash(col1) SORT BY col1, col2 ...
> GenericUDFSum : Calculate a cumulative sum.
> GenericUDFRank : Assign a sequential order, or rank within some group based on key.
> GenericUDFDenseRank : Act like RANK function except that it assigns consecutive ranks.
> GenericUDFRowNumber : Return sequence integer value within some group based on key.
> GenericUDFMax : Determine the highest value within some group based on key.
> GenericUDFMin : Determine the lowest value within some group based on key.
> GenericUDFLag : Access data from a previous row.
> This udfs was developed with hive-pdk
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira