You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Thejas M Nair (JIRA)" <ji...@apache.org> on 2013/12/11 02:46:09 UTC

[jira] [Comment Edited] (HIVE-5356) Move arithmatic UDFs to generic UDF implementations

    [ https://issues.apache.org/jira/browse/HIVE-5356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13844942#comment-13844942 ] 

Thejas M Nair edited comment on HIVE-5356 at 12/11/13 1:45 AM:
---------------------------------------------------------------

Here are my concerns with this change (Thanks to Jason for highlighting the differences in behavior)-

#  The changes to floating point arithmetic are not backward compatible, and there is no SQL compliance benefit for that.
# Regarding integer division returning decimal .
## It will not be backward compatible with some udf implementations ( I believe this is same with change in floating point return type).
## Integer arithmetic becoming NULL in some cases
## more than 50x performance degradation for the arithmetic operation

Regarding drive for making hive more SQL standard compliant, I believe motivation behind it is to make it easier to integrate with external
tools and make it easier for people who are familiar with SQL to use hive. I am not sure if change helps with either of those two motivations. Most
of the commercial databases return int result for integer division, and not decimal (Oracle, SQL Server, DB2, postgres).



was (Author: thejas):
Here are my concerns with this change (Thanks to Jason for highlighting the differences in behavior)-

#  The changes to floating point arithmetic are not backward compatible, and there is no SQL compliance benefit for that.
# Regarding integer division returning decimal .
## It will not be backward compatible with some udf implementations ( I believe this is same with change in floating point return type).
## Integer arithmetic becoming NULL in some cases
## more than 50x performance degradation

Regarding drive for making hive more SQL standard compliant, I believe motivation behind it is to make it easier to integrate with external
tools and make it easier for people who are familiar with SQL to use hive. I am not sure if change helps with either of those two motivations. Most
of the commercial databases return int result for integer division, and not decimal (Oracle, SQL Server, DB2, postgres).


> Move arithmatic UDFs to generic UDF implementations
> ---------------------------------------------------
>
>                 Key: HIVE-5356
>                 URL: https://issues.apache.org/jira/browse/HIVE-5356
>             Project: Hive
>          Issue Type: Task
>          Components: UDF
>    Affects Versions: 0.11.0
>            Reporter: Xuefu Zhang
>            Assignee: Xuefu Zhang
>             Fix For: 0.13.0
>
>         Attachments: HIVE-5356.1.patch, HIVE-5356.10.patch, HIVE-5356.11.patch, HIVE-5356.12.patch, HIVE-5356.2.patch, HIVE-5356.3.patch, HIVE-5356.4.patch, HIVE-5356.5.patch, HIVE-5356.6.patch, HIVE-5356.7.patch, HIVE-5356.8.patch, HIVE-5356.9.patch
>
>
> Currently, all of the arithmetic operators, such as add/sub/mult/div, are implemented as old-style UDFs and java reflection is used to determine the return type TypeInfos/ObjectInspectors, based on the return type of the evaluate() method chosen for the expression. This works fine for types that don't have type params.
> Hive decimal type participates in these operations just like int or double. Different from double or int, however, decimal has precision and scale, which cannot be determined by just looking at the return type (decimal) of the UDF evaluate() method, even though the operands have certain precision/scale. With the default of "decimal" without precision/scale, then (10, 0) will be the type params. This is certainly not desirable.
> To solve this problem, all of the arithmetic operators would need to be implemented as GenericUDFs, which allow returning ObjectInspector during the initialize() method. The object inspectors returned can carry type params, from which the "exact" return type can be determined.
> It's worth mentioning that, for user UDF implemented in non-generic way, if the return type of the chosen evaluate() method is decimal, the return type actually has (10,0) as precision/scale, which might not be desirable. This needs to be documented.
> This JIRA will cover minus, plus, divide, multiply, mod, and pmod, to limit the scope of review. The remaining ones will be covered under HIVE-5706.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)