You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Xuefu Zhang (JIRA)" <ji...@apache.org> on 2013/12/10 06:36:06 UTC

[jira] [Commented] (HIVE-5996) Query for sum of a long column of a table with only two rows produces wrong result

    [ https://issues.apache.org/jira/browse/HIVE-5996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13843972#comment-13843972 ] 

Xuefu Zhang commented on HIVE-5996:
-----------------------------------

For information, the follow is the text from SQL-92[1] standard w.r.t SUM function:
{quote}
            b) If SUM is specified and DT is exact numeric with scale
              S, then the data type of the result is exact numeric with
              implementation-defined precision and scale S.
{quote}
For DT as long, currently Hive returns long, which doesn't violate the standard. However, such implementation is problematic as demonstrated in this JIRA. Plus, for decimal, Hive sum function accommodates at least 10 billion rows of data. Letting sum(long) return long is not able to uphold that assurance. Thus, we need to change the return type to make the function more useful.

[1] http://www.contrib.andrew.cmu.edu/~shadow/sql/sql1992.txt


> Query for sum of a long column of a table with only two rows produces wrong result
> ----------------------------------------------------------------------------------
>
>                 Key: HIVE-5996
>                 URL: https://issues.apache.org/jira/browse/HIVE-5996
>             Project: Hive
>          Issue Type: Bug
>          Components: UDF
>    Affects Versions: 0.12.0
>            Reporter: Xuefu Zhang
>            Assignee: Xuefu Zhang
>
> {code}
> hive> desc test2;
> OK
> l                   	bigint              	None                
> hive> select * from test2;                                 
> OK
> 6666666666666666666
> 5555555555555555555
> hive> select sum(l) from test2;
> OK
> -6224521851487329395
> {code}
> It's believed that a wrap-around error occurred. It's surprising that it happens only with two rows. Same query in MySQL returns:
> {code}
> mysql> select sum(l) from test;
> +----------------------+
> | sum(l)               |
> +----------------------+
> | 12222222222222222221 |
> +----------------------+
> 1 row in set (0.00 sec)
> {code}
> Hive should accommodate large number of rows. Overflowing with only two rows is very unusable.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)