You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Xuefu Zhang (JIRA)" <ji...@apache.org> on 2013/12/05 18:48:37 UTC

[jira] [Updated] (HIVE-5878) Hive standard avg UDAF returns double as the return type for some exact input types

     [ https://issues.apache.org/jira/browse/HIVE-5878?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Xuefu Zhang updated HIVE-5878:
------------------------------

    Description: 
For standard, no-partial avg result, hive currently returns double as the result type.
{code}
hive> desc test;
OK
d                   	int                 	None                
Time taken: 0.051 seconds, Fetched: 1 row(s)
hive> explain select avg(`d`) from test;  
...
      Reduce Operator Tree:
        Group By Operator
          aggregations:
                expr: avg(VALUE._col0)
          bucketGroup: false
          mode: mergepartial
          outputColumnNames: _col0
          Select Operator
            expressions:
                  expr: _col0
                  type: double
{code}
However, exact types including integers and decimal should yield exact type. Here is what MySQL does:
{code}
mysql> desc test;
+-------+--------------+------+-----+---------+-------+
| Field | Type         | Null | Key | Default | Extra |
+-------+--------------+------+-----+---------+-------+
| i     | int(11)      | YES  |     | NULL    |       |
| b     | tinyint(1)   | YES  |     | NULL    |       |
| d     | double       | YES  |     | NULL    |       |
| s     | varchar(5)   | YES  |     | NULL    |       |
| dd    | decimal(5,2) | YES  |     | NULL    |       |
+-------+--------------+------+-----+---------+-------+
mysql> create table test62 as select avg(i) from test;
mysql> desc test62;
+-------+---------------+------+-----+---------+-------+
| Field | Type          | Null | Key | Default | Extra |
+-------+---------------+------+-----+---------+-------+
| avg(i) | decimal(14,4) | YES  |     | NULL    |       |
+-------+---------------+------+-----+---------+-------+
1 row in set (0.00 sec)
{code}

  was:
For standard, no-partial avg result, hive currently returns double as the result type.
{code}
hive> desc test;
OK
d                   	int                 	None                
Time taken: 0.051 seconds, Fetched: 1 row(s)
hive> explain select avg(`d`) from test;  
...
      Reduce Operator Tree:
        Group By Operator
          aggregations:
                expr: avg(VALUE._col0)
          bucketGroup: false
          mode: mergepartial
          outputColumnNames: _col0
          Select Operator
            expressions:
                  expr: _col0
                  type: double
{code}
However, exact types including integers and decimal should yield exact type. Here is what MySQL does:
{code}
mysql> desc test;
+-------+--------------+------+-----+---------+-------+
| Field | Type         | Null | Key | Default | Extra |
+-------+--------------+------+-----+---------+-------+
| i     | int(11)      | YES  |     | NULL    |       |
| b     | tinyint(1)   | YES  |     | NULL    |       |
| d     | double       | YES  |     | NULL    |       |
| s     | varchar(5)   | YES  |     | NULL    |       |
| dd    | decimal(5,2) | YES  |     | NULL    |       |
+-------+--------------+------+-----+---------+-------+
mysql> desc test62;
+-------+---------------+------+-----+---------+-------+
| Field | Type          | Null | Key | Default | Extra |
+-------+---------------+------+-----+---------+-------+
| sum_t | decimal(14,4) | YES  |     | NULL    |       |
+-------+---------------+------+-----+---------+-------+
1 row in set (0.00 sec)
{code}


> Hive standard avg UDAF returns double as the return type for some exact input types
> -----------------------------------------------------------------------------------
>
>                 Key: HIVE-5878
>                 URL: https://issues.apache.org/jira/browse/HIVE-5878
>             Project: Hive
>          Issue Type: Bug
>          Components: Types, UDF
>    Affects Versions: 0.12.0
>            Reporter: Xuefu Zhang
>            Assignee: Xuefu Zhang
>         Attachments: HIVE-5878.patch
>
>
> For standard, no-partial avg result, hive currently returns double as the result type.
> {code}
> hive> desc test;
> OK
> d                   	int                 	None                
> Time taken: 0.051 seconds, Fetched: 1 row(s)
> hive> explain select avg(`d`) from test;  
> ...
>       Reduce Operator Tree:
>         Group By Operator
>           aggregations:
>                 expr: avg(VALUE._col0)
>           bucketGroup: false
>           mode: mergepartial
>           outputColumnNames: _col0
>           Select Operator
>             expressions:
>                   expr: _col0
>                   type: double
> {code}
> However, exact types including integers and decimal should yield exact type. Here is what MySQL does:
> {code}
> mysql> desc test;
> +-------+--------------+------+-----+---------+-------+
> | Field | Type         | Null | Key | Default | Extra |
> +-------+--------------+------+-----+---------+-------+
> | i     | int(11)      | YES  |     | NULL    |       |
> | b     | tinyint(1)   | YES  |     | NULL    |       |
> | d     | double       | YES  |     | NULL    |       |
> | s     | varchar(5)   | YES  |     | NULL    |       |
> | dd    | decimal(5,2) | YES  |     | NULL    |       |
> +-------+--------------+------+-----+---------+-------+
> mysql> create table test62 as select avg(i) from test;
> mysql> desc test62;
> +-------+---------------+------+-----+---------+-------+
> | Field | Type          | Null | Key | Default | Extra |
> +-------+---------------+------+-----+---------+-------+
> | avg(i) | decimal(14,4) | YES  |     | NULL    |       |
> +-------+---------------+------+-----+---------+-------+
> 1 row in set (0.00 sec)
> {code}



--
This message was sent by Atlassian JIRA
(v6.1#6144)