You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hive.apache.org by Guillaume Polaert <gp...@cyres.fr> on 2012/06/11 11:49:40 UTC

Trouble with sum function

Hi,

We're expecting some issue with the sum function in Hive 0.7.1.
The precision of float number isn't correct (0.320484484676 instead of 0.32)
We aren't expecting this error with double format.

For instance, "select id, sum(col1), sum(col2) from test_table group by id" returns incorrect values.

------
CREATE TABLE test_table  ( id int,  col1 double,  col2 float )
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\073'
LINES TERMINATED BY '\n'
STORED AS TEXTFILE;

LOAD DATA INPATH '/user/hive/xxxxx/import.csv' INTO TABLE `test _table`

Import.csv
1;1.012;1.012
2;2.1;2.1
3;3.2;3.2
4;4.323;4.323
5;5;5
----

Is it a bug  ? Or maybe did we make a error ?


Guillaume Polaert | Cyrès Conseil

RE: Trouble with sum function

Posted by Guillaume Polaert <gp...@cyres.fr>.

Thanks you, I didn't know about it.

Guillaume Polaert | Cyrès Conseil

De : Gabi D [mailto:gabid33@gmail.com]
Envoyé : lundi 11 juin 2012 12:14
À : user@hive.apache.org
Cc : Matouk Iftissen
Objet : Re: Trouble with sum function

float is known to have precision issues, because of the way it is implemented. If you are working with money data you should definitely move to double.
google 'float precision' and you'll find a bunch of explanations.
On Mon, Jun 11, 2012 at 12:49 PM, Guillaume Polaert <gp...@cyres.fr>> wrote:
Hi,

We're expecting some issue with the sum function in Hive 0.7.1.
The precision of float number isn't correct (0.320484484676 instead of 0.32)
We aren't expecting this error with double format.

For instance, "select id, sum(col1), sum(col2) from test_table group by id" returns incorrect values.

------
CREATE TABLE test_table  ( id int,  col1 double,  col2 float )
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\073'
LINES TERMINATED BY '\n'
STORED AS TEXTFILE;

LOAD DATA INPATH '/user/hive/xxxxx/import.csv' INTO TABLE `test _table`

Import.csv
1;1.012;1.012
2;2.1;2.1
3;3.2;3.2
4;4.323;4.323
5;5;5
----

Is it a bug  ? Or maybe did we make a error ?


Guillaume Polaert | Cyrès Conseil

Re: Trouble with sum function

Posted by Jagat Singh <ja...@gmail.com>.

>From the code here

http://svn.apache.org/viewvc/hive/branches/branch-0.7/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFSum.java?view=markup

For float , doble and string the implementation points to common function
GenericUDAFSumDouble()

if (parameters[0].getCategory() != ObjectInspector.Category.PRIMITIVE) {
54     throw new UDFArgumentTypeException(0,
55     "Only primitive type arguments are accepted but "
56     + parameters[0].getTypeName() + " is passed.");
57     }
58     switch (((PrimitiveTypeInfo) parameters[0]).getPrimitiveCategory()) {
59     case BYTE:
60     case SHORT:
61     case INT:
62     case LONG:
63     return new GenericUDAFSumLong();
*64  *   *case FLOAT:
65     case DOUBLE:
66     case STRING:
67     return new GenericUDAFSumDouble();*



On Mon, Jun 11, 2012 at 3:43 PM, Gabi D <ga...@gmail.com> wrote:

> float is known to have precision issues, because of the way it is
> implemented. If you are working with money data you should definitely move
> to double.
> google 'float precision' and you'll find a bunch of explanations.
>
>
> On Mon, Jun 11, 2012 at 12:49 PM, Guillaume Polaert <gp...@cyres.fr>wrote:
>
>> Hi,
>>
>> We're expecting some issue with the sum function in Hive 0.7.1.
>> The precision of float number isn't correct (0.320484484676 instead of
>> 0.32)
>> We aren't expecting this error with double format.
>>
>> For instance, "select id, sum(col1), sum(col2) from test_table group by
>> id" returns incorrect values.
>>
>> ------
>> CREATE TABLE test_table  ( id int,  col1 double,  col2 float )
>> ROW FORMAT DELIMITED FIELDS TERMINATED BY '\073'
>> LINES TERMINATED BY '\n'
>> STORED AS TEXTFILE;
>>
>> LOAD DATA INPATH '/user/hive/xxxxx/import.csv' INTO TABLE `test _table`
>>
>> Import.csv
>> 1;1.012;1.012
>> 2;2.1;2.1
>> 3;3.2;3.2
>> 4;4.323;4.323
>> 5;5;5
>> ----
>>
>> Is it a bug  ? Or maybe did we make a error ?
>>
>>
>> Guillaume Polaert | Cyrès Conseil
>>
>>
>

Re: Trouble with sum function

Posted by Gabi D <ga...@gmail.com>.

float is known to have precision issues, because of the way it is
implemented. If you are working with money data you should definitely move
to double.
google 'float precision' and you'll find a bunch of explanations.

On Mon, Jun 11, 2012 at 12:49 PM, Guillaume Polaert <gp...@cyres.fr>wrote:

> Hi,
>
> We're expecting some issue with the sum function in Hive 0.7.1.
> The precision of float number isn't correct (0.320484484676 instead of
> 0.32)
> We aren't expecting this error with double format.
>
> For instance, "select id, sum(col1), sum(col2) from test_table group by
> id" returns incorrect values.
>
> ------
> CREATE TABLE test_table  ( id int,  col1 double,  col2 float )
> ROW FORMAT DELIMITED FIELDS TERMINATED BY '\073'
> LINES TERMINATED BY '\n'
> STORED AS TEXTFILE;
>
> LOAD DATA INPATH '/user/hive/xxxxx/import.csv' INTO TABLE `test _table`
>
> Import.csv
> 1;1.012;1.012
> 2;2.1;2.1
> 3;3.2;3.2
> 4;4.323;4.323
> 5;5;5
> ----
>
> Is it a bug  ? Or maybe did we make a error ?
>
>
> Guillaume Polaert | Cyrès Conseil
>
>