You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "Andrew Sherman (JIRA)" <ji...@apache.org> on 2017/07/27 17:33:00 UTC

[jira] [Commented] (HIVE-17186) `double` type constant operation loses precision

    [ https://issues.apache.org/jira/browse/HIVE-17186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16103551#comment-16103551 ] 

Andrew Sherman commented on HIVE-17186:
---------------------------------------

This looks like an artifact of floating point arithmetic:

{noformat}
double d1 = 0.06D;
double d2 = 0.01D;
double d3 = d1 + d2;
double d4 = d1 - d2;
System.out.println("d3 = " + d3);
System.out.println("d4 = " + d4);
{noformat}
gives
{noformat}
d3 = 0.06999999999999999
d4 = 0.049999999999999996
{noformat}

> `double` type constant operation loses precision
> ------------------------------------------------
>
>                 Key: HIVE-17186
>                 URL: https://issues.apache.org/jira/browse/HIVE-17186
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Dongjoon Hyun
>
> This might be an issue where Hive loses a precision and generates a wrong result when handling *double* constant operations. This was reported in the following environment.
> *ENVIRONMENT*
> https://github.com/hortonworks/hive-testbench/blob/hive14/tpch-gen/ddl/orc.sql
> *SQL*
> {code}
> hive> explain select l_discount from lineitem where l_discount between 0.06 - 0.01 and 0.06 + 0.01 limit 10;
> OK
> Plan not optimized by CBO.
> Stage-0
>    Fetch Operator
>       limit:10
>       Stage-1
>          Map 1 vectorized
>          File Output Operator [FS_9]
>             compressed:false
>             Statistics:Num rows: 10 Data size: 80 Basic stats: COMPLETE Column stats: COMPLETE
>             table:{"input format:":"org.apache.hadoop.mapred.TextInputFormat","output format:":"org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat","serde:":"org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe"}
>             Limit [LIM_8]
>                Number of rows:10
>                Statistics:Num rows: 10 Data size: 80 Basic stats: COMPLETE Column stats: COMPLETE
>                Select Operator [OP_7]
>                   outputColumnNames:["_col0"]
>                   Statistics:Num rows: 2999994854 Data size: 23999958832 Basic stats: COMPLETE Column stats: COMPLETE
>                   Filter Operator [FIL_6]
>                      predicate:l_discount BETWEEN 0.049999999999999996 AND 0.06999999999999999 (type: boolean)
>                      Statistics:Num rows: 2999994854 Data size: 23999958832 Basic stats: COMPLETE Column stats: COMPLETE
>                      TableScan [TS_0]
>                         alias:lineitem
>                         Statistics:Num rows: 5999989709 Data size: 4832986297043 Basic stats: COMPLETE Column stats: COMPLETE
> hive> select max(l_discount) from lineitem where l_discount between 0.06 - 0.01 and 0.06 + 0.01 limit 10;
> OK
> 0.06
> Time taken: 314.923 seconds, Fetched: 1 row(s)
> {code}
> Hive excludes 0.07 differently from the users' intuitiion. Also, this difference makes some users confused because they believe that Hive's result is the correct one. Is there any way for Hive to fix this?



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)