You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@drill.apache.org by "Kristine Hahn (JIRA)" <ji...@apache.org> on 2015/03/13 02:19:38 UTC

[jira] [Resolved] (DRILL-2431) Document behavior of floating point types

     [ https://issues.apache.org/jira/browse/DRILL-2431?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Kristine Hahn resolved DRILL-2431.
----------------------------------
    Resolution: Fixed

Done, added the following section to the docs:
{quote}
**Guidelines for Using Float and Double**
The data types float and double yield approximate results. These are variable-precision numeric types. Drill does not cast/convert all values precisely to the internal format, but instead stores approximations. Slight differences can occur in the value stored and retrieved. The following guidelines are recommended:

* For conversions involving monetary calculations, for example, that require precise results use the decimal type instead of float or double.
* For complex calculations or mission-critical applications, especially those involving infinity and underflow situations, carefully consider the limitations of type casting that involves float or double.
* Equality comparisons between floating-point values can produce unexpected results.

Values of float and double that exceed or fall short of the specified range cause an error. Rounding can occur if the precision of an input number is too high. Numbers approaching zero that Drill cannot distinguish from zero cause an underflow error.
{quote}

> Document behavior of floating point types
> -----------------------------------------
>
>                 Key: DRILL-2431
>                 URL: https://issues.apache.org/jira/browse/DRILL-2431
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Documentation
>    Affects Versions: 0.8.0
>            Reporter: Victoria Markman
>            Assignee: Kristine Hahn
>
>    Joining on columns of float and double data type produces confusing result. Drill returns the same result as postgres. Part of me feels that we should not follow postgres blindly in this case, removing implicit cast between float and double would be better choice.
>    At a minimum we should have a section in our documentation that discusses floating point types and this is a good example on how things can go wrong if user does not understand the behavior.
> Example of such a discussion in Postgres docs: http://www.postgresql.org/docs/9.1/static/datatype-numeric.html
> t1.csv
> {code}
> 997322.0399,997322.0399
> 982209.1438,982209.1438
> 997322,997322
> 982209,982209
> 963548,963548
> 959310,959310
> {code}
> t2.csv
> {code}
> 997322.0399,997322.0399
> 982209.1438,982209.1438
> 997322,997322
> 982209,982209
> 963548,963548
> 959310,959310
> {code}
> {code}
> create table t1(c_float, c_double) as
> select
>         case when columns[0] = '' then cast(null as float) else cast(columns[0] as float) end,
>         case when columns[1] = '' then cast(null as double) else cast(columns[1] as double) end
> from `t1.csv`;
> create table t2(c_float, c_double) as
> select
>         case when columns[0] = '' then cast(null as float) else cast(columns[0] as float) end,
>         case when columns[1] = '' then cast(null as double) else cast(columns[1] as double) end
> from `t2.csv`;
> 0: jdbc:drill:schema=dfs> select * from t1;
> +------------+------------+
> |  c_float   |  c_double  |
> +------------+------------+
> | 997322.06  | 997322.0399 |
> | 982209.1   | 982209.1438 |
> | 997322.0   | 997322.0   |
> | 982209.0   | 982209.0   |
> | 963548.0   | 963548.0   |
> | 959310.0   | 959310.0   |
> +------------+------------+
> 6 rows selected (0.05 seconds)
> 0: jdbc:drill:schema=dfs> select * from t2;
> +------------+------------+
> |  c_float   |  c_double  |
> +------------+------------+
> | 997322.06  | 997322.0399 |
> | 982209.1   | 982209.1438 |
> | 997322.0   | 997322.0   |
> | 982209.0   | 982209.0   |
> | 963548.0   | 963548.0   |
> | 959310.0   | 959310.0   |
> +------------+------------+
> 6 rows selected (0.044 seconds)
> {code}
> Implicit cast: looks incorrect, but in fact we can't expect this to work.
> {code}
> 0: jdbc:drill:schema=dfs> select * from t1, t2 where t1.c_float = t2.c_double;
> +------------+------------+------------+------------+
> |  c_float   |  c_double  |  c_float0  | c_double0  |
> +------------+------------+------------+------------+
> | 959310.0   | 959310.0   | 959310.0   | 959310.0   |
> | 963548.0   | 963548.0   | 963548.0   | 963548.0   |
> | 982209.0   | 982209.0   | 982209.0   | 982209.0   |
> | 997322.0   | 997322.0   | 997322.0   | 997322.0   |
> +------------+------------+------------+------------+
> 4 rows selected (0.127 seconds)
> {code}
> Explicit cast: same
> {code}
> 0: jdbc:drill:schema=dfs> select * from t1, t2 where cast(t1.c_float as double) = t2.c_double;
> +------------+------------+------------+------------+
> |  c_float   |  c_double  |  c_float0  | c_double0  |
> +------------+------------+------------+------------+
> | 959310.0   | 959310.0   | 959310.0   | 959310.0   |
> | 963548.0   | 963548.0   | 963548.0   | 963548.0   |
> | 982209.0   | 982209.0   | 982209.0   | 982209.0   |
> | 997322.0   | 997322.0   | 997322.0   | 997322.0   |
> +------------+------------+------------+------------+
> 4 rows selected (0.136 seconds)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)