You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@drill.apache.org by "Victoria Markman (JIRA)" <ji...@apache.org> on 2015/03/11 18:05:39 UTC

[jira] [Created] (DRILL-2431) Document behavior of floating point types

Victoria Markman created DRILL-2431:
---------------------------------------

             Summary: Document behavior of floating point types
                 Key: DRILL-2431
                 URL: https://issues.apache.org/jira/browse/DRILL-2431
             Project: Apache Drill
          Issue Type: Bug
          Components: Documentation
    Affects Versions: 0.8.0
            Reporter: Victoria Markman
            Assignee: Bridget Bevens


   Joining on columns of float and double data type produces confusing result. Drill returns the same result as postgres. Part of me feels that we should not follow postgres blindly in this case, removing implicit cast between float and double would be better choice.
   At a minimum we should have a section in our documentation that discusses floating point types and this is a good example on how things can go wrong if user does not understand the behavior.

Example of such a discussion in Postgres docs: http://www.postgresql.org/docs/9.1/static/datatype-numeric.html

t1.csv
{code}
997322.0399,997322.0399
982209.1438,982209.1438
997322,997322
982209,982209
963548,963548
959310,959310
{code}

t2.csv
{code}
997322.0399,997322.0399
982209.1438,982209.1438
997322,997322
982209,982209
963548,963548
959310,959310
{code}

{code}
create table t1(c_float, c_double) as
select
        case when columns[0] = '' then cast(null as float) else cast(columns[0] as float) end,
        case when columns[1] = '' then cast(null as double) else cast(columns[1] as double) end
from `t1.csv`;

create table t2(c_float, c_double) as
select
        case when columns[0] = '' then cast(null as float) else cast(columns[0] as float) end,
        case when columns[1] = '' then cast(null as double) else cast(columns[1] as double) end
from `t2.csv`;

0: jdbc:drill:schema=dfs> select * from t1;
+------------+------------+
|  c_float   |  c_double  |
+------------+------------+
| 997322.06  | 997322.0399 |
| 982209.1   | 982209.1438 |
| 997322.0   | 997322.0   |
| 982209.0   | 982209.0   |
| 963548.0   | 963548.0   |
| 959310.0   | 959310.0   |
+------------+------------+
6 rows selected (0.05 seconds)

0: jdbc:drill:schema=dfs> select * from t2;
+------------+------------+
|  c_float   |  c_double  |
+------------+------------+
| 997322.06  | 997322.0399 |
| 982209.1   | 982209.1438 |
| 997322.0   | 997322.0   |
| 982209.0   | 982209.0   |
| 963548.0   | 963548.0   |
| 959310.0   | 959310.0   |
+------------+------------+
6 rows selected (0.044 seconds)
{code}

Implicit cast: looks incorrect, but in fact we can't expect this to work.
{code}
0: jdbc:drill:schema=dfs> select * from t1, t2 where t1.c_float = t2.c_double;
+------------+------------+------------+------------+
|  c_float   |  c_double  |  c_float0  | c_double0  |
+------------+------------+------------+------------+
| 959310.0   | 959310.0   | 959310.0   | 959310.0   |
| 963548.0   | 963548.0   | 963548.0   | 963548.0   |
| 982209.0   | 982209.0   | 982209.0   | 982209.0   |
| 997322.0   | 997322.0   | 997322.0   | 997322.0   |
+------------+------------+------------+------------+
4 rows selected (0.127 seconds)
{code}

Explicit cast: same
{code}
0: jdbc:drill:schema=dfs> select * from t1, t2 where cast(t1.c_float as double) = t2.c_double;
+------------+------------+------------+------------+
|  c_float   |  c_double  |  c_float0  | c_double0  |
+------------+------------+------------+------------+
| 959310.0   | 959310.0   | 959310.0   | 959310.0   |
| 963548.0   | 963548.0   | 963548.0   | 963548.0   |
| 982209.0   | 982209.0   | 982209.0   | 982209.0   |
| 997322.0   | 997322.0   | 997322.0   | 997322.0   |
+------------+------------+------------+------------+
4 rows selected (0.136 seconds)
{code}




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)