You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@impala.apache.org by "Thomas Tauber-Marshall (JIRA)" <ji...@apache.org> on 2017/12/07 23:53:00 UTC

[jira] [Created] (IMPALA-6295) Inconsistent handling of 'nan' and 'inf' with min/max analytic fns

Thomas Tauber-Marshall created IMPALA-6295:
----------------------------------------------

             Summary: Inconsistent handling of 'nan' and 'inf' with min/max analytic fns
                 Key: IMPALA-6295
                 URL: https://issues.apache.org/jira/browse/IMPALA-6295
             Project: IMPALA
          Issue Type: Bug
          Components: Backend
    Affects Versions: Impala 2.11.0
            Reporter: Thomas Tauber-Marshall
            Priority: Critical


Incorrect results are returned in some cases where 'nan'/'inf' are the only values in the group and codegen is enabled:
{noformat}
> set DISABLE_CODEGEN_ROWS_THRESHOLD set to 0

> select * from test1 order by col1
+------+-----------+
| col0 | col1      |
+------+-----------+
| 0    | NaN       |
| 2    | -Infinity |
| 3    | 0         |
| 1    | Infinity  |
+------+-----------+

> set DISABLE_CODEGEN set to true
> select col0, min(col1) from test1 group by col0 order by col0
+------+-----------+
| col0 | min(col1) |
+------+-----------+
| 0    | NaN       |
| 1    | Infinity  |
| 2    | -Infinity |
| 3    | 0         |
+------+-----------+

> set DISABLE_CODEGEN set to false
> select col0, min(col1) from test1 group by col0 order by col0
+------+------------------------+
| col0 | min(col1)              |
+------+------------------------+
| 0    | 1.797693134862316e+308 |
| 1    | 1.797693134862316e+308 |
| 2    | -Infinity              |
| 3    | 0                      |
+------+------------------------+

> set DISABLE_CODEGEN set to true
> select col0, max(col1) from test1 group by col0 order by col0
+------+-----------+
| col0 | max(col1) |
+------+-----------+
| 0    | NaN       |
| 1    | Infinity  |
| 2    | -Infinity |
| 3    | 0         |
+------+-----------+

> set DISABLE_CODEGEN set to false
> select col0, max(col1) from test1 group by col0 order by col0
+------+-------------------------+
| col0 | max(col1)               |
+------+-------------------------+
| 0    | -1.797693134862316e+308 |
| 1    | Infinity                |
| 2    | -1.797693134862316e+308 |
| 3    | 0                       |
+------+-------------------------+
{noformat}

We also appear to never return 'nan' as a min or max value despite sorted it as the lowest value when ordering a table (perhaps this is the intended behavior?):
{noformat}
> set DISABLE_CODEGEN_ROWS_THRESHOLD set to 0
> select * from test2 order by col1
+------+-----------+
| col0 | col1      |
+------+-----------+
| 0    | NaN       |
| 2    | -Infinity |
| 0    | 0         |
| 3    | 0         |
| 1    | 1         |
| 2    | 2         |
| 3    | 3         |
| 1    | Infinity  |
+------+-----------+

> set DISABLE_CODEGEN set to true
> select col0, min(col1) from test2 group by col0 order by col0
+------+-----------+
| col0 | min(col1) |
+------+-----------+
| 0    | 0         |
| 1    | 1         |
| 2    | -Infinity |
| 3    | 0         |
+------+-----------+

> set DISABLE_CODEGEN set to false
> select col0, min(col1) from test2 group by col0 order by col0
+------+-----------+
| col0 | min(col1) |
+------+-----------+
| 0    | 0         |
| 1    | 1         |
| 2    | -Infinity |
| 3    | 0         |
+------+-----------+

> set DISABLE_CODEGEN set to true
> select col0, max(col1) from test2 group by col0 order by col0
+------+-----------+
| col0 | max(col1) |
+------+-----------+
| 0    | 0         |
| 1    | Infinity  |
| 2    | 2         |
| 3    | 3         |
+------+-----------+

> set DISABLE_CODEGEN set to false
> select col0, max(col1) from test2 group by col0 order by col0
+------+-----------+
| col0 | max(col1) |
+------+-----------+
| 0    | 0         |
| 1    | Infinity  |
| 2    | 2         |
| 3    | 3         |
+------+-----------+
{noformat}

Changing LlvmCodeGen::CodegenMinMax to use OLT/OGT float comparison functions appears to solve the first case (at least for 'nan'), but leads to us returning 'nan' as a max value in the second case.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)