You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@impala.apache.org by "Thomas Tauber-Marshall (JIRA)" <ji...@apache.org> on 2018/01/04 18:02:00 UTC
[jira] [Resolved] (IMPALA-6295) Inconsistent handling of 'nan' and
'inf' with min/max analytic fns
[ https://issues.apache.org/jira/browse/IMPALA-6295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Thomas Tauber-Marshall resolved IMPALA-6295.
--------------------------------------------
Resolution: Fixed
Fix Version/s: Impala 2.12.0
commit 96b976aff38de29619ca97dabc47566382e90bf8
Author: Thomas Tauber-Marshall <tm...@cloudera.com>
Date: Thu Dec 14 16:40:55 2017 -0800
IMPALA-6295: Fix mix/max handling of 'nan' and 'inf'
This patch fixes several issues related to the min/max aggregate
functions and their handling of 'nan' and 'inf':
- Previously, if 'inf' or '-inf' was the only value for the min/max
and codegen was being used, the result would be incorrect. This
occurred, for example in the case of 'inf' and 'min', because we
set an initial value of numeric_limits::max, which is less than
'inf', so the returned min was numeric_limits::max when it should be
'inf'. The fix is to set the initial value to
numeric_limits::infinity.
- Previously, if one of the values was 'nan', the result of min/max
was non-deterministic depending on the order the values were
evaluated in. This occurs because 'nan' < or > 'any value' is always
false, so if the first value added was 'nan', all other comparisons
would be false and 'nan' would be returned, whereas if the first
value wasn't 'nan' then the 'nan' wouldn't be returned. The fix is
to treat 'nan' specially and to always return 'nan' if there is a
single 'nan' value.
Testing:
- Added e2e tests for both scenarios, as well as adding a little extra
nan/inf coverage for other aggregate functions.
Change-Id: Ia1e206105937ce5afc75ca5044597d39b3dc6a81
Reviewed-on: http://gerrit.cloudera.org:8080/8854
Reviewed-by: Bikramjeet Vig <bi...@cloudera.com>
Reviewed-by: Tim Armstrong <ta...@cloudera.com>
Tested-by: Impala Public Jenkins
> Inconsistent handling of 'nan' and 'inf' with min/max analytic fns
> ------------------------------------------------------------------
>
> Key: IMPALA-6295
> URL: https://issues.apache.org/jira/browse/IMPALA-6295
> Project: IMPALA
> Issue Type: Bug
> Components: Backend
> Affects Versions: Impala 2.11.0
> Reporter: Thomas Tauber-Marshall
> Assignee: Thomas Tauber-Marshall
> Priority: Critical
> Labels: codegen, correctness
> Fix For: Impala 2.12.0
>
>
> Incorrect results are returned in some cases where 'nan'/'inf' are the only values in the group and codegen is enabled:
> {noformat}
> > set DISABLE_CODEGEN_ROWS_THRESHOLD set to 0
> > select * from test1 order by col1
> +------+-----------+
> | col0 | col1 |
> +------+-----------+
> | 0 | NaN |
> | 2 | -Infinity |
> | 3 | 0 |
> | 1 | Infinity |
> +------+-----------+
> > set DISABLE_CODEGEN set to true
> > select col0, min(col1) from test1 group by col0 order by col0
> +------+-----------+
> | col0 | min(col1) |
> +------+-----------+
> | 0 | NaN |
> | 1 | Infinity |
> | 2 | -Infinity |
> | 3 | 0 |
> +------+-----------+
> > set DISABLE_CODEGEN set to false
> > select col0, min(col1) from test1 group by col0 order by col0
> +------+------------------------+
> | col0 | min(col1) |
> +------+------------------------+
> | 0 | 1.797693134862316e+308 |
> | 1 | 1.797693134862316e+308 |
> | 2 | -Infinity |
> | 3 | 0 |
> +------+------------------------+
> > set DISABLE_CODEGEN set to true
> > select col0, max(col1) from test1 group by col0 order by col0
> +------+-----------+
> | col0 | max(col1) |
> +------+-----------+
> | 0 | NaN |
> | 1 | Infinity |
> | 2 | -Infinity |
> | 3 | 0 |
> +------+-----------+
> > set DISABLE_CODEGEN set to false
> > select col0, max(col1) from test1 group by col0 order by col0
> +------+-------------------------+
> | col0 | max(col1) |
> +------+-------------------------+
> | 0 | -1.797693134862316e+308 |
> | 1 | Infinity |
> | 2 | -1.797693134862316e+308 |
> | 3 | 0 |
> +------+-------------------------+
> {noformat}
> We also appear to never return 'nan' as a min or max value despite sorted it as the lowest value when ordering a table (perhaps this is the intended behavior?):
> {noformat}
> > set DISABLE_CODEGEN_ROWS_THRESHOLD set to 0
> > select * from test2 order by col1
> +------+-----------+
> | col0 | col1 |
> +------+-----------+
> | 0 | NaN |
> | 2 | -Infinity |
> | 0 | 0 |
> | 3 | 0 |
> | 1 | 1 |
> | 2 | 2 |
> | 3 | 3 |
> | 1 | Infinity |
> +------+-----------+
> > set DISABLE_CODEGEN set to true
> > select col0, min(col1) from test2 group by col0 order by col0
> +------+-----------+
> | col0 | min(col1) |
> +------+-----------+
> | 0 | 0 |
> | 1 | 1 |
> | 2 | -Infinity |
> | 3 | 0 |
> +------+-----------+
> > set DISABLE_CODEGEN set to false
> > select col0, min(col1) from test2 group by col0 order by col0
> +------+-----------+
> | col0 | min(col1) |
> +------+-----------+
> | 0 | 0 |
> | 1 | 1 |
> | 2 | -Infinity |
> | 3 | 0 |
> +------+-----------+
> > set DISABLE_CODEGEN set to true
> > select col0, max(col1) from test2 group by col0 order by col0
> +------+-----------+
> | col0 | max(col1) |
> +------+-----------+
> | 0 | 0 |
> | 1 | Infinity |
> | 2 | 2 |
> | 3 | 3 |
> +------+-----------+
> > set DISABLE_CODEGEN set to false
> > select col0, max(col1) from test2 group by col0 order by col0
> +------+-----------+
> | col0 | max(col1) |
> +------+-----------+
> | 0 | 0 |
> | 1 | Infinity |
> | 2 | 2 |
> | 3 | 3 |
> +------+-----------+
> {noformat}
> Changing LlvmCodeGen::CodegenMinMax to use OLT/OGT float comparison functions appears to solve the first case (at least for 'nan'), but leads to us returning 'nan' as a max value in the second case.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)