You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@impala.apache.org by "Tim Armstrong (JIRA)" <ji...@apache.org> on 2018/03/14 16:31:00 UTC
[jira] [Created] (IMPALA-6661) Group by float results in one group
per NaN value
Tim Armstrong created IMPALA-6661:
-------------------------------------
Summary: Group by float results in one group per NaN value
Key: IMPALA-6661
URL: https://issues.apache.org/jira/browse/IMPALA-6661
Project: IMPALA
Issue Type: Bug
Components: Backend
Affects Versions: Impala 2.11.0, Impala 2.12.0
Reporter: Tim Armstrong
I don't know if this is the desired behaviour but it could be problematic for some users since it will blow up the number of distinct groups in an aggregation. I suspect that it's more useful to coalesce all the NaNs into a single group, similar to how NULL is handled in GROUP BY.
{noformat}
[localhost:21000] > select distinct * from (values(cast("nan" as float)), (cast("nan" as float)), (sqrt(cast("-1" as float)))) v;
+----------------------+
| cast('nan' as float) |
+----------------------+
| NaN |
| NaN |
| NaN |
+----------------------+
Fetched 3 row(s) in 0.11s
{noformat}
I suspect IMPALA-6069 slightly changed the behaviour here, although it would have been broken beforehand anyway, since not all NaNs have the same bit pattern, so Equals() and Hash() were inconsistent.
We should decided what the preferred behaviour is and tweak the behaviour of the hash table to produce it.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)