You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@impala.apache.org by "Tim Armstrong (JIRA)" <ji...@apache.org> on 2018/03/14 16:31:00 UTC

[jira] [Created] (IMPALA-6661) Group by float results in one group per NaN value

Tim Armstrong created IMPALA-6661:
-------------------------------------

             Summary: Group by float results in one group per NaN value
                 Key: IMPALA-6661
                 URL: https://issues.apache.org/jira/browse/IMPALA-6661
             Project: IMPALA
          Issue Type: Bug
          Components: Backend
    Affects Versions: Impala 2.11.0, Impala 2.12.0
            Reporter: Tim Armstrong


I don't know if this is the desired behaviour but it could be problematic for some users since it will blow up the number of distinct groups in an aggregation. I suspect that it's more useful to coalesce all the NaNs into a single group, similar to how NULL is handled in GROUP BY.

{noformat}
[localhost:21000] > select distinct * from (values(cast("nan" as float)), (cast("nan" as float)), (sqrt(cast("-1" as float)))) v;
+----------------------+
| cast('nan' as float) |
+----------------------+
| NaN                  |
| NaN                  |
| NaN                  |
+----------------------+
Fetched 3 row(s) in 0.11s
{noformat}

I suspect IMPALA-6069 slightly changed the behaviour here, although it would have been broken beforehand anyway, since not all NaNs have the same bit pattern, so Equals() and Hash() were inconsistent.

We should decided what the preferred behaviour is and tweak the behaviour of the hash table to produce it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)