You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "ASF subversion and git services (JIRA)" <ji...@apache.org> on 2018/11/02 15:24:01 UTC
[jira] [Commented] (IMPALA-6661) Group by float results in one group per NaN value

    [ https://issues.apache.org/jira/browse/IMPALA-6661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16673257#comment-16673257 ] 

ASF subversion and git services commented on IMPALA-6661:
---------------------------------------------------------

Commit 15d48c3205778ce775270feac10186e8e4851d7c in impala's branch refs/heads/master from [~mostrows]
[ https://git-wip-us.apache.org/repos/asf?p=impala.git;h=15d48c3 ]

IMPALA-6661 Make NaN values equal for grouping purposes.

Similar to the treatment of NULLs, we want to consider NaN values
as equal when grouping.

- When detecting a NaN in a set of row values, the NaN value must
  be converted to a canonical value - so that all NaN values have
  the same bit-pattern for hashing purposes.

- When doing equality evaluation, floating point types must have
  additional logic to consider NaN values as equal.

- Existing logic for handling NULLs in this way is appropriate for
  triggering this behavior for NaN values.

- Relabel "force null equality" as "inclusive equality" to expand
  the scope of the concept to a more generic form that includes NaN.

Change-Id: I996c4a2e1934fd887046ed0c55457b7285375086
Reviewed-on: http://gerrit.cloudera.org:8080/11535
Tested-by: Impala Public Jenkins <im...@cloudera.com>
Reviewed-by: Michael Ho <kw...@cloudera.com>


> Group by float results in one group per NaN value
> -------------------------------------------------
>
>                 Key: IMPALA-6661
>                 URL: https://issues.apache.org/jira/browse/IMPALA-6661
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Backend
>    Affects Versions: Impala 2.11.0, Impala 2.12.0
>            Reporter: Tim Armstrong
>            Assignee: Michal Ostrowski
>            Priority: Major
>              Labels: correctness, perf, ramp-up
>             Fix For: Impala 3.1.0
>
>
> I don't know if this is the desired behaviour but it could be problematic for some users since it will blow up the number of distinct groups in an aggregation. I suspect that it's more useful to coalesce all the NaNs into a single group, similar to how NULL is handled in GROUP BY.
> {noformat}
> [localhost:21000] > select distinct * from (values(cast("nan" as float)), (cast("nan" as float)), (sqrt(cast("-1" as float)))) v;
> +----------------------+
> | cast('nan' as float) |
> +----------------------+
> | NaN                  |
> | NaN                  |
> | NaN                  |
> +----------------------+
> Fetched 3 row(s) in 0.11s
> {noformat}
> I suspect IMPALA-6069 slightly changed the behaviour here, although it would have been broken beforehand anyway, since not all NaNs have the same bit pattern, so Equals() and Hash() were inconsistent.
> We should decided what the preferred behaviour is and tweak the behaviour of the hash table to produce it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org