You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@impala.apache.org by "Norbert Luksa (Jira)" <ji...@apache.org> on 2019/11/27 08:43:00 UTC

[jira] [Resolved] (IMPALA-6660) -0/+0 floating point do not compare as equal in hash table

     [ https://issues.apache.org/jira/browse/IMPALA-6660?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Norbert Luksa resolved IMPALA-6660.
-----------------------------------
    Fix Version/s: Impala 3.4.0
       Resolution: Fixed

> -0/+0 floating point do not compare as equal in hash table
> ----------------------------------------------------------
>
>                 Key: IMPALA-6660
>                 URL: https://issues.apache.org/jira/browse/IMPALA-6660
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Backend
>    Affects Versions: Impala 2.6.0, Impala 2.7.0, Impala 2.8.0, Impala 2.9.0, Impala 2.10.0, Impala 2.11.0, Impala 3.0, Impala 2.12.0
>            Reporter: Tim Armstrong
>            Assignee: Norbert Luksa
>            Priority: Major
>              Labels: correctness, ramp-up
>             Fix For: Impala 3.4.0
>
>
> This can happen because we hash the binary representation of the numbers. -0/0 should be treated as equal for hash joins.
> {noformat}
> [localhost:21000] > select * from (select cast("-0" as float) c1) v1,  (select cast("0" as float) c2) v2 where v1.c1 = v2.c2;
> Fetched 0 row(s) in 0.12s
> [localhost:21000] > select * from (select cast("0" as float) c1) v1,  (select cast("0" as float) c2) v2 where v1.c1 = v2.c2;
> +----+----+
> | c1 | c2 |
> +----+----+
> | 0  | 0  |
> +----+----+
> Fetched 1 row(s) in 0.11s
> [localhost:21000] > select * from (select cast("-0" as float) c1) v1,  (select cast("-0" as float) c2) v2 where v1.c1 = v2.c2;
> +----+----+
> | c1 | c2 |
> +----+----+
> | -0 | -0 |
> +----+----+
> Fetched 1 row(s) in 0.11s
> {noformat}
> With aggregations, we get separate groups. I could see the argument either way on whether this is the preferred behaviour for group by, since group by already handles equality of NULL differently. The behaviour here is tied to the behaviour in the join right now, so we should make sure to add a test for this case when fixing the join.
> {noformat}
> [localhost:21000] > select distinct * from (values(cast("-0" as float)), (cast("0" as float))) v;
> +---------------------+
> | cast('-0' as float) |
> +---------------------+
> | -0                  |
> | 0                   |
> +---------------------+
> {noformat}
> *Workaround*
> Casting the floating point numbers to decimal fixes the problem.
> *Proposed solution*
> The frontend could wrap floating point expressions in the hash join or hash aggregation in a normalisation function that converts -0 to +0.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)