You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "Caizhi Weng (Jira)" <ji...@apache.org> on 2022/05/16 03:44:00 UTC
[jira] [Created] (FLINK-27627) Incorrect result when order by (string, double) pair with NaN values
Caizhi Weng created FLINK-27627:
-----------------------------------
Summary: Incorrect result when order by (string, double) pair with NaN values
Key: FLINK-27627
URL: https://issues.apache.org/jira/browse/FLINK-27627
Project: Flink
Issue Type: Bug
Components: Table SQL / Runtime
Affects Versions: 1.15.0
Reporter: Caizhi Weng
Use these test data and SQL to reproduce this exception.
gao.csv:
{code}
1.0,2.0,aaaaaaaaaaaaaaa
0.0,0.0,aaaaaaaaaaaaaaa
1.0,1.0,aaaaaaaaaaaaaaa
0.0,0.0,aaaaaaaaaaaaaaa
1.0,0.0,aaaaaaaaaaaaaaa
0.0,0.0,aaaaaaaaaaaaaaa
-1.0,0.0,aaaaaaaaaaaaaaa
1.0,-1.0,aaaaaaaaaaaaaaa
1.0,-2.0,aaaaaaaaaaaaaaa
{code}
Flink SQL:
{code}
Flink SQL> create table T ( a double, b double, c string ) WITH ( 'connector' = 'filesystem', 'path' = '/tmp/gao.csv', 'format' = 'csv' );
[INFO] Execute statement succeed.
Flink SQL> create table S ( a string, b double ) WITH ( 'connector' = 'filesystem', 'path' = '/tmp/gao2.csv', 'format' = 'csv' );
[INFO] Execute statement succeed.
Flink SQL> insert into S select c, a / b from T;
[INFO] Submitting SQL update statement to the cluster...
[INFO] SQL update statement has been successfully submitted to the cluster:
Job ID: 8c98f5bb99c2dcd28f13def916e2178a
Flink SQL> select * from S order by a, b;
+-----------------+-----------+
| a | b |
+-----------------+-----------+
| aaaaaaaaaaaaaaa | 0.5 |
| aaaaaaaaaaaaaaa | NaN |
| aaaaaaaaaaaaaaa | 1.0 |
| aaaaaaaaaaaaaaa | NaN |
| aaaaaaaaaaaaaaa | Infinity |
| aaaaaaaaaaaaaaa | NaN |
| aaaaaaaaaaaaaaa | -Infinity |
| aaaaaaaaaaaaaaa | -1.0 |
| aaaaaaaaaaaaaaa | -0.5 |
+-----------------+-----------+
9 rows in set
Flink SQL> select * from S order by b;
+-----------------+-----------+
| a | b |
+-----------------+-----------+
| aaaaaaaaaaaaaaa | -Infinity |
| aaaaaaaaaaaaaaa | -1.0 |
| aaaaaaaaaaaaaaa | -0.5 |
| aaaaaaaaaaaaaaa | 0.5 |
| aaaaaaaaaaaaaaa | 1.0 |
| aaaaaaaaaaaaaaa | Infinity |
| aaaaaaaaaaaaaaa | NaN |
| aaaaaaaaaaaaaaa | NaN |
| aaaaaaaaaaaaaaa | NaN |
+-----------------+-----------+
9 rows in set
{code}
As is shown above, when order by a (string, double) pair the result is incorrect, while order by a double column separately yields the correct result.
This is because {{BinaryIndexedSortable}} uses two comparators, the normalized key comparator which directly compares memory segments, and the record comparator which compares actual column values. If the length of sort keys are not determined (for example if the sort keys contain strings) the normalized key comparator cannot fully determine the order and it will fall back to the record comparator.
As we can see in {{GenerateUtils#generateCompare}}, record comparator compares double values directly with {{<}} and {{>}}. However for {{Double.NaN}}, every binary comparator except {{!=}} will return false, which causes this issue.
Note that we cannot simply change {{GenerateUtils#generateCompare}}. This is because comparing {{NaN}} in SQL should also return false except for {{<>}}. It is the sorting operator that requires a specific order. That is to say, the current implementation of {{GenerateUtils#generateCompare}} is correct for comparing, but not for sorting. Maybe we should generate a special comparator for all sorting operators?
--
This message was sent by Atlassian Jira
(v8.20.7#820007)