You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by Satyam Shekhar <sa...@gmail.com> on 2020/11/02 20:23:01 UTC
Wrong results for join post tumble grouping
Hello,
I have a table T0 with the following schema -
root
|-- amount: BIGINT}}
|-- timestamp: TIMESTAMP(3)
The table T0 has two rows -
amount timestamp
0 0
1 86400000
The following query with tumble grouping returns the wrong result -
WITH CTE AS
(SELECT SUM(amount) AS _output,
TUMBLE_END(`timestamp`, INTERVAL '1' SECOND) AS _dim0
FROM T0 GROUP BY TUMBLE(`timestamp`, INTERVAL '1' SECOND))
SELECT V0._output as V0_output, V1._output AS V1_output,
V0._dim0 as V0_time, V1._dim0 as V1_time
FROM CTE as V0 INNER JOIN CTE V1 ON V0._dim0 = V1._dim0
The returned result is -
V0_output V1_output V0_time V1_time
1 1 86401000 86401000
The expected result is -
V0_output V1_output V0_time V1_time
0 0 1000 1000
1 1 86401000 86401000
Running subquery for `CTE` returns the correct result -
SELECT SUM(amount) AS _output,
TUMBLE_END(`timestamp`, INTERVAL '1' SECOND) AS _dim0
FROM T0 GROUP BY TUMBLE(`timestamp`, INTERVAL '1' SECOND)
Result (this is correct) -
_output _dim0
0 1000
1 86401000
Also, the following query without tumble grouping returns the correct
result -
WITH CTE AS
(SELECT amount AS _output, `timestamp` AS _dim0 FROM T0)
SELECT V0._output as V0_output, V1._output AS V1_output,
V0._dim0 as V0_time, V1._dim0 as V1_time
FROM CTE as V0 INNER JOIN CTE V1 ON V0._dim0 = V1._dim0
Result -
V0_output V1_output V0_time V1_time
0 0 0 0
1 1 86400000 86400000
I have filed a JIRA for the issue -
https://issues.apache.org/jira/browse/FLINK-19926#. Would love to get some
eyes on it.
Regards,
Satyam