You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by Satyam Shekhar <sa...@gmail.com> on 2020/11/02 20:23:01 UTC

Wrong results for join post tumble grouping

Hello,


I have a table T0 with the following schema -

root
      |-- amount: BIGINT}}
      |-- timestamp: TIMESTAMP(3)


The table T0 has two rows -
amount timestamp
0 0
1 86400000


The following query with tumble grouping returns the wrong result -

WITH CTE AS
  (SELECT SUM(amount) AS _output,
          TUMBLE_END(`timestamp`, INTERVAL '1' SECOND) AS _dim0
       FROM T0 GROUP BY TUMBLE(`timestamp`, INTERVAL '1' SECOND))
SELECT V0._output as V0_output, V1._output AS V1_output,
       V0._dim0 as V0_time, V1._dim0 as V1_time
   FROM CTE as V0 INNER JOIN CTE V1 ON V0._dim0 = V1._dim0



The returned result is -
V0_output V1_output V0_time V1_time
1 1 86401000 86401000


The expected result is -
V0_output V1_output V0_time V1_time
0 0 1000 1000
1 1 86401000 86401000


Running subquery for `CTE` returns the correct result -

SELECT SUM(amount) AS _output,
               TUMBLE_END(`timestamp`, INTERVAL '1' SECOND) AS _dim0
    FROM T0 GROUP BY TUMBLE(`timestamp`, INTERVAL '1' SECOND)


Result (this is correct) -
_output _dim0
0 1000
1 86401000
Also, the following query without tumble grouping returns the correct
result -
WITH CTE AS
  (SELECT amount AS _output, `timestamp` AS _dim0 FROM T0)
SELECT V0._output as V0_output, V1._output AS V1_output,
       V0._dim0 as V0_time, V1._dim0 as V1_time
   FROM CTE as V0 INNER JOIN CTE V1 ON V0._dim0 = V1._dim0

Result -
V0_output V1_output V0_time V1_time
0 0 0 0
1 1 86400000 86400000
I have filed a JIRA for the issue -
https://issues.apache.org/jira/browse/FLINK-19926#. Would love to get some
eyes on it.

Regards,
Satyam