You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "Francesco Guardiani (Jira)" <ji...@apache.org> on 2021/10/07 07:21:00 UTC

[jira] [Created] (FLINK-24466) Interval Join late events handling behaviour is not consistent

Francesco Guardiani created FLINK-24466:
-------------------------------------------

             Summary: Interval Join late events handling behaviour is not consistent
                 Key: FLINK-24466
                 URL: https://issues.apache.org/jira/browse/FLINK-24466
             Project: Flink
          Issue Type: Bug
          Components: Table SQL / Runtime
            Reporter: Francesco Guardiani
         Attachments: Fix_late_events_filtering_for_interval_join.patch

Interval Join handles late events emitting them in the output, as a padded row. This behavior is also tested extensively in {{RowTimeIntervalJoinTest}}.

The problem with this behavior is the way an event is considered "late" or not: in order to distinguish between the two, {{RowTimeIntervalJoin}} uses the {{ctx.timerService().currentWatermark()}} to find out if an event is later than the last received watermark or not. But that method returns the "combined" watermark across all the keys, partitions and *input streams*, that is if one of the two streams goes "slower" than the other one, the returned watermark is going to be the minimum among the two.

This means that our late events handling effectively works only if the two streams run "at the same pace", otherwise we'll just see what we consider _late events_ for one of the two streams as joined.

To observe this behavior, just run the test {{IntervalJoinITCase#testRowTimeInnerJoinWithEquiTimeAttrs}} in this revision https://github.com/apache/flink/commit/7033cbfe404bea1519d3342a611e2f92768d70f9 several times and you'll see that after a couple of runs it fails, joining one of the {{"should-be-discarded"}} records. Those records are way behind the watermark - 1 second, as defined.

You'll find attached in the issue a small patch to show how this could be fixed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)