You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@beam.apache.org by "Kenneth Knowles (JIRA)" <ji...@apache.org> on 2018/01/30 14:44:00 UTC

[jira] [Created] (BEAM-3568) Overlapping sessions with zero allowed lateness due to window expiry rules

Kenneth Knowles created BEAM-3568:
-------------------------------------

             Summary: Overlapping sessions with zero allowed lateness due to window expiry rules
                 Key: BEAM-3568
                 URL: https://issues.apache.org/jira/browse/BEAM-3568
             Project: Beam
          Issue Type: Bug
          Components: beam-model, runner-core
            Reporter: Kenneth Knowles
            Assignee: Kenneth Knowles


Consider this sequence, with session gap durations of 5:

 - element arrives with timestamp 0, assigned to proto-window [0, 5)
 - watermark advances to 6, emitting the session and discarding it
 - element arrives with timestamp 3, assigned to proto-window [3, 8) so it is not dropped as the window is not expired
 - watermark advances to 8+, emitting that session

While "technically correct" according to spec, this seems undesirable. It was introduced when late data dropping was tied to window expiry. I think either dropping the second element or including it and emitting a merged window would be OK.

In the case of sessions, we could just retain the window until it cannot possibly merge with other non-expired data. Even with allowed lateness zero this is double the gap duration. The window would be in an interesting state where it would be expired and ineligible for further output but could still merge and the greater window could be output.

The challenge is that sessions are just one kind of merging window - the merging logic has to be assumed opaque. So we cannot simply reason about how sessions work. The other, more drastic option, is to rethink how late data dropping is defined for merging windows, particularly in the "proto-window" phase.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)