You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@kafka.apache.org by "Matthias J. Sax (Jira)" <ji...@apache.org> on 2021/06/02 07:10:00 UTC

[jira] [Comment Edited] (KAFKA-12718) SessionWindows are closed too early

    [ https://issues.apache.org/jira/browse/KAFKA-12718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17355521#comment-17355521 ] 

Matthias J. Sax edited comment on KAFKA-12718 at 6/2/21, 7:09 AM:
------------------------------------------------------------------

`key=[k1@0/5]` is the key of a session, with data key `k1` and session start time of 0 and session end time of 5. The format is `[dataKey@windowStart/windowEnd]`.

Given the input data we observe and expected the following: The first record creates a new session `k1@0/0` – the second record extend the existing session (gap is set to 5) – for this case, we get a tombstone for the existing sessions and a second record for the new sessions. Thus after processing the first two input records, we have 3 output records.

Seems the first 6 output records are actually the same as in the expected result, but output records 7 and 8 are not expected in the result. Given that grace-period is zero, the fourth input record `k2` with ts=6 actually closes the session `k1@0/5` (before your fix) and thus the 5th input record was not expected to produce any output – however, with the fix, `k2` does not close the window any longer, and thus we get more result records.

I guess the goal of the test was to verify that the first session gets closed, so I think the right fix is to change the input data, ie, the timestamp of input record key=k2 should be changes from 6 to 11 to bump the time beyond session-end plus gap?


was (Author: mjsax):
`key=[k1@0/5]` is the key of a session, with data key `k1` and session start time of 0 and session end time of 5. The format is `[dataKey@windowStart/windowEnd]`.

Given the input data we observe and expected the following: The first record creates a new session `k1@0/0` – the second record extend the existing session (gap is set to 5) – for this case, we get a tombstone for the existing sessions and a second record for the new sessions. Thus after processing the first two input records, we have 3 output records.

Seems the first 6 output records are actually the same as in the expected result, but output records 7 and 8 are not expected in the result. Given that grace-period is zero, the fourth input record `k2` with ts=6 actually closes the session `k1@0/5` and thus the 5th input record should not result in any output. Thus, the expected result seems to be correct, while the observed output record 7 and 8 are incorrect: seems this is an issue introduced with your code change?

Does this help?

> SessionWindows are closed too early
> -----------------------------------
>
>                 Key: KAFKA-12718
>                 URL: https://issues.apache.org/jira/browse/KAFKA-12718
>             Project: Kafka
>          Issue Type: Bug
>          Components: streams
>            Reporter: Matthias J. Sax
>            Assignee: Juan C. Gonzalez-Zurita
>            Priority: Major
>              Labels: beginner, easy-fix, newbie
>             Fix For: 3.0.0
>
>
> SessionWindows are defined based on a {{gap}} parameter, and also support an additional {{grace-period}} configuration to handle out-of-order data.
> To incorporate the session-gap a session window should only be closed at {{window-end + gap}} and to incorporate grace-period, the close time should be pushed out further to {{window-end + gap + grace}}.
> However, atm we compute the window close time as {{window-end + grace}} omitting the {{gap}} parameter.
> Because default grace-period is 24h most users might not notice this issues. Even if they set a grace period explicitly (eg, when using suppress()), they would most likely set a grace-period larger than gap-time not hitting the issue (or maybe only realize it when inspecting the behavior closely).
> However, if a user wants to disable the grace-period and sets it to zero (on any other value smaller than gap-time), sessions might be close too early and user might notice.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)