You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@kafka.apache.org by "Guozhang Wang (Jira)" <ji...@apache.org> on 2022/02/23 19:18:00 UTC

[jira] [Commented] (KAFKA-13678) 2nd punctuation using STREAM_TIME does not respect scheduled interval

    [ https://issues.apache.org/jira/browse/KAFKA-13678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17497010#comment-17497010 ] 

Guozhang Wang commented on KAFKA-13678:
---------------------------------------

Hello [~lorenzocagnatel] thanks for reporting. There's indeed a rationale to align the stream-time punctuation scheduling with epoch time, different from system-time punctuation: we want to keep the stream-time computational behaviors to be as deterministic as possible, i.e. if you restart your app at different times, the triggering of punctuation would still be the same (I'm saying this since today there are still a few other factors than punctuations that can cause non-detereminism, but we are also targeting them). I think [~mjsax] may also have more context to share with you.

> 2nd punctuation using STREAM_TIME does not respect scheduled interval
> ---------------------------------------------------------------------
>
>                 Key: KAFKA-13678
>                 URL: https://issues.apache.org/jira/browse/KAFKA-13678
>             Project: Kafka
>          Issue Type: Bug
>          Components: streams
>    Affects Versions: 3.0.0
>            Reporter: Lorenzo Cagnatel
>            Priority: Major
>
> Scheduling a punctuator using stream time, the first punctuation occurs immediately as documented, but the second one is not triggered at *t_schedule + interval* but it could happen before that time. 
> For example, assume that we schedule a punctuation every 10 sec at timestamp 5 (t5). The system now works like this:
> {noformat}
> t5 -> schedule, punctuate, next schedule at t10
> t6 -> no punctuation
> t7 -> no punctuation
> t8 -> no punctuation
> t9 -> no punctuation
> t10 -> punctuate, next schedule at t20
> ...{noformat}
> In this example the 2nd schedule occurs after 5 seconds from the first one, breaking the interval duration.
> From my point of view, a reasonable behaviour could be:
> {noformat}
> t5 -> schedule, punctuate, next schedule at t15
> t6 -> no punctuation
> t7 -> no punctuation
> t8 -> no punctuation
> t9 -> no punctuation
> t10 -> no punctuation
> t11 -> no punctuation
> t12 -> no punctuation
> t13 -> no punctuation
> t14 -> no punctuation
> t15 -> punctuate, next schedule at t25
> ...{noformat}
> The origin of this problem can be found in {*}StreamTask.schedule{*}:
> {code:java}
> /**
> * Schedules a punctuation for the processor
> *
> * @param interval the interval in milliseconds
> * @param type the punctuation type
> * @throws IllegalStateException if the current node is not null
> */
> public Cancellable schedule(final long interval, final PunctuationType type, final Punctuator punctuator) {
>    switch (type) {
>       case STREAM_TIME:
>          // align punctuation to 0L, punctuate as soon as we have data
>          return schedule(0L, interval, type, punctuator);
>       case WALL_CLOCK_TIME:
>          // align punctuation to now, punctuate after interval has elapsed
>          return schedule(time.milliseconds() + interval, interval, type, punctuator);
>       default:
>          throw new IllegalArgumentException("Unrecognized PunctuationType: " + type);
>    }
> }{code}
> when, in case of stream time, it calls *schedule* with {*}startTime=0{*}.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)