You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@kafka.apache.org by "Randall Hauch (Jira)" <ji...@apache.org> on 2020/05/26 17:16:00 UTC
[jira] [Commented] (KAFKA-8770) Either switch to or add an option for emit-on-change

    [ https://issues.apache.org/jira/browse/KAFKA-8770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17116908#comment-17116908 ] 

Randall Hauch commented on KAFKA-8770:
--------------------------------------

Just a quick note: I've changed [https://cwiki.apache.org/confluence/display/KAFKA/KIP-557%3A+Add+emit+on+change+support+for+Kafka+Streams] to denote that this KIP is still in voting, as it has only received 2 binding votes. I will also remove the KIP from the AK 2.6.0 release, since the KIP freeze (May 20) has already passed, meaning even with an additional binding vote this KIP still would not make the AK 2.6.0 deadline.

> Either switch to or add an option for emit-on-change
> ----------------------------------------------------
>
>                 Key: KAFKA-8770
>                 URL: https://issues.apache.org/jira/browse/KAFKA-8770
>             Project: Kafka
>          Issue Type: Improvement
>          Components: streams
>            Reporter: John Roesler
>            Priority: Major
>              Labels: needs-kip
>
> Currently, Streams offers two emission models:
> * emit-on-window-close: (using Suppression)
> * emit-on-update: (i.e., emit a new result whenever a new record is processed, regardless of whether the result has changed)
> There is also an option to drop some intermediate results, either using caching or suppression.
> However, there is no support for emit-on-change, in which results would be forwarded only if the result has changed. This has been reported to be extremely valuable as a performance optimizations for some high-traffic applications, and it reduces the computational burden both internally for downstream Streams operations, as well as for external systems that consume the results, and currently have to deal with a lot of "no-op" changes.
> It would be pretty straightforward to implement this, by loading the prior results before a stateful operation and comparing with the new result before persisting or forwarding. In many cases, we load the prior result anyway, so it may not be a significant performance impact either.
> One design challenge is what to do with timestamps. If we get one record at time 1 that produces a result, and then another at time 2 that produces a no-op, what should be the timestamp of the result, 1 or 2? emit-on-change would require us to say 1.
> Clearly, we'd need to do some serious benchmarks to evaluate any potential implementation of emit-on-change.
> Another design challenge is to decide if we should just automatically provide emit-on-change for stateful operators, or if it should be configurable. Configuration increases complexity, so unless the performance impact is high, we may just want to change the emission model without a configuration.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)