You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@kafka.apache.org by "Matthias J. Sax (JIRA)" <ji...@apache.org> on 2019/02/01 22:11:01 UTC

[jira] [Commented] (KAFKA-6049) Kafka Streams: Add Cogroup in the DSL

    [ https://issues.apache.org/jira/browse/KAFKA-6049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16758710#comment-16758710 ] 

Matthias J. Sax commented on KAFKA-6049:
----------------------------------------

Feature freeze deadline for 2.2 was yesterday. Moving this 2.3 release.

> Kafka Streams: Add Cogroup in the DSL
> -------------------------------------
>
>                 Key: KAFKA-6049
>                 URL: https://issues.apache.org/jira/browse/KAFKA-6049
>             Project: Kafka
>          Issue Type: New Feature
>          Components: streams
>            Reporter: Guozhang Wang
>            Priority: Major
>              Labels: api, kip, user-experience
>
> When multiple streams aggregate together to form a single larger object (e.g. a shopping website may have a cart stream, a wish list stream, and a purchases stream. Together they make up a Customer), it is very difficult to accommodate this in the Kafka-Streams DSL: it generally requires you to group and aggregate all of the streams to KTables then make multiple outer join calls to end up with a KTable with your desired object. This will create a state store for each stream and a long chain of ValueJoiners that each new record must go through to get to the final object.
> Creating a cogroup method where you use a single state store will:
> * Reduce the number of gets from state stores. With the multiple joins when a new value comes into any of the streams a chain reaction happens where the join processor keep calling ValueGetters until we have accessed all state stores.
> * Slight performance increase. As described above all ValueGetters are called also causing all ValueJoiners to be called forcing a recalculation of the current joined value of all other streams, impacting performance.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)