You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@kafka.apache.org by "Maatari (Jira)" <ji...@apache.org> on 2020/04/30 18:43:00 UTC
[jira] [Comment Edited] (KAFKA-7224) KIP-328: Add spill-to-disk for Suppression

    [ https://issues.apache.org/jira/browse/KAFKA-7224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17096863#comment-17096863 ] 

Maatari edited comment on KAFKA-7224 at 4/30/20, 6:42 PM:
----------------------------------------------------------

What i call intermediate result, is in the following context. Let say you have the following topology 
{code:java}
ktable0.join(ktable1.groupby.reduce){code}
Where the reduce just act as the collectList in KSQL. This is a use case we have. There is a repartition topic at the groupby, and therefore you would emit, multiple time the same records, while the list collected with the reduce will keep increasing, until the entire topic is consume. This next generate, multiple results for join as well, as the same key on the right of the join will come multiple time. So you end up having systematic every growing version of records. That is what i call intermediate result. This is a way to build views on normalize data, that build entity with reference to all its outgoing links. We use to do that in our databases, but it was not scaling. 


was (Author: maatdeamon):
What i call intermediate result, is in the following context. Let say you have the following topology 
{code:java}
ktable0.join(ktable1.groupby.reduce){code}
Where the reduce just act as the collectList in KSQL. This is a use case we have we need like this. There is a repartition topic at the groupby, and therefore you would emit, multiple time the same records, while the list collected with the reduce will keep increasing, until the entire topic is consume. This next generate, multiple results for join as well, as the same key on the right of the join will come multiple time. So you end up having systematic every growing version of records. That is what i call intermediate result. This is a way to build views on normalize data, that build entity with reference to all its outgoing links. We use to do that in our databases, but it was not scaling. 

> KIP-328: Add spill-to-disk for Suppression
> ------------------------------------------
>
>                 Key: KAFKA-7224
>                 URL: https://issues.apache.org/jira/browse/KAFKA-7224
>             Project: Kafka
>          Issue Type: Improvement
>          Components: streams
>            Reporter: John Roesler
>            Priority: Major
>
> As described in [https://cwiki.apache.org/confluence/display/KAFKA/KIP-328%3A+Ability+to+suppress+updates+for+KTables]
> Following on KAFKA-7223, implement the spill-to-disk buffering strategy.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)