You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kafka.apache.org by "Matthias J. Sax" <ma...@confluent.io> on 2019/02/21 02:55:58 UTC

Re: [DISCUSS] KIP-300: Add Windowed KTable API in StreamsBuilder

Thanks. LGTM.

-Matthias

On 1/20/19 8:39 PM, Boyang Chen wrote:
> Hey Mattihas,
> 
> I have addressed the comments in KIP. Feel free to take another look.
> 
> Also you are right, those are implementation details that we could discuss in diff 😊
> 
> Boyang
> 
> ________________________________
> From: Matthias J. Sax <ma...@confluent.io>
> Sent: Saturday, January 19, 2019 3:16 AM
> To: dev@kafka.apache.org
> Subject: Re: [DISCUSS] KIP-300: Add Windowed KTable API in StreamsBuilder
> 
> Thank Boyang!
> 
>>> I think it should be good to just extend ConsumedInternal and MaterializedInternal with window size, and keep
>>> external API clean. Just so you know it would be even more messy for internal implementation if we don't carry
>>> the window size around within existing data struct.
> 
> I cannot follow here. But because this is internal stuff anyway, I would
> prefer to discuss this on the PR instead of the mailing list.
> 
> 
> -Matthias
> 
> On 1/18/19 10:58 AM, Boyang Chen wrote:
>> Hey Matthias,
>>
>> thanks a lot for the comments!
>>
>> It seems that `windowSize` is a mandatory argument for windowed-tables,
>> thus all overload should have the first two parameters being `String
>> topic` and `Duration windowSize`.
>> Yep, that sounds good to me.
>>
>> For session-tables, there should be no `windowSize` parameter because
>> each session can have a different size and as a matter of fact, both the
>> window start and window end timestamp are contained in the key anyway
>> for this reason. (This is different to time windows as the KIP mentions.)
>> Good suggestion, I think we should be able to skip the windowsize for session store.
>>
>> Thus, I don't think that there is any need to extend `Consumed` or
>> `Materialized` -- in contrast, extending both as suggested would result
>> in bad API, because those new methods would be available for
>> key-value-tables, too.
>> I think it should be good to just extend ConsumedInternal and MaterializedInternal with window size, and keep
>> external API clean. Just so you know it would be even more messy for internal implementation if we don't carry
>> the window size around within existing data struct.
>>
>> About generic types: why is `windowedTable()` using `Consumers<K,V>`
>> while `sessionTable` is using `Consumed<Windowed<K,V>>`? The KIP
>> mentions that we can wrap provided key-serdes automatically with
>> corresponding window serdes. Thus, it seems the correct type should be `K`?
>> Yes that's a typo, and I already fixed it.
>>
>> I will let you know when the KIP updates are done.
>>
>> Best,
>> Boyang
>> ________________________________
>> From: Matthias J. Sax <ma...@confluent.io>
>> Sent: Thursday, January 17, 2019 7:52 AM
>> To: dev@kafka.apache.org
>> Subject: Re: [DISCUSS] KIP-300: Add Windowed KTable API in StreamsBuilder
>>
>> Couple of follow up comment on the KIP:
>>
>> It seems that `windowSize` is a mandatory argument for windowed-tables,
>> thus all overload should have the first two parameters being `String
>> topic` and `Duration windowSize`.
>>
>> For session-tables, there should be no `windowSize` parameter because
>> each session can have a different size and as a matter of fact, both the
>> window start and window end timestamp are contained in the key anyway
>> for this reason. (This is different to time windows as the KIP mentions.)
>>
>> Thus, I don't think that there is any need to extend `Consumed` or
>> `Materialized` -- in contrast, extending both as suggested would result
>> in bad API, because those new methods would be available for
>> key-value-tables, too.
>>
>> About generic types: why is `windowedTable()` using `Consumers<K,V>`
>> while `sessionTable` is using `Consumed<Windowed<K,V>>`? The KIP
>> mentions that we can wrap provided key-serdes automatically with
>> corresponding window serdes. Thus, it seems the correct type should be `K`?
>>
>>
>> -Matthias
>>
>>
>> On 1/12/19 8:35 PM, Boyang Chen wrote:
>>> Hey Matthias,
>>>
>>> thanks for taking a look! It would be great to see this pushed in 2.2. Depending on the tight timeline, I hope to at least get the KIP approved so that we don't see back and forth again as the KTable API has been constantly changing. I couldn't guarantee the implementation timeline until we agree on the updated high level APIs first. Does that make sense?
>>>
>>> Best,
>>> Boyang
>>> ________________________________
>>> From: Matthias J. Sax <ma...@confluent.io>
>>> Sent: Sunday, January 13, 2019 10:53 AM
>>> To: dev@kafka.apache.org
>>> Subject: Re: [DISCUSS] KIP-300: Add Windowed KTable API in StreamsBuilder
>>>
>>> Do you want to get this into 2.2 release? KIP deadline is 1/24, so quite
>>> soon.
>>>
>>> Overall, the KIP is very useful. I can review again in more details if
>>> you aim for 2.2 -- did you address all previous comment about the KIP
>>> already?
>>>
>>>
>>> -Matthias
>>>
>>>
>>>
>>> On 1/8/19 2:50 PM, Boyang Chen wrote:
>>>> Hey folks,
>>>>
>>>> I would like to start a discussion thread on adding new time/session windowed KTable APIs for KStream:
>>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-300%3A+Add+Windowed+KTable+API+in+StreamsBuilder
>>>>
>>>> We have been working on this thread around 7 months ago, and it is successfully applied in our internal stream applications that enable
>>>> data sharing across multiple jobs. As a matter of fact, materialization of windowed store is definitely a concrete use case that could unblock stream users to
>>>> build more complex modules.
>>>>
>>>> Let me know if the API changes makes sense.
>>>>
>>>> Best,
>>>> Boyang
>>>> KIP-300: Add Windowed KTable API in StreamsBuilder - Apache Kafka - Apache Software Foundation<https://cwiki.apache.org/confluence/display/KAFKA/KIP-300%3A+Add+Windowed+KTable+API+in+StreamsBuilder>
>>>> We have an existing table() API in the StreamsBuilder which could materialize a Kafka topic into a local state store called KTable. This interface is very useful when we want to back up a Kafka topic to local store. Sometimes we have certain requirement to materialize a windowed topic (or changlog ...
>>>> cwiki.apache.org
>>>>
>>>>
>>>>
>>>
>>>
>>
>>
> 


Re: [DISCUSS] KIP-300: Add Windowed KTable API in StreamsBuilder

Posted by Boyang Chen <bc...@outlook.com>.
Great, thank you Matthias!


________________________________
From: Matthias J. Sax <ma...@confluent.io>
Sent: Thursday, February 21, 2019 10:55 AM
To: dev@kafka.apache.org
Subject: Re: [DISCUSS] KIP-300: Add Windowed KTable API in StreamsBuilder

Thanks. LGTM.

-Matthias

On 1/20/19 8:39 PM, Boyang Chen wrote:
> Hey Mattihas,
>
> I have addressed the comments in KIP. Feel free to take another look.
>
> Also you are right, those are implementation details that we could discuss in diff 😊
>
> Boyang
>
> ________________________________
> From: Matthias J. Sax <ma...@confluent.io>
> Sent: Saturday, January 19, 2019 3:16 AM
> To: dev@kafka.apache.org
> Subject: Re: [DISCUSS] KIP-300: Add Windowed KTable API in StreamsBuilder
>
> Thank Boyang!
>
>>> I think it should be good to just extend ConsumedInternal and MaterializedInternal with window size, and keep
>>> external API clean. Just so you know it would be even more messy for internal implementation if we don't carry
>>> the window size around within existing data struct.
>
> I cannot follow here. But because this is internal stuff anyway, I would
> prefer to discuss this on the PR instead of the mailing list.
>
>
> -Matthias
>
> On 1/18/19 10:58 AM, Boyang Chen wrote:
>> Hey Matthias,
>>
>> thanks a lot for the comments!
>>
>> It seems that `windowSize` is a mandatory argument for windowed-tables,
>> thus all overload should have the first two parameters being `String
>> topic` and `Duration windowSize`.
>> Yep, that sounds good to me.
>>
>> For session-tables, there should be no `windowSize` parameter because
>> each session can have a different size and as a matter of fact, both the
>> window start and window end timestamp are contained in the key anyway
>> for this reason. (This is different to time windows as the KIP mentions.)
>> Good suggestion, I think we should be able to skip the windowsize for session store.
>>
>> Thus, I don't think that there is any need to extend `Consumed` or
>> `Materialized` -- in contrast, extending both as suggested would result
>> in bad API, because those new methods would be available for
>> key-value-tables, too.
>> I think it should be good to just extend ConsumedInternal and MaterializedInternal with window size, and keep
>> external API clean. Just so you know it would be even more messy for internal implementation if we don't carry
>> the window size around within existing data struct.
>>
>> About generic types: why is `windowedTable()` using `Consumers<K,V>`
>> while `sessionTable` is using `Consumed<Windowed<K,V>>`? The KIP
>> mentions that we can wrap provided key-serdes automatically with
>> corresponding window serdes. Thus, it seems the correct type should be `K`?
>> Yes that's a typo, and I already fixed it.
>>
>> I will let you know when the KIP updates are done.
>>
>> Best,
>> Boyang
>> ________________________________
>> From: Matthias J. Sax <ma...@confluent.io>
>> Sent: Thursday, January 17, 2019 7:52 AM
>> To: dev@kafka.apache.org
>> Subject: Re: [DISCUSS] KIP-300: Add Windowed KTable API in StreamsBuilder
>>
>> Couple of follow up comment on the KIP:
>>
>> It seems that `windowSize` is a mandatory argument for windowed-tables,
>> thus all overload should have the first two parameters being `String
>> topic` and `Duration windowSize`.
>>
>> For session-tables, there should be no `windowSize` parameter because
>> each session can have a different size and as a matter of fact, both the
>> window start and window end timestamp are contained in the key anyway
>> for this reason. (This is different to time windows as the KIP mentions.)
>>
>> Thus, I don't think that there is any need to extend `Consumed` or
>> `Materialized` -- in contrast, extending both as suggested would result
>> in bad API, because those new methods would be available for
>> key-value-tables, too.
>>
>> About generic types: why is `windowedTable()` using `Consumers<K,V>`
>> while `sessionTable` is using `Consumed<Windowed<K,V>>`? The KIP
>> mentions that we can wrap provided key-serdes automatically with
>> corresponding window serdes. Thus, it seems the correct type should be `K`?
>>
>>
>> -Matthias
>>
>>
>> On 1/12/19 8:35 PM, Boyang Chen wrote:
>>> Hey Matthias,
>>>
>>> thanks for taking a look! It would be great to see this pushed in 2.2. Depending on the tight timeline, I hope to at least get the KIP approved so that we don't see back and forth again as the KTable API has been constantly changing. I couldn't guarantee the implementation timeline until we agree on the updated high level APIs first. Does that make sense?
>>>
>>> Best,
>>> Boyang
>>> ________________________________
>>> From: Matthias J. Sax <ma...@confluent.io>
>>> Sent: Sunday, January 13, 2019 10:53 AM
>>> To: dev@kafka.apache.org
>>> Subject: Re: [DISCUSS] KIP-300: Add Windowed KTable API in StreamsBuilder
>>>
>>> Do you want to get this into 2.2 release? KIP deadline is 1/24, so quite
>>> soon.
>>>
>>> Overall, the KIP is very useful. I can review again in more details if
>>> you aim for 2.2 -- did you address all previous comment about the KIP
>>> already?
>>>
>>>
>>> -Matthias
>>>
>>>
>>>
>>> On 1/8/19 2:50 PM, Boyang Chen wrote:
>>>> Hey folks,
>>>>
>>>> I would like to start a discussion thread on adding new time/session windowed KTable APIs for KStream:
>>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-300%3A+Add+Windowed+KTable+API+in+StreamsBuilder
KIP-300: Add Windowed KTable API in StreamsBuilder - Apache Kafka - Apache Software Foundation<https://cwiki.apache.org/confluence/display/KAFKA/KIP-300%3A+Add+Windowed+KTable+API+in+StreamsBuilder>
cwiki.apache.org
We have an existing table() API in the StreamsBuilder which could materialize a Kafka topic into a local state store called KTable. Sometimes we have certain requirement to materialize a windowed topic (or changlog topic) created by another Stream application into local store, too. The current ...



>>>>
>>>> We have been working on this thread around 7 months ago, and it is successfully applied in our internal stream applications that enable
>>>> data sharing across multiple jobs. As a matter of fact, materialization of windowed store is definitely a concrete use case that could unblock stream users to
>>>> build more complex modules.
>>>>
>>>> Let me know if the API changes makes sense.
>>>>
>>>> Best,
>>>> Boyang
>>>> KIP-300: Add Windowed KTable API in StreamsBuilder - Apache Kafka - Apache Software Foundation<https://cwiki.apache.org/confluence/display/KAFKA/KIP-300%3A+Add+Windowed+KTable+API+in+StreamsBuilder>
>>>> We have an existing table() API in the StreamsBuilder which could materialize a Kafka topic into a local state store called KTable. This interface is very useful when we want to back up a Kafka topic to local store. Sometimes we have certain requirement to materialize a windowed topic (or changlog ...
>>>> cwiki.apache.org
>>>>
>>>>
>>>>
>>>
>>>
>>
>>
>