You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@kafka.apache.org by 董宗桢 <ja...@126.com> on 2019/10/24 02:10:33 UTC

Kafka Streams Daily Aggregation

Hello,


I wanna run Kafka Streams on my system to aggregate the users' sales order transactions based on "daily".
I know that Kafka Streams provides such mechanisms called tumbling window, but it seems to be just setting an interval to run the aggregation function. What I want is to aggregate by calendar date, which means, for example, from 10.23 00:00 AM to 10.24 00:00AM, kind of a scheduler which runs every day at 00:00AM to count all my transactions that happened last day.


Is there any functionality in Kafka streams that I can use out of the box to implement my requirement?


Thanks

Re: Kafka Streams Daily Aggregation

Posted by "Matthias J. Sax" <ma...@confluent.io>.
Btw: There is an example implementation of a custom daily window that
considers time zones. Maybe it helps:
https://github.com/confluentinc/kafka-streams-examples/blob/5.3.1-post/src/test/java/io/confluent/examples/streams/window/DailyTimeWindows.java


-Matthias

On 10/23/19 10:02 PM, 董宗桢 wrote:
> Hello Boyang,
> 
> 
> If I start the Kafka Stream process at the middle of a day, say, 10/24 16:00 pm, and with a tumbling window size of 1 day(24 hours). Would the next aggregation run at 10/25 00:00 AM? or at 10/25 16:00 PM?
> 
> 
> 
> 
> 
> 
> 
> 
> 在 2019-10-24 11:06:24,"Boyang Chen" <re...@gmail.com> 写道:
>> Hey Zongzhen,
>>
>> I have implemented some similar functionality with KStream before. You
>> could just set tumbling window to 24 hours to get daily aggregation result.
>> As you just need calendar dates, the tumbling window computation starts
>>from system time 0 which is exactly cut-off daily.
>>
>> Boyang
>>
>> On Wed, Oct 23, 2019 at 7:21 PM 董宗桢 <ja...@126.com> wrote:
>>
>>> Hello,
>>>
>>>
>>> I wanna run Kafka Streams on my system to aggregate the users' sales order
>>> transactions based on "daily".
>>> I know that Kafka Streams provides such mechanisms called tumbling window,
>>> but it seems to be just setting an interval to run the aggregation
>>> function. What I want is to aggregate by calendar date, which means, for
>>> example, from 10.23 00:00 AM to 10.24 00:00AM, kind of a scheduler which
>>> runs every day at 00:00AM to count all my transactions that happened last
>>> day.
>>>
>>>
>>> Is there any functionality in Kafka streams that I can use out of the box
>>> to implement my requirement?
>>>
>>>
>>> Thanks


Re:Re: Kafka Streams Daily Aggregation

Posted by 董宗桢 <ja...@126.com>.
Hello Boyang,


If I start the Kafka Stream process at the middle of a day, say, 10/24 16:00 pm, and with a tumbling window size of 1 day(24 hours). Would the next aggregation run at 10/25 00:00 AM? or at 10/25 16:00 PM?








在 2019-10-24 11:06:24,"Boyang Chen" <re...@gmail.com> 写道:
>Hey Zongzhen,
>
>I have implemented some similar functionality with KStream before. You
>could just set tumbling window to 24 hours to get daily aggregation result.
>As you just need calendar dates, the tumbling window computation starts
>from system time 0 which is exactly cut-off daily.
>
>Boyang
>
>On Wed, Oct 23, 2019 at 7:21 PM 董宗桢 <ja...@126.com> wrote:
>
>> Hello,
>>
>>
>> I wanna run Kafka Streams on my system to aggregate the users' sales order
>> transactions based on "daily".
>> I know that Kafka Streams provides such mechanisms called tumbling window,
>> but it seems to be just setting an interval to run the aggregation
>> function. What I want is to aggregate by calendar date, which means, for
>> example, from 10.23 00:00 AM to 10.24 00:00AM, kind of a scheduler which
>> runs every day at 00:00AM to count all my transactions that happened last
>> day.
>>
>>
>> Is there any functionality in Kafka streams that I can use out of the box
>> to implement my requirement?
>>
>>
>> Thanks

Re: Kafka Streams Daily Aggregation

Posted by "Matthias J. Sax" <ma...@confluent.io>.
One issue to consider is timezones thought. Tumbling windows align at
timetamp zero, but zero is the start of the day in UTC only. If you are
in a different timezone, you would need to "shift" the timestamps
accordingly.

For example, you can shift them using a custom TimestampExtractor, or
you use the Processor API and modify the timestamp via
`context.forward(..., To.all().withTimestamp(...))`.


-Matthias

On 10/23/19 8:06 PM, Boyang Chen wrote:
> Hey Zongzhen,
> 
> I have implemented some similar functionality with KStream before. You
> could just set tumbling window to 24 hours to get daily aggregation result.
> As you just need calendar dates, the tumbling window computation starts
> from system time 0 which is exactly cut-off daily.
> 
> Boyang
> 
> On Wed, Oct 23, 2019 at 7:21 PM 董宗桢 <ja...@126.com> wrote:
> 
>> Hello,
>>
>>
>> I wanna run Kafka Streams on my system to aggregate the users' sales order
>> transactions based on "daily".
>> I know that Kafka Streams provides such mechanisms called tumbling window,
>> but it seems to be just setting an interval to run the aggregation
>> function. What I want is to aggregate by calendar date, which means, for
>> example, from 10.23 00:00 AM to 10.24 00:00AM, kind of a scheduler which
>> runs every day at 00:00AM to count all my transactions that happened last
>> day.
>>
>>
>> Is there any functionality in Kafka streams that I can use out of the box
>> to implement my requirement?
>>
>>
>> Thanks
> 


Re: Kafka Streams Daily Aggregation

Posted by Boyang Chen <re...@gmail.com>.
Hey Zongzhen,

I have implemented some similar functionality with KStream before. You
could just set tumbling window to 24 hours to get daily aggregation result.
As you just need calendar dates, the tumbling window computation starts
from system time 0 which is exactly cut-off daily.

Boyang

On Wed, Oct 23, 2019 at 7:21 PM 董宗桢 <ja...@126.com> wrote:

> Hello,
>
>
> I wanna run Kafka Streams on my system to aggregate the users' sales order
> transactions based on "daily".
> I know that Kafka Streams provides such mechanisms called tumbling window,
> but it seems to be just setting an interval to run the aggregation
> function. What I want is to aggregate by calendar date, which means, for
> example, from 10.23 00:00 AM to 10.24 00:00AM, kind of a scheduler which
> runs every day at 00:00AM to count all my transactions that happened last
> day.
>
>
> Is there any functionality in Kafka streams that I can use out of the box
> to implement my requirement?
>
>
> Thanks