You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@kafka.apache.org by Alexey Vasiliev <an...@list.ru.INVALID> on 2020/07/12 10:03:04 UTC

aggregating messages from topic during window period

Hi,
 
I’m trying to use kafka for the following use case:
*  My app receive events from external source;
*  It aggregates it during some window (say, 10 minutes);
*  Every 10 minutes it writes this aggregated data to external service.
I was thinking about using kafka for this, so I’ll write incoming events to a kafka topic and then read events back, aggregating it and commiting only after writing to external service occurs (so I’ll get at-least-once delivery).
My only concern is that consumer lag will be non-zero most of the time.
Is it an issue? Is there a better ways to aggregate data from kafka topic with time window?
Thanks.
 
 

Re: aggregating messages from topic during window period

Posted by "Matthias J. Sax" <mj...@apache.org>.
You can use Kafka Streams:
https://kafka.apache.org/25/documentation/streams/

The use case you describe sounds like a perfect fit it. You would need
to use the `suppress()` operator:
https://kafka.apache.org/25/documentation/streams/developer-guide/dsl-api.html#window-final-results

>> My only concern is that consumer lag will be non-zero most of the time.

In fact, using Kafka Streams, your lag won't be non-zero most if the
time, because KS will maintain your intermediate result in a
fault-tolerant manner and it will commit in-between.

You could even get exactly-once processing if you chose to write you
result back into a Kafka topic (and enable exactly-once), and let the
external service consume the result topic.


-Matthias


On 7/12/20 3:03 AM, Alexey Vasiliev wrote:
> 
> Hi,
>  
> I’m trying to use kafka for the following use case:
> *  My app receive events from external source;
> *  It aggregates it during some window (say, 10 minutes);
> *  Every 10 minutes it writes this aggregated data to external service.
> I was thinking about using kafka for this, so I’ll write incoming events to a kafka topic and then read events back, aggregating it and commiting only after writing to external service occurs (so I’ll get at-least-once delivery).
> My only concern is that consumer lag will be non-zero most of the time.
> Is it an issue? Is there a better ways to aggregate data from kafka topic with time window?
> Thanks.
>  
>  
>