You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@kafka.apache.org by Pietro Galassi <pi...@gmail.com> on 2021/05/06 15:11:04 UTC

KafkaStreams aggregation with multiple instance

Hi all,
hi have hope you can help me figure out this scenario.

I have a multiinstance microservice that consumes from a topic
(ordersTopic) all of them use the same consumer_group.

This microservice uses a KStream to aggregate (sum) topic events and
produces results on another topic (countTopic).

Have two questions:

1) Can i have problems on counts due to multiple instance of the same
microservies ?
2) I need rockDB and materialized view in order to store data ?

Thanks a lot.
Regards,
Pietro Galassi

Re: KafkaStreams aggregation with multiple instance

Posted by Alex Craig <al...@gmail.com>.
1.  The aggregation is done based on the key to the message.  So for a
silly example, if your messages were data about new car sales and you
wanted to count how many cars sold by color, you could consume the messages
and then "re-key" them so that the key to the message was the color.  Then
later in your streams topology, you would aggregate (count) based on that
new key.  Because kafka will guarantee that the same key will always wind
up in the same partition, you won't have a scenario where messages with the
key "red" will end up being consumed by more than 1 instance.  "Red" might
always be getting consumed/aggregated on instance A, "blue" on instance B,
etc etc.

2.  You can use other data stores as state stores and the documentation
describes how to do this, however my opinion is that unless you can a good
reason to NOT use RocksDB, I would use RocksDB - especially to start with.

Hope that helps!

Alex

On Fri, May 7, 2021 at 12:59 AM Pietro Galassi <pi...@gmail.com>
wrote:

> Hi Neeraj,
>
> 1) I have multiple instance reading from orderTopic and using aggregate
> (sum). So if instance A reads and do a +1 and instance B reads and do a +1
> at the same time can i have wrong count numbers (some +1 may be lost ?).
> Yes i'm using messageKeys and multiple partitions.
>
> 2) What state store can i use ? I'm actually using spring kafka and it
> relays on RockDB it seems.
>
> Regards,
> Pietro
>
> On Fri, May 7, 2021 at 12:39 AM Neeraj Vaidya
> <ne...@yahoo.co.in.invalid> wrote:
>
> >  Hi Pietro,
> > 1) What do you mean by problems in counts due to multiple instances ?
> > Also, do you use Keys in your messages ?
> > 2) If you want to maintain state and refer to that state when processing
> > each message, then yes you will need a state store. A state store will
> also
> > be needed if you want to I guess query that state externally.
> >
> > Regards,
> > Neeraj
> >
> >
> >      On Friday, 7 May, 2021, 01:47:59 am GMT+10, Pietro Galassi <
> > pietro.galassi@gmail.com> wrote:
> >
> >  Hi all,
> > hi have hope you can help me figure out this scenario.
> >
> > I have a multiinstance microservice that consumes from a topic
> > (ordersTopic) all of them use the same consumer_group.
> >
> > This microservice uses a KStream to aggregate (sum) topic events and
> > produces results on another topic (countTopic).
> >
> > Have two questions:
> >
> > 1) Can i have problems on counts due to multiple instance of the same
> > microservies ?
> > 2) I need rockDB and materialized view in order to store data ?
> >
> > Thanks a lot.
> > Regards,
> > Pietro Galassi
> >
>

Re: KafkaStreams aggregation with multiple instance

Posted by Pietro Galassi <pi...@gmail.com>.
Hi Neeraj,

1) I have multiple instance reading from orderTopic and using aggregate
(sum). So if instance A reads and do a +1 and instance B reads and do a +1
at the same time can i have wrong count numbers (some +1 may be lost ?).
Yes i'm using messageKeys and multiple partitions.

2) What state store can i use ? I'm actually using spring kafka and it
relays on RockDB it seems.

Regards,
Pietro

On Fri, May 7, 2021 at 12:39 AM Neeraj Vaidya
<ne...@yahoo.co.in.invalid> wrote:

>  Hi Pietro,
> 1) What do you mean by problems in counts due to multiple instances ?
> Also, do you use Keys in your messages ?
> 2) If you want to maintain state and refer to that state when processing
> each message, then yes you will need a state store. A state store will also
> be needed if you want to I guess query that state externally.
>
> Regards,
> Neeraj
>
>
>      On Friday, 7 May, 2021, 01:47:59 am GMT+10, Pietro Galassi <
> pietro.galassi@gmail.com> wrote:
>
>  Hi all,
> hi have hope you can help me figure out this scenario.
>
> I have a multiinstance microservice that consumes from a topic
> (ordersTopic) all of them use the same consumer_group.
>
> This microservice uses a KStream to aggregate (sum) topic events and
> produces results on another topic (countTopic).
>
> Have two questions:
>
> 1) Can i have problems on counts due to multiple instance of the same
> microservies ?
> 2) I need rockDB and materialized view in order to store data ?
>
> Thanks a lot.
> Regards,
> Pietro Galassi
>

Re: KafkaStreams aggregation with multiple instance

Posted by Neeraj Vaidya <ne...@yahoo.co.in.INVALID>.
 Hi Pietro,
1) What do you mean by problems in counts due to multiple instances ? Also, do you use Keys in your messages ?
2) If you want to maintain state and refer to that state when processing each message, then yes you will need a state store. A state store will also be needed if you want to I guess query that state externally.

Regards,
Neeraj


     On Friday, 7 May, 2021, 01:47:59 am GMT+10, Pietro Galassi <pi...@gmail.com> wrote:  
 
 Hi all,
hi have hope you can help me figure out this scenario.

I have a multiinstance microservice that consumes from a topic
(ordersTopic) all of them use the same consumer_group.

This microservice uses a KStream to aggregate (sum) topic events and
produces results on another topic (countTopic).

Have two questions:

1) Can i have problems on counts due to multiple instance of the same
microservies ?
2) I need rockDB and materialized view in order to store data ?

Thanks a lot.
Regards,
Pietro Galassi