You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@kafka.apache.org by Gioacchino Vino <gi...@gmail.com> on 2019/11/16 00:28:47 UTC
Running stateful Kafka Stream Application on topic with multiple
partitions
Hi expert,
I don't understand a kafka behavior and I'm here to ask for explanation.
My processing task is pretty simple and it's quite similar to a
change-log one.
The record value contains a key/value pair: if the new value is
different respect the stored one, forward to the output topic and update
the state store, otherwise do nothing. There is also a punctuate task
that forwards all stored data to the output topic periodically (30 seconds).
The input topic has 6 partitions.
The observed behavior is that the punctuate task sends 6 times the same
key/value pair. I figure out that there are 6 state store instances, one
for each topic partition, and this produces the undesired behavior of
having 6 times the same key/value pair, but I want only one.
I tried to use a single partition for the input topic and in this
scenario I got the correct behavior: the punctuate task sends no pair
copies.
The issue is that I don't want use input topic with a single partition
because that topic collects data from a large number of producers.
Any better explanations?
Any comments or advices?
Thank a lot in advance,
Gioacchino
Re: Running stateful Kafka Stream Application on topic with multiple
partitions
Posted by "Matthias J. Sax" <ma...@confluent.io>.
You need to partition the input data correctly, thus that all records
with the same key go the same partition. For this case, all records with
the same key will be processed by the same task, and thus each key is
stored in one shard only.
-Matthias
On 11/15/19 4:28 PM, Gioacchino Vino wrote:
> Hi expert,
>
>
> I don't understand a kafka behavior and I'm here to ask for explanation.
>
> My processing task is pretty simple and it's quite similar to a
> change-log one.
>
> The record value contains a key/value pair: if the new value is
> different respect the stored one, forward to the output topic and update
> the state store, otherwise do nothing. There is also a punctuate task
> that forwards all stored data to the output topic periodically (30
> seconds).
>
> The input topic has 6 partitions.
>
> The observed behavior is that the punctuate task sends 6 times the same
> key/value pair. I figure out that there are 6 state store instances, one
> for each topic partition, and this produces the undesired behavior of
> having 6 times the same key/value pair, but I want only one.
>
> I tried to use a single partition for the input topic and in this
> scenario I got the correct behavior: the punctuate task sends no pair
> copies.
>
> The issue is that I don't want use input topic with a single partition
> because that topic collects data from a large number of producers.
>
>
> Any better explanations?
>
> Any comments or advices?
>
>
> Thank a lot in advance,
>
> Gioacchino
>
>
>