You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@kafka.apache.org by Gioacchino Vino <gi...@gmail.com> on 2019/11/16 00:28:47 UTC

Running stateful Kafka Stream Application on topic with multiple partitions

Hi expert,


I don't understand a kafka behavior and I'm here to ask for explanation.

My processing task is pretty simple and it's quite similar to a 
change-log one.

The record value contains a key/value pair: if the new value is 
different respect the stored one, forward to the output topic and update 
the state store, otherwise do nothing. There is also a punctuate task 
that forwards all stored data to the output topic periodically (30 seconds).

The input topic has 6 partitions.

The observed behavior is that the punctuate task sends 6 times the same 
key/value pair. I figure out that there are 6 state store instances, one 
for each topic partition, and this produces the undesired behavior of 
having 6 times the same key/value pair, but I want only one.

I tried to use a single partition for the input topic and in this 
scenario I got the correct behavior: the punctuate task sends no pair 
copies.

The issue is that I don't want use input topic with a single partition 
because that topic collects data from a large number of producers.


Any better explanations?

Any comments or advices?


Thank a lot in advance,

Gioacchino




Re: Running stateful Kafka Stream Application on topic with multiple partitions

Posted by "Matthias J. Sax" <ma...@confluent.io>.
You need to partition the input data correctly, thus that all records
with the same key go the same partition. For this case, all records with
the same key will be processed by the same task, and thus each key is
stored in one shard only.

-Matthias

On 11/15/19 4:28 PM, Gioacchino Vino wrote:
> Hi expert,
> 
> 
> I don't understand a kafka behavior and I'm here to ask for explanation.
> 
> My processing task is pretty simple and it's quite similar to a
> change-log one.
> 
> The record value contains a key/value pair: if the new value is
> different respect the stored one, forward to the output topic and update
> the state store, otherwise do nothing. There is also a punctuate task
> that forwards all stored data to the output topic periodically (30
> seconds).
> 
> The input topic has 6 partitions.
> 
> The observed behavior is that the punctuate task sends 6 times the same
> key/value pair. I figure out that there are 6 state store instances, one
> for each topic partition, and this produces the undesired behavior of
> having 6 times the same key/value pair, but I want only one.
> 
> I tried to use a single partition for the input topic and in this
> scenario I got the correct behavior: the punctuate task sends no pair
> copies.
> 
> The issue is that I don't want use input topic with a single partition
> because that topic collects data from a large number of producers.
> 
> 
> Any better explanations?
> 
> Any comments or advices?
> 
> 
> Thank a lot in advance,
> 
> Gioacchino
> 
> 
>