You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kafka.apache.org by Paolo Patierno <pp...@live.com> on 2017/06/20 16:32:19 UTC

Kafka Streams : parallelism related to repartition topic partitions as well

Hi devs,


at following documentation page (by Confluent) I read (http://docs.confluent.io/current/streams/architecture.html#stream-partitions-and-tasks) ...


"the maximum parallelism at which your application may run is bounded by the maximum number of stream tasks, which itself is determined by maximum number of partitions of the input topic(s) the application is reading from. For example, if your input topic has 5 partitions, then you can run up to 5 applications instances"

but it seems not so true ... I mean ...
The number of the application instances depends on the possibility that we have "internal" repartition topic in our processor topology.
I tried the WordCountDemo starting from a topic with 2 partitions. In this case I'm able to run up to 4 application instances while the 5th stays idle.
It's possible because due to the map() in the example we have repartitioning (so 1 repartition topic with 2 partitions) ... it means 4 tasks for the total 4 partitions (2 for the input topic, 2 for the repartition topic) ... and this tasks can run even one for each application instance.
Following the above mentioned doc part the maximum should be just 2 (not 4).

Do you confirm this ?

Thanks,
Paolo


Paolo Patierno
Senior Software Engineer (IoT) @ Red Hat
Microsoft MVP on Windows Embedded & IoT
Microsoft Azure Advisor

Twitter : @ppatierno<http://twitter.com/ppatierno>
Linkedin : paolopatierno<http://it.linkedin.com/in/paolopatierno>
Blog : DevExperience<http://paolopatierno.wordpress.com/>

Re: Kafka Streams : parallelism related to repartition topic partitions as well

Posted by "Matthias J. Sax" <ma...@confluent.io>.
Your observation is correct.

The paragraph you quote is not very precise but also not necessarily
wrong. The example is simplified and assumes that there is no
re-partitioning even if it is not mentioned explicitly.


-Matthias


On 6/20/17 9:32 AM, Paolo Patierno wrote:
> Hi devs,
> 
> 
> at following documentation page (by Confluent) I read (http://docs.confluent.io/current/streams/architecture.html#stream-partitions-and-tasks) ...
> 
> 
> "the maximum parallelism at which your application may run is bounded by the maximum number of stream tasks, which itself is determined by maximum number of partitions of the input topic(s) the application is reading from. For example, if your input topic has 5 partitions, then you can run up to 5 applications instances"
> 
> but it seems not so true ... I mean ...
> The number of the application instances depends on the possibility that we have "internal" repartition topic in our processor topology.
> I tried the WordCountDemo starting from a topic with 2 partitions. In this case I'm able to run up to 4 application instances while the 5th stays idle.
> It's possible because due to the map() in the example we have repartitioning (so 1 repartition topic with 2 partitions) ... it means 4 tasks for the total 4 partitions (2 for the input topic, 2 for the repartition topic) ... and this tasks can run even one for each application instance.
> Following the above mentioned doc part the maximum should be just 2 (not 4).
> 
> Do you confirm this ?
> 
> Thanks,
> Paolo
> 
> 
> Paolo Patierno
> Senior Software Engineer (IoT) @ Red Hat
> Microsoft MVP on Windows Embedded & IoT
> Microsoft Azure Advisor
> 
> Twitter : @ppatierno<http://twitter.com/ppatierno>
> Linkedin : paolopatierno<http://it.linkedin.com/in/paolopatierno>
> Blog : DevExperience<http://paolopatierno.wordpress.com/>
>