You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@flink.apache.org by "Kathula, Sandeep via user" <us...@flink.apache.org> on 2023/01/18 17:22:25 UTC

Unequal distribtion of Kafka partitions with respect to topic when reading from multiple topics using KafkaSource API with Flink 1.14.3

Hi,
I am using KafkaSource API read from 6 topics within Kafka. Flink version - 1.14.3. Each and every kafka topic my Flink pipeline reads from is having a different load but same number of partitions (lets say 30). For example partition 0 of topic 1 and partition 0 of topic 2 have different loads but all the partitions of a given topic have similar load. So the total number of partitions my pipeline reads from is 30 * 6 = 180 partitions.

1. Initially I started with 15 task slots. Each task slot is reading from 12 partitions (2 partitions of topic1, 2 partitions of topic2,……)
2. I took a savepoint and redeployed the application from savepoint by scaling it up to 20 task slots.
3. I then took another savepoint and redeployed the application from new savepoint by scaling it down back to 15 task slots.

I then observe that each task slot is reading from 12 partitions but not 2 partitions of each topic. For example, task slot 0 is reading from 3 partitions of topic1, 3 partitions of topic2, 2 partitions of topic3, 2 partitions of topic 4, 1 partition of topic 5 and 1 partition of topic 6. As the load is not same across partitions of different topics, all the pods are not processing same number of records and we are seeing lag from the few pods which are processing partitions of heavy throughput.

This issue is coming up during scaling up and scaling down. I suspect the partition distribution is going wrong during scaling up and scaling down when reading from savepoint.

Can someone please help with this.

Thanks,
Sandeep