You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by sagar loke <sa...@gmail.com> on 2018/07/18 05:54:37 UTC

FlinkKafkaConsumer configuration to consume from Multiple Kafka Topics

Hi,

We have a use case where we are consuming from more than 100s of Kafka
Topics. Each topic has different number of partitions.

As per the documentation, to parallelize a Kafka Topic, we need to use
setParallelism() == number of Kafka Partitions for a topic.

But if we are consuming multiple topics in Flink by providing pattern eg.
*my_topic_** and for each topic if there is different configuration for
partitions,

then how should we connect all these together so that we can map Kafka
Partition to Flink Parallelization correctly and programmatically (so that
we don't have to hard code all the topic names and parallelism --
considering we can access kafka topic <-> number of partitions mapping in
Flink) ?

Thanks,

Re: FlinkKafkaConsumer configuration to consume from Multiple Kafka Topics

Posted by Andrey Zagrebin <an...@data-artisans.com>.
Hi Sagar,

At the moment number of partitions in Kafka source topics and parallelism of Flink Kafka source operator are completely independent. Flink will internally distribute partitions between a number of source parallel subtasks which you configure. In case of dynamic partition or topic discovery while running it also happens automatically.

Job or source parallelism can be set e.g. to the total number of Kafka partitions over all topics known in advanced, if programmatically then e.g. using Kafka client.

Cheers,
Andrey

> On 18 Jul 2018, at 07:54, sagar loke <sa...@gmail.com> wrote:
> 
> Hi,
> 
> We have a use case where we are consuming from more than 100s of Kafka Topics. Each topic has different number of partitions. 
> 
> As per the documentation, to parallelize a Kafka Topic, we need to use setParallelism() == number of Kafka Partitions for a topic. 
> 
> But if we are consuming multiple topics in Flink by providing pattern eg. my_topic_* and for each topic if there is different configuration for partitions, 
> 
> then how should we connect all these together so that we can map Kafka Partition to Flink Parallelization correctly and programmatically (so that we don't have to hard code all the topic names and parallelism -- considering we can access kafka topic <-> number of partitions mapping in Flink) ?
> 
> Thanks,