You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "Oleksandr Nitavskyi (Jira)" <ji...@apache.org> on 2024/04/22 15:20:00 UTC

[jira] [Commented] (FLINK-35210) Give the option to set automatically the parallelism of the KafkaSource to the number of kafka partitions

    [ https://issues.apache.org/jira/browse/FLINK-35210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17839739#comment-17839739 ] 

Oleksandr Nitavskyi commented on FLINK-35210:
---------------------------------------------

Thanks [~npfp] for suggestion. I believe what you proposed is often resolve with some wrapper around KafkaSource, which could be a layer of indirection to do a lot of things, e.g. parallelism config.

Meanwhile could you please elaborate how could bad parallelism lead to the Idle tasks? Do you mean the case where Source parallelism is lower than the amount of partitions and thus you have Source which consumes nothing and thus you have no watermark advancement unless [Idleness|https://nightlies.apache.org/flink/flink-docs-release-1.19/docs/dev/datastream/event-time/generating_watermarks/#dealing-with-idle-sources] is not configured.

> Give the option to set automatically the parallelism of the KafkaSource to the number of kafka partitions
> ---------------------------------------------------------------------------------------------------------
>
>                 Key: FLINK-35210
>                 URL: https://issues.apache.org/jira/browse/FLINK-35210
>             Project: Flink
>          Issue Type: Improvement
>          Components: Connectors / Kafka
>            Reporter: Nicolas Perrin
>            Priority: Minor
>
> Currently the setting of the `KafkaSource` Flink's operator parallelism needs to be manually chosen which can leads to highly skewed tasks if the developer doesn't do this job.
> To avoid this issue, I propose to:
> -  retrieve dynamically the number of partitions of the topic using `KafkaConsumer.
> partitionsFor(topic).size()`,
> - set the parallelism of the stream built from the source based on this value.
>  This way there won't be any idle tasks.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)