You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@flink.apache.org by GitBox <gi...@apache.org> on 2020/12/05 17:25:23 UTC

[GitHub] [flink] becketqin commented on pull request #14303: [FLINK-20379][Connector/Kafka] Improve the KafkaRecordDesrializer interface

becketqin commented on pull request #14303:
URL: https://github.com/apache/flink/pull/14303#issuecomment-739324003

I think you are right. At the high level, the performance would be the best all the CPU cores a busy and they do not do unnecessary work.

In the ideal case, there are N dedicated main threads, where N == number of CPU cores, so no computing resource is idle. These main threads will only be "interrupted" by IO, which means there are more records to be handed over to the main threads for processing. Async IO would be beneficial so that can be done in the main thread without interruption or context switch at all. We can achieve this in KafkaSource because `KafkaConsumer` is designed to be non-blocking.

The only potential problem I can think of is the overhead of increasing the parallelism. e.g. more memory footprint, more IO buffer, etc.

And I think you are also right about the assumption based on which more deserialization threads works. For most streaming systems in production, actually there are spare CPU resources. And increasing the parallelism is usually done by adding a new JVM instance which could be expensive. So adding more deserialization thread helps.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org