You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by "Christian Krudewig (Corporate Development) via user" <us...@flink.apache.org> on 2023/01/22 16:59:40 UTC
Flink Statefun: How to find the performance bottleneck?
Hi fellow flink users,
I'd like to seek advice on how to find the performance bottleneck of a
stateful functions pipeline. The throughput is too low. Ideally we could
push it to 2000 messages/s, but I don't get it above 100/s. The pipeline
quickly gets under backpressure.
Some facts:
* The pipeline is running on a powerful Kubernetes cluster, with
rocksDB state backend writing to a Hadoop volume.
* There are six functions, only one of them makes use of state
* Ingress and egress are via kafka
* The pipeline is set to "exactly once" semantics with checkpoints
every 10 seconds
Here a picture from the Flink UI, showing that the active ingress is
backpressured. The functions task has subtasks which take turns in being
100% busy:
What I tried:
* Scale up all functions deployments heavily, although each container
is under low load
* Increase the memory for the task managers to 16 GB each
* Increase the parallelism from 3 to 7 task managers
* Tried switching on "buffer debloating"
(https://nightlies.apache.org/flink/flink-docs-master/docs/ops/state/checkpo
inting_under_backpressure/)
* Set execution.checkpointing.aligned-checkpoint-timeout: 300sec,
because I saw
* Increase "maxNumBatchRequests" for all funtions
I hope this is all, I tried so many things.
How can I figure out, why the pipeline is slow, i.e. what the bottleneck is?
Thanks for any advice.
Best,
Christian
--
Dr. Christian Krudewig
Corporate Development - Data Analytics
Deutsche Post DHL