You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@beam.apache.org by "Kathula, Sandeep" <Sa...@intuit.com> on 2020/08/10 16:28:39 UTC

Beam flink runner job not keeping up with input rate after downscaling

Hi,
   We started a Beam application with Flink runner with parallelism as 50. It is a stateless application.  With initial parallelism of 50, our application is able to process up to 50,000 records per second. After a week, we took a savepoint and restarted from savepoint with the parallelism of 18. We are seeing that our application is only able to process 7000 records per second but we expect it to process almost 18,000 records per second. Records processed per task manager was almost half of what is used to process previously with 50 task managers.

 When we started a new application with 18 pods without any savepoint, it is able to process ~18500 records per second. This problem occurs only when we downscale after taking a savepoint. We ported same application to simple Flink application without Apache Beam, and there it scales well without any issues after restarting from savepoint with less parallelism.  So the problem should be with Apache Beam or some config we are passing to Beam/Flink. We are using the following config:

numberOfExecutionRetries=2
externalizedCheckpointsEnabled=true
retainExternalizedCheckpointsOnCancellation=true


We didn’t give any maxParallelism in our Beam application but just specifying parallelism.

Beam version - 2.19
Flink version- 1.9

Any suggestions/help would be appreciated.


Thanks
Sandeep Kathula