You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@beam.apache.org by "pof-declaneaston (via GitHub)" <gi...@apache.org> on 2023/02/16 01:10:26 UTC

[GitHub] [beam] pof-declaneaston opened a new issue, #25502: [Bug]: Zero Backlog for DataFlow Python Kafka Pipeline

pof-declaneaston opened a new issue, #25502:
URL: https://github.com/apache/beam/issues/25502

   ### What happened?
   
   Hello,
   
   I have built a DataFlow Python pipeline which I am using to consume from and produce to Kafka. When I run my pipeline I notice that the backlog metric in the DataFlow job UI stays at or near 0 seemingly forever. This is incorrect since I can see in my Kafka metrics that the consumer group's latency is constantly increasing, the DataFlow job is not keeping up with the consumer. Since the backlog is staying low the job never scales up and never catches up with the topic. I believe Java pipelines using KafkaIO correctly take consumer latency into account to track backlog.
   
   My pipeline is running v2.44.0. I would provide something to repro but I don't know how you would access a Kafka cluster to repro with.
   
   Thanks a lot,
   Declan
   
   ### Issue Priority
   
   Priority: 2 (default / most bugs should be filed as P2)
   
   ### Issue Components
   
   - [X] Component: Python SDK
   - [ ] Component: Java SDK
   - [ ] Component: Go SDK
   - [ ] Component: Typescript SDK
   - [X] Component: IO connector
   - [ ] Component: Beam examples
   - [ ] Component: Beam playground
   - [ ] Component: Beam katas
   - [ ] Component: Website
   - [ ] Component: Spark Runner
   - [ ] Component: Flink Runner
   - [ ] Component: Samza Runner
   - [ ] Component: Twister2 Runner
   - [ ] Component: Hazelcast Jet Runner
   - [X] Component: Google Cloud Dataflow Runner


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] pof-declaneaston commented on issue #25502: [Bug]: Zero Backlog for DataFlow Python Kafka Pipeline

Posted by "pof-declaneaston (via GitHub)" <gi...@apache.org>.
pof-declaneaston commented on issue #25502:
URL: https://github.com/apache/beam/issues/25502#issuecomment-1432334863

   I tried running the pipeline with num_workers = 3, which worked well for a little while but eventually the runner scaled it back down to 1 worker and the system is unable to keepup.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] pof-declaneaston commented on issue #25502: [Bug]: Backlog Does Not Track Consumer Lag for DataFlow Python Kafka Pipeline

Posted by "pof-declaneaston (via GitHub)" <gi...@apache.org>.
pof-declaneaston commented on issue #25502:
URL: https://github.com/apache/beam/issues/25502#issuecomment-1433940916

   I realized that the autoscaler can actually be disabled so I have a work around for the issue but definitely would prefer a cleaner solution.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org