You are viewing a plain text version of this content. The canonical link for it is here.

Posted to github@beam.apache.org by "pof-declaneaston (via GitHub)" <gi...@apache.org> on 2023/02/09 18:49:20 UTC

[GitHub] [beam] pof-declaneaston commented on issue #22809: [Bug]: Python SDK gets stuck when using Unbounded PCollection in streaming mode on GroupByKey after ReadFromKafka on DirectRunner, FlinkRunner and DataflowRunner

pof-declaneaston commented on issue #22809:
URL: https://github.com/apache/beam/issues/22809#issuecomment-1424661912

   Hello everyone. I am trying to build a Python DataFlow pipeline with Kafka as the input. I am experience issues with consuming from Kafka both with the DirectRunner and DataFlowRunner. If I add max_records I can can data from the DirectRunner but I haven't been able to consume messages with the DataFlowRunner. I think the DataFlow issue might actually be related to networking between GCP and my on-prem, I am working on that, but it looks like others have struggled to get DataFlow working correctly.
   
   I can see a couple of different tickets related to this issue and I wanted to ask for some clarity on the situation as there is a lot of information:
   
   1. Is there a workaround for the issue on DataFlow with v2.44.0 or earlier?
   2. It looks like the issue in the DataFlow runner is being addressed in v2.45.0. Is there any estimate on when that version will be available to the public? An RC release would work well enough.
   3. Will the issue in the DirectRunner be addressed in an upcoming release?
   
   Thanks a lot for any help!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org