You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@beam.apache.org by "alexmreis (via GitHub)" <gi...@apache.org> on 2023/02/04 01:35:56 UTC

[GitHub] [beam] alexmreis commented on issue #25114: [Bug]: ReadFromKafka not forwarding in streaming mode version on portable runners

alexmreis commented on issue #25114:
URL: https://github.com/apache/beam/issues/25114#issuecomment-1416594045

   The implementation of Kafka in the Python SDK + Portable Runner is unfortunately rather broken for streaming use cases. I don't understand why there isn't a native python implementation based on https://github.com/confluentinc/confluent-kafka-python that doesn't have to deal with the portability layer.  It would be much more reliable, even if maybe less capable of parallel compute. 
   
   Our company has abandoned Beam and Dataflow for this very reason. Last bug I opened in August 2022, #22809 was closed today but still depends on 2 other issues, one of which remains unsolved #25114 half a year later. The Python SDK is clearly not a priority for the core team. Maybe they're too busy focusing on GCP-specific products like PubSub to put in the effort to make open source tools, like Kafka, work properly in Beam's Python SDK. There isn't even a single unit test in the test suite for an unbounded Kafka stream being windowed and keyed.
   
   As someone who really believes in Beam as a great portable standard for data engineering, it's sad to see the lack of interest from the core team in anything that is not making Google money (although we would still be paying for Dataflow if it worked).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org