You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@beam.apache.org by "jihad-akl (via GitHub)" <gi...@apache.org> on 2023/01/22 08:32:59 UTC

[GitHub] [beam] jihad-akl opened a new issue, #25114: [Bug]:

jihad-akl opened a new issue, #25114:
URL: https://github.com/apache/beam/issues/25114

   ### What happened?
   
   ReadFromKafka not forwarding in streaming mode.
   
   beam_options = PipelineOptions(streaming = True)
   pipeline = beam.Pipeline(options=beam_options)
   
   messages = (
       pipeline
       | 'Read from Kafka' >> ReadFromKafka(
           consumer_config=json.load(open("config/consumer_config_beam.json")),
           topics=topic
       )
       | 'Print messages' >> beam.Map(lambda message: print("received!"))
   )
   
   Hello, in the code above, the code is stuck on ReadFromKafka.
   Adding max_num_records will only wait for the specific amount of data and them forward them to the next step and ends the codes.
   (I am using the DirectRunner I need to run the code locally)
   
   
   ### Issue Priority
   
   Priority: 1 (data loss / total loss of function)
   
   ### Issue Components
   
   - [X] Component: Python SDK
   - [ ] Component: Java SDK
   - [ ] Component: Go SDK
   - [ ] Component: Typescript SDK
   - [ ] Component: IO connector
   - [ ] Component: Beam examples
   - [ ] Component: Beam playground
   - [ ] Component: Beam katas
   - [ ] Component: Website
   - [ ] Component: Spark Runner
   - [ ] Component: Flink Runner
   - [ ] Component: Samza Runner
   - [ ] Component: Twister2 Runner
   - [ ] Component: Hazelcast Jet Runner
   - [ ] Component: Google Cloud Dataflow Runner


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] Abacn commented on issue #25114: [Bug]: ReadFromKafka not forwarding in streaming mode version 2.44.0

Posted by "Abacn (via GitHub)" <gi...@apache.org>.
Abacn commented on issue #25114:
URL: https://github.com/apache/beam/issues/25114#issuecomment-1399639396

   streaming by definition will not end; despite python directly runner is not for production and do not have full support for streaming. This is most likely working as intended


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] jihad-akl commented on issue #25114: [Bug]: ReadFromKafka not forwarding in streaming mode version on portable runners

Posted by "jihad-akl (via GitHub)" <gi...@apache.org>.
jihad-akl commented on issue #25114:
URL: https://github.com/apache/beam/issues/25114#issuecomment-1543698781

   Any update for this issue in version 2.47?
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] alexmreis commented on issue #25114: [Bug]: ReadFromKafka not forwarding in streaming mode version on portable runners

Posted by "alexmreis (via GitHub)" <gi...@apache.org>.
alexmreis commented on issue #25114:
URL: https://github.com/apache/beam/issues/25114#issuecomment-1416594045

   The implementation of Kafka in the Python SDK + Portable Runner is unfortunately rather broken for streaming use cases. I don't understand why there isn't a native python implementation based on https://github.com/confluentinc/confluent-kafka-python that doesn't have to deal with the portability layer.  It would be much more reliable, even if maybe less capable of parallel compute. 
   
   Our company has abandoned Beam and Dataflow for this very reason. Last bug I opened in August 2022, #22809 was closed today but still depends on 2 other issues, one of which remains unsolved #25114 half a year later. The Python SDK is clearly not a priority for the core team. Maybe they're too busy focusing on GCP-specific products like PubSub to put in the effort to make open source tools, like Kafka, work properly in Beam's Python SDK. There isn't even a single unit test in the test suite for an unbounded Kafka stream being windowed and keyed.
   
   As someone who really believes in Beam as a great portable standard for data engineering, it's sad to see the lack of interest from the core team in anything that is not making Google money (although we would still be paying for Dataflow if it worked).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] jihad-akl commented on issue #25114: [Bug]: ReadFromKafka not forwarding in streaming mode version on portable runners

Posted by "jihad-akl (via GitHub)" <gi...@apache.org>.
jihad-akl commented on issue #25114:
URL: https://github.com/apache/beam/issues/25114#issuecomment-1625338699

   Any update for this issue in version 2.49?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] Abacn commented on issue #25114: [Bug]: ReadFromKafka not forwarding in streaming mode version 2.44.0

Posted by "Abacn (via GitHub)" <gi...@apache.org>.
Abacn commented on issue #25114:
URL: https://github.com/apache/beam/issues/25114#issuecomment-1402103902

   fyi may hit #22809, the issue in python side may still persist
   CC: @johnjcasey
   I will also take a look


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] jihad-akl commented on issue #25114: [Bug]: ReadFromKafka not forwarding in streaming mode version 2.44.0

Posted by "jihad-akl (via GitHub)" <gi...@apache.org>.
jihad-akl commented on issue #25114:
URL: https://github.com/apache/beam/issues/25114#issuecomment-1403207042

   Please Note that if I use flink runenr 1.14 I get:
   ERROR:apache_beam.utils.subprocess_server:Starting job service with ['java', '-jar', '/root/.apache_beam/cache/jars/beam-runners-flink-1.14-job-server-2.44.0.jar', '--flink-master', 'http://localhost:8081', '--artifacts-dir', '/tmp/beam-temp1of29sbe/artifactsz8je29uo', '--job-port', '33755', '--artifact-port', '0', '--expansion-port', '0']
   ERROR:apache_beam.utils.subprocess_server:Error bringing up service
   Traceback (most recent call last):
     File "/usr/local/lib/python3.10/dist-packages/apache_beam/utils/subprocess_server.py", line 88, in start
       raise RuntimeError(
   RuntimeError: Service failed to start up with error 1
   Traceback (most recent call last):
     File "main.py", line 38, in <module>
       with beam.Pipeline(options=beam_options) as p:
     File "/usr/local/lib/python3.10/dist-packages/apache_beam/pipeline.py", line 600, in __exit__
       self.result = self.run()
     File "/usr/local/lib/python3.10/dist-packages/apache_beam/pipeline.py", line 577, in run
       return self.runner.run_pipeline(self, self._options)
     File "/usr/local/lib/python3.10/dist-packages/apache_beam/runners/portability/flink_runner.py", line 45, in run_pipeline
       return super().run_pipeline(pipeline, options)
     File "/usr/local/lib/python3.10/dist-packages/apache_beam/runners/portability/portable_runner.py", line 439, in run_pipeline
       job_service_handle = self.create_job_service(options)
     File "/usr/local/lib/python3.10/dist-packages/apache_beam/runners/portability/portable_runner.py", line 318, in create_job_service
       return self.create_job_service_handle(server.start(), options)
     File "/usr/local/lib/python3.10/dist-packages/apache_beam/runners/portability/job_server.py", line 81, in start
       self._endpoint = self._job_server.start()
     File "/usr/local/lib/python3.10/dist-packages/apache_beam/runners/portability/job_server.py", line 110, in start
       return self._server.start()
     File "/usr/local/lib/python3.10/dist-packages/apache_beam/utils/subprocess_server.py", line 88, in start
       raise RuntimeError(
   RuntimeError: Service failed to start up with error 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] jihad-akl commented on issue #25114: [Bug]: ReadFromKafka not forwarding in streaming mode version 2.44.0

Posted by "jihad-akl (via GitHub)" <gi...@apache.org>.
jihad-akl commented on issue #25114:
URL: https://github.com/apache/beam/issues/25114#issuecomment-1399879220

   True, so how can use the apache beam pipeline in streaming mode if it only gather data and not send them to the next step?
   The print received does not trigger every time I receive a message from kafka. So it is an important bug!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] jihad-akl commented on issue #25114: [Bug]: ReadFromKafka not forwarding in streaming mode version 2.44.0

Posted by "jihad-akl (via GitHub)" <gi...@apache.org>.
jihad-akl commented on issue #25114:
URL: https://github.com/apache/beam/issues/25114#issuecomment-1401986512

   So after researching and testing, I found that the Flink Runner does not help because apache flink gather a lot of data before releasing them, my use case is every message I receive from kafka I need to forwarded to the next step in the pipeline (locally).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] Abacn commented on issue #25114: [Bug]: ReadFromKafka not forwarding in streaming mode version on portable runners

Posted by "Abacn (via GitHub)" <gi...@apache.org>.
Abacn commented on issue #25114:
URL: https://github.com/apache/beam/issues/25114#issuecomment-1573666804

   Not able to get into this.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] jihad-akl commented on issue #25114: [Bug]: ReadFromKafka not forwarding in streaming mode version 2.44.0

Posted by "jihad-akl (via GitHub)" <gi...@apache.org>.
jihad-akl commented on issue #25114:
URL: https://github.com/apache/beam/issues/25114#issuecomment-1401506534

   I am trying to to implement it but till now I am facing the same issue, I can see my pipeline in the apache flink localhost:8081 but nothing happens, I am debugging it to see if I made any mistake 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] Abacn commented on issue #25114: [Bug]: ReadFromKafka not forwarding in streaming mode version 2.44.0

Posted by "Abacn (via GitHub)" <gi...@apache.org>.
Abacn commented on issue #25114:
URL: https://github.com/apache/beam/issues/25114#issuecomment-1407137644

   per #22809 the cause is likely #20979. It is due to a feature lacking on python portable runner. The thing I am not sure is why the unbounded reader also not working.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] jihad-akl commented on issue #25114: [Bug]: ReadFromKafka not forwarding in streaming mode version on portable runners

Posted by "jihad-akl (via GitHub)" <gi...@apache.org>.
jihad-akl commented on issue #25114:
URL: https://github.com/apache/beam/issues/25114#issuecomment-1573358790

   Any update for this issue in version 2.48?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] tvalentyn commented on issue #25114: [Bug]: ReadFromKafka not forwarding in streaming mode version 2.44.0

Posted by "tvalentyn (via GitHub)" <gi...@apache.org>.
tvalentyn commented on issue #25114:
URL: https://github.com/apache/beam/issues/25114#issuecomment-1401309741

   https://github.com/apache/beam/issues/24528 tracks various issues related to streaming direct runner. I am not sure if it is able to run a simple KafkaIO pipeline. Are you able to use a portable Flink Runner by chance


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] Abacn commented on issue #25114: [Bug]: ReadFromKafka not forwarding in streaming mode version on portable runners

Posted by "Abacn (via GitHub)" <gi...@apache.org>.
Abacn commented on issue #25114:
URL: https://github.com/apache/beam/issues/25114#issuecomment-1452576103

   > Was this issue addressed in the new version 2.45?
   
   Not yet. This is the feature gap in portable runner. May need substantial effort. I am trying to work on it currently though


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] jihad-akl commented on issue #25114: [Bug]: ReadFromKafka not forwarding in streaming mode version on portable runners

Posted by "jihad-akl (via GitHub)" <gi...@apache.org>.
jihad-akl commented on issue #25114:
URL: https://github.com/apache/beam/issues/25114#issuecomment-1573359814

   > > Was this issue addressed in the new version 2.45?
   > 
   > Not yet. This is the feature gap in portable runner. May need substantial effort. I am trying to work on it currently though
   
   Any update?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] jihad-akl commented on issue #25114: [Bug]: ReadFromKafka not forwarding in streaming mode version on portable runners

Posted by "jihad-akl (via GitHub)" <gi...@apache.org>.
jihad-akl commented on issue #25114:
URL: https://github.com/apache/beam/issues/25114#issuecomment-1658147767

   > Not able to get into this.
   
   Any idea where the problem is to try and make a work around?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] Abacn commented on issue #25114: [Bug]: ReadFromKafka not forwarding in streaming mode version on portable runners

Posted by "Abacn (via GitHub)" <gi...@apache.org>.
Abacn commented on issue #25114:
URL: https://github.com/apache/beam/issues/25114#issuecomment-1415986242

   yes, or feature missing


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [I] [Bug]: ReadFromKafka not forwarding in streaming mode version on portable runners [beam]

Posted by "jihad-akl (via GitHub)" <gi...@apache.org>.
jihad-akl commented on issue #25114:
URL: https://github.com/apache/beam/issues/25114#issuecomment-1767781677

   Almost 1 year and didn't get any clear response if that bug will be fixed or no, 7 versions from version 2.44 to 2.51 and the bug remains.
   Will this bug be fixed?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] Abacn commented on issue #25114: [Bug]: ReadFromKafka not forwarding in streaming mode version on portable runners

Posted by "Abacn (via GitHub)" <gi...@apache.org>.
Abacn commented on issue #25114:
URL: https://github.com/apache/beam/issues/25114#issuecomment-1416646555

   Hi @alexmreis sorry if there is any misunderstanding, #22809 is closed because the issue on KafkaIO side is fixed, by #24205 (it comments closes #22809: https://github.com/apache/beam/pull/24205#issuecomment-1353257737) That said, the use case of Dataflow Runner should be fixed in upcoming Beam v2.45.0
   
   It still experiencing issues on portable runner (flink, direct streaming) an issue not limited to kafka source, it is that the "splittable DoFn" streaming source not yet supported by portable runner (#20979). I also got bite by this issue quite often (when I validating the fix of #24205, see comments of #22809 I had). The gap between Dataflow and local runners is definitely an important thing need improve. This has direct impact to developers.
   
   Besides, no unit test in Python Kafka IO is intended. Within the cross-language framework, the code running kafka read is Java's KafkaIO and unit test is exercised there. We have CrossLanguage Validation Runner (XVR) Tests for each xlang IO and each SDK exercised in schedule. And I recently added a Python KafkaIO performance test also. That said KafkaIO in both Java and Python are our team's priority.
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] vjixy commented on issue #25114: [Bug]: ReadFromKafka not forwarding in streaming mode version 2.44.0

Posted by "vjixy (via GitHub)" <gi...@apache.org>.
vjixy commented on issue #25114:
URL: https://github.com/apache/beam/issues/25114#issuecomment-1407722855

   Thanks for your reply,
   So there is bugs in the portable runners.
   hmmmmm :(


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] hadikoub commented on issue #25114: [Bug]: ReadFromKafka not forwarding in streaming mode version on portable runners

Posted by "hadikoub (via GitHub)" <gi...@apache.org>.
hadikoub commented on issue #25114:
URL: https://github.com/apache/beam/issues/25114#issuecomment-1452021304

   Was this issue addressed in the new version 2.45?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] jihad-akl commented on issue #25114: [Bug]: ReadFromKafka not forwarding in streaming mode version 2.44.0

Posted by "jihad-akl (via GitHub)" <gi...@apache.org>.
jihad-akl commented on issue #25114:
URL: https://github.com/apache/beam/issues/25114#issuecomment-1403183472

   I am using this producer_config:
   {
       "bootstrap.servers": "localhost:9092",
       "enable.idempotence": true,
       "retries": 100,
       "max.in.flight.requests.per.connection": 5,
       "compression.type": "snappy",
       "linger.ms": 5,
       "batch.num.messages": 1,
       "queue.buffering.max.ms": 0,
       "queue.buffering.max.messages": 10
   }
   and this kafka: yml
   https://github.com/conduktor/kafka-stack-docker-compose/blob/master/zk-single-kafka-single.yml
   for the producer code:
   producer = Producer(json.load(open("producer_config.json")))
       frame_no = 0
       while True:
           
           frame_bytes = "hello" + str(frame_no)
           producer.produce(
               topic="multi-video-stream", 
               value=frame_bytes, 
               on_delivery=delivery_report,
               timestamp=frame_no,
               headers={
                   "test": str.encode("test")
               }
           )
           frame_no+=1
           # producer.poll(1)
           producer.flush()
           
           time.sleep(0.1)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] jihad-akl commented on issue #25114: [Bug]: ReadFromKafka not forwarding in streaming mode version 2.44.0

Posted by "jihad-akl (via GitHub)" <gi...@apache.org>.
jihad-akl commented on issue #25114:
URL: https://github.com/apache/beam/issues/25114#issuecomment-1403175243

   To reproduce:
   consumer_config.json:
   {
       "bootstrap.servers": "127.0.0.1:9092"
   }
   main.py:
       topic = ["multi-video-stream"]
       
       beam_options = PipelineOptions(["--runner=FlinkRunner","--flink_version=1.15","--flink_master=localhost:8081"
                                       ,"--environment_type=LOOPBACK","--streaming"])
       with beam.Pipeline(options=beam_options) as p:
           messages = p | 'Read from kafka' >> ReadFromKafka(consumer_config=json.load(open("consumer_config.json"))
                                                          ,topics=topic)
           messages | 'Print Messages' >> beam.Map(print)
           
    
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] jihad-akl commented on issue #25114: [Bug]: ReadFromKafka not forwarding in streaming mode version on portable runners

Posted by "jihad-akl (via GitHub)" <gi...@apache.org>.
jihad-akl commented on issue #25114:
URL: https://github.com/apache/beam/issues/25114#issuecomment-1711088028

   Any news for version 2.50?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org