You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@beam.apache.org by GitBox <gi...@apache.org> on 2022/06/04 12:32:39 UTC

[GitHub] [beam] damccorm opened a new issue, #19630: Streaming from PubSub to Firestore doesn't work on Dataflow

damccorm opened a new issue, #19630:
URL: https://github.com/apache/beam/issues/19630

   I came to the same error as here [https://stackoverflow.com/questions/57059944/python-package-errors-while-running-gcp-dataflow](https://stackoverflow.com/questions/57059944/python-package-errors-while-running-gcp-dataflow) but I don't see anywhere reported thus I am creating an issue just in case.
   
   The pipeline is quite simple, reading from PubSub and writing to Firestore.
   
   Beam version used is 2.13.0, Python 2.7
   
   With DirectRunner works ok, but on Dataflow it throws the following message:
   
    
   ```
   
   java.util.concurrent.ExecutionException: java.lang.RuntimeException: Error received from SDK harness
   for instruction -81: Traceback (most recent call last):
    File "/usr/local/lib/python2.7/dist-packages/apache_beam/runners/worker/sdk_worker.py",
   line 157, in _execute
    response = task()
    File "/usr/local/lib/python2.7/dist-packages/apache_beam/runners/worker/sdk_worker.py",
   line 190, in <lambda>
    self._execute(lambda: worker.do_instruction(work), work)
    File "/usr/local/lib/python2.7/dist-packages/apache_beam/runners/worker/sdk_worker.py",
   line 312, in do_instruction
    request.instruction_id)
    File "/usr/local/lib/python2.7/dist-packages/apache_beam/runners/worker/sdk_worker.py",
   line 331, in process_bundle
    bundle_processor.process_bundle(instruction_id))
    File "/usr/local/lib/python2.7/dist-packages/apache_beam/runners/worker/bundle_processor.py",
   line 538, in process_bundle
    op.start()
    File "apache_beam/runners/worker/operations.py", line 554,
   in apache_beam.runners.worker.operations.DoOperation.start
    def start(self):
    File "apache_beam/runners/worker/operations.py",
   line 555, in apache_beam.runners.worker.operations.DoOperation.start
    with self.scoped_start_state:
   
   File "apache_beam/runners/worker/operations.py", line 557, in apache_beam.runners.worker.operations.DoOperation.start
   
   self.dofn_runner.start()
    File "apache_beam/runners/common.py", line 778, in apache_beam.runners.common.DoFnRunner.start
   
   self._invoke_bundle_method(self.do_fn_invoker.invoke_start_bundle)
    File "apache_beam/runners/common.py",
   line 775, in apache_beam.runners.common.DoFnRunner._invoke_bundle_method
    self._reraise_augmented(exn)
   
   File "apache_beam/runners/common.py", line 800, in apache_beam.runners.common.DoFnRunner._reraise_augmented
   
   raise_with_traceback(new_exn)
    File "apache_beam/runners/common.py", line 773, in apache_beam.runners.common.DoFnRunner._invoke_bundle_method
   
   bundle_method()
    File "apache_beam/runners/common.py", line 359, in apache_beam.runners.common.DoFnInvoker.invoke_start_bundle
   
   def invoke_start_bundle(self):
    File "apache_beam/runners/common.py", line 363, in apache_beam.runners.common.DoFnInvoker.invoke_start_bundle
   
   self.signature.start_bundle_method.method_value())
    File "/home/zdenulo/dev/gcp_stuff/df_firestore_stream/df_firestore_stream.py",
   line 39, in start_bundle
   NameError: global name 'firestore' is not defined [while running 'generatedPtransform-64']
    
   
   ```
   
   It's interesting that using Beam version 2.12.0 solves the problem on Dataflow, it works as expected, not sure what could be the problem.
   
   Here is a repository with the code which was used [https://github.com/zdenulo/dataflow_firestore_stream](https://github.com/zdenulo/dataflow_firestore_stream)
   
    
   
   Imported from Jira [BEAM-7871](https://issues.apache.org/jira/browse/BEAM-7871). Original Jira may contain additional context.
   Reported by: zdenulo.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org