You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@beam.apache.org by GitBox <gi...@apache.org> on 2023/01/20 15:14:34 UTC

[GitHub] [beam] MOscity commented on issue #21432: `beam.CombineValues` on DataFlow runner causes ambiguous failure with python SDK

MOscity commented on issue #21432:
URL: https://github.com/apache/beam/issues/21432#issuecomment-1398541772

   Hey, I'm facing the same issue here, whole pipeline works with DirectRunner (all steps), but DataflowRunner fails after 1-3secs and emits no logs. It works fine without the the CountCombineFn Step.
   
   ```
   def transform_data(right_side_data, step):
       data_out = (
               right_side_data
               | 'Step 1'.format(step) >> beam.Map(prepare_key_value)
               | 'Step 2'.format(step) >> beam.GroupByKey()
               
               # This line fails with DataflowRunner, but runs in DirectRunner locally:
               | 'Step 3'.format(step) >> beam.CombineValues(beam.combiners.CountCombineFn())
       )
       return data_out
   ```
   
   Error Log:
   ```
   ERROR:apache_beam.runners.dataflow.dataflow_runner:Console URL: https://console.cloud.google.com/dataflow/jobs/<RegionId>/2023-01-20_06_59_03-4426498189309546663?project=<ProjectId>
   Traceback (most recent call last):
     File "./path/to/file/my_python.py", line 618, in <module>
       run_pipeline()
     File "./path/to/file/my_python.py", line 598, in run_pipeline
       print(f'----- After Step: {step}.')
     File "/home/myusername/.local/share/virtualenvs/pipenv_20-Y278SNFx/lib/python3.8/site-packages/apache_beam/pipeline.py", line 598, in __exit__
       self.result.wait_until_finish()
     File "/home/myusername/.local/share/virtualenvs/pipenv_20-Y278SNFx/lib/python3.8/site-packages/apache_beam/runners/dataflow/dataflow_runner.py", line 1641, in wait_until_finish
       raise DataflowRuntimeException(
   apache_beam.runners.dataflow.dataflow_runner.DataflowRuntimeException: Dataflow pipeline failed. State: FAILED, Error:
   Error processing pipeline.
   ```
   
   Didn't figure out a workaround yet... anyone an idea?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org