You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@beam.apache.org by GitBox <gi...@apache.org> on 2023/01/20 15:14:34 UTC
[GitHub] [beam] MOscity commented on issue #21432: `beam.CombineValues` on DataFlow runner causes ambiguous failure with python SDK
MOscity commented on issue #21432:
URL: https://github.com/apache/beam/issues/21432#issuecomment-1398541772
Hey, I'm facing the same issue here, whole pipeline works with DirectRunner (all steps), but DataflowRunner fails after 1-3secs and emits no logs. It works fine without the the CountCombineFn Step.
```
def transform_data(right_side_data, step):
data_out = (
right_side_data
| 'Step 1'.format(step) >> beam.Map(prepare_key_value)
| 'Step 2'.format(step) >> beam.GroupByKey()
# This line fails with DataflowRunner, but runs in DirectRunner locally:
| 'Step 3'.format(step) >> beam.CombineValues(beam.combiners.CountCombineFn())
)
return data_out
```
Error Log:
```
ERROR:apache_beam.runners.dataflow.dataflow_runner:Console URL: https://console.cloud.google.com/dataflow/jobs/<RegionId>/2023-01-20_06_59_03-4426498189309546663?project=<ProjectId>
Traceback (most recent call last):
File "./path/to/file/my_python.py", line 618, in <module>
run_pipeline()
File "./path/to/file/my_python.py", line 598, in run_pipeline
print(f'----- After Step: {step}.')
File "/home/myusername/.local/share/virtualenvs/pipenv_20-Y278SNFx/lib/python3.8/site-packages/apache_beam/pipeline.py", line 598, in __exit__
self.result.wait_until_finish()
File "/home/myusername/.local/share/virtualenvs/pipenv_20-Y278SNFx/lib/python3.8/site-packages/apache_beam/runners/dataflow/dataflow_runner.py", line 1641, in wait_until_finish
raise DataflowRuntimeException(
apache_beam.runners.dataflow.dataflow_runner.DataflowRuntimeException: Dataflow pipeline failed. State: FAILED, Error:
Error processing pipeline.
```
Didn't figure out a workaround yet... anyone an idea?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@beam.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org