You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@beam.apache.org by GitBox <gi...@apache.org> on 2022/06/04 16:22:36 UTC
[GitHub] [beam] damccorm opened a new issue, #20244: Stateful Dataflow runner?
damccorm opened a new issue, #20244:
URL: https://github.com/apache/beam/issues/20244
Hi,
I'm trying to use python portable DataflowRunner with a [BagStateSpec]([https://beam.apache.org/releases/pydoc/2.6.0/apache_beam.transforms.userstate.html)]. Though I encounter followiung issue:
```
Traceback (most recent call last):
File "/Users/leopold/.pyenv/versions/3.6.0/lib/python3.6/runpy.py",
line 193, in _run_module_as_main
"__main__", mod_spec)
File "/Users/leopold/.pyenv/versions/3.6.0/lib/python3.6/runpy.py",
line 85, in _run_code
exec(code, run_globals)
File "/Users/leopold/workspace/BenchmarkListingStreaming/listing_beam_pipeline/test_runner.py",
line 49, in <module>
run()
File "/Users/leopold/workspace/BenchmarkListingStreaming/listing_beam_pipeline/test_runner.py",
line 44, in run
| 'write to file' >> WriteToText(known_args.output)
File "/Users/leopold/.pyenv/versions/BenchmarkListingStreaming/lib/python3.6/site-packages/apache_beam/pipeline.py",
line 481, in __exit__
self.run().wait_until_finish()
File "/Users/leopold/.pyenv/versions/BenchmarkListingStreaming/lib/python3.6/site-packages/apache_beam/runners/dataflow/dataflow_runner.py",
line 1449, in wait_until_finish
(self.state, getattr(self._runner, 'last_error_msg', None)), self)
apache_beam.runners.dataflow.dataflow_runner.DataflowRuntimeException:
Dataflow pipeline failed. State: FAILED, Error:
Traceback (most recent call last):
File "/usr/local/lib/python3.6/site-packages/dataflow_worker/batchworker.py",
line 648, in do_work
work_executor.execute()
File "/usr/local/lib/python3.6/site-packages/dataflow_worker/executor.py",
line 176, in execute
op.start()
File "apache_beam/runners/worker/operations.py", line 649, in
apache_beam.runners.worker.operations.DoOperation.start
File "apache_beam/runners/worker/operations.py",
line 651, in apache_beam.runners.worker.operations.DoOperation.start
File "apache_beam/runners/worker/operations.py",
line 652, in apache_beam.runners.worker.operations.DoOperation.start
File "apache_beam/runners/worker/operations.py",
line 261, in apache_beam.runners.worker.operations.Operation.start
File "apache_beam/runners/worker/operations.py",
line 266, in apache_beam.runners.worker.operations.Operation.start
File "apache_beam/runners/worker/operations.py",
line 597, in apache_beam.runners.worker.operations.DoOperation.setup
File "apache_beam/runners/worker/operations.py",
line 636, in apache_beam.runners.worker.operations.DoOperation.setup
File "apache_beam/runners/common.py",
line 866, in apache_beam.runners.common.DoFnRunner.__init__
Exception: Requested execution of a stateful
DoFn, but no user state context is available. This likely means that the current runner does not support
the execution of stateful DoFns.
```
I've also seen this issue in stackoverflow
[https://stackoverflow.com/questions/55413690/does-google-dataflow-support-stateful-pipelines-developed-with-python-sdk](https://stackoverflow.com/questions/55413690/does-google-dataflow-support-stateful-pipelines-developed-with-python-sdk)
Do you have any idea/ETA when this feature will be available with beam?
Thanks!
Imported from Jira [BEAM-9655](https://issues.apache.org/jira/browse/BEAM-9655). Original Jira may contain additional context.
Reported by: leopold.boudard.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@beam.apache.org.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org