You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@beam.apache.org by "Valentyn Tymofieiev (Jira)" <ji...@apache.org> on 2019/11/22 01:41:00 UTC
[jira] [Comment Edited] (BEAM-6158) Using --save_main_session fails
on Python 3 when main module has invocations of superclass method using
'super' .
[ https://issues.apache.org/jira/browse/BEAM-6158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16919945#comment-16919945 ]
Valentyn Tymofieiev edited comment on BEAM-6158 at 11/22/19 1:40 AM:
---------------------------------------------------------------------
The error is happens when main pipeline module has class methods that refer to superclass methods using super(). A reference to super in the method code creates a cyclical reference inside the object, which dill currently handles via pickling objects by reference. Such approach does not work for restoring a pickled a main session, since object classes need to be defined at the moment of unpickling . This issue will be addressed after [https://github.com/uqfoundation/dill/issues/300]. is fixed or we start using CloudPickle as a pickler, which is investigated in BEAM-8123.
In the meantime following workarounds are available:
- don't use super() in the main module.
- restructure the pipeline so that the pipeline code does not depend on the entities defined in the main module, and don't pass --save_main_session.
- refer to superclass methods in the main module via SuperClassName.method(self, ...). This is NOT an equivalent replacement, but may work in simple class hierarchies. [Example|https://github.com/apache/beam/blob/7a8a26b6f1e67c619bfe283492a3f9fe83a983bb/sdks/python/apache_beam/examples/wordcount.py#L43].
was (Author: tvalentyn):
The error is happens when main pipeline module has class methods that refer to superclass methods using super(). A reference to super in the method code creates a cyclical reference inside the object, which dill currently handles via pickling objects by reference. Such approach does not work for restoring a pickled a main session, since object classes need to be defined at the moment of unpickling . This issue will be addressed after https://github.com/uqfoundation/dill/issues/300. is fixed or we start using CloudPickle as a pickler, which is investigated in BEAM-8123.
In the meantime following workarounds are available:
- don't use super() in the main module.
- refer to superclass methods via SuperClassName.method(self, ...). This is NOT an equivalent replacement, but may work in simple class hierarchies.
> Using --save_main_session fails on Python 3 when main module has invocations of superclass method using 'super' .
> -----------------------------------------------------------------------------------------------------------------
>
> Key: BEAM-6158
> URL: https://issues.apache.org/jira/browse/BEAM-6158
> Project: Beam
> Issue Type: Sub-task
> Components: sdk-py-harness
> Reporter: Mark Liu
> Assignee: Valentyn Tymofieiev
> Priority: Major
> Time Spent: 4h 10m
> Remaining Estimate: 0h
>
> A typical manifestation of this failure, which can be observed on several Beam examples:
> {noformat}
> Traceback (most recent call last):
> File "/usr/lib/python3.5/runpy.py", line 193, in _run_module_as_main
> "__main__", mod_spec)
> File "/usr/lib/python3.5/runpy.py", line 85, in _run_code
> exec(code, run_globals)
> File "/usr/local/google/home/valentyn/tmp/r2.14.0_py3.5_env/lib/python3.5/site-packages/apache_beam/examples/complete/game/user_score.py", line 164, in <module>
> run()
> File "/usr/local/google/home/valentyn/tmp/r2.14.0_py3.5_env/lib/python3.5/site-packages/apache_beam/examples/complete/game/user_score.py", line 158, in run
> | 'WriteUserScoreSums' >> beam.io.WriteToText(args.output))
> File "/usr/local/google/home/valentyn/tmp/r2.14.0_py3.5_env/lib/python3.5/site-packages/apache_beam/pipeline.py", line 426, in __exit__
> self.run().wait_until_finish()
> File "/usr/local/google/home/valentyn/tmp/r2.14.0_py3.5_env/lib/python3.5/site-packages/apache_beam/runners/dataflow/dataflow_runner.py", line 1338, in wait_until_finish
> (self.state, getattr(self._runner, 'last_error_msg', None)), self)
> apache_beam.runners.dataflow.dataflow_runner.DataflowRuntimeException: Dataflow pipeline failed. State: FAILED, Error:
> Traceback (most recent call last):
> File "/usr/local/lib/python3.5/site-packages/dataflow_worker/batchworker.py", line 773, in run
> self._load_main_session(self.local_staging_directory)
> File "/usr/local/lib/python3.5/site-packages/dataflow_worker/batchworker.py", line 489, in _load_main_session
> pickler.load_session(session_file)
> File "/usr/local/lib/python3.5/site-packages/apache_beam/internal/pickler.py", line 280, in load_session
> return dill.load_session(file_path)
> File "/usr/local/lib/python3.5/site-packages/dill/_dill.py", line 410, in load_session
> module = unpickler.load()
> File "/usr/local/lib/python3.5/site-packages/dill/_dill.py", line 474, in find_class
> return StockUnpickler.find_class(self, module, name)
> AttributeError: Can't get attribute 'ParseGameEventFn' on <module 'dataflow_worker.start' from '/usr/local/lib/python3.5/site-packages/dataflow_worker/start.py'> {noformat}
>
> Note that the example has the following code [1]:
> {code:python}
> class ParseGameEventFn(beam.DoFn):
> def __init__(self):
> super(ParseGameEventFn, self).__init__()
> {code}
> https://github.com/apache/beam/blob/0325c360bef17a6673e2d43051e59174b8e5ccc9/sdks/python/apache_beam/examples/complete/game/user_score.py#L81
> +cc: [~tvalentyn] [~robertwb] [~altay]
--
This message was sent by Atlassian Jira
(v8.3.4#803005)