You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@beam.apache.org by "Valentyn Tymofieiev (Jira)" <ji...@apache.org> on 2019/11/22 01:41:00 UTC

[jira] [Comment Edited] (BEAM-6158) Using --save_main_session fails on Python 3 when main module has invocations of superclass method using 'super' .

    [ https://issues.apache.org/jira/browse/BEAM-6158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16919945#comment-16919945 ] 

Valentyn Tymofieiev edited comment on BEAM-6158 at 11/22/19 1:40 AM:
---------------------------------------------------------------------

The error is happens when main pipeline module has class methods that refer to superclass methods using super(). A reference to super in the method code creates a cyclical reference inside the object, which dill currently handles via pickling objects by reference. Such approach does not work for restoring a pickled a main session, since object classes need to be defined at the moment of unpickling . This issue will be addressed after [https://github.com/uqfoundation/dill/issues/300]. is fixed or we start using CloudPickle as a pickler, which is investigated in BEAM-8123.

In the meantime following workarounds are available:
 - don't use super() in the main module.
 - restructure the pipeline so that the pipeline code does not depend on the entities defined in the main module, and don't pass --save_main_session.
 - refer to superclass methods in the main module via SuperClassName.method(self, ...). This is NOT an equivalent replacement, but may work in simple class hierarchies. [Example|https://github.com/apache/beam/blob/7a8a26b6f1e67c619bfe283492a3f9fe83a983bb/sdks/python/apache_beam/examples/wordcount.py#L43].


was (Author: tvalentyn):
The error is happens when main pipeline module has class methods that refer to superclass methods using super(). A reference to super in the method code creates a cyclical reference inside the object, which dill  currently handles via pickling objects by reference. Such approach does not work for restoring a pickled  a main session, since object classes need to be defined at the moment of unpickling . This issue will be addressed after  https://github.com/uqfoundation/dill/issues/300. is fixed or we start using CloudPickle as a pickler, which is investigated in BEAM-8123. 

In the meantime following workarounds are available:
- don't use super() in the main module.
- refer to superclass methods via SuperClassName.method(self, ...). This is NOT an equivalent replacement, but may work in simple class hierarchies. 

> Using --save_main_session fails on Python 3 when main module has invocations of superclass method using 'super' .
> -----------------------------------------------------------------------------------------------------------------
>
>                 Key: BEAM-6158
>                 URL: https://issues.apache.org/jira/browse/BEAM-6158
>             Project: Beam
>          Issue Type: Sub-task
>          Components: sdk-py-harness
>            Reporter: Mark Liu
>            Assignee: Valentyn Tymofieiev
>            Priority: Major
>          Time Spent: 4h 10m
>  Remaining Estimate: 0h
>
> A typical manifestation of this failure, which can be observed on several Beam examples:
> {noformat}
> Traceback (most recent call last):
>   File "/usr/lib/python3.5/runpy.py", line 193, in _run_module_as_main
>     "__main__", mod_spec)
>   File "/usr/lib/python3.5/runpy.py", line 85, in _run_code
>     exec(code, run_globals)
>   File "/usr/local/google/home/valentyn/tmp/r2.14.0_py3.5_env/lib/python3.5/site-packages/apache_beam/examples/complete/game/user_score.py", line 164, in <module>                                                
>     run()
>   File "/usr/local/google/home/valentyn/tmp/r2.14.0_py3.5_env/lib/python3.5/site-packages/apache_beam/examples/complete/game/user_score.py", line 158, in run                                                     
>     | 'WriteUserScoreSums' >> beam.io.WriteToText(args.output))
>   File "/usr/local/google/home/valentyn/tmp/r2.14.0_py3.5_env/lib/python3.5/site-packages/apache_beam/pipeline.py", line 426, in __exit__                                                                         
>     self.run().wait_until_finish()
>   File "/usr/local/google/home/valentyn/tmp/r2.14.0_py3.5_env/lib/python3.5/site-packages/apache_beam/runners/dataflow/dataflow_runner.py", line 1338, in wait_until_finish                                       
>     (self.state, getattr(self._runner, 'last_error_msg', None)), self)
> apache_beam.runners.dataflow.dataflow_runner.DataflowRuntimeException: Dataflow pipeline failed. State: FAILED, Error:                                                                                            
> Traceback (most recent call last):
>   File "/usr/local/lib/python3.5/site-packages/dataflow_worker/batchworker.py", line 773, in run
>     self._load_main_session(self.local_staging_directory)
>   File "/usr/local/lib/python3.5/site-packages/dataflow_worker/batchworker.py", line 489, in _load_main_session                                                                                                   
>     pickler.load_session(session_file)
>   File "/usr/local/lib/python3.5/site-packages/apache_beam/internal/pickler.py", line 280, in load_session                                                                                                        
>     return dill.load_session(file_path)
>   File "/usr/local/lib/python3.5/site-packages/dill/_dill.py", line 410, in load_session
>     module = unpickler.load()
>   File "/usr/local/lib/python3.5/site-packages/dill/_dill.py", line 474, in find_class
>     return StockUnpickler.find_class(self, module, name)
> AttributeError: Can't get attribute 'ParseGameEventFn' on <module 'dataflow_worker.start' from '/usr/local/lib/python3.5/site-packages/dataflow_worker/start.py'> {noformat}
>  
> Note that the example has the following code [1]:
> {code:python}
> class ParseGameEventFn(beam.DoFn):
>   def __init__(self):
>     super(ParseGameEventFn, self).__init__()
> {code}
> https://github.com/apache/beam/blob/0325c360bef17a6673e2d43051e59174b8e5ccc9/sdks/python/apache_beam/examples/complete/game/user_score.py#L81
> +cc: [~tvalentyn] [~robertwb] [~altay]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)