You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@beam.apache.org by GitBox <gi...@apache.org> on 2022/06/04 13:51:53 UTC
[GitHub] [beam] damccorm opened a new issue, #19888: Python 3 pipeline fails with errors in StockUnpickler.find_class() during loading a main session.
damccorm opened a new issue, #19888:
URL: https://github.com/apache/beam/issues/19888
When running Apache Beam with Python3 on Google Cloud Dataflow the pipeline fails during pickler.load_session(session_file):
StockUnpickler.find_class(self, module, name) AttributeError: Can't get attribute 'SomeAttribute' on <module 'dataflow_worker.start' from '/usr/local/lib/python3.6/site-packages/dataflow_worker/start.py'\>
Note that this is different from BEAM-8651, since the error happens in a Batch Pipeline on a Dataflow runner and the error happens consistently.
When testing it in the local/direct runner there seems to be no issue.
```
class FlattenCustomActions(beam.PTransform):
""" Transforms Facebook Day Actions Only retains
actions with custom_conversions
Flattens the actions
Adds custom conversions names
using a side input
"""
def __init__(self, conversions):
super(FlattenCustomActions,
self).__init__()
self.conversions = conversions def expand(self, input_or_inputs):
return (
input_or_inputs
| "FlattenActions" >> beam.ParDo(flatten_filter_actions)
| "AddConversionName" >> beam.Map(add_conversion_name, self.conversions)
)
#
...
# in run():
pipeline_options = PipelineOptions(pipeline_args)
pipeline_options.view_as(SetupOptions).save_main_session
= True
p = beam.Pipeline(options=pipeline_options)
conversions_output = (
p
| "ReadConversions" >> ReadFromText(known_args.input_conversions, coder=JsonCoder())
|
TransformConversionMetadata()
) (
conversions_output
| "WriteConversions"
>> WriteCoerced(
known_args.output_conversions,
known_args.output_type,
schema_path=BIGQUERY_SCHEMA_CONVERSIONS_PATH,
)
) (
p
| ReadFacebookJson(known_args.input, retain_root_fields=True)
| FlattenCustomActions(beam.pvalue.AsList(conversions_output))
| "WriteActions"
>> WriteCoerced(
known_args.output, known_args.output_type,
schema_path=BIGQUERY_SCHEMA_ACTIONS_PATH
)
)
```
I receive the following Traceback in Dataflow:
```
Traceback (most recent call last):
File "/usr/local/lib/python3.6/site-packages/dataflow_worker/batchworker.py",
line 773, in run self._load_main_session(self.local_staging_directory)
File "/usr/local/lib/python3.6/site-packages/dataflow_worker/batchworker.py",
line 489, in _load_main_session pickler.load_session(session_file)
File "/usr/local/lib/python3.6/site-packages/apache_beam/internal/pickler.py",
line 287, in load_session return dill.load_session(file_path)
File "/usr/local/lib/python3.6/site-packages/dill/_dill.py",
line 410, in load_session module = unpickler.load()
File "/usr/local/lib/python3.6/site-packages/dill/_dill.py",
line 474, in find_class return StockUnpickler.find_class(self, module, name) AttributeError: Can't get
attribute 'FlattenCustomActions' on <module 'dataflow_worker.start' from '/usr/local/lib/python3.6/site-packages/dataflow_worker/start.py'>
```
Imported from Jira [BEAM-8441](https://issues.apache.org/jira/browse/BEAM-8441). Original Jira may contain additional context.
Reported by: Jannik.Franz@umusic.com.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@beam.apache.org.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org