You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@beam.apache.org by "Valentyn Tymofieiev (Jira)" <ji...@apache.org> on 2022/05/26 16:58:00 UTC
[jira] [Updated] (BEAM-14514) Beam python SDK ignores pickle_library option in pipeline.run()
[ https://issues.apache.org/jira/browse/BEAM-14514?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Valentyn Tymofieiev updated BEAM-14514:
---------------------------------------
Status: Open (was: Triage Needed)
> Beam python SDK ignores pickle_library option in pipeline.run()
> ---------------------------------------------------------------
>
> Key: BEAM-14514
> URL: https://issues.apache.org/jira/browse/BEAM-14514
> Project: Beam
> Issue Type: Bug
> Components: sdk-py-core
> Affects Versions: 2.38.0
> Reporter: dctelus
> Assignee: Ryan Thompson
> Priority: P2
>
> Context:
> In the Python SDK, you can specify the Pipeline argument --pickle_library which dictates which library to use to pickle variables to send them from the executing machine to the workers (when save_main_session is True).
> Issue:
> pickle_library options is ignored in the pipeline.run() function, which reverts to using dill (the default one).
> https://github.com/apache/beam/blob/master/sdks/python/apache_beam/pipeline.py#L570
> Reproduce:
> Add --pickle_library cloudpickle to pipeline options and notice that dill is used for this session dump, even though cloudpickle is provided.
>
> I found this out because dill parser throws an exception for my use case, but cloud pickle doesn't.
--
This message was sent by Atlassian Jira
(v8.20.7#820007)