You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@beam.apache.org by "Valentyn Tymofieiev (Jira)" <ji...@apache.org> on 2022/05/26 16:58:00 UTC

[jira] [Updated] (BEAM-14514) Beam python SDK ignores pickle_library option in pipeline.run()

     [ https://issues.apache.org/jira/browse/BEAM-14514?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Valentyn Tymofieiev updated BEAM-14514:
---------------------------------------
    Status: Open  (was: Triage Needed)

> Beam python SDK ignores pickle_library option in pipeline.run()
> ---------------------------------------------------------------
>
>                 Key: BEAM-14514
>                 URL: https://issues.apache.org/jira/browse/BEAM-14514
>             Project: Beam
>          Issue Type: Bug
>          Components: sdk-py-core
>    Affects Versions: 2.38.0
>            Reporter: dctelus
>            Assignee: Ryan Thompson
>            Priority: P2
>
> Context:
> In the Python SDK, you can specify the Pipeline argument --pickle_library which dictates which library to use to pickle variables to send them from the executing machine to the workers (when save_main_session is True).
> Issue:
> pickle_library options is ignored in the pipeline.run() function, which reverts to using dill (the default one).
> https://github.com/apache/beam/blob/master/sdks/python/apache_beam/pipeline.py#L570
> Reproduce:
> Add --pickle_library cloudpickle to pipeline options and notice that dill is used for this session dump, even though cloudpickle is provided.
>  
> I found this out because dill parser throws an exception for my use case, but cloud pickle doesn't.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)