You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@beam.apache.org by GitBox <gi...@apache.org> on 2022/06/05 00:31:44 UTC

[GitHub] [beam] damccorm opened a new issue, #21615: Beam python SDK ignores pickle_library option in pipeline.run()

damccorm opened a new issue, #21615:
URL: https://github.com/apache/beam/issues/21615

   Context:
   
   In the Python SDK, you can specify the Pipeline argument \--pickle_library which dictates which library to use to pickle variables to send them from the executing machine to the workers (when save_main_session is True).
   
   Issue:
   
   pickle_library options is ignored in the pipeline.run() function, which reverts to using dill (the default one).
   
   https://github.com/apache/beam/blob/master/sdks/python/apache_beam/pipeline.py#L570
   
   Reproduce:
   
   Add \--pickle_library cloudpickle to pipeline options and notice that dill is used for this session dump, even though cloudpickle is provided.
   
    
   
   I found this out because dill parser throws an exception for my use case, but cloud pickle doesn't.
   
   Imported from Jira [BEAM-14514](https://issues.apache.org/jira/browse/BEAM-14514). Original Jira may contain additional context.
   Reported by: dctelus.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] derekocallaghan commented on issue #21615: Beam python SDK ignores pickle_library option in pipeline.run()

Posted by GitBox <gi...@apache.org>.
derekocallaghan commented on issue #21615:
URL: https://github.com/apache/beam/issues/21615#issuecomment-1290436909

   I've added a local workaround which sets the specified `pickle_library` for `save_main_session` (in this case, `cloudpickle`), where I've modified https://github.com/apache/beam/blob/master/sdks/python/apache_beam/pipeline.py#L570 as follows:
   
   ```python
   try:
     pickle_library = self._options.view_as(SetupOptions).pickle_library
     if pickle_library:
       pickler.set_library(pickle_library)
     pickler.dump_session(os.path.join(tmpdir, 'main_session.pickle'))
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] AnandInguva commented on issue #21615: Beam python SDK ignores pickle_library option in pipeline.run()

Posted by GitBox <gi...@apache.org>.
AnandInguva commented on issue #21615:
URL: https://github.com/apache/beam/issues/21615#issuecomment-1307957711

   We do something similar workaround https://github.com/apache/beam/blob/b9655e7de1a682d8ec4efcafb4d610f794e1b40e/sdks/python/apache_beam/runners/portability/stager.py#L207 but it won't catch every runner.
   
   We should have a good way of declaring pickling library.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] ryanthompson591 commented on issue #21615: Beam python SDK ignores pickle_library option in pipeline.run()

Posted by GitBox <gi...@apache.org>.
ryanthompson591 commented on issue #21615:
URL: https://github.com/apache/beam/issues/21615#issuecomment-1168842019

   .self-assing


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] ryanthompson591 commented on issue #21615: Beam python SDK ignores pickle_library option in pipeline.run()

Posted by GitBox <gi...@apache.org>.
ryanthompson591 commented on issue #21615:
URL: https://github.com/apache/beam/issues/21615#issuecomment-1168842834

   .take-issue


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] damccorm commented on issue #21615: Beam python SDK ignores pickle_library option in pipeline.run()

Posted by GitBox <gi...@apache.org>.
damccorm commented on issue #21615:
URL: https://github.com/apache/beam/issues/21615#issuecomment-1146712217

   Unable to assign user @ryanthompson591. If able, self-assign, otherwise tag @damccorm so that he can assign you. Because of GitHub's spam prevention system, your activity is required to enable assignment in this repo.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] davidcavazos commented on issue #21615: Beam python SDK ignores pickle_library option in pipeline.run()

Posted by GitBox <gi...@apache.org>.
davidcavazos commented on issue #21615:
URL: https://github.com/apache/beam/issues/21615#issuecomment-1293791197

   This looks like something that maybe Beam should adopt. Would it make sense to merge this into the main branch?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] ryanthompson591 commented on issue #21615: Beam python SDK ignores pickle_library option in pipeline.run()

Posted by GitBox <gi...@apache.org>.
ryanthompson591 commented on issue #21615:
URL: https://github.com/apache/beam/issues/21615#issuecomment-1241257708

   This is a problem and I haven't been able to come up with a great solution.  The cloudpickle library needs to be specified at the start of the file.
   
   For example:
   beam.internal.pickler.set_library(USE_CLOUDPICKLE)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] tvalentyn closed issue #21615: Beam python SDK ignores pickle_library option in pipeline.run()

Posted by GitBox <gi...@apache.org>.
tvalentyn closed issue #21615: Beam python SDK ignores pickle_library option in pipeline.run()
URL: https://github.com/apache/beam/issues/21615


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org