You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@beam.apache.org by "svetakvsundhar (via GitHub)" <gi...@apache.org> on 2023/04/12 17:56:56 UTC

[GitHub] [beam] svetakvsundhar commented on a diff in pull request #26236: Pickling and Savemainsession Doc update

svetakvsundhar commented on code in PR #26236:
URL: https://github.com/apache/beam/pull/26236#discussion_r1164468409


##########
website/www/site/content/en/documentation/sdks/python-pipeline-dependencies.md:
##########
@@ -141,3 +141,12 @@ However, it may be possible to pre-build the SDK containers and perform the depe
 Dataflow, see [Pre-building the python SDK custom container image with extra dependencies](https://cloud.google.com/dataflow/docs/guides/using-custom-containers#prebuild).
 
 **NOTE**: This feature is available only for the `Dataflow Runner v2`.
+
+## Pickling and Managing Main Session
+
+Pickling in the Python SDK is set up to pickle the state of the global namespace. By default, global imports, functions, and variables defined in the main session are not saved during the serialization of a Dataflow job.
+Thus, one might encounter an unexpected `NameError` when running a `DoFn` on Dataflow Runner. To resolve this, manage the main session by
+simply setting `--save_main_session=True`. This will load the pickled state of the global namespace onto the Dataflow workers.
+For more information, see [Handling NameErrors](https://cloud.google.com/dataflow/docs/guides/common-errors#how-do-i-handle-nameerrors).
+
+**NOTE**: This strictly applies to the `Python SDK executing with the dill pickler on the Dataflow Runner`.

Review Comment:
   ah thanks for the catch! updating to make it clearer -- this doesn't apply on `DirectRunner`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org