You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@beam.apache.org by "tvalentyn (via GitHub)" <gi...@apache.org> on 2023/04/20 12:44:05 UTC

[GitHub] [beam] tvalentyn commented on a diff in pull request #26331: Adding info on picklers to docs [follow-on]

tvalentyn commented on code in PR #26331:
URL: https://github.com/apache/beam/pull/26331#discussion_r1172518937


##########
website/www/site/content/en/documentation/sdks/python-pipeline-dependencies.md:
##########
@@ -146,8 +146,12 @@ Dataflow, see [Pre-building the python SDK custom container image with extra dep
 
 Pickling in the Python SDK is set up to pickle the state of the global namespace. By default, global imports, functions, and variables defined in the main session are not saved during the serialization of a Beam job.
 Thus, one might encounter an unexpected `NameError` when running a `DoFn` on any remote runner using portability. To resolve this, manage the main session by
-simply setting the main session. This will load the pickled state of the global namespace onto the Dataflow workers.
+simply setting the main session, `--save_main_session`. This will load the pickled state of the global namespace onto the Dataflow workers.

Review Comment:
   ```suggestion
   setting the `--save_main_session` pipeline option. This will load the pickled state of the global namespace onto the Dataflow workers.
   ```



##########
website/www/site/content/en/documentation/sdks/python-pipeline-dependencies.md:
##########
@@ -146,8 +146,12 @@ Dataflow, see [Pre-building the python SDK custom container image with extra dep
 
 Pickling in the Python SDK is set up to pickle the state of the global namespace. By default, global imports, functions, and variables defined in the main session are not saved during the serialization of a Beam job.
 Thus, one might encounter an unexpected `NameError` when running a `DoFn` on any remote runner using portability. To resolve this, manage the main session by
-simply setting the main session. This will load the pickled state of the global namespace onto the Dataflow workers.
+simply setting the main session, `--save_main_session`. This will load the pickled state of the global namespace onto the Dataflow workers.
 For example, see [Handling NameErrors](https://cloud.google.com/dataflow/docs/guides/common-errors#how-do-i-handle-nameerrors) to set the main session on the `DataflowRunner`.
 
+The dill pickler is the default pickler in the Python SDK.

Review Comment:
   ```suggestion
   ```



##########
website/www/site/content/en/documentation/sdks/python-pipeline-dependencies.md:
##########
@@ -146,8 +146,12 @@ Dataflow, see [Pre-building the python SDK custom container image with extra dep
 
 Pickling in the Python SDK is set up to pickle the state of the global namespace. By default, global imports, functions, and variables defined in the main session are not saved during the serialization of a Beam job.
 Thus, one might encounter an unexpected `NameError` when running a `DoFn` on any remote runner using portability. To resolve this, manage the main session by
-simply setting the main session. This will load the pickled state of the global namespace onto the Dataflow workers.
+simply setting the main session, `--save_main_session`. This will load the pickled state of the global namespace onto the Dataflow workers.
 For example, see [Handling NameErrors](https://cloud.google.com/dataflow/docs/guides/common-errors#how-do-i-handle-nameerrors) to set the main session on the `DataflowRunner`.
 
+The dill pickler is the default pickler in the Python SDK.
+
 **NOTE**: This applies to the Python SDK executing with the dill pickler on any remote runner using portability. Therefore, this issue will

Review Comment:
   ```suggestion
   **NOTE**: This applies to the Python SDK executing with the dill pickler on any remote runner. Therefore, this issue will
   ```
   
   (removing dev jargon which may be confusing to users)



##########
website/www/site/content/en/documentation/sdks/python-pipeline-dependencies.md:
##########
@@ -146,8 +146,12 @@ Dataflow, see [Pre-building the python SDK custom container image with extra dep
 
 Pickling in the Python SDK is set up to pickle the state of the global namespace. By default, global imports, functions, and variables defined in the main session are not saved during the serialization of a Beam job.
 Thus, one might encounter an unexpected `NameError` when running a `DoFn` on any remote runner using portability. To resolve this, manage the main session by

Review Comment:
   ```suggestion
   Thus, one might encounter an unexpected `NameError` when running a `DoFn` on any remote runner. To resolve this, supply the main session content with the pipeline by
   ```



##########
website/www/site/content/en/documentation/sdks/python-pipeline-dependencies.md:
##########
@@ -146,8 +146,12 @@ Dataflow, see [Pre-building the python SDK custom container image with extra dep
 
 Pickling in the Python SDK is set up to pickle the state of the global namespace. By default, global imports, functions, and variables defined in the main session are not saved during the serialization of a Beam job.

Review Comment:
   ```suggestion
   When the Python SDK submits the pipeline for execution to a remote runner, the pipeline contents, such as transform user code, is serialized (or pickled)  into a bytecode using libraries that perform the serialization (also called picklers).  The default pickler library used by Beam is `dill`. By default, global imports, functions, and variables defined in the main pipeline module are not saved during the serialization of a Beam job.
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org