You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@beam.apache.org by yi...@apache.org on 2022/05/02 18:14:11 UTC

[beam] branch master updated: Add website link log to notify user of pre-build workflow. (#17498)

This is an automated email from the ASF dual-hosted git repository.

yichi pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/beam.git


The following commit(s) were added to refs/heads/master by this push:
     new 2adf0c68aed Add website link log to notify user of pre-build workflow. (#17498)
2adf0c68aed is described below

commit 2adf0c68aed3480e8cd54139933afb6292574bce
Author: Yichi Zhang <zy...@google.com>
AuthorDate: Mon May 2 11:14:05 2022 -0700

    Add website link log to notify user of pre-build workflow. (#17498)
---
 .../apache_beam/runners/dataflow/dataflow_runner.py | 11 ++++++++++-
 .../sdks/python-pipeline-dependencies.md            | 21 ++-------------------
 2 files changed, 12 insertions(+), 20 deletions(-)

diff --git a/sdks/python/apache_beam/runners/dataflow/dataflow_runner.py b/sdks/python/apache_beam/runners/dataflow/dataflow_runner.py
index b395508fd12..13cbec6dc02 100644
--- a/sdks/python/apache_beam/runners/dataflow/dataflow_runner.py
+++ b/sdks/python/apache_beam/runners/dataflow/dataflow_runner.py
@@ -471,10 +471,19 @@ class DataflowRunner(PipelineRunner):
         options.view_as(WorkerOptions).sdk_container_image = (
             self._default_environment.container_image)
       else:
+        artifacts = environments.python_sdk_dependencies(options)
+        if artifacts and apiclient._use_fnapi(options):
+          _LOGGER.info(
+              "Pipeline has additional dependencies to be installed "
+              "in SDK worker container, consider using the SDK "
+              "container image pre-building workflow to avoid "
+              "repetitive installations. Learn more on "
+              "https://cloud.google.com/dataflow/docs/guides/"
+              "using-custom-containers#prebuild")
         self._default_environment = (
             environments.DockerEnvironment.from_container_image(
                 apiclient.get_container_image_from_options(options),
-                artifacts=environments.python_sdk_dependencies(options),
+                artifacts=artifacts,
                 resource_hints=environments.resource_hints_from_options(
                     options)))
 
diff --git a/website/www/site/content/en/documentation/sdks/python-pipeline-dependencies.md b/website/www/site/content/en/documentation/sdks/python-pipeline-dependencies.md
index 4df2374cf47..ca91194c533 100644
--- a/website/www/site/content/en/documentation/sdks/python-pipeline-dependencies.md
+++ b/website/www/site/content/en/documentation/sdks/python-pipeline-dependencies.md
@@ -138,24 +138,7 @@ If your pipeline uses non-Python packages (e.g. packages that require installati
 ## Pre-building SDK container image
 
 In pipeline execution modes where a Beam runner launches SDK workers in Docker containers, the additional pipeline dependencies (specified via `--requirements_file` and other runtime options) are installed into the containers at runtime. This can increase the worker startup time.
- However, it may be possible to pre-build the SDK containers and perform the dependency installation once before the workers start. To pre-build the container image before pipeline submission, provide the pipeline options mentioned below.
-1. Provide the container engine. Beam supports `local_docker`(requires local installation of Docker) and `cloud_build`(requires a GCP project with Cloud Build API enabled).
-
-       --prebuild_sdk_container_engine=<container_engine>
-
-2. If using `local_docker` engine, provide a URL for the remote registry to which the image will be pushed by passing
-
-       --docker_registry_push_url=<remote_registry_url>
-       # Example: --docker_registry_push_url=<registry_name>/beam
-       # pre-built image will be pushed to the <registry_name>/beam/beam_python_prebuilt_sdk:<unique_image_tag>
-       # <unique_image_tag> tag is generated by Beam SDK.
-
-   **NOTE:** `docker_registry_push_url` must be a remote registry.
-> The pre-building feature requires the Apache Beam SDK for Python, version 2.25.0 or later.
-The container images created during prebuilding will persist beyond the pipeline runtime.
-Once your job is finished or stopped, you can remove the pre-built image from the container registry.
-
->If your pipeline is using a custom container image, most likely you will not benefit from pre-building step as extra dependencies can be preinstalled in the custom image at build time. If you still would like to use pre-building with custom images, use Apache Beam SDK 2.38.0 or newer and
- supply your custom image via `--sdk_container_image` pipeline option.
+However, it may be possible to pre-build the SDK containers and perform the dependency installation once before the workers start with `--prebuild_sdk_container_engine`. For instructions of how to use pre-building with Google Cloud
+Dataflow, see [Pre-building the python SDK custom container image with extra dependencies](https://cloud.google.com/dataflow/docs/guides/using-custom-containers#prebuild).
 
 **NOTE**: This feature is available only for the `Dataflow Runner v2`.