You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@beam.apache.org by yi...@apache.org on 2022/05/02 18:14:11 UTC
[beam] branch master updated: Add website link log to notify user of pre-build workflow. (#17498)
This is an automated email from the ASF dual-hosted git repository.
yichi pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/beam.git
The following commit(s) were added to refs/heads/master by this push:
new 2adf0c68aed Add website link log to notify user of pre-build workflow. (#17498)
2adf0c68aed is described below
commit 2adf0c68aed3480e8cd54139933afb6292574bce
Author: Yichi Zhang <zy...@google.com>
AuthorDate: Mon May 2 11:14:05 2022 -0700
Add website link log to notify user of pre-build workflow. (#17498)
---
.../apache_beam/runners/dataflow/dataflow_runner.py | 11 ++++++++++-
.../sdks/python-pipeline-dependencies.md | 21 ++-------------------
2 files changed, 12 insertions(+), 20 deletions(-)
diff --git a/sdks/python/apache_beam/runners/dataflow/dataflow_runner.py b/sdks/python/apache_beam/runners/dataflow/dataflow_runner.py
index b395508fd12..13cbec6dc02 100644
--- a/sdks/python/apache_beam/runners/dataflow/dataflow_runner.py
+++ b/sdks/python/apache_beam/runners/dataflow/dataflow_runner.py
@@ -471,10 +471,19 @@ class DataflowRunner(PipelineRunner):
options.view_as(WorkerOptions).sdk_container_image = (
self._default_environment.container_image)
else:
+ artifacts = environments.python_sdk_dependencies(options)
+ if artifacts and apiclient._use_fnapi(options):
+ _LOGGER.info(
+ "Pipeline has additional dependencies to be installed "
+ "in SDK worker container, consider using the SDK "
+ "container image pre-building workflow to avoid "
+ "repetitive installations. Learn more on "
+ "https://cloud.google.com/dataflow/docs/guides/"
+ "using-custom-containers#prebuild")
self._default_environment = (
environments.DockerEnvironment.from_container_image(
apiclient.get_container_image_from_options(options),
- artifacts=environments.python_sdk_dependencies(options),
+ artifacts=artifacts,
resource_hints=environments.resource_hints_from_options(
options)))
diff --git a/website/www/site/content/en/documentation/sdks/python-pipeline-dependencies.md b/website/www/site/content/en/documentation/sdks/python-pipeline-dependencies.md
index 4df2374cf47..ca91194c533 100644
--- a/website/www/site/content/en/documentation/sdks/python-pipeline-dependencies.md
+++ b/website/www/site/content/en/documentation/sdks/python-pipeline-dependencies.md
@@ -138,24 +138,7 @@ If your pipeline uses non-Python packages (e.g. packages that require installati
## Pre-building SDK container image
In pipeline execution modes where a Beam runner launches SDK workers in Docker containers, the additional pipeline dependencies (specified via `--requirements_file` and other runtime options) are installed into the containers at runtime. This can increase the worker startup time.
- However, it may be possible to pre-build the SDK containers and perform the dependency installation once before the workers start. To pre-build the container image before pipeline submission, provide the pipeline options mentioned below.
-1. Provide the container engine. Beam supports `local_docker`(requires local installation of Docker) and `cloud_build`(requires a GCP project with Cloud Build API enabled).
-
- --prebuild_sdk_container_engine=<container_engine>
-
-2. If using `local_docker` engine, provide a URL for the remote registry to which the image will be pushed by passing
-
- --docker_registry_push_url=<remote_registry_url>
- # Example: --docker_registry_push_url=<registry_name>/beam
- # pre-built image will be pushed to the <registry_name>/beam/beam_python_prebuilt_sdk:<unique_image_tag>
- # <unique_image_tag> tag is generated by Beam SDK.
-
- **NOTE:** `docker_registry_push_url` must be a remote registry.
-> The pre-building feature requires the Apache Beam SDK for Python, version 2.25.0 or later.
-The container images created during prebuilding will persist beyond the pipeline runtime.
-Once your job is finished or stopped, you can remove the pre-built image from the container registry.
-
->If your pipeline is using a custom container image, most likely you will not benefit from pre-building step as extra dependencies can be preinstalled in the custom image at build time. If you still would like to use pre-building with custom images, use Apache Beam SDK 2.38.0 or newer and
- supply your custom image via `--sdk_container_image` pipeline option.
+However, it may be possible to pre-build the SDK containers and perform the dependency installation once before the workers start with `--prebuild_sdk_container_engine`. For instructions of how to use pre-building with Google Cloud
+Dataflow, see [Pre-building the python SDK custom container image with extra dependencies](https://cloud.google.com/dataflow/docs/guides/using-custom-containers#prebuild).
**NOTE**: This feature is available only for the `Dataflow Runner v2`.