You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@beam.apache.org by GitBox <gi...@apache.org> on 2022/03/07 16:57:46 UTC

[GitHub] [beam] AnandInguva commented on a change in pull request #16938: [BEAM-13314]Revise recommendations to manage Python pipeline dependencies.

AnandInguva commented on a change in pull request #16938:
URL: https://github.com/apache/beam/pull/16938#discussion_r820906597



##########
File path: website/www/site/content/en/documentation/runtime/environments.md
##########
@@ -171,6 +171,49 @@ creates a Java 8 SDK image with appropriate licenses in `/opt/apache/beam/third_
 
 By default, no licenses/notices are added to the docker images.
 
+#### Build an existing container image to make it compatible with Apache Beam Runners {#modify-existing-base-image}
+Beam offers a way to take a Beam container image and customize it. But if you have an existing base image to be compatible with Apache Beam Runners, use a [multi-stage build](https://docs.docker.com/develop/develop-images/multistage-build/) process to copy over the necessary artifacts from a default Apache Beam base image and provide your custom container image.
+
+
+1. Copy necessary artifacts from Apache Beam base image to your image.
+  ```
+  # This can be any container image,
+ FROM python:3.8-slim

Review comment:
       Thanks for catching

##########
File path: website/www/site/content/en/documentation/sdks/python-pipeline-dependencies.md
##########
@@ -45,6 +45,19 @@ If your pipeline uses public packages from the [Python Package Index](https://py
     The runner will use the `requirements.txt` file to install your additional dependencies onto the remote workers.
 
 **Important:** Remote workers will install all packages listed in the `requirements.txt` file. Because of this, it's very important that you delete non-PyPI packages from the `requirements.txt` file, as stated in step 2. If you don't remove non-PyPI packages, the remote workers will fail when attempting to install packages from sources that are unknown to them.
+> **NOTE**: An alternative to `pip check` is to use a library like [pip-tools](https://github.com/jazzband/pip-tools) to compile the `requirements.txt` with all the dependencies required for the pipeline.
+## Custom Containers {#custom-containers}
+
+You can pass a [container](https://hub.docker.com/search?q=apache%2Fbeam&type=image) image with all the dependencies that are needed for the pipeline instead of `requirements.txt`. [Follow the instructions on how to run pipeline with Custom Container images](https://beam.apache.org/documentation/runtime/environments/#running-pipelines).
+
+1. If you are passing a custom container image, `--sdk_container_image` at runtime and specify `--requirements_file` option, we recommend you to install the dependencies from the `--requirements_file` when building your container image. In this case, you would reduce the pipeline startup time and do not need to pass `--requirements_file` option at runtime.

Review comment:
       Changed it

##########
File path: website/www/site/content/en/documentation/sdks/python-pipeline-dependencies.md
##########
@@ -45,6 +45,19 @@ If your pipeline uses public packages from the [Python Package Index](https://py
     The runner will use the `requirements.txt` file to install your additional dependencies onto the remote workers.
 
 **Important:** Remote workers will install all packages listed in the `requirements.txt` file. Because of this, it's very important that you delete non-PyPI packages from the `requirements.txt` file, as stated in step 2. If you don't remove non-PyPI packages, the remote workers will fail when attempting to install packages from sources that are unknown to them.
+> **NOTE**: An alternative to `pip check` is to use a library like [pip-tools](https://github.com/jazzband/pip-tools) to compile the `requirements.txt` with all the dependencies required for the pipeline.
+## Custom Containers {#custom-containers}
+
+You can pass a [container](https://hub.docker.com/search?q=apache%2Fbeam&type=image) image with all the dependencies that are needed for the pipeline instead of `requirements.txt`. [Follow the instructions on how to run pipeline with Custom Container images](https://beam.apache.org/documentation/runtime/environments/#running-pipelines).
+
+1. If you are passing a custom container image, `--sdk_container_image` at runtime and specify `--requirements_file` option, we recommend you to install the dependencies from the `--requirements_file` when building your container image. In this case, you would reduce the pipeline startup time and do not need to pass `--requirements_file` option at runtime.
+
+       # Add these lines with the path to the requirements.txt to the Dockerfile
+
+       COPY <path to requirements.txt> /tmp/requirements.txt
+       RUN python -m pip download -r /tmp/requirements.txt
+
+**Note:** [Different approaches](https://beam.apache.org/documentation/runtime/environments/#writing-new-dockerfiles) to build the container images that would be compatible with Apache Beam Runners.

Review comment:
       I thought may be referencing on how to use custom container would be useful but thinking about it, you are right

##########
File path: website/www/site/content/en/documentation/runtime/environments.md
##########
@@ -46,7 +46,7 @@ Beam [SDK container images](https://hub.docker.com/search?q=apache%2Fbeam&type=i
 
 1. **[Writing a new](#writing-new-dockerfiles) Dockerfile based on a released container image**. This is sufficient for simple additions to the image, such as adding artifacts or environment variables.
 2. **[Modifying](#modifying-dockerfiles) a source Dockerfile in [Beam](https://github.com/apache/beam)**. This method requires building from Beam source but allows for greater customization of the container (including replacement of artifacts or base OS/language versions).
-
+3. **[Build](#modify-existing-base-image) an existing container image to make it compatible with Apache Beam Runners**. This method is used when users start from an existing image, and configure the image to be compatible with Apache Beam Runners.

Review comment:
       Done




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org