You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@beam.apache.org by GitBox <gi...@apache.org> on 2021/11/15 17:31:35 UTC

[GitHub] [beam] ryanthompson591 commented on a change in pull request #15927: Generate Python container dependencies in an automated way.

ryanthompson591 commented on a change in pull request #15927:
URL: https://github.com/apache/beam/pull/15927#discussion_r749523304



##########
File path: sdks/python/container/Dockerfile
##########
@@ -38,11 +36,10 @@ RUN apt-get update && \
 # Install required packages for Beam Python SDK and common dependencies used by users.
 ####
 
-# SDK dependencies not listed in base_image_requirements.txt will be installed
-# when we install SDK with pip below.
 COPY target/base_image_requirements.txt /tmp/base_image_requirements.txt
 RUN \
-    pip install -r /tmp/base_image_requirements.txt && \
+    # use --no-deps to ensure the list includes all transitive dependencies.
+    pip install --no-deps -r /tmp/base_image_requirements.txt && \

Review comment:
       Could it be the case that future deps will cause future issues?  I mean if future deps come in will that be a problem?

##########
File path: sdks/python/container/Dockerfile
##########
@@ -67,13 +64,33 @@ RUN ln -s /usr/bin/ccache /usr/local/bin/gcc
 RUN ccache --set-config=sloppiness=file_macro && ccache --set-config=hash_dir=false
 
 ####
-# Install Apache Beam SDK
+# Install Apache Beam SDK. Use --no-deps and pip check to verify that all
+# necessary dependencies are specified in base_image_requiremetns.txt.
 ####
 COPY target/apache-beam.tar.gz /opt/apache/beam/tars/
-RUN pip install -v /opt/apache/beam/tars/apache-beam.tar.gz[gcp]
+RUN pip install --no-deps -v /opt/apache/beam/tars/apache-beam.tar.gz[gcp]
+RUN pip check

Review comment:
       So if this fails, will the whole docker install fail?

##########
File path: sdks/python/container/Dockerfile
##########
@@ -67,13 +64,33 @@ RUN ln -s /usr/bin/ccache /usr/local/bin/gcc
 RUN ccache --set-config=sloppiness=file_macro && ccache --set-config=hash_dir=false
 
 ####
-# Install Apache Beam SDK
+# Install Apache Beam SDK. Use --no-deps and pip check to verify that all
+# necessary dependencies are specified in base_image_requiremetns.txt.
 ####
 COPY target/apache-beam.tar.gz /opt/apache/beam/tars/
-RUN pip install -v /opt/apache/beam/tars/apache-beam.tar.gz[gcp]
+RUN pip install --no-deps -v /opt/apache/beam/tars/apache-beam.tar.gz[gcp]
+RUN pip check
 
-COPY target/license_scripts /tmp/license_scripts/
+COPY target/LICENSE /opt/apache/beam/
+COPY target/LICENSE.python /opt/apache/beam/
+COPY target/NOTICE /opt/apache/beam/
+COPY target/launcher/linux_amd64/boot /opt/apache/beam/
+
+# Log complete list of what exact packages and versions are installed.
+RUN pip freeze --all
+# Make sure there are no conflicting dependencies.
+RUN pip check

Review comment:
       we called run check up above. Can we combine these?

##########
File path: sdks/python/container/run_generate_requirements.sh
##########
@@ -0,0 +1,70 @@
+#!/bin/bash
+#
+#    Licensed to the Apache Software Foundation (ASF) under one or more
+#    contributor license agreements.  See the NOTICE file distributed with
+#    this work for additional information regarding copyright ownership.
+#    The ASF licenses this file to You under the Apache License, Version 2.0
+#    (the "License"); you may not use this file except in compliance with
+#    the License.  You may obtain a copy of the License at
+#
+#       http://www.apache.org/licenses/LICENSE-2.0
+#
+#    Unless required by applicable law or agreed to in writing, software
+#    distributed under the License is distributed on an "AS IS" BASIS,
+#    WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+#    See the License for the specific language governing permissions and
+#    limitations under the License.
+#
+
+if [[ $# != 2 ]]; then
+  printf "Usage: \n$> ./sdks/python/container/run_generate_requirements.sh <python_version> <sdk_tarball>"
+  printf "\n\tpython_version: [required] Python version to generate dependencies for."
+  printf " Use 3.7 for Python3.7, 3.8 for Python3.8 etc."
+fi
+
+set -ex
+PY_VERSION=$1
+SDK_TARBALL=$2
+
+ENV_PATH="$PWD/build/python${PY_VERSION/./}_requirements_gen"
+rm -rf $ENV_PATH 2>/dev/null || true
+python${PY_VERSION} -m venv $ENV_PATH
+source $ENV_PATH/bin/activate
+pip install --upgrade pip
+pip install wheel
+
+pip install --no-cache-dir $SDK_TARBALL[gcp,dataframe]
+pip install --no-cache-dir -r $PWD/sdks/python/container/base_image_requirements_manual.txt
+pip uninstall -y apache-beam
+echo "Checking for broken dependencies:"
+pip check
+echo "Installed dependencies:"
+pip freeze
+
+# Take the version from intepreter instead of $1 to make sure they match. Sample value: py36.
+PY_IMAGE=$(python -c 'import sys; print(f"py{sys.version_info.major}{sys.version_info.minor}")')
+REQUIREMENTS_FILE=$PWD/sdks/python/container/$PY_IMAGE/base_image_requirements.txt
+cat <<EOT > $REQUIREMENTS_FILE
+#    Licensed to the Apache Software Foundation (ASF) under one or more

Review comment:
       Ideally this license file would just live somewhere and you could reference it instead of having all this text here.

##########
File path: sdks/python/container/base_image_requirements_manual.txt
##########
@@ -0,0 +1,41 @@
+###############################################################################
+#  Licensed to the Apache Software Foundation (ASF) under one
+#  or more contributor license agreements.  See the NOTICE file
+#  distributed with this work for additional information
+#  regarding copyright ownership.  The ASF licenses this file
+#  to you under the Apache License, Version 2.0 (the
+#  "License"); you may not use this file except in compliance
+#  with the License.  You may obtain a copy of the License at
+#
+#      http://www.apache.org/licenses/LICENSE-2.0
+#
+#  Unless required by applicable law or agreed to in writing, software
+#  distributed under the License is distributed on an "AS IS" BASIS,
+#  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+#  See the License for the specific language governing permissions and
+# limitations under the License.
+###############################################################################
+
+# This file has additional packages to install in Beam's containers.
+# Do not specify dependencies, which are already in Beam's setup.py.
+# See generate_pip_requirements_list.sh to find which extras (e.g. gcp) are used.
+# Do not specify a constrained version without a particular reason, so that
+# we can pick up recent versions when regenerating the requirements.
+# Consider constraining requirements of Beam itself when necessary.
+
+bs4  # Commonly used HTML processing tool.
+# Let's upgrade to Cython 3.0 in a controlled way, stay on 0.x versions for now.
+cython<1
+google-cloud-profiler
+google-python-cloud-debugger
+guppy3  # Memory profiler
+mmh3  # Optimzes execution of some Beam codepaths. TODO: Make it Beam's dependency.
+python-snappy  # Optimzes execution of some Beam codepaths.
+scipy
+scikit-learn

Review comment:
       I'm just curious why some of these packages like scikit-learn and tensorflow are specified as deps.  Seems like they shouldn't be.

##########
File path: sdks/python/container/build.gradle
##########
@@ -75,6 +75,12 @@ task pushAll {
   dependsOn ':sdks:python:container:py38:dockerPush'
 }
 
+task generatePythonRequirementsAll {
+  dependsOn ':sdks:python:container:py36:generatePythonRequirements'

Review comment:
       just curious if the requirements are now different between python versions.  Why is that now the case?

##########
File path: sdks/python/container/run_generate_requirements.sh
##########
@@ -0,0 +1,70 @@
+#!/bin/bash
+#
+#    Licensed to the Apache Software Foundation (ASF) under one or more
+#    contributor license agreements.  See the NOTICE file distributed with
+#    this work for additional information regarding copyright ownership.
+#    The ASF licenses this file to You under the Apache License, Version 2.0
+#    (the "License"); you may not use this file except in compliance with
+#    the License.  You may obtain a copy of the License at
+#
+#       http://www.apache.org/licenses/LICENSE-2.0
+#
+#    Unless required by applicable law or agreed to in writing, software
+#    distributed under the License is distributed on an "AS IS" BASIS,
+#    WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+#    See the License for the specific language governing permissions and
+#    limitations under the License.
+#
+
+if [[ $# != 2 ]]; then
+  printf "Usage: \n$> ./sdks/python/container/run_generate_requirements.sh <python_version> <sdk_tarball>"
+  printf "\n\tpython_version: [required] Python version to generate dependencies for."
+  printf " Use 3.7 for Python3.7, 3.8 for Python3.8 etc."
+fi
+
+set -ex
+PY_VERSION=$1
+SDK_TARBALL=$2
+
+ENV_PATH="$PWD/build/python${PY_VERSION/./}_requirements_gen"
+rm -rf $ENV_PATH 2>/dev/null || true
+python${PY_VERSION} -m venv $ENV_PATH
+source $ENV_PATH/bin/activate
+pip install --upgrade pip
+pip install wheel
+
+pip install --no-cache-dir $SDK_TARBALL[gcp,dataframe]
+pip install --no-cache-dir -r $PWD/sdks/python/container/base_image_requirements_manual.txt
+pip uninstall -y apache-beam
+echo "Checking for broken dependencies:"
+pip check
+echo "Installed dependencies:"
+pip freeze
+
+# Take the version from intepreter instead of $1 to make sure they match. Sample value: py36.
+PY_IMAGE=$(python -c 'import sys; print(f"py{sys.version_info.major}{sys.version_info.minor}")')

Review comment:
       I don't get it.  What happens if they don't match. Do we trust what is here instead of the user parameter. What if the user has a different version for python vs python3?

##########
File path: sdks/python/container/base_image_requirements_manual.txt
##########
@@ -0,0 +1,41 @@
+###############################################################################
+#  Licensed to the Apache Software Foundation (ASF) under one
+#  or more contributor license agreements.  See the NOTICE file
+#  distributed with this work for additional information
+#  regarding copyright ownership.  The ASF licenses this file
+#  to you under the Apache License, Version 2.0 (the
+#  "License"); you may not use this file except in compliance
+#  with the License.  You may obtain a copy of the License at
+#
+#      http://www.apache.org/licenses/LICENSE-2.0
+#
+#  Unless required by applicable law or agreed to in writing, software
+#  distributed under the License is distributed on an "AS IS" BASIS,
+#  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+#  See the License for the specific language governing permissions and
+# limitations under the License.
+###############################################################################
+
+# This file has additional packages to install in Beam's containers.
+# Do not specify dependencies, which are already in Beam's setup.py.
+# See generate_pip_requirements_list.sh to find which extras (e.g. gcp) are used.
+# Do not specify a constrained version without a particular reason, so that
+# we can pick up recent versions when regenerating the requirements.
+# Consider constraining requirements of Beam itself when necessary.
+
+bs4  # Commonly used HTML processing tool.
+# Let's upgrade to Cython 3.0 in a controlled way, stay on 0.x versions for now.

Review comment:
       Is this supposed to be a TODO?

##########
File path: website/www/site/content/en/contribute/release-guide.md
##########
@@ -230,7 +231,19 @@ If you are not a PMC, please ask for help in dev@ mailing list.
 **********
 
 
-## 3. Investigate performance regressions
+## 3. Update base image dependencies for Python container images
+
+1. Check the versions specified in sdks/python/container/base_image_requirements_manual.txt` and update them if necessary.
+2. Regenerate full dependency list by running:

Review comment:
       prefer 1. for numbered lists in markdown.

##########
File path: sdks/python/container/run_generate_requirements.sh
##########
@@ -0,0 +1,70 @@
+#!/bin/bash
+#
+#    Licensed to the Apache Software Foundation (ASF) under one or more
+#    contributor license agreements.  See the NOTICE file distributed with
+#    this work for additional information regarding copyright ownership.
+#    The ASF licenses this file to You under the Apache License, Version 2.0
+#    (the "License"); you may not use this file except in compliance with
+#    the License.  You may obtain a copy of the License at
+#
+#       http://www.apache.org/licenses/LICENSE-2.0
+#
+#    Unless required by applicable law or agreed to in writing, software
+#    distributed under the License is distributed on an "AS IS" BASIS,
+#    WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+#    See the License for the specific language governing permissions and
+#    limitations under the License.
+#
+
+if [[ $# != 2 ]]; then
+  printf "Usage: \n$> ./sdks/python/container/run_generate_requirements.sh <python_version> <sdk_tarball>"
+  printf "\n\tpython_version: [required] Python version to generate dependencies for."
+  printf " Use 3.7 for Python3.7, 3.8 for Python3.8 etc."
+fi
+
+set -ex
+PY_VERSION=$1
+SDK_TARBALL=$2
+
+ENV_PATH="$PWD/build/python${PY_VERSION/./}_requirements_gen"
+rm -rf $ENV_PATH 2>/dev/null || true
+python${PY_VERSION} -m venv $ENV_PATH
+source $ENV_PATH/bin/activate
+pip install --upgrade pip
+pip install wheel
+
+pip install --no-cache-dir $SDK_TARBALL[gcp,dataframe]
+pip install --no-cache-dir -r $PWD/sdks/python/container/base_image_requirements_manual.txt
+pip uninstall -y apache-beam
+echo "Checking for broken dependencies:"
+pip check

Review comment:
       Does this error out if there are broken deps?

##########
File path: sdks/python/container/run_generate_requirements.sh
##########
@@ -0,0 +1,70 @@
+#!/bin/bash
+#
+#    Licensed to the Apache Software Foundation (ASF) under one or more
+#    contributor license agreements.  See the NOTICE file distributed with
+#    this work for additional information regarding copyright ownership.
+#    The ASF licenses this file to You under the Apache License, Version 2.0
+#    (the "License"); you may not use this file except in compliance with
+#    the License.  You may obtain a copy of the License at
+#
+#       http://www.apache.org/licenses/LICENSE-2.0
+#
+#    Unless required by applicable law or agreed to in writing, software
+#    distributed under the License is distributed on an "AS IS" BASIS,
+#    WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+#    See the License for the specific language governing permissions and
+#    limitations under the License.
+#
+

Review comment:
       I wish more of these files had a high level description of what they were.
   
   For example with this one, I would like to know if it's meant to be run manually, or if its a utility script for a tool or build rule.
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org