You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2021/02/21 18:50:07 UTC

[GitHub] [airflow] potiuk opened a new pull request #14347: Fix caching of python images during builds

potiuk opened a new pull request #14347:
URL: https://github.com/apache/airflow/pull/14347


   After GHCR change, github image caching for python images was
   broken. GHCR requires each image that we push to it to be
   tagged with "org.opencontainers.image.source" label to point
   to the right project repository. Otherwise it will not appear
   in the repository images and we will not be able to manage
   permissions of the image.
   
   So we'v been extending the python base image with appropriate
   label, but it was not in all places necessary. As the result.
   the builds in CI were usign different images than the cache
   image in DockerHub, which in turn caused full image rebuilds
   ALWAYS. By implementing this change we make sure tha the the
   "airflow-tainted" python images - including the label.
   
   All images (when using breeze) are now built using such
   airflow-tainted image which in turn should give5 up to 5-8
   minutes saving on building the images job per each job (CI
   and production)
   
   The change implements the following behaviours:
   
   1) When image is built on CI or locally without
      UPGRADE_TO_NEWER_DEPENDENCIES flag and without
      FORCE_PULL_IMAGES flag -t the latest image from the
      GitHub Registry or DockerHub registry is used.
   
   2) When both FORCE_PULL_IMAGES and UPGRADE_TO_NEWER_DEPENDENCY
      are set to something different than "false" - the latest Python
      images are used - they are pulled first and tagged appropriately.
   
   This way - always when UPGRADE_TO_NEWER_DEPENDENCIES are set (also
   when master pushes are done) - such builds will use latest version
   of python base images published on DockerHub.
   
   <!--
   Thank you for contributing! Please make sure that your code changes
   are covered with tests. And in case of new features or big changes
   remember to adjust the documentation.
   
   Feel free to ping committers for the review!
   
   In case of existing issue, reference it using one of the following:
   
   closes: #ISSUE
   related: #ISSUE
   
   How to write a good git commit message:
   http://chris.beams.io/posts/git-commit/
   -->
   
   ---
   **^ Add meaningful description above**
   
   Read the **[Pull Request Guidelines](https://github.com/apache/airflow/blob/master/CONTRIBUTING.rst#pull-request-guidelines)** for more information.
   In case of fundamental code change, Airflow Improvement Proposal ([AIP](https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+Improvements+Proposals)) is needed.
   In case of a new dependency, check compliance with the [ASF 3rd Party License Policy](https://www.apache.org/legal/resolved.html#category-x).
   In case of backwards incompatible changes please leave a note in [UPDATING.md](https://github.com/apache/airflow/blob/master/UPDATING.md).
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on pull request #14347: Fix caching of python images during builds

Posted by GitBox <gi...@apache.org>.
potiuk commented on pull request #14347:
URL: https://github.com/apache/airflow/pull/14347#issuecomment-782911535


   HEy @ashb , @kaxil (others)- this one should speed up the image builds quite significantly (especially in self-hosted runners) it should fix the problem that we had recently that Image builds did not use caching properly - this was a problem introduced with GiHub Container Registry switch (labelling python image necessary for GHCR). This PR should fix it and bring down the time of builds for images by a lot.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on a change in pull request #14347: Fix caching of python images during builds

Posted by GitBox <gi...@apache.org>.
potiuk commented on a change in pull request #14347:
URL: https://github.com/apache/airflow/pull/14347#discussion_r580612782



##########
File path: scripts/ci/libraries/_initialization.sh
##########
@@ -726,12 +728,8 @@ function initialization::get_environment_for_builds_on_ci() {
         export CI_REF="${CI_REF="refs/head/master"}"
     fi
 
-    if [[ ${VERBOSE} == "true" && ${PRINT_INFO_FROM_SCRIPTS} == "true" ]]; then
-        initialization::summarize_build_environment
-    fi
-
     if [[ -z "${LIBRARY_PATH:-}" && -n "${LD_LIBRARY_PATH:-}" ]]; then
-      export LIBRARY_PATH="$LD_LIBRARY_PATH"
+      export LIBRARY_PATH="${LD_LIBRARY_PATH}"

Review comment:
       This is something I learned to always use braces for variables. While they are optional, they give clear boundaries. So I add them by default for all variables always. I don't even think about it when I do it whenever I am around such variable. 
   
   It's also consistent with Google shell guide which we adopted: 
   
   https://google.github.io/styleguide/shellguide.html#variable-expansion
   
   > In order of precedence: Stay consistent with what you find; quote your variables; prefer "${var}" over "$var".
   
   > Don’t brace-delimit single character shell specials / positional parameters, unless strictly necessary or avoiding deep confusion.
   > Prefer brace-delimiting all other variables.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk edited a comment on pull request #14347: Fix caching of python images during builds

Posted by GitBox <gi...@apache.org>.
potiuk edited a comment on pull request #14347:
URL: https://github.com/apache/airflow/pull/14347#issuecomment-784148821


   Yep. And I also have to take a look why prod image is not using the cache yet (it will be few more minutes to shave off). It's not as important as the CI build because PROD image was not on a critical path - CI build is prerequiste for all others so speeding it up directly impacts "feedback time" - which for me is the most important metrics of the CI system..
   
   I define "feedback time" as the elapsed time that passes between you pushing the PR and actionable feedback from the build telling you that it is either "ok" or "needs fixes". The smaller, the better.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] github-actions[bot] commented on pull request #14347: Fix caching of python images during builds

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #14347:
URL: https://github.com/apache/airflow/pull/14347#issuecomment-783691137


   The PR most likely needs to run full matrix of tests because it modifies parts of the core of Airflow. However, committers might decide to merge it quickly and take the risk. If they don't merge it quickly - please rebase it to the latest master at your convenience, or amend the last commit of the PR, and push it with --force-with-lease.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ashb commented on a change in pull request #14347: Fix caching of python images during builds

Posted by GitBox <gi...@apache.org>.
ashb commented on a change in pull request #14347:
URL: https://github.com/apache/airflow/pull/14347#discussion_r580604484



##########
File path: scripts/ci/libraries/_initialization.sh
##########
@@ -726,12 +728,8 @@ function initialization::get_environment_for_builds_on_ci() {
         export CI_REF="${CI_REF="refs/head/master"}"
     fi
 
-    if [[ ${VERBOSE} == "true" && ${PRINT_INFO_FROM_SCRIPTS} == "true" ]]; then
-        initialization::summarize_build_environment
-    fi
-
     if [[ -z "${LIBRARY_PATH:-}" && -n "${LD_LIBRARY_PATH:-}" ]]; then
-      export LIBRARY_PATH="$LD_LIBRARY_PATH"
+      export LIBRARY_PATH="${LD_LIBRARY_PATH}"

Review comment:
       Why change this line?

##########
File path: scripts/ci/libraries/_push_pull_remove_images.sh
##########
@@ -104,37 +104,58 @@ function push_pull_remove_images::pull_image_github_dockerhub() {
     set -e
 }
 
-# Pulls CI image in case caching strategy is "pulled" and the image needs to be pulled
-function push_pull_remove_images::pull_ci_images_if_needed() {
-    if [[ "${DOCKER_CACHE}" == "pulled" ]]; then
-        local python_image_hash
-        python_image_hash=$(docker images -q "${AIRFLOW_CI_PYTHON_IMAGE}" 2> /dev/null || true)
-        if [[ -z "${python_image_hash=}" ]]; then
-            FORCE_PULL_IMAGES="true"
-        fi
-        if [[ "${FORCE_PULL_IMAGES}" == "true" ]]; then
-            echo
-            echo "Force pull base image ${PYTHON_BASE_IMAGE}"
-            echo
-            if [[ -n ${DETECTED_TERMINAL=} ]]; then
-                echo -n "
-Docker pulling ${PYTHON_BASE_IMAGE}.
+# Pulls base python image used to build CI and PROD imaages, depending on the parameters used

Review comment:
       Typo




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on pull request #14347: Fix caching of python images during builds

Posted by GitBox <gi...@apache.org>.
potiuk commented on pull request #14347:
URL: https://github.com/apache/airflow/pull/14347#issuecomment-783646756


   Hey @kaxil @ashb - how about speeding up our CI :) ?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ashb commented on a change in pull request #14347: Fix caching of python images during builds

Posted by GitBox <gi...@apache.org>.
ashb commented on a change in pull request #14347:
URL: https://github.com/apache/airflow/pull/14347#discussion_r580896095



##########
File path: scripts/ci/libraries/_push_pull_remove_images.sh
##########
@@ -104,37 +104,58 @@ function push_pull_remove_images::pull_image_github_dockerhub() {
     set -e
 }
 
-# Pulls CI image in case caching strategy is "pulled" and the image needs to be pulled
-function push_pull_remove_images::pull_ci_images_if_needed() {
-    if [[ "${DOCKER_CACHE}" == "pulled" ]]; then
-        local python_image_hash
-        python_image_hash=$(docker images -q "${AIRFLOW_CI_PYTHON_IMAGE}" 2> /dev/null || true)
-        if [[ -z "${python_image_hash=}" ]]; then
-            FORCE_PULL_IMAGES="true"
-        fi
-        if [[ "${FORCE_PULL_IMAGES}" == "true" ]]; then
-            echo
-            echo "Force pull base image ${PYTHON_BASE_IMAGE}"
-            echo
-            if [[ -n ${DETECTED_TERMINAL=} ]]; then
-                echo -n "
-Docker pulling ${PYTHON_BASE_IMAGE}.
+# Pulls base python image used to build CI and PROD imaages, depending on the parameters used

Review comment:
       It was `imaages`, sorry. Was reviewing on my phone which makes it harder to be verbose. Not important.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk merged pull request #14347: Fix caching of python images during builds

Posted by GitBox <gi...@apache.org>.
potiuk merged pull request #14347:
URL: https://github.com/apache/airflow/pull/14347


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on pull request #14347: Fix caching of python images during builds

Posted by GitBox <gi...@apache.org>.
potiuk commented on pull request #14347:
URL: https://github.com/apache/airflow/pull/14347#issuecomment-784148821


   Yep. And I also have to take a look why prod image is not using the cache yet (it will be few more minutes to shave off). It's not as important as the CI build because PROD image was not on a critical path - CI build is prerequiste for all others so speeding it up directly impact "feedback time" - which for me is the most important metrics of the CI system..
   
   I define "feedback time" as the elapsed time that passes between you pushing the PR and actionable feedback from the build telling you that it is either "ok" or "needs fixes". The smaller, the better.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] github-actions[bot] commented on pull request #14347: Fix caching of python images during builds

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #14347:
URL: https://github.com/apache/airflow/pull/14347#issuecomment-782936332


   [The Workflow run](https://github.com/apache/airflow/actions/runs/587309236) is cancelling this PR. It has some failed jobs matching ^Pylint$,^Static checks,^Build docs$,^Spell check docs$,^Backport packages$,^Provider packages,^Checks: Helm tests$,^Test OpenAPI*.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on a change in pull request #14347: Fix caching of python images during builds

Posted by GitBox <gi...@apache.org>.
potiuk commented on a change in pull request #14347:
URL: https://github.com/apache/airflow/pull/14347#discussion_r580612782



##########
File path: scripts/ci/libraries/_initialization.sh
##########
@@ -726,12 +728,8 @@ function initialization::get_environment_for_builds_on_ci() {
         export CI_REF="${CI_REF="refs/head/master"}"
     fi
 
-    if [[ ${VERBOSE} == "true" && ${PRINT_INFO_FROM_SCRIPTS} == "true" ]]; then
-        initialization::summarize_build_environment
-    fi
-
     if [[ -z "${LIBRARY_PATH:-}" && -n "${LD_LIBRARY_PATH:-}" ]]; then
-      export LIBRARY_PATH="$LD_LIBRARY_PATH"
+      export LIBRARY_PATH="${LD_LIBRARY_PATH}"

Review comment:
       This is something I learned to always use braces for variables. While they are optional, they give clear boundaries. So I add them by default for all variables always. I don't even think about it when I do whenever I am around such variable. 
   
   It's also consistent with Google shell guide which we adopted: 
   
   https://google.github.io/styleguide/shellguide.html#variable-expansion
   
   > In order of precedence: Stay consistent with what you find; quote your variables; prefer "${var}" over "$var".
   
   > Don’t brace-delimit single character shell specials / positional parameters, unless strictly necessary or avoiding deep confusion.
   > Prefer brace-delimiting all other variables.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ashb commented on a change in pull request #14347: Fix caching of python images during builds

Posted by GitBox <gi...@apache.org>.
ashb commented on a change in pull request #14347:
URL: https://github.com/apache/airflow/pull/14347#discussion_r580896833



##########
File path: scripts/ci/libraries/_initialization.sh
##########
@@ -726,12 +728,8 @@ function initialization::get_environment_for_builds_on_ci() {
         export CI_REF="${CI_REF="refs/head/master"}"
     fi
 
-    if [[ ${VERBOSE} == "true" && ${PRINT_INFO_FROM_SCRIPTS} == "true" ]]; then
-        initialization::summarize_build_environment
-    fi
-
     if [[ -z "${LIBRARY_PATH:-}" && -n "${LD_LIBRARY_PATH:-}" ]]; then
-      export LIBRARY_PATH="$LD_LIBRARY_PATH"
+      export LIBRARY_PATH="${LD_LIBRARY_PATH}"

Review comment:
       It's more the change of this line in this PR I was commenting on. By including changes like this that are otherwise unrelated to the PR at hand it makes it much harder to review as there is more diff to look through.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] github-actions[bot] commented on pull request #14347: Fix caching of python images during builds

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #14347:
URL: https://github.com/apache/airflow/pull/14347#issuecomment-783003014


   [The Workflow run](https://github.com/apache/airflow/actions/runs/587772243) is cancelling this PR. Building images for the PR has failed. Follow the the workflow link to check the reason.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on a change in pull request #14347: Fix caching of python images during builds

Posted by GitBox <gi...@apache.org>.
potiuk commented on a change in pull request #14347:
URL: https://github.com/apache/airflow/pull/14347#discussion_r580617975



##########
File path: scripts/ci/libraries/_push_pull_remove_images.sh
##########
@@ -104,37 +104,58 @@ function push_pull_remove_images::pull_image_github_dockerhub() {
     set -e
 }
 
-# Pulls CI image in case caching strategy is "pulled" and the image needs to be pulled
-function push_pull_remove_images::pull_ci_images_if_needed() {
-    if [[ "${DOCKER_CACHE}" == "pulled" ]]; then
-        local python_image_hash
-        python_image_hash=$(docker images -q "${AIRFLOW_CI_PYTHON_IMAGE}" 2> /dev/null || true)
-        if [[ -z "${python_image_hash=}" ]]; then
-            FORCE_PULL_IMAGES="true"
-        fi
-        if [[ "${FORCE_PULL_IMAGES}" == "true" ]]; then
-            echo
-            echo "Force pull base image ${PYTHON_BASE_IMAGE}"
-            echo
-            if [[ -n ${DETECTED_TERMINAL=} ]]; then
-                echo -n "
-Docker pulling ${PYTHON_BASE_IMAGE}.
+# Pulls base python image used to build CI and PROD imaages, depending on the parameters used

Review comment:
       I reworded it a bit including `python -> Python` if that was the typo (the function changed - it is really Python base image that is pulled by the function). 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ashb commented on pull request #14347: Fix caching of python images during builds

Posted by GitBox <gi...@apache.org>.
ashb commented on pull request #14347:
URL: https://github.com/apache/airflow/pull/14347#issuecomment-784047599


   Nice, down to 1m to build an image https://github.com/apache/airflow/runs/1959651260?check_suite_focus=true


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org