You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by po...@apache.org on 2021/09/09 19:01:08 UTC

[airflow] 06/06: Move instriuctions of constraint/image refreshing to dev

This is an automated email from the ASF dual-hosted git repository.

potiuk pushed a commit to branch v2-1-test
in repository https://gitbox.apache.org/repos/asf/airflow.git

commit ae068686b6a07d7214b53c5ac6367caf4c1980bc
Author: Jarek Potiuk <ja...@potiuk.com>
AuthorDate: Mon Aug 23 11:35:44 2021 +0200

    Move instriuctions of constraint/image refreshing to dev
    
    When we have a prolonged issue with flaky tests or Github runners
    instabilities, our automated constraint and image refresh might
    not work, so we might need to manually refresh the constraints
    and images. Documentation about that was in CONTRIBUTING.rst
    but it is more appriate to keep it in ``dev`` as it only applies
    to committers.
    
    Also during testing the parallell refresh without delays an error
    was discovered  which prevented parallell check of random image
    hash during the build. This has been fixed and parallell
    image cache building should work flawlessly now.
    
    (cherry picked from commit 36c5fd3df9b271702e1dd2d73c579de3f3bd5fc0)
---
 CONTRIBUTING.rst                        | 39 --------------
 dev/REFRESHING_CI_CACHE.md              | 94 +++++++++++++++++++++++++++++++++
 dev/refresh_images.sh                   | 38 +++++++++++++
 scripts/ci/libraries/_build_images.sh   | 68 +++++++++++++-----------
 scripts/ci/libraries/_initialization.sh | 11 ----
 5 files changed, 170 insertions(+), 80 deletions(-)

diff --git a/CONTRIBUTING.rst b/CONTRIBUTING.rst
index be807f4..577f925 100644
--- a/CONTRIBUTING.rst
+++ b/CONTRIBUTING.rst
@@ -860,45 +860,6 @@ The ``constraints-<PYTHON_MAJOR_MINOR_VERSION>.txt`` and ``constraints-no-provid
 will be automatically regenerated by CI job every time after the ``setup.py`` is updated and pushed
 if the tests are successful.
 
-Manually generating constraint files
-------------------------------------
-
-The constraint files are generated automatically by the CI job. Sometimes however it is needed to regenerate
-them manually (committers only). For example when main build did not succeed for quite some time).
-This can be done by running this (it utilizes parallel preparation of the constraints):
-
-.. code-block:: bash
-
-    export CURRENT_PYTHON_MAJOR_MINOR_VERSIONS_AS_STRING="3.6 3.7 3.8 3.9"
-    for python_version in $(echo "${CURRENT_PYTHON_MAJOR_MINOR_VERSIONS_AS_STRING}")
-    do
-      ./breeze build-image --upgrade-to-newer-dependencies --python ${python_version} --build-cache-local
-      ./breeze build-image --upgrade-to-newer-dependencies --python ${python_version} --build-cache-local
-      ./breeze build-image --upgrade-to-newer-dependencies --python ${python_version} --build-cache-local
-    done
-
-    GENERATE_CONSTRAINTS_MODE="pypi-providers" ./scripts/ci/constraints/ci_generate_all_constraints.sh
-    GENERATE_CONSTRAINTS_MODE="source-providers" ./scripts/ci/constraints/ci_generate_all_constraints.sh
-    GENERATE_CONSTRAINTS_MODE="no-providers" ./scripts/ci/constraints/ci_generate_all_constraints.sh
-
-    AIRFLOW_SOURCES=$(pwd)
-
-
-The constraints will be generated in "files/constraints-PYTHON_VERSION/constraints-*.txt files. You need to
-checkout the right 'constraints-' branch in a separate repository and then you can copy, commit and push the
-generated files:
-
-.. code-block:: bash
-
-    cd <AIRFLOW_WITH_CONSTRAINT_main_DIRECTORY>
-    git pull
-    cp ${AIRFLOW_SOURCES}/files/constraints-*/constraints*.txt .
-    git diff
-    git add .
-    git commit -m "Your commit message here" --no-verify
-    git push
-
-
 Documentation
 =============
 
diff --git a/dev/REFRESHING_CI_CACHE.md b/dev/REFRESHING_CI_CACHE.md
new file mode 100644
index 0000000..c5a27ee
--- /dev/null
+++ b/dev/REFRESHING_CI_CACHE.md
@@ -0,0 +1,94 @@
+<!--
+ Licensed to the Apache Software Foundation (ASF) under one
+ or more contributor license agreements.  See the NOTICE file
+ distributed with this work for additional information
+ regarding copyright ownership.  The ASF licenses this file
+ to you under the Apache License, Version 2.0 (the
+ "License"); you may not use this file except in compliance
+ with the License.  You may obtain a copy of the License at
+
+   http://www.apache.org/licenses/LICENSE-2.0
+
+ Unless required by applicable law or agreed to in writing,
+ software distributed under the License is distributed on an
+ "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ KIND, either express or implied.  See the License for the
+ specific language governing permissions and limitations
+ under the License.
+-->
+
+<!-- START doctoc generated TOC please keep comment here to allow auto update -->
+<!-- DON'T EDIT THIS SECTION, INSTEAD RE-RUN doctoc TO UPDATE -->
+**Table of Contents**  *generated with [DocToc](https://github.com/thlorenz/doctoc)*
+
+- [Automated cache refreshing in CI](#automated-cache-refreshing-in-ci)
+- [Manually generating constraint files](#manually-generating-constraint-files)
+- [Manually refreshing the images](#manually-refreshing-the-images)
+
+<!-- END doctoc generated TOC please keep comment here to allow auto update -->
+
+# Automated cache refreshing in CI
+
+Our [CI system](../CI.rst) is build in the way that it self-maintains. Regular scheduled builds and
+merges to `main` branch have separate maintenance step that take care about refreshing the cache that is
+used to speed up our builds and to speed up rebuilding of [Breeze](../BREEZE.rst) images for development
+purpose. This is all happening automatically, usually:
+
+* The latest [constraints](../COMMITTERS.rst#pinned-constraint-files) are pushed to appropriate branch
+  after all tests succeeded in `main` merge or in `scheduled` build
+
+* The [images](../IMAGES.rst) in `ghcr.io` registry are refreshed after every successful merge to `main`
+  or `scheduled` build and after pushing the constraints, this means that the latest image cache uses
+  also the latest tested constraints
+
+Sometimes however, when we have prolonged period of fighting with flakiness of GitHub Actions runners or our
+tests, the refresh might not be triggered - because tests will not succeed for some time. In this case
+manual refresh might be needed.
+
+# Manually generating constraint files
+
+```bash
+export CURRENT_PYTHON_MAJOR_MINOR_VERSIONS_AS_STRING="3.6 3.7 3.8 3.9"
+for python_version in $(echo "${CURRENT_PYTHON_MAJOR_MINOR_VERSIONS_AS_STRING}")
+do
+  ./breeze build-image --upgrade-to-newer-dependencies --python ${python_version} --build-cache-local
+done
+
+GENERATE_CONSTRAINTS_MODE="pypi-providers" ./scripts/ci/constraints/ci_generate_all_constraints.sh
+GENERATE_CONSTRAINTS_MODE="source-providers" ./scripts/ci/constraints/ci_generate_all_constraints.sh
+GENERATE_CONSTRAINTS_MODE="no-providers" ./scripts/ci/constraints/ci_generate_all_constraints.sh
+
+AIRFLOW_SOURCES=$(pwd)
+```
+
+The constraints will be generated in `files/constraints-PYTHON_VERSION/constraints-*.txt` files. You need to
+check out the right 'constraints-' branch in a separate repository, and then you can copy, commit and push the
+generated files:
+
+```bash
+cd <AIRFLOW_WITH_CONSTRAINTS-MAIN_DIRECTORY>
+git pull
+cp ${AIRFLOW_SOURCES}/files/constraints-*/constraints*.txt .
+git diff
+git add .
+git commit -m "Your commit message here" --no-verify
+git push
+```
+
+# Manually refreshing the images
+
+The images can be rebuilt and refreshed after the constraints are pushed. Refreshing image for particular
+python version is a simple as running the [refresh_images.sh](refresh_images.sh) script with pyhon version
+as parameter:
+
+```bash
+./dev/refresh_images.sh 3.9
+```
+
+If you have fast network and powerful computer, you can refresh the images in parallel running the
+[refresh_images.sh](refresh_images.sh) with all python versions. You might do it with `tmux` manually
+or with gnu parallel:
+
+```bash
+parallel -j 4 --linebuffer --tagstring '{}' ./dev/refresh_images.sh ::: 3.6 3.7 3.8 3.9
+```
diff --git a/dev/refresh_images.sh b/dev/refresh_images.sh
new file mode 100755
index 0000000..38e283a
--- /dev/null
+++ b/dev/refresh_images.sh
@@ -0,0 +1,38 @@
+#!/bin/bash
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+set -euo pipefail
+rm -rf docker-context-files/*.whl
+rm -rf docker-context-files/*.tgz
+export FORCE_ANSWER_TO_QUESTIONS="true"
+export CI="true"
+
+if [[ $1 == "" ]]; then
+  echo
+  echo ERROR! Please specify python version as parameter
+  echo
+  exit 1
+fi
+
+python_version=$1
+
+./breeze build-image --python "${python_version}" --build-cache-pulled  --check-if-base-python-image-updated --verbose
+./breeze build-image --python "${python_version}" --build-cache-pulled  --production-image --verbose
+
+./breeze push-image --python "${python_version}"
+./breeze push-image --production-image --python "${python_version}"
diff --git a/scripts/ci/libraries/_build_images.sh b/scripts/ci/libraries/_build_images.sh
index 3ce020b..88169a0 100644
--- a/scripts/ci/libraries/_build_images.sh
+++ b/scripts/ci/libraries/_build_images.sh
@@ -232,16 +232,13 @@ function build_images::check_for_docker_context_files() {
     fi
 }
 
-# Builds local image manifest
-# It contains only one .json file - result of docker inspect - describing the image
-# We cannot use docker registry APIs as they are available only with authorisation
-# But this image can be pulled without authentication
+# Builds local image manifest. It contains only one random file generated during Docker.ci build
 function build_images::build_ci_image_manifest() {
     docker_v build \
         --tag="${AIRFLOW_CI_LOCAL_MANIFEST_IMAGE}" \
         -f- . <<EOF
 FROM scratch
-COPY "manifests/local-build-cache-hash" /build-cache-hash
+COPY "manifests/local-build-cache-hash-${PYTHON_MAJOR_MINOR_VERSION}" /build-cache-hash
 LABEL org.opencontainers.image.source="https://github.com/${GITHUB_REPOSITORY}"
 CMD ""
 EOF
@@ -249,9 +246,13 @@ EOF
 
 #
 # Retrieves information about build cache hash random file from the local image
+# The random file is generated during the build and is best indicator whether your local CI image
+# has been built using the same pulled image as the remote one
 #
 function build_images::get_local_build_cache_hash() {
     set +e
+    local local_image_build_cache_file
+    local_image_build_cache_file="${AIRFLOW_SOURCES}/manifests/local-build-cache-hash-${PYTHON_MAJOR_MINOR_VERSION}"
     # Remove the container just in case
     docker_v rm --force "local-airflow-ci-container" 2>/dev/null >/dev/null
     if ! docker_v inspect "${AIRFLOW_CI_IMAGE}" 2>/dev/null >/dev/null; then
@@ -260,34 +261,37 @@ function build_images::get_local_build_cache_hash() {
         verbosity::print_info
         LOCAL_MANIFEST_IMAGE_UNAVAILABLE="true"
         export LOCAL_MANIFEST_IMAGE_UNAVAILABLE
-        touch "${LOCAL_IMAGE_BUILD_CACHE_HASH_FILE}"
+        touch "${local_image_build_cache_file}"
         set -e
         return
 
     fi
     docker_v create --name "local-airflow-ci-container" "${AIRFLOW_CI_IMAGE}" 2>/dev/null
     docker_v cp "local-airflow-ci-container:/build-cache-hash" \
-        "${LOCAL_IMAGE_BUILD_CACHE_HASH_FILE}" 2>/dev/null ||
-        touch "${LOCAL_IMAGE_BUILD_CACHE_HASH_FILE}"
+        "${local_image_build_cache_file}" 2>/dev/null ||
+        touch "${local_image_build_cache_file}"
     set -e
     verbosity::print_info
-    verbosity::print_info "Local build cache hash: '$(cat "${LOCAL_IMAGE_BUILD_CACHE_HASH_FILE}")'"
+    verbosity::print_info "Local build cache hash: '$(cat "${local_image_build_cache_file}")'"
     verbosity::print_info
 }
 
 # Retrieves information about the build cache hash random file from the remote image.
-# We actually use manifest image for that, which is a really, really small image to pull!
-# The problem is that inspecting information about remote image cannot be done easily with existing APIs
-# of Dockerhub because they require additional authentication even for public images.
-# Therefore instead we are downloading a specially prepared manifest image
-# which is built together with the main image and pushed with it. This special manifest image is prepared
-# during building of the main image and contains single file which is randomly built during the docker
-# build in the right place in the image (right after installing all dependencies of Apache Airflow
-# for the first time). When this random file gets regenerated it means that either base image has
-# changed or some of the earlier layers was modified - which means that it is usually faster to pull
-# that image first and then rebuild it - because this will likely be faster
+# We use manifest image for that, which is a really, really small image to pull!
+# The image is a specially prepared manifest image which is built together with the main image and
+# pushed with it. This special manifest image is prepared during building of the CI image and contains
+# single file which is generated with random content during the docker
+# build in the right step of the image build (right after installing all dependencies of Apache Airflow
+# for the first time).
+# When this random file gets regenerated it means that either base image has changed before that step
+# or some of the earlier layers was modified - which means that it is usually faster to pull
+# that image first and then rebuild it.
 function build_images::get_remote_image_build_cache_hash() {
     set +e
+    local remote_image_container_id_file
+    remote_image_container_id_file="${AIRFLOW_SOURCES}/manifests/remote-airflow-manifest-image-${PYTHON_MAJOR_MINOR_VERSION}"
+    local remote_image_build_cache_file
+    remote_image_build_cache_file="${AIRFLOW_SOURCES}/manifests/remote-build-cache-hash-${PYTHON_MAJOR_MINOR_VERSION}"
     # Pull remote manifest image
     if ! docker_v pull "${AIRFLOW_CI_REMOTE_MANIFEST_IMAGE}" 2>/dev/null >/dev/null; then
         verbosity::print_info
@@ -295,32 +299,36 @@ function build_images::get_remote_image_build_cache_hash() {
         verbosity::print_info
         REMOTE_DOCKER_REGISTRY_UNREACHABLE="true"
         export REMOTE_DOCKER_REGISTRY_UNREACHABLE
-        touch "${REMOTE_IMAGE_BUILD_CACHE_HASH_FILE}"
+        touch "${remote_image_build_cache_file}"
         set -e
         return
     fi
     set -e
-    rm -f "${REMOTE_IMAGE_CONTAINER_ID_FILE}"
+    rm -f "${remote_image_container_id_file}"
     # Create container dump out of the manifest image without actually running it
-    docker_v create --cidfile "${REMOTE_IMAGE_CONTAINER_ID_FILE}" "${AIRFLOW_CI_REMOTE_MANIFEST_IMAGE}"
+    docker_v create --cidfile "${remote_image_container_id_file}" "${AIRFLOW_CI_REMOTE_MANIFEST_IMAGE}"
     # Extract manifest and store it in local file
-    docker_v cp "$(cat "${REMOTE_IMAGE_CONTAINER_ID_FILE}"):/build-cache-hash" \
-        "${REMOTE_IMAGE_BUILD_CACHE_HASH_FILE}"
-    docker_v rm --force "$(cat "${REMOTE_IMAGE_CONTAINER_ID_FILE}")"
-    rm -f "${REMOTE_IMAGE_CONTAINER_ID_FILE}"
+    docker_v cp "$(cat "${remote_image_container_id_file}"):/build-cache-hash" \
+        "${remote_image_build_cache_file}"
+    docker_v rm --force "$(cat "${remote_image_container_id_file}")"
+    rm -f "${remote_image_container_id_file}"
     verbosity::print_info
-    verbosity::print_info "Remote build cache hash: '$(cat "${REMOTE_IMAGE_BUILD_CACHE_HASH_FILE}")'"
+    verbosity::print_info "Remote build cache hash: '$(cat "${remote_image_build_cache_file}")'"
     verbosity::print_info
 }
 
 # Compares layers from both remote and local image and set FORCE_PULL_IMAGES to true in case
-# More than the last NN layers are different.
+# The random has in remote image is different than that in the local image
+# indicating that it is likely faster to pull the image from cache rather than let the
+# image rebuild fully locally
 function build_images::compare_local_and_remote_build_cache_hash() {
     set +e
+    local local_image_build_cache_file
+    local_image_build_cache_file="${AIRFLOW_SOURCES}/manifests/local-build-cache-hash-${PYTHON_MAJOR_MINOR_VERSION}"
     local remote_hash
-    remote_hash=$(cat "${REMOTE_IMAGE_BUILD_CACHE_HASH_FILE}")
+    remote_hash=$(cat "${remote_image_build_cache_file}")
     local local_hash
-    local_hash=$(cat "${LOCAL_IMAGE_BUILD_CACHE_HASH_FILE}")
+    local_hash=$(cat "${local_image_build_cache_file}")
 
     if [[ ${remote_hash} != "${local_hash}" || -z ${local_hash} ]] \
         ; then
diff --git a/scripts/ci/libraries/_initialization.sh b/scripts/ci/libraries/_initialization.sh
index d392d98..6068880 100644
--- a/scripts/ci/libraries/_initialization.sh
+++ b/scripts/ci/libraries/_initialization.sh
@@ -558,12 +558,6 @@ function initialization::initialize_package_variables() {
 }
 
 
-function initialization::initialize_build_image_variables() {
-    REMOTE_IMAGE_CONTAINER_ID_FILE="${AIRFLOW_SOURCES}/manifests/remote-airflow-manifest-image"
-    LOCAL_IMAGE_BUILD_CACHE_HASH_FILE="${AIRFLOW_SOURCES}/manifests/local-build-cache-hash"
-    REMOTE_IMAGE_BUILD_CACHE_HASH_FILE="${AIRFLOW_SOURCES}/manifests/remote-build-cache-hash"
-}
-
 function initialization::set_output_color_variables() {
     COLOR_BLUE=$'\e[34m'
     COLOR_GREEN=$'\e[32m'
@@ -597,7 +591,6 @@ function initialization::initialize_common_environment() {
     initialization::initialize_github_variables
     initialization::initialize_test_variables
     initialization::initialize_package_variables
-    initialization::initialize_build_image_variables
 }
 
 function initialization::set_default_python_version_if_empty() {
@@ -845,10 +838,6 @@ function initialization::make_constants_read_only() {
     readonly BUILT_CI_IMAGE_FLAG_FILE
     readonly INIT_SCRIPT_FILE
 
-    readonly REMOTE_IMAGE_CONTAINER_ID_FILE
-    readonly LOCAL_IMAGE_BUILD_CACHE_HASH_FILE
-    readonly REMOTE_IMAGE_BUILD_CACHE_HASH_FILE
-
     readonly INSTALLED_EXTRAS
     readonly INSTALLED_PROVIDERS