You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by po...@apache.org on 2020/06/29 15:21:54 UTC

[airflow] branch v1-10-test updated (eceae90 -> 4526519)

This is an automated email from the ASF dual-hosted git repository.

potiuk pushed a change to branch v1-10-test
in repository https://gitbox.apache.org/repos/asf/airflow.git.


    from eceae90  [AIRFLOW-6957] Make retrieving Paused Dag ids a separate method
     new 3b2d5cc  Remove redundant code from breeze initialization (#9375)
     new e348037  Add missing precommit-hook ids to breeze-complete (#9524)
     new 84d6edd  Gunicorn works better if temporary folder uses tmpfs (#9534)
     new 3749cfe  Make Production Dockerfile OpenShift-compatible (#9545)
     new 4526519  More sensible docker caching strategy for Prod images (#9547)

The 5 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:
 .dockerignore                                      |  2 -
 .github/workflows/ci.yml                           | 16 ++--
 BREEZE.rst                                         | 96 +++++++++++++++++++---
 Dockerfile                                         | 35 +++++---
 IMAGES.rst                                         | 64 +++++++++++----
 breeze                                             | 48 +++++++++--
 breeze-complete                                    |  9 +-
 ...image_on_ci.sh => ci_prepare_ci_image_on_ci.sh} |  3 -
 ...ace_on_ci.sh => ci_prepare_prod_image_on_ci.sh} |  6 +-
 scripts/ci/libraries/_build_images.sh              | 79 ++++++++++++++----
 scripts/ci/libraries/_initialization.sh            | 11 +--
 scripts/prod/entrypoint_prod.sh                    |  9 ++
 12 files changed, 287 insertions(+), 91 deletions(-)
 rename scripts/ci/{ci_prepare_image_on_ci.sh => ci_prepare_ci_image_on_ci.sh} (91%)
 copy scripts/ci/{ci_free_space_on_ci.sh => ci_prepare_prod_image_on_ci.sh} (91%)


[airflow] 05/05: More sensible docker caching strategy for Prod images (#9547)

Posted by po...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

potiuk pushed a commit to branch v1-10-test
in repository https://gitbox.apache.org/repos/asf/airflow.git

commit 4526519a7827fe03d312d731b1e808d9775cdfc4
Author: Jarek Potiuk <ja...@polidea.com>
AuthorDate: Sun Jun 28 17:38:17 2020 +0200

    More sensible docker caching strategy for Prod images (#9547)
    
    Local caching is now default strategy when building
    the Production image.
    
    You can still change it to pulled - similar to CI builds
    by providing the right build flag and this is what
    is used in CI by default. The flags in Breeze are now updated
    to be more eplanatory and friendly (build-cache-*) and a flag
    for "disabled" cache option is added as well.
    
    Also the Dockerfile and Dockerfile.ci files are not needed
    any more in the docker context. They used to be needed when
    we built the Kubernetes image in the container, but since
    we are now using production image directly - we do not need
    them any nmore.
    
    Combining setting the default strategy to local and removing
    the Dockerfile from the context has the nice effect that you
    can iterate much faster on the Production image without
    triggering rebuilds of half of the docker image
    as soon as the Dockerfile changes.
    
    (cherry picked from commit 6aabd9a86e9582ca762ee143ddce5b6ee1619684)
---
 .dockerignore                                      |  2 -
 .github/workflows/ci.yml                           | 16 ++--
 BREEZE.rst                                         | 85 ++++++++++++++++++++--
 IMAGES.rst                                         | 52 +++++++++----
 breeze                                             | 48 ++++++++++--
 breeze-complete                                    |  6 +-
 ...image_on_ci.sh => ci_prepare_ci_image_on_ci.sh} |  3 -
 ...age_on_ci.sh => ci_prepare_prod_image_on_ci.sh} |  5 +-
 scripts/ci/libraries/_build_images.sh              | 78 ++++++++++++++++----
 scripts/ci/libraries/_initialization.sh            |  5 --
 10 files changed, 232 insertions(+), 68 deletions(-)

diff --git a/.dockerignore b/.dockerignore
index 0a89434..6f89516 100644
--- a/.dockerignore
+++ b/.dockerignore
@@ -42,8 +42,6 @@
 !.dockerignore
 !pytest.ini
 !CHANGELOG.txt
-!Dockerfile.ci
-!Dockerfile
 !LICENSE
 !MANIFEST.in
 !NOTICE
diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml
index 67b9e50..bc86961 100644
--- a/.github/workflows/ci.yml
+++ b/.github/workflows/ci.yml
@@ -62,7 +62,7 @@ jobs:
       - name: "Free space"
         run: ./scripts/ci/ci_free_space_on_ci.sh
       - name: "Build CI image"
-        run: ./scripts/ci/ci_prepare_image_on_ci.sh
+        run: ./scripts/ci/ci_prepare_ci_image_on_ci.sh
       - name: "Static checks"
         if: success()
         run: |
@@ -83,7 +83,7 @@ jobs:
         with:
           python-version: '3.6'
       - name: "Build CI image ${{ matrix.python-version }}"
-        run: ./scripts/ci/ci_prepare_image_on_ci.sh
+        run: ./scripts/ci/ci_prepare_ci_image_on_ci.sh
       - name: "Build docs"
         run: ./scripts/ci/ci_docs.sh
 
@@ -200,7 +200,7 @@ ${{ hashFiles('requirements/requirements-python${{matrix.python-version}}.txt')
       - name: "Free space"
         run: ./scripts/ci/ci_free_space_on_ci.sh
       - name: "Build CI image ${{ matrix.python-version }}"
-        run: ./scripts/ci/ci_prepare_image_on_ci.sh
+        run: ./scripts/ci/ci_prepare_ci_image_on_ci.sh
       - name: "Tests"
         run: ./scripts/ci/ci_run_airflow_testing.sh
 
@@ -232,7 +232,7 @@ ${{ hashFiles('requirements/requirements-python${{matrix.python-version}}.txt')
       - name: "Free space"
         run: ./scripts/ci/ci_free_space_on_ci.sh
       - name: "Build CI image ${{ matrix.python-version }}"
-        run: ./scripts/ci/ci_prepare_image_on_ci.sh
+        run: ./scripts/ci/ci_prepare_ci_image_on_ci.sh
       - name: "Tests"
         run: ./scripts/ci/ci_run_airflow_testing.sh
 
@@ -262,7 +262,7 @@ ${{ hashFiles('requirements/requirements-python${{matrix.python-version}}.txt')
       - name: "Free space"
         run: ./scripts/ci/ci_free_space_on_ci.sh
       - name: "Build CI image ${{ matrix.python-version }}"
-        run: ./scripts/ci/ci_prepare_image_on_ci.sh
+        run: ./scripts/ci/ci_prepare_ci_image_on_ci.sh
       - name: "Tests"
         run: ./scripts/ci/ci_run_airflow_testing.sh
 
@@ -293,7 +293,7 @@ ${{ hashFiles('requirements/requirements-python${{matrix.python-version}}.txt')
       - name: "Free space"
         run: ./scripts/ci/ci_free_space_on_ci.sh
       - name: "Build CI image ${{ matrix.python-version }}"
-        run: ./scripts/ci/ci_prepare_image_on_ci.sh
+        run: ./scripts/ci/ci_prepare_ci_image_on_ci.sh
       - name: "Tests"
         run: ./scripts/ci/ci_run_airflow_testing.sh
 
@@ -316,7 +316,7 @@ ${{ hashFiles('requirements/requirements-python${{matrix.python-version}}.txt')
       - name: "Free space"
         run: ./scripts/ci/ci_free_space_on_ci.sh
       - name: "Build CI image ${{ matrix.python-version }}"
-        run: ./scripts/ci/ci_prepare_image_on_ci.sh
+        run: ./scripts/ci/ci_prepare_ci_image_on_ci.sh
       - name: "Generate requirements"
         run: ./scripts/ci/ci_generate_requirements.sh
 
@@ -374,6 +374,6 @@ ${{ hashFiles('requirements/requirements-python${{matrix.python-version}}.txt')
       - name: "Free space"
         run: ./scripts/ci/ci_free_space_on_ci.sh
       - name: "Build CI image"
-        run: ./scripts/ci/ci_prepare_image_on_ci.sh
+        run: ./scripts/ci/ci_prepare_ci_image_on_ci.sh
       - name: "Push CI image ${{ matrix.python-version }}"
         run: ./scripts/ci/ci_push_ci_image.sh
diff --git a/BREEZE.rst b/BREEZE.rst
index 441c19b..ee0ff01 100644
--- a/BREEZE.rst
+++ b/BREEZE.rst
@@ -790,7 +790,9 @@ This is the current syntax for  `./breeze <./breeze>`_:
 
         Builds docker image (CI or production) without entering the container. You can pass
         additional options to this command, such as '--force-build-image',
-        '--force-pull-image' '--python' '--use-local-cache'' in order to modify build behaviour.
+        '--force-pull-image', '--python', '--build-cache-local' or '-build-cache-pulled'
+        in order to modify build behaviour.
+
         You can also pass '--production-image' flag to build production image rather than CI image.
 
   Flags:
@@ -851,9 +853,32 @@ This is the current syntax for  `./breeze <./breeze>`_:
           Force build images with cache disabled. This will remove the pulled or build images
           and start building images from scratch. This might take a long time.
 
-  -L, --use-local-cache
+  -L, --build-cache-local
           Uses local cache to build images. No pulled images will be used, but results of local
-          builds in the Docker cache are used instead.
+          builds in the Docker cache are used instead. This will take longer than when the pulled
+          cache is used for the first time, but subsequent '--build-cache-local' builds will be
+          faster as they will use mostly the locally build cache.
+
+          This is default strategy used by the Production image builds.
+
+  -U, --build-cache-pulled
+          Uses images pulled from registry (either DockerHub or GitHub depending on
+          --github-registry flag) to build images. The pulled images will be used as cache.
+          Those builds are usually faster than when ''--build-cache-local'' with the exception if
+          the registry images are not yet updated. The DockerHub images are updated nightly and the
+          GitHub images are updated after merges to master so it might be that the images are still
+          outdated vs. the latest version of the Dockerfiles you are using. In this case, the
+          ''--build-cache-local'' might be faster, especially if you iterate and change the
+          Dockerfiles yourself.
+
+          This is default strategy used by the CI image builds.
+
+  -X, --build-cache-disabled
+          Disables cache during docker builds. This is useful if you want to make sure you want to
+          rebuild everything from scratch.
+
+          This strategy is used by default for both Production and CI images for the scheduled
+          (nightly) builds in CI.
 
   -D, --dockerhub-user
           DockerHub user used to pull, push and build images. Default: apache.
@@ -1199,9 +1224,32 @@ This is the current syntax for  `./breeze <./breeze>`_:
           Force build images with cache disabled. This will remove the pulled or build images
           and start building images from scratch. This might take a long time.
 
-  -L, --use-local-cache
+  -L, --build-cache-local
           Uses local cache to build images. No pulled images will be used, but results of local
-          builds in the Docker cache are used instead.
+          builds in the Docker cache are used instead. This will take longer than when the pulled
+          cache is used for the first time, but subsequent '--build-cache-local' builds will be
+          faster as they will use mostly the locally build cache.
+
+          This is default strategy used by the Production image builds.
+
+  -U, --build-cache-pulled
+          Uses images pulled from registry (either DockerHub or GitHub depending on
+          --github-registry flag) to build images. The pulled images will be used as cache.
+          Those builds are usually faster than when ''--build-cache-local'' with the exception if
+          the registry images are not yet updated. The DockerHub images are updated nightly and the
+          GitHub images are updated after merges to master so it might be that the images are still
+          outdated vs. the latest version of the Dockerfiles you are using. In this case, the
+          ''--build-cache-local'' might be faster, especially if you iterate and change the
+          Dockerfiles yourself.
+
+          This is default strategy used by the CI image builds.
+
+  -X, --build-cache-disabled
+          Disables cache during docker builds. This is useful if you want to make sure you want to
+          rebuild everything from scratch.
+
+          This strategy is used by default for both Production and CI images for the scheduled
+          (nightly) builds in CI.
 
 
   ####################################################################################################
@@ -1450,9 +1498,32 @@ This is the current syntax for  `./breeze <./breeze>`_:
           Force build images with cache disabled. This will remove the pulled or build images
           and start building images from scratch. This might take a long time.
 
-  -L, --use-local-cache
+  -L, --build-cache-local
           Uses local cache to build images. No pulled images will be used, but results of local
-          builds in the Docker cache are used instead.
+          builds in the Docker cache are used instead. This will take longer than when the pulled
+          cache is used for the first time, but subsequent '--build-cache-local' builds will be
+          faster as they will use mostly the locally build cache.
+
+          This is default strategy used by the Production image builds.
+
+  -U, --build-cache-pulled
+          Uses images pulled from registry (either DockerHub or GitHub depending on
+          --github-registry flag) to build images. The pulled images will be used as cache.
+          Those builds are usually faster than when ''--build-cache-local'' with the exception if
+          the registry images are not yet updated. The DockerHub images are updated nightly and the
+          GitHub images are updated after merges to master so it might be that the images are still
+          outdated vs. the latest version of the Dockerfiles you are using. In this case, the
+          ''--build-cache-local'' might be faster, especially if you iterate and change the
+          Dockerfiles yourself.
+
+          This is default strategy used by the CI image builds.
+
+  -X, --build-cache-disabled
+          Disables cache during docker builds. This is useful if you want to make sure you want to
+          rebuild everything from scratch.
+
+          This strategy is used by default for both Production and CI images for the scheduled
+          (nightly) builds in CI.
 
   ****************************************************************************************************
    Flags for pulling/pushing Docker images (both CI and production)
diff --git a/IMAGES.rst b/IMAGES.rst
index 99eaf77..c829176 100644
--- a/IMAGES.rst
+++ b/IMAGES.rst
@@ -115,36 +115,56 @@ parameter to Breeze:
 Using cache during builds
 =========================
 
-Default mechanism used in Breeze for building images uses - as base - images puled from DockerHub or
-GitHub Image Registry. This is in order to speed up local builds and CI builds - instead of 15 minutes
+Default mechanism used in Breeze for building CI images uses images pulled from DockerHub or
+GitHub Image Registry. This is done to speed up local builds and CI builds - instead of 15 minutes
 for rebuild of CI images, it takes usually less than 3 minutes when cache is used. For CI builds this is
-usually the best strategy - to use default "pull" cache - same for Production Image - it's better to rely
-on the "pull" mechanism rather than rebuild the image from the scratch.
+usually the best strategy - to use default "pull" cache. This is default strategy when
+`<BREEZE.rst>`_ builds are performed.
 
-However when you are iterating on the images and want to rebuild them quickly and often you can provide the
-``--use-local-cache`` flag to build commands - this way the standard docker mechanism based on local cache
-will be used. The first time you run it, it will take considerably longer time than if you use the
-default pull mechanism, but then when you do small, incremental changes to local sources, Dockerfile image
-and scripts further rebuilds with --use-local-cache will be considerably faster.
+For Production Image - which is far smaller and faster to build, it's better to use local build cache (the
+standard mechanism that docker uses. This is the default strategy for production images when
+`<BREEZE.rst>`_ builds are performed. The first time you run it, it will take considerably longer time than
+if you use the pull mechanism, but then when you do small, incremental changes to local sources,
+Dockerfile image= and scripts further rebuilds with local build cache will be considerably faster.
+
+You can also disable build cache altogether. This is the strategy used by the scheduled builds in CI - they
+will always rebuild all the images from scratch.
+
+You can change the strategy by providing one of the ``--build-cache-local``, ``--build-cache-pulled`` or
+even ``--build-cache-disabled`` flags when you run Breeze commands. For example:
+
+.. code-block:: bash
+
+  ./breeze build-image --python 3.7 --build-cache-local
+
+Will build the CI image using local build cache (note that it will take quite a long time the first
+time you run it).
 
 .. code-block:: bash
 
-  ./breeze build-image --python 3.7 --production-image --use-local-cache
+  ./breeze build-image --python 3.7 --production-image --build-cache-pulled
+
+Will build the production image with pulled images as cache.
+
+
+.. code-block:: bash
+
+  ./breeze build-image --python 3.7 --production-image --build-cache-disabled
+
+Will build the production image from the scratch.
 
-You can also turn local docker caching by setting DOCKER_CACHE variable to "local" instead of the default
-"pulled" and export it to Breeze.
+You can also turn local docker caching by setting ``DOCKER_CACHE`` variable to "local", "pulled",
+"disabled" and exporting it.
 
 .. code-block:: bash
 
   export DOCKER_CACHE="local"
 
-You can also - if you really want - disable caching altogether by setting this variable to "no-cache".
-This is how "scheduled" builds in our CI are run - those builds take a long time because they
-always rebuild everything from scratch.
+or
 
 .. code-block:: bash
 
-  export DOCKER_CACHE="no-cache"
+  export DOCKER_CACHE="disabled"
 
 
 Choosing image registry
diff --git a/breeze b/breeze
index cc6dacc..16826bc 100755
--- a/breeze
+++ b/breeze
@@ -376,7 +376,7 @@ EOF
         if [[ ${PRODUCTION_IMAGE} == "true" ]]; then
             cat <<EOF
 
-   Use production image.
+   Production image.
 
    Branch name:             ${BRANCH_NAME}
    Docker image:            ${AIRFLOW_PROD_IMAGE}
@@ -384,7 +384,7 @@ EOF
         else
             cat <<EOF
 
-   Use CI image.
+   CI image.
 
    Branch name:             ${BRANCH_NAME}
    Docker image:            ${AIRFLOW_CI_IMAGE}
@@ -675,11 +675,21 @@ function parse_arguments() {
           export DOCKER_CACHE="no-cache"
           export FORCE_BUILD_IMAGES="true"
           shift ;;
-        -L|--use-local-cache)
+        -L|--build-cache-local)
           echo "Use local cache to build images"
           echo
           export DOCKER_CACHE="local"
           shift ;;
+        -U|--build-cache-pulled)
+          echo "Use pulled cache to build images"
+          echo
+          export DOCKER_CACHE="pulled"
+          shift ;;
+        -X|--build-cache-disabled)
+          echo "Use disabled cache to build images"
+          echo
+          export DOCKER_CACHE="disabled"
+          shift ;;
         -P|--force-pull-images)
           echo "Force pulling images before build. Uses pulled images as cache."
           echo
@@ -1052,7 +1062,9 @@ ${CMDNAME} build-image [FLAGS]
 
       Builds docker image (CI or production) without entering the container. You can pass
       additional options to this command, such as '--force-build-image',
-      '--force-pull-image' '--python' '--use-local-cache'' in order to modify build behaviour.
+      '--force-pull-image', '--python', '--build-cache-local' or '-build-cache-pulled'
+      in order to modify build behaviour.
+
       You can also pass '--production-image' flag to build production image rather than CI image.
 
 Flags:
@@ -1516,9 +1528,33 @@ ${FORMATTED_DEFAULT_PROD_EXTRAS}
         Force build images with cache disabled. This will remove the pulled or build images
         and start building images from scratch. This might take a long time.
 
--L, --use-local-cache
+-L, --build-cache-local
         Uses local cache to build images. No pulled images will be used, but results of local
-        builds in the Docker cache are used instead.
+        builds in the Docker cache are used instead. This will take longer than when the pulled
+        cache is used for the first time, but subsequent '--build-cache-local' builds will be
+        faster as they will use mostly the locally build cache.
+
+        This is default strategy used by the Production image builds.
+
+-U, --build-cache-pulled
+        Uses images pulled from registry (either DockerHub or GitHub depending on
+        --github-registry flag) to build images. The pulled images will be used as cache.
+        Those builds are usually faster than when ''--build-cache-local'' with the exception if
+        the registry images are not yet updated. The DockerHub images are updated nightly and the
+        GitHub images are updated after merges to master so it might be that the images are still
+        outdated vs. the latest version of the Dockerfiles you are using. In this case, the
+        ''--build-cache-local'' might be faster, especially if you iterate and change the
+        Dockerfiles yourself.
+
+        This is default strategy used by the CI image builds.
+
+-X, --build-cache-disabled
+        Disables cache during docker builds. This is useful if you want to make sure you want to
+        rebuild everything from scratch.
+
+        This strategy is used by default for both Production and CI images for the scheduled
+        (nightly) builds in CI.
+
 "
 }
 
diff --git a/breeze-complete b/breeze-complete
index af9a314..a3cc1eb 100644
--- a/breeze-complete
+++ b/breeze-complete
@@ -92,7 +92,8 @@ h p: b: i:
 K: V:
 l a: t: d:
 v y n q f
-F P I E: C L
+F P I E: C
+L U X
 D: R: c g: G:
 "
 
@@ -101,7 +102,8 @@ help python: backend: integration:
 kubernetes-mode: kubernetes-version:
 skip-mounting-local-sources install-airflow-version: install-airflow-reference: db-reset
 verbose assume-yes assume-no assume-quit forward-credentials
-force-build-images force-pull-images production-image extras: force-clean-images use-local-cache
+force-build-images force-pull-images production-image extras: force-clean-images
+build-cache-local build-cache-pulled build-cache-disabled
 dockerhub-user: dockerhub-repo: github-registry github-organisation: github-repo:
 postgres-version: mysql-version:
 additional-extras: additional-python-deps: additional-dev-deps: additional-runtime-deps:
diff --git a/scripts/ci/ci_prepare_image_on_ci.sh b/scripts/ci/ci_prepare_ci_image_on_ci.sh
similarity index 91%
copy from scripts/ci/ci_prepare_image_on_ci.sh
copy to scripts/ci/ci_prepare_ci_image_on_ci.sh
index 78cf323..8ced220 100755
--- a/scripts/ci/ci_prepare_image_on_ci.sh
+++ b/scripts/ci/ci_prepare_ci_image_on_ci.sh
@@ -18,7 +18,4 @@
 # shellcheck source=scripts/ci/_script_init.sh
 . "$( dirname "${BASH_SOURCE[0]}" )/_script_init.sh"
 
-export UPGRADE_TO_LATEST_REQUIREMENTS="false"
-export SKIP_CI_IMAGE_CHECK="false"
-
 build_ci_image_on_ci
diff --git a/scripts/ci/ci_prepare_image_on_ci.sh b/scripts/ci/ci_prepare_prod_image_on_ci.sh
similarity index 89%
rename from scripts/ci/ci_prepare_image_on_ci.sh
rename to scripts/ci/ci_prepare_prod_image_on_ci.sh
index 78cf323..066a43b 100755
--- a/scripts/ci/ci_prepare_image_on_ci.sh
+++ b/scripts/ci/ci_prepare_prod_image_on_ci.sh
@@ -18,7 +18,4 @@
 # shellcheck source=scripts/ci/_script_init.sh
 . "$( dirname "${BASH_SOURCE[0]}" )/_script_init.sh"
 
-export UPGRADE_TO_LATEST_REQUIREMENTS="false"
-export SKIP_CI_IMAGE_CHECK="false"
-
-build_ci_image_on_ci
+build_prod_image_on_ci
diff --git a/scripts/ci/libraries/_build_images.sh b/scripts/ci/libraries/_build_images.sh
index 0f1d9aa..7195b12 100644
--- a/scripts/ci/libraries/_build_images.sh
+++ b/scripts/ci/libraries/_build_images.sh
@@ -316,6 +316,11 @@ function print_build_info() {
 # Prepares all variables needed by the CI build. Depending on the configuration used (python version
 # DockerHub user etc. the variables are set so that other functions can use those variables.
 function prepare_ci_build() {
+    # We use pulled docker image cache by default for CI images to  speed up the builds
+    export DOCKER_CACHE=${DOCKER_CACHE:="pulled"}
+    echo
+    echo "Using ${DOCKER_CACHE} cache strategy for the build."
+    echo
     export AIRFLOW_CI_BASE_TAG="${BRANCH_NAME}-python${PYTHON_MAJOR_MINOR_VERSION}-ci"
     export AIRFLOW_CI_LOCAL_MANIFEST_IMAGE="local/${DOCKERHUB_REPO}:${AIRFLOW_CI_BASE_TAG}-manifest"
     export AIRFLOW_CI_REMOTE_MANIFEST_IMAGE="${DOCKERHUB_USER}/${DOCKERHUB_REPO}:${AIRFLOW_CI_BASE_TAG}-manifest"
@@ -473,25 +478,39 @@ function rebuild_ci_image_if_needed_and_confirmed() {
     fi
 }
 
-
-# Builds the CI image in the CI environment.
-# Depending on the type of build (push/pr/scheduled) it will either build it incrementally or
-# from the scratch without cache (the latter for scheduled builds only)
-function build_ci_image_on_ci() {
-    get_ci_environment
-
-    # In case of CRON jobs we run builds without cache and upgrade to latest requirements
+# Determines the strategy to be used for caching based on the type of CI job run.
+# In case of CRON jobs we run builds without cache and upgrade to latest requirements
+function determine_cache_strategy() {
     if [[ "${CI_EVENT_TYPE:=}" == "schedule" ]]; then
         echo
         echo "Disabling cache for scheduled jobs"
         echo
-        export DOCKER_CACHE="no-cache"
+        export DOCKER_CACHE="disabled"
         echo
-        echo "Requirements are upgraded to latest while running Docker build"
+        echo "Requirements are upgraded to latest for scheduled CI build"
         echo
         export UPGRADE_TO_LATEST_REQUIREMENTS="true"
+    else
+        echo
+        echo "Pull cache used for regular CI builds"
+        echo
+        export DOCKER_CACHE="pulled"
+        echo
+        echo "Requirements are not upgraded to latest ones for regular CI builds"
+        echo
+        export UPGRADE_TO_LATEST_REQUIREMENTS="false"
     fi
+}
+
+
+# Builds the CI image in the CI environment.
+# Depending on the type of build (push/pr/scheduled) it will either build it incrementally or
+# from the scratch without cache (the latter for scheduled builds only)
+function build_ci_image_on_ci() {
+    export SKIP_CI_IMAGE_CHECK="false"
 
+    get_ci_environment
+    determine_cache_strategy
     prepare_ci_build
 
     rm -rf "${BUILD_CACHE_DIR}"
@@ -500,12 +519,14 @@ function build_ci_image_on_ci() {
     rebuild_ci_image_if_needed
 
     # Disable force pulling forced above this is needed for the subsequent scripts so that
-    # They do not try to pull/build images again
+    # They do not try to pull/build images again. Also skip the image check entirely for
+    # the rest of the script
     unset FORCE_PULL_IMAGES
     unset FORCE_BUILD
+    export SKIP_CI_IMAGE_CHECK="true"
 }
 
-# Builds CI image - depending on the caching strategy (pulled, local, no-cache) it
+# Builds CI image - depending on the caching strategy (pulled, local, disabled) it
 # passes the necessary docker build flags via DOCKER_CACHE_CI_DIRECTIVE array
 # it also passes the right Build args depending on the configuration of the build
 # selected by Breeze flags or environment variables.
@@ -521,7 +542,7 @@ function build_ci_image() {
     fi
     pull_ci_image_if_needed
 
-    if [[ "${DOCKER_CACHE}" == "no-cache" ]]; then
+    if [[ "${DOCKER_CACHE}" == "disabled" ]]; then
         export DOCKER_CACHE_CI_DIRECTIVE=("--no-cache")
     elif [[ "${DOCKER_CACHE}" == "local" ]]; then
         export DOCKER_CACHE_CI_DIRECTIVE=()
@@ -570,6 +591,11 @@ Docker building ${AIRFLOW_CI_IMAGE}.
 # Prepares all variables needed by the CI build. Depending on the configuration used (python version
 # DockerHub user etc. the variables are set so that other functions can use those variables.
 function prepare_prod_build() {
+    # We use local docker image cache by default for Production images
+    export DOCKER_CACHE=${DOCKER_CACHE:="local"}
+    echo
+    echo "Using ${DOCKER_CACHE} cache strategy for the build."
+    echo
     if [[ "${INSTALL_AIRFLOW_REFERENCE:=}" != "" ]]; then
         # When --install-airflow-reference is used then the image is build from github tag
         EXTRA_DOCKER_PROD_BUILD_FLAGS=(
@@ -638,7 +664,29 @@ function prepare_prod_build() {
 }
 
 
-# Builds PROD image - depending on the caching strategy (pulled, local, no-cache) it
+# Builds the prod image in the CI environment.
+# Depending on the type of build (push/pr/scheduled) it will either build it incrementally or
+# from the scratch without cache (the latter for scheduled builds only)
+function build_prod_image_on_ci() {
+    get_prod_environment
+
+    determine_cache_strategy
+
+    prepare_prod_build
+
+    rm -rf "${BUILD_CACHE_DIR}"
+    mkdir -pv "${BUILD_CACHE_DIR}"
+
+    build_prod_image
+
+    # Disable force pulling forced above this is needed for the subsequent scripts so that
+    # They do not try to pull/build images again
+    unset FORCE_PULL_IMAGES
+    unset FORCE_BUILD
+}
+
+
+# Builds PROD image - depending on the caching strategy (pulled, local, disabled) it
 # passes the necessary docker build flags via DOCKER_CACHE_PROD_DIRECTIVE and
 # DOCKER_CACHE_PROD_BUILD_DIRECTIVE (separate caching options are needed for "build" segment of the image)
 # it also passes the right Build args depending on the configuration of the build
@@ -647,7 +695,7 @@ function build_prod_image() {
     print_build_info
     pull_prod_images_if_needed
 
-    if [[ "${DOCKER_CACHE}" == "no-cache" ]]; then
+    if [[ "${DOCKER_CACHE}" == "disabled" ]]; then
         export DOCKER_CACHE_PROD_DIRECTIVE=("--cache-from" "${AIRFLOW_PROD_BUILD_IMAGE}")
         export DOCKER_CACHE_PROD_BUILD_DIRECTIVE=("--no-cache")
     elif [[ "${DOCKER_CACHE}" == "local" ]]; then
diff --git a/scripts/ci/libraries/_initialization.sh b/scripts/ci/libraries/_initialization.sh
index 93a6ec5..c0c171a 100644
--- a/scripts/ci/libraries/_initialization.sh
+++ b/scripts/ci/libraries/_initialization.sh
@@ -161,11 +161,6 @@ function initialize_common_environment {
         done
     fi
 
-    # We use pulled docker image cache by default to speed up the builds
-    # but we can also set different docker caching strategy (for example we can use local cache
-    # to build the images in case we iterate with the image
-    export DOCKER_CACHE=${DOCKER_CACHE:="pulled"}
-
     # By default we are not upgrading to latest requirements when building Docker CI image
     # This will only be done in cron jobs
     export UPGRADE_TO_LATEST_REQUIREMENTS=${UPGRADE_TO_LATEST_REQUIREMENTS:="false"}


[airflow] 04/05: Make Production Dockerfile OpenShift-compatible (#9545)

Posted by po...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

potiuk pushed a commit to branch v1-10-test
in repository https://gitbox.apache.org/repos/asf/airflow.git

commit 3749cfe9c5dae1b547c3ab650333fbed06b920ce
Author: Jarek Potiuk <ja...@polidea.com>
AuthorDate: Sat Jun 27 14:29:55 2020 +0200

    Make Production Dockerfile OpenShift-compatible (#9545)
    
    OpenShift (and other Kubernetes platforms) often use the approach
    that they start containers with random user and root group. This is
    described in the https://docs.openshift.com/container-platform/3.7/creating_images/guidelines.html
    
    All the files created by the "airflow" user are now belonging to
    'root' group and the root group has the same access to those
    files as the Airflow user.
    
    Additionally, the random user gets automatically added
    /etc/passwd entry which is name 'default'. The name of the user
    can be set by setting the USER_NAME variable when starting the
    container.
    
    Closes #9248
    Closes #8706
    
    (cherry picked from commit cf510a30fb4bd2aae0860712f02e205b1d6c8f53)
---
 Dockerfile                            | 34 ++++++++++++++++++++++++----------
 IMAGES.rst                            | 12 +++++++++++-
 scripts/ci/libraries/_build_images.sh |  1 -
 scripts/prod/entrypoint_prod.sh       |  9 +++++++++
 4 files changed, 44 insertions(+), 12 deletions(-)

diff --git a/Dockerfile b/Dockerfile
index 822b16d..bd21d06 100644
--- a/Dockerfile
+++ b/Dockerfile
@@ -215,8 +215,7 @@ RUN pip install --user "${AIRFLOW_INSTALL_SOURCES}[${AIRFLOW_EXTRAS}]${AIRFLOW_I
     find /root/.local/ -name '*.pyc' -print0 | xargs -0 rm -r && \
     find /root/.local/ -type d -name '__pycache__' -print0 | xargs -0 rm -r
 
-RUN \
-    AIRFLOW_SITE_PACKAGE="/root/.local/lib/python${PYTHON_MAJOR_MINOR_VERSION}/site-packages/airflow"; \
+RUN AIRFLOW_SITE_PACKAGE="/root/.local/lib/python${PYTHON_MAJOR_MINOR_VERSION}/site-packages/airflow"; \
     if [[ -f "${AIRFLOW_SITE_PACKAGE}/www_rbac/package.json" ]]; then \
         WWW_DIR="${AIRFLOW_SITE_PACKAGE}/www_rbac"; \
     elif [[ -f "${AIRFLOW_SITE_PACKAGE}/www/package.json" ]]; then \
@@ -228,6 +227,10 @@ RUN \
         rm -rf "${WWW_DIR}/node_modules"; \
     fi
 
+# make sure that all directories and files in .local are also group accessible
+RUN find /root/.local -executable -print0 | xargs --null chmod g+x && \
+    find /root/.local -print0 | xargs --null chmod g+rw
+
 ##############################################################################################
 # This is the actual Airflow image - much smaller than the build one. We copy
 # installed Airflow and all it's dependencies from the build image to make it smaller.
@@ -326,36 +329,47 @@ RUN pip install --upgrade pip==${PIP_VERSION}
 ENV AIRFLOW_UID=${AIRFLOW_UID}
 ENV AIRFLOW_GID=${AIRFLOW_GID}
 
+ENV AIRFLOW__CORE__LOAD_EXAMPLES="false"
+
+ARG AIRFLOW_USER_HOME_DIR=/home/airflow
+ENV AIRFLOW_USER_HOME_DIR=${AIRFLOW_USER_HOME_DIR}
+
 RUN addgroup --gid "${AIRFLOW_GID}" "airflow" && \
     adduser --quiet "airflow" --uid "${AIRFLOW_UID}" \
-        --ingroup "airflow" \
-        --home /home/airflow
+        --gid "${AIRFLOW_GID}" \
+        --home "${AIRFLOW_USER_HOME_DIR}"
 
 ARG AIRFLOW_HOME
 ENV AIRFLOW_HOME=${AIRFLOW_HOME}
 
+# Make Airflow files belong to the root group and are accessible. This is to accomodate the guidelines from
+# OpenShift https://docs.openshift.com/enterprise/3.0/creating_images/guidelines.html
 RUN mkdir -pv "${AIRFLOW_HOME}"; \
     mkdir -pv "${AIRFLOW_HOME}/dags"; \
     mkdir -pv "${AIRFLOW_HOME}/logs"; \
-    chown -R "airflow" "${AIRFLOW_HOME}"
+    chown -R "airflow:root" "${AIRFLOW_USER_HOME_DIR}" "${AIRFLOW_HOME}"; \
+    find "${AIRFLOW_HOME}" -executable -print0 | xargs --null chmod g+x && \
+        find "${AIRFLOW_HOME}" -print0 | xargs --null chmod g+rw
 
-COPY --chown=airflow:airflow --from=airflow-build-image /root/.local "/home/airflow/.local"
+COPY --chown=airflow:root --from=airflow-build-image /root/.local "${AIRFLOW_USER_HOME_DIR}/.local"
 
 COPY scripts/prod/entrypoint_prod.sh /entrypoint
 COPY scripts/prod/clean-logs.sh /clean-logs
 
 RUN chmod a+x /entrypoint /clean-logs
 
-USER airflow
+# Make /etc/passwd root-group-writeable so that user can be dynamically added by OpenShift
+# See https://github.com/apache/airflow/issues/9248
+RUN chmod g=u /etc/passwd
 
-ENV PATH="/home/airflow/.local/bin:${PATH}"
+ENV PATH="${AIRFLOW_USER_HOME_DIR}/.local/bin:${PATH}"
 ENV GUNICORN_CMD_ARGS="--worker-tmp-dir /dev/shm"
 
 WORKDIR ${AIRFLOW_HOME}
 
-ENV AIRFLOW__CORE__LOAD_EXAMPLES="false"
-
 EXPOSE 8080
 
+USER ${AIRFLOW_UID}
+
 ENTRYPOINT ["/usr/bin/dumb-init", "--", "/entrypoint"]
 CMD ["airflow", "--help"]
diff --git a/IMAGES.rst b/IMAGES.rst
index 8b4b8df..99eaf77 100644
--- a/IMAGES.rst
+++ b/IMAGES.rst
@@ -399,7 +399,12 @@ The following build arguments (``--build-arg`` in docker build command) can be u
 +------------------------------------------+------------------------------------------+------------------------------------------+
 | ``AIRFLOW_UID``                          | ``50000``                                | Airflow user UID                         |
 +------------------------------------------+------------------------------------------+------------------------------------------+
-| ``AIRFLOW_GID``                          | ``50000``                                | Airflow group GID                        |
+| ``AIRFLOW_GID``                          | ``50000``                                | Airflow group GID. Note that most files  |
+|                                          |                                          | created on behalf of airflow user belong |
+|                                          |                                          | to the ``root`` group (0) to keep        |
+|                                          |                                          | OpenShift Guidelines compatibility       |
++------------------------------------------+------------------------------------------+------------------------------------------+
+| ``AIRFLOW_USER_HOME_DIR``                | ``/home/airflow``                        | Home directory of the Airflow user       |
 +------------------------------------------+------------------------------------------+------------------------------------------+
 | ``PIP_VERSION``                          | ``19.0.2``                               | version of PIP to use                    |
 +------------------------------------------+------------------------------------------+------------------------------------------+
@@ -621,6 +626,11 @@ Using the PROD image
 
 The PROD image entrypoint works as follows:
 
+* In case the user is not "airflow" (with undefined user id) and the group id of the user is set to 0 (root),
+  then the user is dynamically added to /etc/passwd at entry using USER_NAME variable to define the user name.
+  This is in order to accommodate the
+  `OpenShift Guidelines<https://docs.openshift.com/enterprise/3.0/creating_images/guidelines.html>`_
+
 * If ``AIRFLOW__CORE__SQL_ALCHEMY_CONN`` variable is passed to the container and it is either mysql or postgres
   SQL alchemy connection, then the connection is checked and the script waits until the database is reachable.
 
diff --git a/scripts/ci/libraries/_build_images.sh b/scripts/ci/libraries/_build_images.sh
index fcb2cc8..0f1d9aa 100644
--- a/scripts/ci/libraries/_build_images.sh
+++ b/scripts/ci/libraries/_build_images.sh
@@ -678,7 +678,6 @@ function build_prod_image() {
         --build-arg ADDITIONAL_AIRFLOW_EXTRAS="${ADDITIONAL_AIRFLOW_EXTRAS}" \
         --build-arg ADDITIONAL_PYTHON_DEPS="${ADDITIONAL_PYTHON_DEPS}" \
         --build-arg ADDITIONAL_DEV_DEPS="${ADDITIONAL_DEV_DEPS}" \
-        --build-arg ADDITIONAL_RUNTIME_DEPS="${ADDITIONAL_RUNTIME_DEPS}" \
         "${DOCKER_CACHE_PROD_BUILD_DIRECTIVE[@]}" \
         -t "${AIRFLOW_PROD_BUILD_IMAGE}" \
         --target "airflow-build-image" \
diff --git a/scripts/prod/entrypoint_prod.sh b/scripts/prod/entrypoint_prod.sh
index 220e9b7..f54a57f 100755
--- a/scripts/prod/entrypoint_prod.sh
+++ b/scripts/prod/entrypoint_prod.sh
@@ -90,6 +90,15 @@ function verify_db_connection {
     fi
 }
 
+if ! whoami &> /dev/null; then
+  if [[ -w /etc/passwd ]]; then
+    echo "${USER_NAME:-default}:x:$(id -u):0:${USER_NAME:-default} user:${AIRFLOW_USER_HOME_DIR}:/sbin/nologin" \
+        >> /etc/passwd
+  fi
+  export HOME="${AIRFLOW_USER_HOME_DIR}"
+fi
+
+
 # if no DB configured - use sqlite db by default
 AIRFLOW__CORE__SQL_ALCHEMY_CONN="${AIRFLOW__CORE__SQL_ALCHEMY_CONN:="sqlite:///${AIRFLOW_HOME}/airflow.db"}"
 


[airflow] 01/05: Remove redundant code from breeze initialization (#9375)

Posted by po...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

potiuk pushed a commit to branch v1-10-test
in repository https://gitbox.apache.org/repos/asf/airflow.git

commit 3b2d5cce801968102a09aee947875a96e2ed7b75
Author: Felix Uellendall <fe...@users.noreply.github.com>
AuthorDate: Mon Jun 22 19:36:44 2020 +0200

    Remove redundant code from breeze initialization (#9375)
    
    
    (cherry picked from commit 097180b1fd79244a9f20f8f00b3f767cc6e103cb)
---
 scripts/ci/libraries/_initialization.sh | 6 +-----
 1 file changed, 1 insertion(+), 5 deletions(-)

diff --git a/scripts/ci/libraries/_initialization.sh b/scripts/ci/libraries/_initialization.sh
index f70655b..93a6ec5 100644
--- a/scripts/ci/libraries/_initialization.sh
+++ b/scripts/ci/libraries/_initialization.sh
@@ -52,15 +52,11 @@ function initialize_common_environment {
     # Temporary dir used well ... temporarily
     export TMP_DIR="${AIRFLOW_SOURCES}/tmp"
 
-    # Create those folders above in case they do not exist
+    # Create useful directories if not yet created
     mkdir -p "${TMP_DIR}"
     mkdir -p "${FILES_DIR}"
-
-    # Create useful directories if not yet created
     mkdir -p "${AIRFLOW_SOURCES}/.mypy_cache"
     mkdir -p "${AIRFLOW_SOURCES}/logs"
-    mkdir -p "${AIRFLOW_SOURCES}/tmp"
-    mkdir -p "${AIRFLOW_SOURCES}/files"
     mkdir -p "${AIRFLOW_SOURCES}/dist"
 
     # Read common values used across Breeze and CI scripts


[airflow] 03/05: Gunicorn works better if temporary folder uses tmpfs (#9534)

Posted by po...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

potiuk pushed a commit to branch v1-10-test
in repository https://gitbox.apache.org/repos/asf/airflow.git

commit 84d6edd8eef3b1e8d57eb333e1457d0b36bfa787
Author: Jarek Potiuk <ja...@polidea.com>
AuthorDate: Fri Jun 26 16:41:21 2020 +0200

    Gunicorn works better if temporary folder uses tmpfs (#9534)
    
    This is discussed in the documentation of gunicorn.
    You can find more information here: https://docs.gunicorn.org/en/stable/faq.html#how-do-i-avoid-gunicorn-excessively-blocking-in-os-fchmod
    
    Since we are using docker, we always have shared memory
    available (at least 64MB).
    
    Closes #9379
    
    (cherry picked from commit 2cf167b047063271c0df12344abc0e14940457af)
---
 Dockerfile | 1 +
 1 file changed, 1 insertion(+)

diff --git a/Dockerfile b/Dockerfile
index 56fdf0f..822b16d 100644
--- a/Dockerfile
+++ b/Dockerfile
@@ -349,6 +349,7 @@ RUN chmod a+x /entrypoint /clean-logs
 USER airflow
 
 ENV PATH="/home/airflow/.local/bin:${PATH}"
+ENV GUNICORN_CMD_ARGS="--worker-tmp-dir /dev/shm"
 
 WORKDIR ${AIRFLOW_HOME}
 


[airflow] 02/05: Add missing precommit-hook ids to breeze-complete (#9524)

Posted by po...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

potiuk pushed a commit to branch v1-10-test
in repository https://gitbox.apache.org/repos/asf/airflow.git

commit e348037a4212f7d063a3957b29d3a135e4e54253
Author: Kaxil Naik <ka...@gmail.com>
AuthorDate: Fri Jun 26 10:17:16 2020 +0100

    Add missing precommit-hook ids to breeze-complete (#9524)
    
    (cherry picked from commit 1787057ad82037cdbae5de4128c49928de980f4c)
---
 BREEZE.rst      | 11 ++++++-----
 breeze-complete |  3 +++
 2 files changed, 9 insertions(+), 5 deletions(-)

diff --git a/BREEZE.rst b/BREEZE.rst
index 3e9305a..441c19b 100644
--- a/BREEZE.rst
+++ b/BREEZE.rst
@@ -1215,13 +1215,14 @@ This is the current syntax for  `./breeze <./breeze>`_:
         Run selected static checks for currently changed files. You should specify static check that
         you would like to run or 'all' to run all checks. One of:
 
-                 all airflow-config-yaml bat-tests build check-apache-license
+                 all airflow-config-yaml bat-tests build check-apache-license check-builtin-literals
                  check-executables-have-shebangs check-hooks-apply check-integrations
                  check-merge-conflict check-xml debug-statements detect-private-key doctoc
-                 end-of-file-fixer fix-encoding-pragma flake8 forbid-tabs insert-license
-                 language-matters lint-dockerfile mixed-line-ending mypy pydevd python2-compile
-                 python2-fastcheck python-no-log-warn rst-backticks setup-order shellcheck
-                 trailing-whitespace update-breeze-file update-extras update-local-yml-file yamllint
+                 dont-use-safe-filter end-of-file-fixer fix-encoding-pragma flake8 forbid-tabs
+                 insert-license language-matters lint-dockerfile lint-openapi mixed-line-ending mypy
+                 pydevd python2-compile python2-fastcheck python-no-log-warn rst-backticks
+                 setup-order shellcheck trailing-whitespace update-breeze-file update-extras
+                 update-local-yml-file yamllint
 
         You can pass extra arguments including options to to the pre-commit framework as
         <EXTRA_ARGS> passed after --. For example:
diff --git a/breeze-complete b/breeze-complete
index 6bde8fa..af9a314 100644
--- a/breeze-complete
+++ b/breeze-complete
@@ -47,6 +47,7 @@ airflow-config-yaml
 bat-tests
 build
 check-apache-license
+check-builtin-literals
 check-executables-have-shebangs
 check-hooks-apply
 check-integrations
@@ -55,6 +56,7 @@ check-xml
 debug-statements
 detect-private-key
 doctoc
+dont-use-safe-filter
 end-of-file-fixer
 fix-encoding-pragma
 flake8
@@ -62,6 +64,7 @@ forbid-tabs
 insert-license
 language-matters
 lint-dockerfile
+lint-openapi
 mixed-line-ending
 mypy
 pydevd