You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2022/01/05 07:08:44 UTC

[GitHub] [airflow] potiuk opened a new pull request #20664: Switch to buildx to build airflow images

potiuk opened a new pull request #20664:
URL: https://github.com/apache/airflow/pull/20664


   The "buildkit" is much more modern docker build mechanism and supports
   multiarchitecture builds which makes it suitable for our future ARM
   support, it also has nicer UI and much more sophisticated caching
   mechanisms as well as supports better multi-segment builds.
   
   BuildKit has been promoted to official for quite a while and it is
   rather stable now. Also we can now install BuildKit Plugin to docker
   that add capabilities of building and managin cache using dedicated
   builders (previously BuildKit cache was managed using rather
   complex external tools).
   
   This gives us an opportunity to vastly
   simplify our build scripts, because it has now much more robust caching
   mechanism than the old docker build (which forced us to pull images
   before using them as cache).
   
   We had a lot of complexity involved in efficient caching
   but with BuildKit all that can be vastly simplified and we can
   get rid of:
   
     * keeping base python images in our registry
     * keeping build segments for prod image in our registry
     * keeping manifest images in our registry
     * deciding when to pull or pull&build image (not needed now, we can
       always build image with --cache-from and buildkit will pull cached
       layers as needed
     * building the image when performing pre-commit (rather than that
       we simply encourage users to rebuild the image via breeze command)
     * pulling the images before building
     * separate 'build' cache kept in our registry (not needed any more
       as buildkit allows to keep cache for all segments of multi-segmented
       build in a single cache
     * the nice animated tty UI of buildkit eliminates the need of manual
       spinner
     * and a number of other complexities.
   
   Depends on #20238
   ---
   **^ Add meaningful description above**
   
   Read the **[Pull Request Guidelines](https://github.com/apache/airflow/blob/main/CONTRIBUTING.rst#pull-request-guidelines)** for more information.
   In case of fundamental code change, Airflow Improvement Proposal ([AIP](https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+Improvements+Proposals)) is needed.
   In case of a new dependency, check compliance with the [ASF 3rd Party License Policy](https://www.apache.org/legal/resolved.html#category-x).
   In case of backwards incompatible changes please leave a note in [UPDATING.md](https://github.com/apache/airflow/blob/main/UPDATING.md).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on a change in pull request #20664: Switch to buildx to build airflow images

Posted by GitBox <gi...@apache.org>.
potiuk commented on a change in pull request #20664:
URL: https://github.com/apache/airflow/pull/20664#discussion_r782051292



##########
File path: breeze
##########
@@ -1015,24 +1003,6 @@ function breeze::parse_arguments() {
             # if not set here, docker cached is determined later, depending on type of image to be built
             shift
             ;;
-        -P | --force-pull-images)

Review comment:
       No need to force-pull the image or to pull base python image. We will not let the user (by default) to rebuild the image before rebasing to latest main (it can be forced but it fails by default) so pulling latest python image and latest cache should always be good.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on a change in pull request #20664: Switch to buildx to build airflow images

Posted by GitBox <gi...@apache.org>.
potiuk commented on a change in pull request #20664:
URL: https://github.com/apache/airflow/pull/20664#discussion_r782055398



##########
File path: scripts/ci/libraries/_build_images.sh
##########
@@ -112,27 +112,60 @@ function build_images::forget_last_answer() {
     fi
 }
 
-function build_images::confirm_via_terminal() {
-    echo >"${DETECTED_TERMINAL}"
-    echo >"${DETECTED_TERMINAL}"
+
+function build_images::reconfirm_rebuilding_if_not_rebased() {

Review comment:
       Rather than handling all the different cases when user is not rebased, we now simply (by default) check and require the branch of the user to be rebased to latest `main` when rebuilding the image (and warn the user that not doing so might take a long time). This cuts down the number of cases that we had to handle and eliminates the need of storing python base image in ghcr.io (because different branches and  locally built image  could have different base image).
   
   By asking the user to rebase when rebuilding we make sure that the same base python image is used (and also it makes it possible to use `--pull` flag on all rebuilds to always pull the latest base python image. 
   
   This might have a side effect only for a very short while when the new image is relased and no "main" build succeeded yet - but with the recent stability improvements we had in flaky tests, this case should be very limited.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] mik-laj commented on a change in pull request #20664: Switch to buildx to build airflow images

Posted by GitBox <gi...@apache.org>.
mik-laj commented on a change in pull request #20664:
URL: https://github.com/apache/airflow/pull/20664#discussion_r786415816



##########
File path: scripts/ci/libraries/_build_images.sh
##########
@@ -800,6 +624,12 @@ function build_images::build_prod_images() {
         echo
         exit 1
     fi
+    if [[ ${PREPARE_BUILDX_CACHE} == "true" ]]; then
+        # Cache for prod image contains also build stage for buildx when mode=max specified!
+        docker_cache_prod_directive+=(
+            "--cache-to=type=registry,ref=${AIRFLOW_PROD_IMAGE}:cache,mode=max"

Review comment:
       Do you consider to use `gha` cache?
   > gha type type exports cache through [the Github Actions Cache service API](https://github.com/tonistiigi/go-actions-cache/blob/master/api.md#authentication).
   
   https://github.com/docker/buildx/blob/master/docs/reference/buildx_build.md#-export-build-cache-to-an-external-cache-destination---cache-to




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on pull request #20664: Switch to buildx to build airflow images

Posted by GitBox <gi...@apache.org>.
potiuk commented on pull request #20664:
URL: https://github.com/apache/airflow/pull/20664#issuecomment-1009782503


   Right! This is the last one that remained from the image optimization steps - switching to buildx/buildkit is a major improvement int the way how our images are being built. I am going to test also if our Self-hosted images will cope well with building and pushing the cache (I need to make sure  that buildx plugin is installed for it). 
   
   Really looking forward to this one merged as it will help with building ARM images a lot and will improve the ./breeze experience of all the breeze users a lot (not mentioning -500 lines of bash code that was only needed because before buidkit caching I had to re-implement pretty much all caching experience for "interactive use case" with breeze.
   
   BTW. I am super happy that this code can be removed - it was total pain in the neck to add, maintain and fix it and implement all the workarounds just to overcome the limitation of the original docker caching schemes. Fortunately docker people did a great job in the way how caching has been implemented with buildkit and the buildx plugin allows us to manage it efficiently and automatically.
   
   After this is merged and works for a few days we will be able to remove half of our images we keep in ghcr.io :D


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on a change in pull request #20664: Switch to buildx to build airflow images

Posted by GitBox <gi...@apache.org>.
potiuk commented on a change in pull request #20664:
URL: https://github.com/apache/airflow/pull/20664#discussion_r782062508



##########
File path: Dockerfile.ci
##########
@@ -98,8 +98,7 @@ RUN apt-get update \
 # Only copy mysql/mssql installation scripts for now - so that changing the other

Review comment:
       Why not :). Good point.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk edited a comment on pull request #20664: Switch to buildx to build airflow images

Posted by GitBox <gi...@apache.org>.
potiuk edited a comment on pull request #20664:
URL: https://github.com/apache/airflow/pull/20664#issuecomment-1005562336


   > Whats buildx vs buildkit?
   
   Essentially the same. Buildx is a plugin you can install in docker to get more capabilities of buildkit available as "docker buildx build" command (and a number of management commands). You do not need it to run "buildkit-enabled" builds (`DOCKER_BUILDKIT=1` is enough in docker 1.18+) but you need it for example to build and push cache to github registry.
   
   An example of that is our prod image. It is multi-segmented image so in order to prepare a good cache for the builds I need to do it with command smilar to `docker buildx build . --cache-to=type=registry,ref=ghcr.io/,........,mode=max` (mode=max means that the cache includes layers from all segments which means that when you build muiti-segmented build one --cache-from is enough)  
   
   This was (about a year ago when I checked last time) missing in buildkit - you had to do some strange combination of not-yet-released-then tools from "moby" to build and refresh the cache- but now with the plugin it's a "breeze" to manage and prepare the cache.
   
   In our case users will not have to install buildx plugin, but it will have to be available (eventually) on our self-hosted runners to refresh the cache on main builds (I will add it) and any time you want to manually refresh the cache with `./breeze prepare-build-cache`. 
   
   https://docs.docker.com/engine/reference/commandline/buildx_build/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on a change in pull request #20664: Switch to buildx to build airflow images

Posted by GitBox <gi...@apache.org>.
potiuk commented on a change in pull request #20664:
URL: https://github.com/apache/airflow/pull/20664#discussion_r782057519



##########
File path: scripts/ci/libraries/_push_pull_remove_images.sh
##########
@@ -44,213 +44,22 @@ function push_pull_remove_images::push_image_with_retries() {
 }
 
 
-# Pulls image in case it is needed (either has never been pulled or pulling was forced
+# Pulls image in case it is missing
 # Should be run with set +e
 # Parameters:
 #   $1 -> image to pull
-#   $2 - fallback image
-function push_pull_remove_images::pull_image_if_not_present_or_forced() {
+function push_pull_remove_images::pull_image_if_missing() {
     local image_to_pull="${1}"
     local image_hash
     image_hash=$(docker images -q "${image_to_pull}" 2> /dev/null || true)
-    local pull_image=${FORCE_PULL_IMAGES}
-
     if [[ -z "${image_hash=}" ]]; then
-        pull_image="true"
-    fi
-    if [[ "${pull_image}" == "true" ]]; then
         echo
         echo "Pulling the image ${image_to_pull}"
         echo
         docker pull "${image_to_pull}"
     fi
 }
 
-# Rebuilds python base image from the latest available Python version if it has been updated

Review comment:
       Same here - all that was needed to check and pull the right images before we could use them as cache. This can be removed now completely.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on pull request #20664: Switch to buildx to build airflow images

Posted by GitBox <gi...@apache.org>.
potiuk commented on pull request #20664:
URL: https://github.com/apache/airflow/pull/20664#issuecomment-1005438706


   cc: @Bowrna @edithturn - > that will make your job much easier, it simplifies a lot of the "caching" complexity - that's why I did not want you to start looking at it before as I knew we are going to get it WAAAAY simpler with BuildKit.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk edited a comment on pull request #20664: Switch to buildx to build airflow images

Posted by GitBox <gi...@apache.org>.
potiuk edited a comment on pull request #20664:
URL: https://github.com/apache/airflow/pull/20664#issuecomment-1008843903


   > FYI "Video can't be played because the file is corrupt"
   
   Works for me in Chrome :) On Linux. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on a change in pull request #20664: Switch to buildx to build airflow images

Posted by GitBox <gi...@apache.org>.
potiuk commented on a change in pull request #20664:
URL: https://github.com/apache/airflow/pull/20664#discussion_r782056909



##########
File path: scripts/ci/libraries/_md5sum.sh
##########
@@ -152,22 +152,3 @@ function md5sum::check_if_docker_build_is_needed() {
         fi
     fi
 }
-

Review comment:
       We do not need to pull the image at all now - buildkit caching is checked without pulling the image first.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] uranusjr commented on pull request #20664: Switch to buildx to build airflow images

Posted by GitBox <gi...@apache.org>.
uranusjr commented on pull request #20664:
URL: https://github.com/apache/airflow/pull/20664#issuecomment-1005457445


   I’m not too familiar with buildx in the first palce and will dismiss my request for review.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on pull request #20664: Switch to buildx to build airflow images

Posted by GitBox <gi...@apache.org>.
potiuk commented on pull request #20664:
URL: https://github.com/apache/airflow/pull/20664#issuecomment-1008843903


   > FYI "Video can't be played because the file is corrupt"
   
   Works for me in Chrome :) 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on a change in pull request #20664: Switch to buildx to build airflow images

Posted by GitBox <gi...@apache.org>.
potiuk commented on a change in pull request #20664:
URL: https://github.com/apache/airflow/pull/20664#discussion_r786712445



##########
File path: scripts/ci/libraries/_build_images.sh
##########
@@ -800,6 +624,12 @@ function build_images::build_prod_images() {
         echo
         exit 1
     fi
+    if [[ ${PREPARE_BUILDX_CACHE} == "true" ]]; then
+        # Cache for prod image contains also build stage for buildx when mode=max specified!
+        docker_cache_prod_directive+=(
+            "--cache-to=type=registry,ref=${AIRFLOW_PROD_IMAGE}:cache,mode=max"

Review comment:
       Yep. I did consider it, but:
   
   > Github Actions cache saves both cache metadata and layers to GitHub's Cache service. This cache currently has a size limit of 10GB that is shared accross different caches in the repo. If you exceed this limit, GitHub will save your cache but will begin evicting caches until the total size is less than 10 GB. Recycling caches too often can result in slower runtimes overall.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on a change in pull request #20664: Switch to buildx to build airflow images

Posted by GitBox <gi...@apache.org>.
potiuk commented on a change in pull request #20664:
URL: https://github.com/apache/airflow/pull/20664#discussion_r782051945



##########
File path: breeze
##########
@@ -3536,13 +3505,6 @@ function breeze::run_breeze_command() {
         docker_engine_resources::check_all_resources
         runs::run_prepare_provider_documentation "${@}"
         ;;
-    perform_push_image)

Review comment:
       We do not need to push image from breeze any more. 'prepare cache' takes care about it via --cache-to directive during the build.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on a change in pull request #20664: Switch to buildx to build airflow images

Posted by GitBox <gi...@apache.org>.
potiuk commented on a change in pull request #20664:
URL: https://github.com/apache/airflow/pull/20664#discussion_r782045197



##########
File path: IMAGES.rst
##########
@@ -230,45 +229,41 @@ or
 Naming conventions
 ==================
 
-By default images are pulled and pushed from and to Github Container registry when you use Breeze's push-image
-or build commands.
-
-We are using GitHub Container Registry as build cache.The images are all in organization wide "apache/"
-namespace. We are adding "airflow-" as prefix for image names of all Airflow images.
-The images are linked to the repository via ``org.opencontainers.image.source`` label in the image.
+By default images we are using cache for images in Github Container registry. We are using GitHub
+Container Registry as development image cache and CI registry for build images.
+The images are all in organization wide "apache/" namespace. We are adding "airflow-" as prefix for
+the image names of all Airflow images. The images are linked to the repository
+via ``org.opencontainers.image.source`` label in the image.
 
 See https://docs.github.com/en/packages/learn-github-packages/connecting-a-repository-to-a-package
 
 Naming convention for the GitHub packages.
 
-Images with a commit SHA (built for pull requests and pushes)
+Images with a commit SHA (built for pull requests and pushes). Those are images that are snapshot of the
+currently run build. They are built once per each build and pulled by each test job.
 
 .. code-block:: bash
 
   ghcr.io/apache/airflow/<BRANCH>/ci/python<X.Y>:<COMMIT_SHA>         - for CI images
   ghcr.io/apache/airflow/<BRANCH>/prod/python<X.Y>:<COMMIT_SHA>       - for production images
 
-We do not push Base Python images and prod-build images when we prepare COMMIT builds, because those
-images are never rebuilt locally, so there is no need to store base images specific for those builds.
 
-Latest images (pushed when main merge succeeds):
+The cache images (pushed when main merge succeeds) are kept with ``cache`` tag:
 
 .. code-block:: bash
 
-  ghcr.io/apache/airflow/<BRANCH>/python:<X.Y>-slim-buster        - for base Python images
-  ghcr.io/apache/airflow/<BRANCH>/ci/python<X.Y>:latest           - for CI images
-  ghcr.io/apache/airflow/<BRANCH>/ci-manifest/python<X.Y>:latest  - for CI Manifest images
-  ghcr.io/apache/airflow/<BRANCH>/prod/python<X.Y>:latest         - for production images
-  ghcr.io/apache/airflow/<BRANCH>/prod-build/python<X.Y>:latest   - for production build stage

Review comment:
       * 3/5 of the images can be scraped now. Just :cache for PROD and CI are enough to efficiently rebuild the image.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ashb commented on pull request #20664: Switch to buildx to build airflow images

Posted by GitBox <gi...@apache.org>.
ashb commented on pull request #20664:
URL: https://github.com/apache/airflow/pull/20664#issuecomment-1005556253


   Whats buildx vs buildkit?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] mik-laj commented on a change in pull request #20664: Switch to buildx to build airflow images

Posted by GitBox <gi...@apache.org>.
mik-laj commented on a change in pull request #20664:
URL: https://github.com/apache/airflow/pull/20664#discussion_r781159629



##########
File path: Dockerfile
##########
@@ -166,10 +166,18 @@ ARG INSTALL_PROVIDERS_FROM_SOURCES="false"
 # But it also can be `.` from local installation or GitHub URL pointing to specific branch or tag

Review comment:
       Should we add information about frontend?
   ```
   # syntax = docker/dockerfile:1.3
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk edited a comment on pull request #20664: Switch to buildx to build airflow images

Posted by GitBox <gi...@apache.org>.
potiuk edited a comment on pull request #20664:
URL: https://github.com/apache/airflow/pull/20664#issuecomment-1008843903


   > FYI "Video can't be played because the file is corrupt"
   
   Works for me in Chrome :) On Linux. Also in incognito mode.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on a change in pull request #20664: Switch to buildx to build airflow images

Posted by GitBox <gi...@apache.org>.
potiuk commented on a change in pull request #20664:
URL: https://github.com/apache/airflow/pull/20664#discussion_r782049978



##########
File path: IMAGES.rst
##########
@@ -502,116 +487,58 @@ This builds the CI image in version 3.7 with default extras ("all").
 
 .. code-block:: bash
 
-  docker build . -f Dockerfile.ci --build-arg PYTHON_BASE_IMAGE="python:3.7-slim-buster" --tag my-image:0.0.1
+  DOCKER_BUILDKIT=1 docker build . -f Dockerfile.ci \
+     --pull \
+     --build-arg PYTHON_BASE_IMAGE="python:3.7-slim-buster" --tag my-image:0.0.1
 
 
 This builds the CI image in version 3.6 with "gcp" extra only.
 
 .. code-block:: bash
 
-  docker build . -f Dockerfile.ci --build-arg PYTHON_BASE_IMAGE="python:3.7-slim-buster" \
+  DOCKER_BUILDKIT=1 docker build . -f Dockerfile.ci \
+    --pull \
+    --build-arg PYTHON_BASE_IMAGE="python:3.7-slim-buster" \
     --build-arg AIRFLOW_EXTRAS=gcp --tag my-image:0.0.1
 
 
 This builds the CI image in version 3.6 with "apache-beam" extra added.
 
 .. code-block:: bash
 
-  docker build . -f Dockerfile.ci --build-arg PYTHON_BASE_IMAGE="python:3.7-slim-buster" \
+  DOCKER_BUILDKIT=1 docker build . -f Dockerfile.ci \
+    --pull \
+    --build-arg PYTHON_BASE_IMAGE="python:3.7-slim-buster" \
     --build-arg ADDITIONAL_AIRFLOW_EXTRAS="apache-beam" --tag my-image:0.0.1
 
 This builds the CI image in version 3.6 with "mssql" additional package added.
 
 .. code-block:: bash
 
-  docker build . -f Dockerfile.ci --build-arg PYTHON_BASE_IMAGE="python:3.7-slim-buster" \
+  DOCKER_BUILDKIT=1 docker build . -f Dockerfile.ci \
+    --pull \
+    --build-arg PYTHON_BASE_IMAGE="python:3.7-slim-buster" \
     --build-arg ADDITIONAL_PYTHON_DEPS="mssql" --tag my-image:0.0.1
 
 This builds the CI image in version 3.6 with "gcc" and "g++" additional apt dev dependencies added.
 
 .. code-block::
 
-  docker build . -f Dockerfile.ci --build-arg PYTHON_BASE_IMAGE="python:3.7-slim-buster" \
+  DOCKER_BUILDKIT=1 docker build . -f Dockerfile.ci \
+    --pull
+    --build-arg PYTHON_BASE_IMAGE="python:3.7-slim-buster" \
     --build-arg ADDITIONAL_DEV_APT_DEPS="gcc g++" --tag my-image:0.0.1
 
 This builds the CI image in version 3.6 with "jdbc" extra and "default-jre-headless" additional apt runtime dependencies added.
 
 .. code-block::
 
-  docker build . -f Dockerfile.ci --build-arg PYTHON_BASE_IMAGE="python:3.7-slim-buster" \
+  DOCKER_BUILDKIT=1 docker build . -f Dockerfile.ci \
+    --pull \
+    --build-arg PYTHON_BASE_IMAGE="python:3.7-slim-buster" \
     --build-arg AIRFLOW_EXTRAS=jdbc --build-arg ADDITIONAL_RUNTIME_DEPS="default-jre-headless" \
     --tag my-image:0.0.1
 
-CI Image manifests

Review comment:
       No need for image manifest either :). We needed it to be able to determine how "big" the difference is of the current Dockerfile /image we had locally vs the one available as cache. This was mainly because of catch-22 - you had to pull the image first to use it as cache, and you could not really inspect the content of the remote image before pulling it without authenticating  (no public API for that). The manifest was really implemented to overcome that limitation.
   
   With Buildkit  cache, each layer is checked separataly and independently from each other without actually pulling the image. This is nicely visualized during the build process where CACHE/REBUILD status is shown very quickly without actually pulling any image. This way we completely do not have to worry how "big" the difference is vs locally rebuilt image. If the difference is "big" the necessary layers will be pulled, if we have modified the image/copied files in the way that they are different than cache - they will be rebuilt. All super-optimised.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on a change in pull request #20664: Switch to buildx to build airflow images

Posted by GitBox <gi...@apache.org>.
potiuk commented on a change in pull request #20664:
URL: https://github.com/apache/airflow/pull/20664#discussion_r782055714



##########
File path: scripts/ci/libraries/_build_images.sh
##########
@@ -254,128 +276,6 @@ function build_images::check_for_docker_context_files() {
     fi
 }
 
-# Builds local image manifest. It contains only one random file generated during Docker.ci build

Review comment:
       All this is gone when we have no manifest image.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] github-actions[bot] commented on pull request #20664: Switch to buildx to build airflow images

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #20664:
URL: https://github.com/apache/airflow/pull/20664#issuecomment-1011290969


   The PR most likely needs to run full matrix of tests because it modifies parts of the core of Airflow. However, committers might decide to merge it quickly and take the risk. If they don't merge it quickly - please rebase it to the latest main at your convenience, or amend the last commit of the PR, and push it with --force-with-lease.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk edited a comment on pull request #20664: Switch to buildx to build airflow images

Posted by GitBox <gi...@apache.org>.
potiuk edited a comment on pull request #20664:
URL: https://github.com/apache/airflow/pull/20664#issuecomment-1005562336


   > Whats buildx vs buildkit?
   
   Essentially the same. Buildx is a plugin you can install in docker to get more capabilities of buildkit available as "docker buildx build" command (and a number of management commands). You do not need it run "buildkit" builds (`DOCKER_BUILDKIT=1` is enough in docker 1.18+) but you need it for example to build and push cache to github registry.
   
   An example of that is our prod image. It is multi-segmented image so in order to prepare a good cache for the builds I need to do it with command smilar to `docker buildx build . --cache-to=type=registry,ref=ghcr.io/,........,mode=max`
   
   This was (about a year ago when I checked last time) missing in buildkit - you had to do some strange combination of not-yet-released-then tools from "moby" to build and refresh the cache- but now with the plugin it's a "breeze" to manage and prepare the cache.
   
   In our case users will not have to install buildx plugin, but it will have to be available (eventually) on our self-hosted runners to refresh the cache on main builds (I will add it) and any time you want to manually refresh the cache with `./breeze prepare-build-cache`. 
   
   https://docs.docker.com/engine/reference/commandline/buildx_build/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk edited a comment on pull request #20664: Switch to buildx to build airflow images

Posted by GitBox <gi...@apache.org>.
potiuk edited a comment on pull request #20664:
URL: https://github.com/apache/airflow/pull/20664#issuecomment-1005562336


   > Whats buildx vs buildkit?
   
   Essentially the same. Buildx is a plugin you can install in docker to get more capabilities of buildkit available as "docker buildx build" command (and a number of management commands). You do not need it to run "buildkit-enabled" builds (`DOCKER_BUILDKIT=1` is enough in docker 1.18+) but you need it for example to build and push cache to github registry.
   
   An example of that is our prod image. It is multi-segmented image so in order to prepare a good cache for the builds I need to do it with command smilar to `docker buildx build . --cache-to=type=registry,ref=ghcr.io/,........,mode=max`
   
   This was (about a year ago when I checked last time) missing in buildkit - you had to do some strange combination of not-yet-released-then tools from "moby" to build and refresh the cache- but now with the plugin it's a "breeze" to manage and prepare the cache.
   
   In our case users will not have to install buildx plugin, but it will have to be available (eventually) on our self-hosted runners to refresh the cache on main builds (I will add it) and any time you want to manually refresh the cache with `./breeze prepare-build-cache`. 
   
   https://docs.docker.com/engine/reference/commandline/buildx_build/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] mik-laj commented on a change in pull request #20664: Switch to buildx to build airflow images

Posted by GitBox <gi...@apache.org>.
mik-laj commented on a change in pull request #20664:
URL: https://github.com/apache/airflow/pull/20664#discussion_r781741123



##########
File path: Dockerfile.ci
##########
@@ -98,8 +98,7 @@ RUN apt-get update \
 # Only copy mysql/mssql installation scripts for now - so that changing the other

Review comment:
       Should we add a frontend definition to this file also?
   ```
   # syntax=docker/dockerfile:1.3
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ashb commented on pull request #20664: Switch to buildx to build airflow images

Posted by GitBox <gi...@apache.org>.
ashb commented on pull request #20664:
URL: https://github.com/apache/airflow/pull/20664#issuecomment-1008834707


   FYI "Video can't be played because the file is corrupt"


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on a change in pull request #20664: Switch to buildx to build airflow images

Posted by GitBox <gi...@apache.org>.
potiuk commented on a change in pull request #20664:
URL: https://github.com/apache/airflow/pull/20664#discussion_r781170203



##########
File path: Dockerfile
##########
@@ -166,10 +166,18 @@ ARG INSTALL_PROVIDERS_FROM_SOURCES="false"
 # But it also can be `.` from local installation or GitHub URL pointing to specific branch or tag

Review comment:
       Added




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk closed pull request #20664: Switch to buildx to build airflow images

Posted by GitBox <gi...@apache.org>.
potiuk closed pull request #20664:
URL: https://github.com/apache/airflow/pull/20664


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on pull request #20664: Switch to buildx to build airflow images

Posted by GitBox <gi...@apache.org>.
potiuk commented on pull request #20664:
URL: https://github.com/apache/airflow/pull/20664#issuecomment-1005567519


   It also has all the nice things about building multi-platform images. The nice thing about buldx is that you can have mutliple builders - for example different builders for different platforms, or you could have " build cache server" where you builds such caches and organize it in the way that each builder is completely separated from the "docker engine" it runs on. It works in the way that each builder runs as a separate container and has private image 'storage" so when the builder builds an image, it's not visible via `docker images`. 
   
   Pretty nice solution to organize your builds when you have mutli-platform, multi-branch, multil-whatever case. Initially seems much more complex than original docker build system, but it is actually very intuitive.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on a change in pull request #20664: Switch to buildx to build airflow images

Posted by GitBox <gi...@apache.org>.
potiuk commented on a change in pull request #20664:
URL: https://github.com/apache/airflow/pull/20664#discussion_r782044205



##########
File path: Dockerfile
##########
@@ -327,34 +329,6 @@ RUN if [[ -f /docker-context-files/requirements.txt ]]; then \
         pip install --no-cache-dir --user -r /docker-context-files/requirements.txt; \
     fi
 

Review comment:
       BTW. We do not need those labels any more because we are not pushing the 'build' segment to our repo any more. This yet another great feature of buildkit cache, that you can built it with "mode=max" where the cache will contain layers for all segments - the final and intermediate ones. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on a change in pull request #20664: Switch to buildx to build airflow images

Posted by GitBox <gi...@apache.org>.
potiuk commented on a change in pull request #20664:
URL: https://github.com/apache/airflow/pull/20664#discussion_r782044205



##########
File path: Dockerfile
##########
@@ -327,34 +329,6 @@ RUN if [[ -f /docker-context-files/requirements.txt ]]; then \
         pip install --no-cache-dir --user -r /docker-context-files/requirements.txt; \
     fi
 

Review comment:
       BTW. We do not need those labels any more because we are not pusing the 'build' segment to our repo any more. This yet another great feature of buildkit cache, that you can built it with "mode=max" where the cache will contain layers for all segments - the final and intermediate ones. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on a change in pull request #20664: Switch to buildx to build airflow images

Posted by GitBox <gi...@apache.org>.
potiuk commented on a change in pull request #20664:
URL: https://github.com/apache/airflow/pull/20664#discussion_r781168020



##########
File path: Dockerfile
##########
@@ -166,10 +166,18 @@ ARG INSTALL_PROVIDERS_FROM_SOURCES="false"
 # But it also can be `.` from local installation or GitHub URL pointing to specific branch or tag

Review comment:
       > Should we add information about frontend?
   
   Good idea: 
   
   > To use the external frontend, the first line of your Dockerfile needs to be # syntax=docker/dockerfile:1.3 pointing to the specific image you want to use.
   
   > BuildKit also ships with Dockerfile frontend builtin but it is recommended to use an external image to make sure that all users use the same version on the builder and to pick up bugfixes automatically without waiting for a new version of BuildKit or Docker engine.
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk merged pull request #20664: Switch to buildx to build airflow images

Posted by GitBox <gi...@apache.org>.
potiuk merged pull request #20664:
URL: https://github.com/apache/airflow/pull/20664


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on a change in pull request #20664: Switch to buildx to build airflow images

Posted by GitBox <gi...@apache.org>.
potiuk commented on a change in pull request #20664:
URL: https://github.com/apache/airflow/pull/20664#discussion_r786712445



##########
File path: scripts/ci/libraries/_build_images.sh
##########
@@ -800,6 +624,12 @@ function build_images::build_prod_images() {
         echo
         exit 1
     fi
+    if [[ ${PREPARE_BUILDX_CACHE} == "true" ]]; then
+        # Cache for prod image contains also build stage for buildx when mode=max specified!
+        docker_cache_prod_directive+=(
+            "--cache-to=type=registry,ref=${AIRFLOW_PROD_IMAGE}:cache,mode=max"

Review comment:
       Yep. I did consider it, but the size of GHA cache is FAR too small for us (10GB will be esily 3x exceeded by the latest CI + prod images, v2-2, v2-3 branch and we have a lot more caches that are directly used by the GHA (for various virtualenvs  and the like). Our images are huge (for a good reason though). Also PROD image cache is far bigger than that prod image itself because it contains the "build" stage cache which is many times bigger than the PROD image.
   
   > Github Actions cache saves both cache metadata and layers to GitHub's Cache service. This cache currently has a size limit of 10GB that is shared accross different caches in the repo. If you exceed this limit, GitHub will save your cache but will begin evicting caches until the total size is less than 10 GB. Recycling caches too often can result in slower runtimes overall.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on pull request #20664: Switch to buildx to build airflow images

Posted by GitBox <gi...@apache.org>.
potiuk commented on pull request #20664:
URL: https://github.com/apache/airflow/pull/20664#issuecomment-1008341336


   BTW. This is how image building will look like when this one is merged:
   
   https://user-images.githubusercontent.com/595491/148693440-2af4dc38-06e2-4fd5-8bab-4d21170ddea2.mp4
   
   It's a great improvement - both UI and speed/performance wise. No need to pull the whole image before using it as a cache, it will automatically determine which layers can be pulled an which needs to be rebuilt, it will use multiple processors for rebuilding when more layers are rebuilt, it will do everything in one step ! And it is supported by codespaces natively too, so we will be able to make the image works for codespaces (PR on that is coming too). 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on pull request #20664: Switch to buildx to build airflow images

Posted by GitBox <gi...@apache.org>.
potiuk commented on pull request #20664:
URL: https://github.com/apache/airflow/pull/20664#issuecomment-1005562336


   > Whats buildx vs buildkit?
   
   Essentially the same. Buildx is a plugin you can install in docker to get more capabilities of buildkit available as "docker buildx build" command. You do not need it run "buildkit" builds (`DOCKER_BUILDKIT=1` is enough in docker 1.18+) but you need it for example to build and push cache to github registry.
   
   An example of that is our prod image. It is multi-segmented image so in order to prepare a good cache for the builds I need to do it with command smilar to `docker buildx build . --cache-to=type=registry,ref=ghcr.io/,........,mode=max`
   
   This was (about a year ago when I checked last time) missing in buildkit - you had to do some strange combination of not-yet-released-then tools from "moby" to build and refresh the cache- but now with the plugin it's a "breeze" to manage and prepare the cache.
   
   In our case users will not have to install buildx plugin, but it will have to be available (eventually) on our self-hosted runners to refresh the cache on main builds (I will add it) and any time you want to manually refresh the cache with `./breeze prepare-build-cache`. 
   
   https://docs.docker.com/engine/reference/commandline/buildx_build/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on pull request #20664: Switch to buildx to build airflow images

Posted by GitBox <gi...@apache.org>.
potiuk commented on pull request #20664:
URL: https://github.com/apache/airflow/pull/20664#issuecomment-1005437311


   This is a VAST simplification of our docker build caching (using modern BuildKit plugin). I have to do some more testing with it, but on top of removing several hundreds of lines of Bash that I implemented when BuildKit did not have all the capabilities needed, it's also very stable and robust and opens the path to multi-architecture images (for ARM). 
   
   I would do some more testing - but I would also love to merge #20238 (optimization of Dockerfiles) that this build is based on.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on a change in pull request #20664: Switch to buildx to build airflow images

Posted by GitBox <gi...@apache.org>.
potiuk commented on a change in pull request #20664:
URL: https://github.com/apache/airflow/pull/20664#discussion_r782060062



##########
File path: scripts/ci/libraries/_spinner.sh
##########
@@ -1,55 +0,0 @@
-#!/usr/bin/env bash

Review comment:
       Rather than showing our animated spinner during the build I chose the option to simply show the buildkit output directly in the terminal. It's nice, colorful, animated and optimized to show only relevant information and take as little space as possible during the build. So we can remove the animated spinner now. 
   
   Also during the pre-commit  I opted to just show the warning that image is not updated rather that try to build it. User can build it when entering breeze or by (suggested) manual action. This simplifies interaction in pre-commit (there the difficulty is that the terminal there is difficult to get hold of).




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on pull request #20664: Switch to buildx to build airflow images

Posted by GitBox <gi...@apache.org>.
potiuk commented on pull request #20664:
URL: https://github.com/apache/airflow/pull/20664#issuecomment-1009876584


   I also added some comments explaining why certain things were there and why we can remove them now :). I hope it will make it easier to review it as some of the decisions/reasoning was only in my head (luckily a lot of that can go to the trash bin now :D) 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on pull request #20664: Switch to buildx to build airflow images

Posted by GitBox <gi...@apache.org>.
potiuk commented on pull request #20664:
URL: https://github.com/apache/airflow/pull/20664#issuecomment-1009786917


   Actually, we will be able to remove 3/5 of our airflow images from ghcr.io:
   
   * Python base images
   * Build images
   * Manifest images 
   
   The only remaining ones will be plain PROD and CI images.
   
   All those we will be able to remove (after I cherry-pick those changes to 2.2 line)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on pull request #20664: Switch to buildx to build airflow images

Posted by GitBox <gi...@apache.org>.
potiuk commented on pull request #20664:
URL: https://github.com/apache/airflow/pull/20664#issuecomment-1011391486


   FYI. I will run a few more tests in my fork - just to make sure everything is fine with caching of build-image workflow results and will merge it when I come back from Slovakia (going for a few days) just to make sure to support any problems with Breeze. So still some time for reviews :)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org