You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2022/01/05 07:15:08 UTC

[GitHub] [airflow] potiuk opened a new pull request #20238: Optimize dockerfiles for local rebuilds

potiuk opened a new pull request #20238:
URL: https://github.com/apache/airflow/pull/20238


   Optimize dockerfiles for local rebuilds
       
   This is one of the last final refactorings of the image before
   it is eligible to become an "official image".
   
   The project for that is here with few remaining issues:
   
   https://github.com/apache/airflow/projects/3
   
   When you build dockerfiles locally for development the layer
   invalidation could happen earlier than you wanted - some of the
   variables (like COMMIT_SHA) were affecting the cache of Docker
   in the way that they forced either invalidation of the pre-cached
   packages installed or forced to recreate assets when they were
   not touched.
   
   Some of the variables there were not needed (for
   example PIP_INSTALL_USER flag as well.
   
   PIP was upgraded to version 21.3.1 (latest released)
   
   AIRLFOW_INSTALL_USER_FLAG is not needed - we always install
   Airflow with --user flag in Production Docker file and this
   is controlled by PIP_USER variable.
   
   In some places `--upgrade` flag for pip were used where it had
   no real usage (when airflow was installed in specific version)
   this has been also cleaned up as part of the PR.
   
   Additional PIP version diagnostics is printed to make sure
   that the right version of PIP from the right PATH is used.
   
   All `stderr` output has been carefully removed (except the
   warning of PIP which is accompanied by a reassuring message
   that we know what we are doing.
   
   Fixes: #20259
   
   This PR improves the experience of iterating over docker image
   building by decreasing unnecesary layer invalidations.
   
   <!--
   Thank you for contributing! Please make sure that your code changes
   are covered with tests. And in case of new features or big changes
   remember to adjust the documentation.
   
   Feel free to ping committers for the review!
   
   In case of existing issue, reference it using one of the following:
   
   closes: #ISSUE
   related: #ISSUE
   
   How to write a good git commit message:
   http://chris.beams.io/posts/git-commit/
   -->
   
   ---
   **^ Add meaningful description above**
   
   Read the **[Pull Request Guidelines](https://github.com/apache/airflow/blob/main/CONTRIBUTING.rst#pull-request-guidelines)** for more information.
   In case of fundamental code change, Airflow Improvement Proposal ([AIP](https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+Improvements+Proposals)) is needed.
   In case of a new dependency, check compliance with the [ASF 3rd Party License Policy](https://www.apache.org/legal/resolved.html#category-x).
   In case of backwards incompatible changes please leave a note in [UPDATING.md](https://github.com/apache/airflow/blob/main/UPDATING.md).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on pull request #20238: Optimize dockerfiles for local rebuilds

Posted by GitBox <gi...@apache.org>.
potiuk commented on pull request #20238:
URL: https://github.com/apache/airflow/pull/20238#issuecomment-998780796


   > python2 is still in the DockerFile , maybe the opportunity to remove it ?
   
   It's there for a reason. Python2 + Python Virtualenv is our offer to the users who still (?) have a need to run Python2 code. 
   
   It's part of the migration process we have:
   * https://airflow.apache.org/docs/apache-airflow/stable/upgrading-from-1-10/index.html#step-1-switch-to-python-3
   
   We have a lot of enterprise users who are (hopefully) going to move away soon from it. And since it does not cost us much, we have Python 2 added to our image and even tests in our CI that test if PythonVirtualenv work for Python2.
   
   I think removal of Python 2 from our tests/image shoul be done after voting that we want to drop it. 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] mik-laj commented on a change in pull request #20238: Optimize dockerfiles for local rebuilds

Posted by GitBox <gi...@apache.org>.
mik-laj commented on a change in pull request #20238:
URL: https://github.com/apache/airflow/pull/20238#discussion_r775205798



##########
File path: Dockerfile
##########
@@ -62,6 +65,7 @@ ENV PYTHON_BASE_IMAGE=${PYTHON_BASE_IMAGE} \
 
 # Install curl and gnupg2 - needed for many other installation steps
 RUN apt-get update \
+    && apt-get install --no-install-recommends -yqq apt-utils >/dev/null 2>&1 \

Review comment:
       Why are we installing this as a separate command and not together with the next step?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on pull request #20238: Optimize dockerfiles for local rebuilds

Posted by GitBox <gi...@apache.org>.
potiuk commented on pull request #20238:
URL: https://github.com/apache/airflow/pull/20238#issuecomment-999186665


   I would love some reviews - happy to split off some things (though it would be rather difficult to split).
   
   But it would be great to get this one in to (almost) end the prod image story. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on pull request #20238: Optimize dockerfiles for local rebuilds

Posted by GitBox <gi...@apache.org>.
potiuk commented on pull request #20238:
URL: https://github.com/apache/airflow/pull/20238#issuecomment-1005581275


   Question: Should I try to split this one into smaller ones? Seems that it did not get too much of interest so far. Maybe trying to split will help :D ?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on pull request #20238: Optimize dockerfiles for local rebuilds

Posted by GitBox <gi...@apache.org>.
potiuk commented on pull request #20238:
URL: https://github.com/apache/airflow/pull/20238#issuecomment-1004146533


   Hey @uranusjr  @mik-laj - any more comments ? I want to merge that one as I am also working on "codespaces" support and need to convert build for the image to use buildx, so I would love to base it on this change.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk closed pull request #20238: Optimize dockerfiles for local rebuilds

Posted by GitBox <gi...@apache.org>.
potiuk closed pull request #20238:
URL: https://github.com/apache/airflow/pull/20238


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] malthe commented on pull request #20238: Optimize dockerfiles for local rebuilds

Posted by GitBox <gi...@apache.org>.
malthe commented on pull request #20238:
URL: https://github.com/apache/airflow/pull/20238#issuecomment-1009747067


   FWIW, the gzipped size comes to about 300 megs. I suppose when the image is mapped to a filesystem then it has to uncompress each layer. There's some discussion about that here: https://github.com/moby/moby/issues/24515.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on a change in pull request #20238: Optimize dockerfiles for local rebuilds

Posted by GitBox <gi...@apache.org>.
potiuk commented on a change in pull request #20238:
URL: https://github.com/apache/airflow/pull/20238#discussion_r776088296



##########
File path: scripts/docker/compile_www_assets.sh
##########
@@ -35,8 +34,26 @@ function compile_www_assets() {
         www_dir="$(python -m site --user-site)/airflow/www"
     fi
     pushd ${www_dir} || exit 1
-    yarn install --frozen-lockfile --no-cache
-    yarn run prod
+    set +e
+    yarn install --frozen-lockfile --no-cache 2>/tmp/out-yarn-install.txt
+    local res=$?
+    if [[ ${res} != 0 ]]; then
+        >&2 echo
+        >&2 echo "Error when running yarn install:"
+        >&2 echo
+        >&2 cat /tmp/out-yarn-install.txt
+        exit 1
+    fi
+    yarn run prod 2>/tmp/out-yarn-run.txt
+    res=$?
+    if [[ ${res} != 0 ]]; then
+        >&2 echo
+        >&2 echo "Error when running yarn install:"
+        >&2 echo
+        >&2 cat /tmp/out-yarn-run.txt
+        exit 1
+    fi
+    set -e
     find package.json yarn.lock static/css static/js -type f | sort | xargs md5sum > "${md5sum_file}"

Review comment:
       Yep.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk closed pull request #20238: Optimize dockerfiles for local rebuilds

Posted by GitBox <gi...@apache.org>.
potiuk closed pull request #20238:
URL: https://github.com/apache/airflow/pull/20238


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on a change in pull request #20238: Optimize dockerfiles for local rebuilds

Posted by GitBox <gi...@apache.org>.
potiuk commented on a change in pull request #20238:
URL: https://github.com/apache/airflow/pull/20238#discussion_r776085431



##########
File path: Dockerfile
##########
@@ -62,6 +65,7 @@ ENV PYTHON_BASE_IMAGE=${PYTHON_BASE_IMAGE} \
 
 # Install curl and gnupg2 - needed for many other installation steps

Review comment:
       Removed it. It's not needed.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on pull request #20238: Optimize dockerfiles for local rebuilds

Posted by GitBox <gi...@apache.org>.
potiuk commented on pull request #20238:
URL: https://github.com/apache/airflow/pull/20238#issuecomment-1008114804


   This one is also updated - based on improved #20744 and #20747


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on pull request #20238: Optimize dockerfiles for local rebuilds

Posted by GitBox <gi...@apache.org>.
potiuk commented on pull request #20238:
URL: https://github.com/apache/airflow/pull/20238#issuecomment-993854408


   Looking forward to reviews on that one- it should be helpful in finalizing `AIP-26` (finally)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on pull request #20238: Optimize dockerfiles for local rebuilds

Posted by GitBox <gi...@apache.org>.
potiuk commented on pull request #20238:
URL: https://github.com/apache/airflow/pull/20238#issuecomment-1002313050


   All should be fixed.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] mik-laj commented on a change in pull request #20238: Optimize dockerfiles for local rebuilds

Posted by GitBox <gi...@apache.org>.
mik-laj commented on a change in pull request #20238:
URL: https://github.com/apache/airflow/pull/20238#discussion_r775205757



##########
File path: Dockerfile
##########
@@ -62,6 +65,7 @@ ENV PYTHON_BASE_IMAGE=${PYTHON_BASE_IMAGE} \
 
 # Install curl and gnupg2 - needed for many other installation steps

Review comment:
       Comment needs updating.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on pull request #20238: Optimize dockerfiles for local rebuilds

Posted by GitBox <gi...@apache.org>.
potiuk commented on pull request #20238:
URL: https://github.com/apache/airflow/pull/20238#issuecomment-996272750


   Anyone :) ? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] mik-laj commented on a change in pull request #20238: Optimize dockerfiles for local rebuilds

Posted by GitBox <gi...@apache.org>.
mik-laj commented on a change in pull request #20238:
URL: https://github.com/apache/airflow/pull/20238#discussion_r775207313



##########
File path: scripts/docker/compile_www_assets.sh
##########
@@ -35,8 +34,26 @@ function compile_www_assets() {
         www_dir="$(python -m site --user-site)/airflow/www"
     fi
     pushd ${www_dir} || exit 1
-    yarn install --frozen-lockfile --no-cache
-    yarn run prod
+    set +e
+    yarn install --frozen-lockfile --no-cache 2>/tmp/out-yarn-install.txt
+    local res=$?
+    if [[ ${res} != 0 ]]; then
+        >&2 echo
+        >&2 echo "Error when running yarn install:"
+        >&2 echo
+        >&2 cat /tmp/out-yarn-install.txt
+        exit 1
+    fi
+    yarn run prod 2>/tmp/out-yarn-run.txt
+    res=$?
+    if [[ ${res} != 0 ]]; then
+        >&2 echo
+        >&2 echo "Error when running yarn install:"
+        >&2 echo
+        >&2 cat /tmp/out-yarn-run.txt
+        exit 1
+    fi
+    set -e
     find package.json yarn.lock static/css static/js -type f | sort | xargs md5sum > "${md5sum_file}"

Review comment:
       Should we remove `/tmp/out-yarn-run.txt` also?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on pull request #20238: Optimize dockerfiles for local rebuilds

Posted by GitBox <gi...@apache.org>.
potiuk commented on pull request #20238:
URL: https://github.com/apache/airflow/pull/20238#issuecomment-1005876288


   > Easier to review, and plus you get more commits that way (it's a race right?)
   
   Already split :). Crossed my mind ideed. Though starting with #20664  and some of the upcoming changes from Breeze rewrite to Python I might actually start racing you with removed code.  About a time to get rid some of the accumulated stuff be employing new solutions that are better than when it started ;).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ashb commented on pull request #20238: Optimize dockerfiles for local rebuilds

Posted by GitBox <gi...@apache.org>.
ashb commented on pull request #20238:
URL: https://github.com/apache/airflow/pull/20238#issuecomment-1005865526


   > Question: Should I try to split this one into smaller ones? Seems that it did not get too much of interest so far. Maybe trying to split will help :D ?
   
   Easier to review, and plus you get more commits that way (it's a race right?)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] mik-laj commented on a change in pull request #20238: Optimize dockerfiles for local rebuilds

Posted by GitBox <gi...@apache.org>.
mik-laj commented on a change in pull request #20238:
URL: https://github.com/apache/airflow/pull/20238#discussion_r775206599



##########
File path: Dockerfile.ci
##########
@@ -170,16 +165,16 @@ RUN mkdir -pv /usr/share/man/man1 \
     && mkdir -pv /usr/share/man/man7 \
     && export ${ADDITIONAL_DEV_APT_ENV?} \
     && export ${ADDITIONAL_RUNTIME_APT_ENV?} \
-    && bash -o pipefail -e -u -x -c "${RUNTIME_APT_COMMAND}" \
-    && bash -o pipefail -e -u -x -c "${ADDITIONAL_RUNTIME_APT_COMMAND}" \
+    && bash -o pipefail -e -u -c "${RUNTIME_APT_COMMAND}" \
+    && bash -o pipefail -e -u -c "${ADDITIONAL_RUNTIME_APT_COMMAND}" \
     && apt-get update \
     && apt-get install --no-install-recommends -y \
       ${RUNTIME_APT_DEPS} \
       ${ADDITIONAL_RUNTIME_APT_DEPS} \
     && apt-get autoremove -yqq --purge \
     && apt-get clean \
     && rm -rf /var/lib/apt/lists/* \
-    && curl https://download.docker.com/linux/static/stable/x86_64/docker-${DOCKER_CLI_VERSION}.tgz \
+    && curl --silent https://download.docker.com/linux/static/stable/x86_64/docker-${DOCKER_CLI_VERSION}.tgz \

Review comment:
       ```suggestion
       && curl --silent "https://download.docker.com/linux/static/stable/x86_64/docker-${DOCKER_CLI_VERSION}.tgz" \
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk edited a comment on pull request #20238: Optimize dockerfiles for local rebuilds

Posted by GitBox <gi...@apache.org>.
potiuk edited a comment on pull request #20238:
URL: https://github.com/apache/airflow/pull/20238#issuecomment-1009762408


   > FWIW, the gzipped size comes to about 300 megs. I suppose when the image is mapped to a filesystem then it has to uncompress each layer. There's some discussion about that here: [moby/moby#24515](https://github.com/moby/moby/issues/24515).
   
   Yeah - compressed size is much smaller - and that's the size that "matters" when the file gets pulled really. The moby discussion is about users who are already dynamically decompressing the data they store. This I think is kind of antipattern when you use images - the image layers are compressed by default - maybe not most efficient compression on the planet because it is a "generic" compression - but it does the job. in vast majority of cases I saw it is 1:3 compression rate for most binary data and 1:10 at least for text data (roughly - I never made a detailed calculation), But if someone attempts to store compressed data that reaches similar levels of compression in an already compressed image layer, it's a pretty much loss (unless you care about the final space used when image is decompressed and you decompress on-the-flight and never store the decompressed data). 
   
   So in essence I just take the compression done by container layers as "granted" and don't try to tweak around it.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk merged pull request #20238: Optimize dockerfiles for local rebuilds

Posted by GitBox <gi...@apache.org>.
potiuk merged pull request #20238:
URL: https://github.com/apache/airflow/pull/20238


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on pull request #20238: Optimize dockerfiles for local rebuilds

Posted by GitBox <gi...@apache.org>.
potiuk commented on pull request #20238:
URL: https://github.com/apache/airflow/pull/20238#issuecomment-1002587313


   I think all solved @mik-laj !


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk edited a comment on pull request #20238: Optimize dockerfiles for local rebuilds

Posted by GitBox <gi...@apache.org>.
potiuk edited a comment on pull request #20238:
URL: https://github.com/apache/airflow/pull/20238#issuecomment-992619940


   This is one of the final optimizations/refactorings and cleanups of the image before I will submit it to become an official image. Some last optimizations/clarifications (cleaning up the stderr output etc. )
   
   If you think it is to big I can attempt to split it into smaller pieces, but there are a number of changes in there that invalidate current layers of Airflow images, making the cache less effective and merging it in one go would only require image refresh once. 
   
   Since the image is going to be "official" I had to make sure that all the potential "errors" are either gone or are explained. There re many requirements to fulfill to become an "official image" - and I think with that one, we are getting closer to having them all fulfilled (https://github.com/docker-library/official-images). 
   
   One thing that I had to do however is to add this reassuring message that we know what we are doing by using root and not using virtualenv. I tried to solve the problem in PIP or even PEP 668 level but I failed, so I had to revert to this reassuring message:
   
   ![Screenshot from 2021-12-13 16-55-29](https://user-images.githubusercontent.com/595491/145844895-96088d89-9778-4158-9c16-416b3ebaec06.png)
   
   Looking forward to reviews!
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk edited a comment on pull request #20238: Optimize dockerfiles for local rebuilds

Posted by GitBox <gi...@apache.org>.
potiuk edited a comment on pull request #20238:
URL: https://github.com/apache/airflow/pull/20238#issuecomment-998780796


   > python2 is still in the DockerFile , maybe the opportunity to remove it ?
   
   It's there for a reason. Python2 + Python Virtualenv is our offer to the users who still (?) have a need to run Python2 code. 
   
   It's part of the migration process we have:
   * https://airflow.apache.org/docs/apache-airflow/stable/upgrading-from-1-10/index.html#step-1-switch-to-python-3
   
   We have a lot of enterprise users who are (hopefully) going to move away soon from it. And since it does not cost us much, we have Python 2 added to our image and even tests in our CI that test if PythonVirtualenv work for Python2.
   
   I think removal of Python 2 from our tests/image should be done after voting that we want to drop it. 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] raphaelauv commented on pull request #20238: Optimize dockerfiles for local rebuilds

Posted by GitBox <gi...@apache.org>.
raphaelauv commented on pull request #20238:
URL: https://github.com/apache/airflow/pull/20238#issuecomment-998767490


   python2 is still in the DockerFile , maybe the opportunity to remove it ?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on pull request #20238: Optimize dockerfiles for local rebuilds

Posted by GitBox <gi...@apache.org>.
potiuk commented on pull request #20238:
URL: https://github.com/apache/airflow/pull/20238#issuecomment-998791008


   I started PROPOSAL thread on the devlist: https://lists.apache.org/thread/rjyqw3cwsh4vgg6jycsbr1jr0slnych3 @raphaelauv . I think 2 years of EOL anniversary is a good time to put the final nail in the coffin for it. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on pull request #20238: Optimize dockerfiles for local rebuilds

Posted by GitBox <gi...@apache.org>.
potiuk commented on pull request #20238:
URL: https://github.com/apache/airflow/pull/20238#issuecomment-1009762408


   > FWIW, the gzipped size comes to about 300 megs. I suppose when the image is mapped to a filesystem then it has to uncompress each layer. There's some discussion about that here: [moby/moby#24515](https://github.com/moby/moby/issues/24515).
   
   Yeah - compressed size is much smaller - and that's the size that "matters" when the file gets pulled really. The moby discussion is about users who are already dynamically decompressing the data they store ( which I think is kind of antipattern when you use images - the image layers are compressed by default - maybe not most efficient compression on the planet because it is a "generic" compression - but it does the job. in vast majority of cases I saw it is 1:3 compression rate for most binary data and 1:10 at least for text data (roughly - I never made a detailed calculation), But if someone attempts to store compressed data that reaches similar levels of compression in an already compressed image layer, it's a pretty much loss (unless you care about the final space used when image is decompressed and you decompress on-the-flight and never store the decompressed data). 
   
   So in essence I just take the compression done by container layers as "granted" and don't try to tweak around it.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk closed pull request #20238: Optimize dockerfiles for local rebuilds

Posted by GitBox <gi...@apache.org>.
potiuk closed pull request #20238:
URL: https://github.com/apache/airflow/pull/20238


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] github-actions[bot] commented on pull request #20238: Optimize dockerfiles for local rebuilds

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #20238:
URL: https://github.com/apache/airflow/pull/20238#issuecomment-1009755883


   The PR most likely needs to run full matrix of tests because it modifies parts of the core of Airflow. However, committers might decide to merge it quickly and take the risk. If they don't merge it quickly - please rebase it to the latest main at your convenience, or amend the last commit of the PR, and push it with --force-with-lease.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk edited a comment on pull request #20238: Optimize dockerfiles for local rebuilds

Posted by GitBox <gi...@apache.org>.
potiuk edited a comment on pull request #20238:
URL: https://github.com/apache/airflow/pull/20238#issuecomment-1008806195


   With this change I think I finally implemented an optimised sequence of layers for the CI image:
   
   1) Pre-install PYPI deps from main
   2) Install node deps if changed
   3) Install "current" PYPI deps if changed
   4) Compile www assets if asset files changed
   5) Copy sources from airflow
   
   This brings really optimized image building and witht the follow-up buildx #20258 change and build-kit --cache-from, it will be always minimum time needd to rebuilld exactly what needs to be rebuilt.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on pull request #20238: Optimize dockerfiles for local rebuilds

Posted by GitBox <gi...@apache.org>.
potiuk commented on pull request #20238:
URL: https://github.com/apache/airflow/pull/20238#issuecomment-1008806195


   With this change I think I finally implemented an optimised sequence of layers for the CI image:
   
   1) Pre-install PYPI deps from main
   2) Install node deps if changed
   3) Install "current" PYPI deps if changed
   4) Compile www assets
   5) Copy sources from airflow
   
   This brings really optimized image building and witht the follow-up buildx #20258 change and build-kit --cache-from, it will be always minimum time needd to rebuilld exactly what needs to be rebuilt.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on pull request #20238: Optimize dockerfiles for local rebuilds

Posted by GitBox <gi...@apache.org>.
potiuk commented on pull request #20238:
URL: https://github.com/apache/airflow/pull/20238#issuecomment-1009731442


   I Look for some reviews. As part of the optimization I also reviewed the image with Dive (cc: @malthe @mik-laj ) and made sure that some of the remainig remnants that were "bloating" the image were removed
   
   * we had (unnecesary) PIP install in the final image - this caused (a small) number of .pyc files to be embedded in the image
   * we also had a lastlog produced during apt installl which had 15MB - I made sure it is removed as the last step of the RUN instruction that created it (thanks @malthe for pointing that out!).
   * I also reviewed and improved the instructions which copied the .local folder and performed permission  - one of the problems noted in #20776 that  there was no "group write" permission for the home directory of Airflow (which could be problematic in some open-shift cases). It had to be done carefully - changing of the permissions has to be done in the right place bacause changing the permission after the files are stored as layer effectively duplicates the layer (the new layer with pemissions creates effectively a copy o all the files) 😱 
   
   As result the efficiency score of our image jumped from 97% to 99%:
   
   ![Screenshot 2022-01-11 02 08 24](https://user-images.githubusercontent.com/595491/148911786-e520beb2-2dec-44d0-91b1-89f2acd377a2.png)
   
   I am thinking about adding some more automated tests for the presence of unwanted files and automating the tests for the image "efficiency" in our CI, but I would like to do it after this one and #20258  as switching to buildx significantly improves the experience of iterating over the images and building them in small increments. 
   
   Looking forward to reviews!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk closed pull request #20238: Optimize dockerfiles for local rebuilds

Posted by GitBox <gi...@apache.org>.
potiuk closed pull request #20238:
URL: https://github.com/apache/airflow/pull/20238


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on a change in pull request #20238: Optimize dockerfiles for local rebuilds

Posted by GitBox <gi...@apache.org>.
potiuk commented on a change in pull request #20238:
URL: https://github.com/apache/airflow/pull/20238#discussion_r776087280



##########
File path: Dockerfile
##########
@@ -62,6 +65,7 @@ ENV PYTHON_BASE_IMAGE=${PYTHON_BASE_IMAGE} \
 
 # Install curl and gnupg2 - needed for many other installation steps
 RUN apt-get update \
+    && apt-get install --no-install-recommends -yqq apt-utils >/dev/null 2>&1 \

Review comment:
       Because we need curl/gpg2 to run DEV_APT_COMMAND  - and we can potentially override this commmand (and we need curl/gpg2 to be able to install "extra resources" if needed. I prefer to install curl/gpg2 using "official sources" first - to be able to add "new sources" later. Otherwise it would require anyhow to run the "apt update" two times - once with the oriignal sources to update curl/gpg2, downloading keys for the "secondary" sources, and running apt-update again. The optimisations we can get this way are minimal, and it's much better to separate the concerns - the ommand to run all of that in one RUN would be pretty long.. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on a change in pull request #20238: Optimize dockerfiles for local rebuilds

Posted by GitBox <gi...@apache.org>.
potiuk commented on a change in pull request #20238:
URL: https://github.com/apache/airflow/pull/20238#discussion_r776097411



##########
File path: Dockerfile
##########
@@ -62,6 +65,7 @@ ENV PYTHON_BASE_IMAGE=${PYTHON_BASE_IMAGE} \
 
 # Install curl and gnupg2 - needed for many other installation steps
 RUN apt-get update \
+    && apt-get install --no-install-recommends -yqq apt-utils >/dev/null 2>&1 \

Review comment:
       However - I think that is a good idea eventually and I merged this RUN with the next one.
   
   Also. I realized that you asked for the
   
   ```
       && apt-get install --no-install-recommends -yqq apt-utils >/dev/null 2>&1 \
   ```
   
   The main reson is that you need to install apt-utils to get rid of some warnings printed by apt-get. But I do not want to redirect stderr to /dev/null for all installation steps (just in case of some errors) - so i only redirect stderr in this one step (otherwise installing apt-utils would produce an stderr warning too).
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on pull request #20238: Optimize dockerfiles for local rebuilds

Posted by GitBox <gi...@apache.org>.
potiuk commented on pull request #20238:
URL: https://github.com/apache/airflow/pull/20238#issuecomment-992619940


   This is one of the final optimizations/refactorings and cleanups of the image before I will submit it to become an official image. Some last optimizations/clarifications (cleaning up the stderr output etc. )
   
   If you think it is to big I can attempt to split it into smaller pieces, but there are a number of changes in there that invalidate current layers of Airflow images, making the cache less effective and merging it in one go would only require image refresh once. 
   
   Since the image is going to be "official" I had to make sure that all the potential "errors" are either gone or are explained. There re many requirements to fulfill to become an "official image" - and I think with that one, we are getting closer to having them all fulfilled (https://github.com/docker-library/official-images). 
   
   One thing that I had to do however is to add this reassuring message that we know what we are doing by using root and not using virtualenv. I tried to solve the problem in PIP or even PEP 668 level but I failed, so I had t revert to this reassuring message:
   
   ![Screenshot from 2021-12-13 16-55-29](https://user-images.githubusercontent.com/595491/145844895-96088d89-9778-4158-9c16-416b3ebaec06.png)
   
   Looking forward to reviews!
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on pull request #20238: Optimize dockerfiles for local rebuilds

Posted by GitBox <gi...@apache.org>.
potiuk commented on pull request #20238:
URL: https://github.com/apache/airflow/pull/20238#issuecomment-999185866


   I updated the images to remove Python 2. 
   
   I also got rid of the warning from PIP in a different way. I complicated the Base image to add airflow user there (even if it is not needed), Just to make sure there are no warnings whe I pass it to the official image verification.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org