You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2020/06/06 18:44:25 UTC

[GitHub] [airflow] potiuk opened a new pull request #9162: Improve production image iteration speed

potiuk opened a new pull request #9162:
URL: https://github.com/apache/airflow/pull/9162


   For a long time the way how entrypoint worked in ci scripts
   was wrong. The way it worked was convoluted and short of black
   magic. This did not allow to pass multiple test targets and
   required separate execute command scripts in Breeze.
   
   This is all now straightened out and both production and
   CI image are always using the right entrypoint by default
   and we can simply pass parameters to the image as usual without
   escaping strings.
   
   This also allowed to remove some breeze commands and
   change names of several flags in Breeze to make them more
   meaningful.
   
   Both CI and PROD image have now embedded scripts for log
   cleaning.
   
   History of image releases is added for 1.10.10-*
   alpha quality images.
   
   ---
   Make sure to mark the boxes below before creating PR: [x]
   
   - [x] Description above provides context of the change
   - [x] Unit tests coverage for changes (not needed for documentation changes)
   - [x] Target Github ISSUE in description if exists
   - [x] Commits follow "[How to write a good git commit message](http://chris.beams.io/posts/git-commit/)"
   - [x] Relevant documentation is updated including usage instructions.
   - [x] I will engage committers as explained in [Contribution Workflow Example](https://github.com/apache/airflow/blob/master/CONTRIBUTING.rst#contribution-workflow-example).
   
   ---
   In case of fundamental code change, Airflow Improvement Proposal ([AIP](https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+Improvements+Proposals)) is needed.
   In case of a new dependency, check compliance with the [ASF 3rd Party License Policy](https://www.apache.org/legal/resolved.html#category-x).
   In case of backwards incompatible changes please leave a note in [UPDATING.md](https://github.com/apache/airflow/blob/master/UPDATING.md).
   Read the [Pull Request Guidelines](https://github.com/apache/airflow/blob/master/CONTRIBUTING.rst#pull-request-guidelines) for more information.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] mik-laj commented on a change in pull request #9162: Improve production image iteration speed

Posted by GitBox <gi...@apache.org>.
mik-laj commented on a change in pull request #9162:
URL: https://github.com/apache/airflow/pull/9162#discussion_r436514184



##########
File path: scripts/ci/in_container/entrypoint_ci.sh
##########
@@ -163,22 +148,19 @@ ssh-keyscan -H localhost >> ~/.ssh/known_hosts 2>/dev/null
 if [[ ${CI:=} == "true" && ${RUN_TESTS} == "true" ]] ; then
     echo
     echo " !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!"
-    echo "  Setting default parallellism to 2 because we can run out of memory during tests on CI"
+    echo "  Setting default parallelism to 2 because we can run out of memory during tests on CI"
     echo " !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!"
     echo
-    export AIRFLOW__CORE__PARALELLISM=2
+    export AIRFLOW__CORE__PARALLELISM=2
 fi
 
+cd "${AIRFLOW_SOURCES}"
+
 set +u
 # If we do not want to run tests, we simply drop into bash
-if [[ "${RUN_TESTS}" == "false" ]]; then
-    if [[ ${#ARGS} == 0 ]]; then
-        exec /bin/bash
-    else
-        exec /bin/bash -c "$(printf "%q " "${ARGS[@]}")"
-    fi
+if [[ "${RUN_TESTS:=false}" != "true" ]]; then
+    exec /bin/bash "${@}"

Review comment:
       Are you sure of it? Should not be added --c here?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on pull request #9162: Improve production image iteration speed

Posted by GitBox <gi...@apache.org>.
potiuk commented on pull request #9162:
URL: https://github.com/apache/airflow/pull/9162#issuecomment-640104246


   Hey @ashb @dimberman @schnie @jhtimmins I think you  might want to merge that one :).
   
   It is on the biggish side and I can split it into few independent PRs if you think that it makes sense, but I think this one is crucial to get fast iteration on the production image while we are integrating the helm chart with it. Yesterday when @ashb worked on the chart we had slack conversation and I could help fairly quickly in adding some changes to the entrypoint, i saw later @ashb mentioned lack of clear-logs as another blocker. So I realised we need to have a way so that not only me but all of us can actually build and release new images quickly. I am happy to help (and I have a bunch of changes in the Github issues but since we work sometimes in different timezones, I think it is important to get iterations quicker.
   
   This change implements a number of related improvements to Breeze to make it super easy to iterat with production images  - including production images from released version of Airflow - and allows to quickly release patched versions. I already released -1 yesterday for @ash and I am just pushing -2 with clear-logs.
   
   You can take a look at the IMAGES.rst changes where I described everything but for your use it should be as easy as:
   
   `./breeze --python 3.7 build-image --install-airflow-version 1.10.10 --production-image --use-local-cache`
   
   That's it. This will build production image out of PyPi 1.10.10 with the local Dockerfile and scripts - so you can easily add new scripts, use them in Dockerfile and build production image locally. Then you can re-tag and push them manually. 
   
   There are more changes in there for example once you build the prod image you can enter:
   
   `./breeze --production-image --python 3.7 shell bash` -> and you will be dropped in bash in the production image
   
   `./breeze --production-image --python 3.7 shell python` -> and you will be dropped in pyhton in the production image
   
   And few more things that I described in the description of the commit. 
   
   I also added "1.10.10-1, -2"  "release notes as well in the IMAGES.rst so that we can keep track of the changes while we work on them.
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] mik-laj edited a comment on pull request #9162: Improve production image iteration speed

Posted by GitBox <gi...@apache.org>.
mik-laj edited a comment on pull request #9162:
URL: https://github.com/apache/airflow/pull/9162#issuecomment-640428080


   > entrypoint.sh → scripts/prod/entrypoint_prod.sh
   
   What do you think about entrypoint.sh → scripts/prod/entrypoint.sh
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk edited a comment on pull request #9162: Improve production image iteration speed

Posted by GitBox <gi...@apache.org>.
potiuk edited a comment on pull request #9162:
URL: https://github.com/apache/airflow/pull/9162#issuecomment-640104246


   Hey @ashb @dimberman @schnie @jhtimmins I think you  might want to merge that one :).
   
   It is on the biggish side and I can split it into few independent PRs if you think that it makes sense, but I think this one is crucial to get fast iteration on the production image while we are integrating the helm chart with it. Yesterday when @ashb worked on the chart we had slack conversation and I could help fairly quickly in adding some changes to the entrypoint, i saw later @ashb mentioned lack of clear-logs as another blocker. So I realised we need to have a way so that not only me but all of us can actually build and release new images quickly. I am happy to help (and I have a bunch of changes in the GitHub issues but since we work sometimes in different timezones, I think it is important to get iterations quicker.
   
   This change implements a number of related improvements to Breeze to make it super easy to iterat with production images  - including production images from released version of Airflow - and allows to quickly release patched versions. I already released -1 yesterday for @ash and I am just pushing -2 with clear-logs.
   
   You can take a look at the IMAGES.rst changes where I described everything but for your use it should be as easy as:
   
   `./breeze --python 3.7 build-image --install-airflow-version 1.10.10 --production-image --use-local-cache`
   
   That's it. This will build production image out of PyPi 1.10.10 with the local Dockerfile and scripts - so you can easily add new scripts, use them in Dockerfile and build production image locally. Then you can re-tag and push them manually. 
   
   What's super important - the PROD image is now optimized for BOTH - speed of rebuilds AND size.  If you use the `--use-local-cache` switch it will rebuild very quickly next time you build it - even if you modify setup.py and add new dependencies. For me it's < 20-30 seconds when I tried. I think it removes the biggest obstacle when working on the prod image - i.e. how quickly you can iterate.
   
   There are more changes in there for example once you build the prod image you can enter:
   
   `./breeze --production-image --python 3.7 --install-airflow-version 1.10.10  shell bash` -> and you will be dropped in bash in the production image
   
   `./breeze --production-image --python 3.7 --install-airflow-version 1.10.10  shell python` -> and you will be dropped in python in the production image
   
   And few more things that I described in the description of the commit. 
   
   I also added "1.10.10-1, -2"  "release notes as well in the IMAGES.rst so that we can keep track of the changes while we work on them.
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk edited a comment on pull request #9162: Improve production image iteration speed

Posted by GitBox <gi...@apache.org>.
potiuk edited a comment on pull request #9162:
URL: https://github.com/apache/airflow/pull/9162#issuecomment-640104246


   Hey @ashb @dimberman @schnie @jhtimmins I think you  might want to merge that one :).
   
   It is on the biggish side and I can split it into few independent PRs if you think that it makes sense, but I think this one is crucial to get fast iteration on the production image while we are integrating the helm chart with it. Yesterday when @ashb worked on the chart we had slack conversation and I could help fairly quickly in adding some changes to the entrypoint, i saw later @ashb mentioned lack of clear-logs as another blocker. So I realised we need to have a way so that not only me but all of us can actually build and release new images quickly. I am happy to help (and I have a bunch of changes in the Github issues but since we work sometimes in different timezones, I think it is important to get iterations quicker.
   
   This change implements a number of related improvements to Breeze to make it super easy to iterat with production images  - including production images from released version of Airflow - and allows to quickly release patched versions. I already released -1 yesterday for @ash and I am just pushing -2 with clear-logs.
   
   You can take a look at the IMAGES.rst changes where I described everything but for your use it should be as easy as:
   
   `./breeze --python 3.7 build-image --install-airflow-version 1.10.10 --production-image --use-local-cache`
   
   That's it. This will build production image out of PyPi 1.10.10 with the local Dockerfile and scripts - so you can easily add new scripts, use them in Dockerfile and build production image locally. Then you can re-tag and push them manually. 
   
   What's super important - the PROD image is now optimized for BOTH - speed of rebuilds AND size.  If you use the `--use-local-cache` switch it will rebuild very quickly next time you build it - even if you modify setup.py and add new dependencies. For me it's < 20-30 seconds when I tried. I think it removes the biggest obstacle when working on the prod image - i.e. how quickly you can iterate.
   
   There are more changes in there for example once you build the prod image you can enter:
   
   `./breeze --production-image --python 3.7 --install-airflow-version 1.10.10  shell bash` -> and you will be dropped in bash in the production image
   
   `./breeze --production-image --python 3.7 --install-airflow-version 1.10.10  shell python` -> and you will be dropped in pyhton in the production image
   
   And few more things that I described in the description of the commit. 
   
   I also added "1.10.10-1, -2"  "release notes as well in the IMAGES.rst so that we can keep track of the changes while we work on them.
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ash commented on pull request #9162: Improve production image iteration speed

Posted by GitBox <gi...@apache.org>.
ash commented on pull request #9162:
URL: https://github.com/apache/airflow/pull/9162#issuecomment-641062231


   Guys. This is very annoying to get email notifications when you mistype the name of one of you.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on a change in pull request #9162: Improve production image iteration speed

Posted by GitBox <gi...@apache.org>.
potiuk commented on a change in pull request #9162:
URL: https://github.com/apache/airflow/pull/9162#discussion_r437550994



##########
File path: scripts/ci/in_container/entrypoint_ci.sh
##########
@@ -134,10 +123,6 @@ if [[ ${INTEGRATION_KERBEROS:="false"} == "true" ]]; then
 fi
 
 
-# Start MiniCluster
-java -cp "/opt/minicluster-1.1-SNAPSHOT/*" com.ing.minicluster.MiniCluster \

Review comment:
       Exactly.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ashb commented on pull request #9162: Improve production image iteration speed

Posted by GitBox <gi...@apache.org>.
ashb commented on pull request #9162:
URL: https://github.com/apache/airflow/pull/9162#issuecomment-640741641






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on a change in pull request #9162: Improve production image iteration speed

Posted by GitBox <gi...@apache.org>.
potiuk commented on a change in pull request #9162:
URL: https://github.com/apache/airflow/pull/9162#discussion_r436891270



##########
File path: Dockerfile
##########
@@ -153,6 +153,23 @@ ENV PIP_VERSION=${PIP_VERSION}
 
 RUN pip install --upgrade pip==${PIP_VERSION}
 
+ARG AIRFLOW_REPO=apache/airflow
+ENV AIRFLOW_REPO=${AIRFLOW_REPO}
+
+ARG AIRFLOW_BRANCH=master
+ENV AIRFLOW_BRANCH=${AIRFLOW_BRANCH}
+
+ARG AIRFLOW_EXTRAS
+ARG ADDITIONAL_AIRFLOW_EXTRAS=""
+ENV AIRFLOW_EXTRAS=${AIRFLOW_EXTRAS}${ADDITIONAL_AIRFLOW_EXTRAS:+,}${ADDITIONAL_AIRFLOW_EXTRAS}
+
+# In case of Production uild image segment we want to pre-install master version of airflow

Review comment:
       Nope. It installis it in two steps (all happens in the build segment):
   1} It first pre-installs it from v1-10-test HEAD. This gives the "optimise build time" improvement - because it will only re-install what has changed
   2) then it installs airflow from PyPI including requirements.txt taken from GitHub  - same version as being installed from PyPi (so when we install 1.10.11 in the future it will install airflow with the requirements that were "snapshot" at 1.10.11 tagging time.
   
   This is all installed with --user flag in the build segment. And then the .local dir is copied to main segment. This way we  do not gave double layers from installing it first from Github v1-10-test and secondly from PyPI - we always copy the "final" set of install files as single layer.
   
   




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on a change in pull request #9162: Improve production image iteration speed

Posted by GitBox <gi...@apache.org>.
potiuk commented on a change in pull request #9162:
URL: https://github.com/apache/airflow/pull/9162#discussion_r436516086



##########
File path: scripts/ci/in_container/entrypoint_ci.sh
##########
@@ -163,22 +148,19 @@ ssh-keyscan -H localhost >> ~/.ssh/known_hosts 2>/dev/null
 if [[ ${CI:=} == "true" && ${RUN_TESTS} == "true" ]] ; then
     echo
     echo " !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!"
-    echo "  Setting default parallellism to 2 because we can run out of memory during tests on CI"
+    echo "  Setting default parallelism to 2 because we can run out of memory during tests on CI"
     echo " !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!"
     echo
-    export AIRFLOW__CORE__PARALELLISM=2
+    export AIRFLOW__CORE__PARALLELISM=2
 fi
 
+cd "${AIRFLOW_SOURCES}"
+
 set +u
 # If we do not want to run tests, we simply drop into bash
-if [[ "${RUN_TESTS}" == "false" ]]; then
-    if [[ ${#ARGS} == 0 ]]; then
-        exec /bin/bash
-    else
-        exec /bin/bash -c "$(printf "%q " "${ARGS[@]}")"
-    fi
+if [[ "${RUN_TESTS:=false}" != "true" ]]; then
+    exec /bin/bash "${@}"

Review comment:
       Yep. I am sure. We want to add -c manually if needed. If you do not add anything you are simply dropped into shell. This way you can also pass additional options to bash - not only -c.
   
   See the examples in BREEZE.rst 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk merged pull request #9162: Improve production image iteration speed

Posted by GitBox <gi...@apache.org>.
potiuk merged pull request #9162:
URL: https://github.com/apache/airflow/pull/9162


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on a change in pull request #9162: Improve production image iteration speed

Posted by GitBox <gi...@apache.org>.
potiuk commented on a change in pull request #9162:
URL: https://github.com/apache/airflow/pull/9162#discussion_r436526224



##########
File path: scripts/ci/in_container/entrypoint_ci.sh
##########
@@ -163,22 +148,19 @@ ssh-keyscan -H localhost >> ~/.ssh/known_hosts 2>/dev/null
 if [[ ${CI:=} == "true" && ${RUN_TESTS} == "true" ]] ; then
     echo
     echo " !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!"
-    echo "  Setting default parallellism to 2 because we can run out of memory during tests on CI"
+    echo "  Setting default parallelism to 2 because we can run out of memory during tests on CI"
     echo " !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!"
     echo
-    export AIRFLOW__CORE__PARALELLISM=2
+    export AIRFLOW__CORE__PARALLELISM=2
 fi
 
+cd "${AIRFLOW_SOURCES}"
+
 set +u
 # If we do not want to run tests, we simply drop into bash
-if [[ "${RUN_TESTS}" == "false" ]]; then
-    if [[ ${#ARGS} == 0 ]]; then
-        exec /bin/bash
-    else
-        exec /bin/bash -c "$(printf "%q " "${ARGS[@]}")"
-    fi
+if [[ "${RUN_TESTS:=false}" != "true" ]]; then
+    exec /bin/bash "${@}"

Review comment:
       Right. I think it's not needed after recent HIVE mocking. Let's see what our tests have to say about it.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on pull request #9162: Improve production image iteration speed

Posted by GitBox <gi...@apache.org>.
potiuk commented on pull request #9162:
URL: https://github.com/apache/airflow/pull/9162#issuecomment-640430697


   > > entrypoint.sh → scripts/prod/entrypoint_prod.sh
   > 
   > What do you think about entrypoint.sh → scripts/prod/entrypoint.sh
   
   I prefer to keep the name _prod - we also have a _ci one and _exec,sh. 
   
   It's very easy to make a mistake which  one you are editing (happened to me already several times). Adding explicit name makes it straightforward.
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on pull request #9162: Improve production image iteration speed

Posted by GitBox <gi...@apache.org>.
potiuk commented on pull request #9162:
URL: https://github.com/apache/airflow/pull/9162#issuecomment-640799787






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ashb removed a comment on pull request #9162: Improve production image iteration speed

Posted by GitBox <gi...@apache.org>.
ashb removed a comment on pull request #9162:
URL: https://github.com/apache/airflow/pull/9162#issuecomment-640741641


   It still feels very weird that we are building 1.10.10 images _not_ from the Dockerfile in the 1.10.10 branch.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk edited a comment on pull request #9162: Improve production image iteration speed

Posted by GitBox <gi...@apache.org>.
potiuk edited a comment on pull request #9162:
URL: https://github.com/apache/airflow/pull/9162#issuecomment-640104246


   Hey @ashb @dimberman @schnie @jhtimmins I think you  might want to merge that one :).
   
   It is on the biggish side and I can split it into few independent PRs if you think that it makes sense, but I think this one is crucial to get fast iteration on the production image while we are integrating the helm chart with it. Yesterday when @ashb worked on the chart we had slack conversation and I could help fairly quickly in adding some changes to the entrypoint, i saw later @ashb mentioned lack of clear-logs as another blocker. So I realised we need to have a way so that not only me but all of us can actually build and release new images quickly. I am happy to help (and I have a bunch of changes in the Github issues but since we work sometimes in different timezones, I think it is important to get iterations quicker.
   
   This change implements a number of related improvements to Breeze to make it super easy to iterat with production images  - including production images from released version of Airflow - and allows to quickly release patched versions. I already released -1 yesterday for @ash and I am just pushing -2 with clear-logs.
   
   You can take a look at the IMAGES.rst changes where I described everything but for your use it should be as easy as:
   
   `./breeze --python 3.7 build-image --install-airflow-version 1.10.10 --production-image --use-local-cache`
   
   That's it. This will build production image out of PyPi 1.10.10 with the local Dockerfile and scripts - so you can easily add new scripts, use them in Dockerfile and build production image locally. Then you can re-tag and push them manually. 
   
   What's super important - the PROD image is now optimized for BOTH - speed of rebuilds AND size.  If you use the `--use-local-cache` switch it will rebuild very quickly next time you build it - even if you modify setup.py and add new dependencies. For me it's < 20-30 seconds when I tried. I think it removes the biggest obstacle when working on the prod image - i.e. how quickly you can iterate.
   
   There are more changes in there for example once you build the prod image you can enter:
   
   `./breeze --production-image --python 3.7 shell bash` -> and you will be dropped in bash in the production image
   
   `./breeze --production-image --python 3.7 shell python` -> and you will be dropped in pyhton in the production image
   
   And few more things that I described in the description of the commit. 
   
   I also added "1.10.10-1, -2"  "release notes as well in the IMAGES.rst so that we can keep track of the changes while we work on them.
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] mik-laj commented on a change in pull request #9162: Improve production image iteration speed

Posted by GitBox <gi...@apache.org>.
mik-laj commented on a change in pull request #9162:
URL: https://github.com/apache/airflow/pull/9162#discussion_r436516446



##########
File path: scripts/ci/docker-compose/base.yml
##########
@@ -19,16 +19,10 @@ version: "2.2"
 services:
   airflow:
     image: ${AIRFLOW_IMAGE}
-    init: true
-    entrypoint: ["/bin/bash", "-c"]
     environment:
       - USER=root
       - ADDITIONAL_PATH=~/.local/bin
-      - HADOOP_DISTRO=cdh
-      - HADOOP_HOME=/opt/hadoop-cdh
-      - HADOOP_OPTS=-D/opt/krb5.conf
       - HIVE_HOME=/opt/hive

Review comment:
       ```suggestion
   ```
   I don't see any reference to this directory in Dockerfike.ci. Did I miss something?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on a change in pull request #9162: Improve production image iteration speed

Posted by GitBox <gi...@apache.org>.
potiuk commented on a change in pull request #9162:
URL: https://github.com/apache/airflow/pull/9162#discussion_r436516895



##########
File path: scripts/ci/in_container/entrypoint_ci.sh
##########
@@ -163,22 +148,19 @@ ssh-keyscan -H localhost >> ~/.ssh/known_hosts 2>/dev/null
 if [[ ${CI:=} == "true" && ${RUN_TESTS} == "true" ]] ; then
     echo
     echo " !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!"
-    echo "  Setting default parallellism to 2 because we can run out of memory during tests on CI"
+    echo "  Setting default parallelism to 2 because we can run out of memory during tests on CI"
     echo " !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!"
     echo
-    export AIRFLOW__CORE__PARALELLISM=2
+    export AIRFLOW__CORE__PARALLELISM=2
 fi
 
+cd "${AIRFLOW_SOURCES}"
+
 set +u
 # If we do not want to run tests, we simply drop into bash
-if [[ "${RUN_TESTS}" == "false" ]]; then
-    if [[ ${#ARGS} == 0 ]]; then
-        exec /bin/bash
-    else
-        exec /bin/bash -c "$(printf "%q " "${ARGS[@]}")"
-    fi
+if [[ "${RUN_TESTS:=false}" != "true" ]]; then
+    exec /bin/bash "${@}"

Review comment:
       This way you can do "./breeze shell -- -c "ls" or "./docker run IMAGE bash -c "ls" which is exactly the way how Bash's docker is used.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] mik-laj commented on pull request #9162: Improve production image iteration speed

Posted by GitBox <gi...@apache.org>.
mik-laj commented on pull request #9162:
URL: https://github.com/apache/airflow/pull/9162#issuecomment-640428080


   > entrypoint.sh → scripts/prod/entrypoint_prod.sh
   
   WDYT about entrypoint.sh → scripts/prod/entrypoint.sh
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on pull request #9162: Improve production image iteration speed

Posted by GitBox <gi...@apache.org>.
potiuk commented on pull request #9162:
URL: https://github.com/apache/airflow/pull/9162#issuecomment-640424948


   I would love to get that one merged so that we can improve the prod images faster


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on pull request #9162: Improve production image iteration speed

Posted by GitBox <gi...@apache.org>.
potiuk commented on pull request #9162:
URL: https://github.com/apache/airflow/pull/9162#issuecomment-644649141


   Only Quarantine failed. If there are no more comments I am emerging this one shortly @feluelle - you wanted to take a look as well ?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ashb commented on a change in pull request #9162: Improve production image iteration speed

Posted by GitBox <gi...@apache.org>.
ashb commented on a change in pull request #9162:
URL: https://github.com/apache/airflow/pull/9162#discussion_r436843199



##########
File path: Dockerfile
##########
@@ -153,6 +153,23 @@ ENV PIP_VERSION=${PIP_VERSION}
 
 RUN pip install --upgrade pip==${PIP_VERSION}
 
+ARG AIRFLOW_REPO=apache/airflow
+ENV AIRFLOW_REPO=${AIRFLOW_REPO}
+
+ARG AIRFLOW_BRANCH=master
+ENV AIRFLOW_BRANCH=${AIRFLOW_BRANCH}
+
+ARG AIRFLOW_EXTRAS
+ARG ADDITIONAL_AIRFLOW_EXTRAS=""
+ENV AIRFLOW_EXTRAS=${AIRFLOW_EXTRAS}${ADDITIONAL_AIRFLOW_EXTRAS:+,}${ADDITIONAL_AIRFLOW_EXTRAS}
+
+# In case of Production uild image segment we want to pre-install master version of airflow

Review comment:
       ```suggestion
   # In case of Production build image segment we want to pre-install master version of airflow
   ```

##########
File path: Dockerfile
##########
@@ -153,6 +153,23 @@ ENV PIP_VERSION=${PIP_VERSION}
 
 RUN pip install --upgrade pip==${PIP_VERSION}
 
+ARG AIRFLOW_REPO=apache/airflow
+ENV AIRFLOW_REPO=${AIRFLOW_REPO}
+
+ARG AIRFLOW_BRANCH=master
+ENV AIRFLOW_BRANCH=${AIRFLOW_BRANCH}
+
+ARG AIRFLOW_EXTRAS
+ARG ADDITIONAL_AIRFLOW_EXTRAS=""
+ENV AIRFLOW_EXTRAS=${AIRFLOW_EXTRAS}${ADDITIONAL_AIRFLOW_EXTRAS:+,}${ADDITIONAL_AIRFLOW_EXTRAS}
+
+# In case of Production uild image segment we want to pre-install master version of airflow

Review comment:
       Also it's not master branch dependencies when doing a prod build, is it?

##########
File path: scripts/ci/in_container/run_ci_tests.sh
##########
@@ -18,15 +18,14 @@
 # shellcheck source=scripts/ci/in_container/_in_container_script_init.sh
 . "$( dirname "${BASH_SOURCE[0]}" )/_in_container_script_init.sh"
 
-# any argument received is overriding the default nose execution arguments:
-PYTEST_ARGS=( "$@" )
-
 echo
-echo "Starting the tests with those pytest arguments: ${PYTEST_ARGS[*]}"
+echo "Starting the tests with those pytest arguments:" "${@}"
 echo
+pwd
+ls -la

Review comment:
       Debug left over?

##########
File path: scripts/ci/in_container/entrypoint_ci.sh
##########
@@ -134,10 +123,6 @@ if [[ ${INTEGRATION_KERBEROS:="false"} == "true" ]]; then
 fi
 
 
-# Start MiniCluster
-java -cp "/opt/minicluster-1.1-SNAPSHOT/*" com.ing.minicluster.MiniCluster \

Review comment:
       Was this left over from Jame's removal of hive? Was it just erroring, but because of the `&` not stopping execution?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org