You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2020/11/28 11:17:14 UTC

[GitHub] [airflow] potiuk opened a new pull request #12685: Production images on CI are now built from packages

potiuk opened a new pull request #12685:
URL: https://github.com/apache/airflow/pull/12685


   So far, the production images of Airflow were using sources
   when they were built on CI. This PR changes that, to build
   airflow + providers packages first and install them
   rather than use sources as installation mechanism.
   
   Part of #12261
   
   <!--
   Thank you for contributing! Please make sure that your code changes
   are covered with tests. And in case of new features or big changes
   remember to adjust the documentation.
   
   Feel free to ping committers for the review!
   
   In case of existing issue, reference it using one of the following:
   
   closes: #ISSUE
   related: #ISSUE
   
   How to write a good git commit message:
   http://chris.beams.io/posts/git-commit/
   -->
   
   ---
   **^ Add meaningful description above**
   
   Read the **[Pull Request Guidelines](https://github.com/apache/airflow/blob/master/CONTRIBUTING.rst#pull-request-guidelines)** for more information.
   In case of fundamental code change, Airflow Improvement Proposal ([AIP](https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+Improvements+Proposals)) is needed.
   In case of a new dependency, check compliance with the [ASF 3rd Party License Policy](https://www.apache.org/legal/resolved.html#category-x).
   In case of backwards incompatible changes please leave a note in [UPDATING.md](https://github.com/apache/airflow/blob/master/UPDATING.md).
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on pull request #12685: Production images on CI are now built from packages

Posted by GitBox <gi...@apache.org>.
potiuk commented on pull request #12685:
URL: https://github.com/apache/airflow/pull/12685#issuecomment-738883609


   I hope this time no Exit 137 and it gets green


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk edited a comment on pull request #12685: Production images on CI are now built from packages

Posted by GitBox <gi...@apache.org>.
potiuk edited a comment on pull request #12685:
URL: https://github.com/apache/airflow/pull/12685#issuecomment-736726882


   @ashb -> I think you wanted this to happen and it's high time to get the prod image for tests (next step - the Dockerhub one) built from packages,
   
   I reviewed the list of extras installed by default and added a few - especially those that did not require any special dependencies (like http/ftp) - they worked previously but they would not work after changing to packages (only selected provider packages get installed!) 
   
   This is the list I came up with after the review. I based it on how "niche" particular providers are (but this is purely my perception, so I might be wrong).
   
   Included: "async,amazon,celery,cncf.kubernetes,docker,dask,elasticsearch,ftp,grpc,hashicorp,http,google,microsoft.azure,mysql,postgres,redis,sendgrid,sftp,slack,ssh,statsd,virtualenv"
   
   The recommendation is that people build their own images, choose the extras they want, and add extra packages, they need (and our image fully supports customization). But there will be a group of people that will rely on our images for their own production usage, so I think it's the right time to standardize it. We certainly do not want to install all providers, but I do not think we have some clear guidelines of what should be in the reference image so I have to resort to the wisdom of crowd. Once we agree to some proposal here, I will send it to devlist to vote on it.
   
   Maybe better will be to list those which are left:
   Excluded: 
   * all apache providers:  Cassandra, druid, hdfs, hive, kylin livy, pig, pinot, spark, sqoop
   * cloudant
   * databricks
   * datadog
   * dingding
   * discord
   * exasol
   * facebook
   * imap
   * jdbc
   * jenkins
   * jira
   * microsoft.mssql
   * microsoft.winrm
   * mongo
   * odbc
   * openfaas
   * oracle
   * pagerduty
   * papermill
   * plexus
   * presto
   * qubole
   * salesforce
   * samba
   * segment
   * singularity
   * snowflake
   * sqlite
   * vertica
   * yandex
   * zendesk
   
   @ashb, @kaxil, @turbaszek, @mik-laj. @XD-DENG, @feluelle, @eladkal, @ryw, @vikramkoka, @KevinYang21 (also others)  -> can you please take a look and comment if you think this list is "good" for reference image of ours? Any proposals to move providers between "excluded" and "included" ?
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on a change in pull request #12685: Production images on CI are now built from packages

Posted by GitBox <gi...@apache.org>.
potiuk commented on a change in pull request #12685:
URL: https://github.com/apache/airflow/pull/12685#discussion_r537063108



##########
File path: LOCAL_VIRTUALENV.rst
##########
@@ -163,6 +163,31 @@ Activate your virtualenv, e.g. by using ``workon``, and once you are in it, run:
     cd airflow/www
     yarn build
 
+Developing Providers
+--------------------
+
+In Airflow 2.0 we introduced split of Apache Airflow into separate packages - there is one main
+apache-airflow package with core of Airflow and 70+ packages for all providers (external services
+and software Airflow can communicate with).
+
+Developing providers is part of Airflow development, but when you install airflow as editable in your local
+development environment, the corresponding provider packages will be also installed from PyPI. However, the
+providers will also be present in your "airflow/providers" folder. This might lead to confusion,
+which sources of providers are imported during development. It will depend on your
+environment's PYTHONPATH setting in general.
+
+In order to avoid the confusion, you can set ``INSTALL_PROVIDERS_FROM_SOURCES`` environment to ``true``
+before running ``pip install`` command:
+
+.. code-block:: bash
+
+  INSTALL_PROVIDERS_FROM_SOURCES="true" pip install -U -e ".[devel,<OTHER EXTRAS>]" \

Review comment:
       Yes. If you do not provide the variable it will also install providers from PyPI and you end up with two versions of providers - one in sources and one installed via packages.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] github-actions[bot] commented on pull request #12685: Production images on CI are now built from packages

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #12685:
URL: https://github.com/apache/airflow/pull/12685#issuecomment-739572539


   [The Workflow run](https://github.com/apache/airflow/actions/runs/404551048) is cancelling this PR. Building images for the PR has failed. Follow the the workflow link to check the reason.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ashb commented on a change in pull request #12685: Production images on CI are now built from packages

Posted by GitBox <gi...@apache.org>.
ashb commented on a change in pull request #12685:
URL: https://github.com/apache/airflow/pull/12685#discussion_r533774397



##########
File path: scripts/in_container/entrypoint_ci.sh
##########
@@ -98,13 +101,51 @@ if [[ -z ${INSTALL_AIRFLOW_VERSION=} ]]; then
     mkdir -p "${AIRFLOW_SOURCES}"/logs/
     mkdir -p "${AIRFLOW_SOURCES}"/tmp/
     export PYTHONPATH=${AIRFLOW_SOURCES}
+elif [[ ${INSTALL_AIRFLOW_VERSION} == "none"  ]]; then
+    echo
+    echo "Skip installing airflow - only install wheel packages that are present locally"
+    echo
+    uninstall_airflow_and_providers
+elif [[ ${INSTALL_AIRFLOW_VERSION} == "wheel"  ]]; then
+    echo
+    echo "Install airflow from wheel package with [all] extras but uninstalling providers."
+    echo
+    uninstall_airflow_and_providers
+    install_airflow_from_wheel "[all]"
+    uninstall_providers
 else
-    # Installs released airflow version without adding more extras
-    install_released_airflow_version  "" "${INSTALL_AIRFLOW_VERSION}"
+    echo
+    echo "Install airflow from PyPI including [all] extras"
+    echo
+    install_released_airflow_version "${INSTALL_AIRFLOW_VERSION}" "[all]"
 fi
-
-if [[ ${INSTALL_WHEELS=} == "true" ]]; then
-  pip install /dist/*.whl || true
+if [[ ${INSTALL_PACKAGES_FROM_DIST=} == "true" ]]; then
+    echo
+    echo "Install all packages from dist folder"
+    if [[ ${INSTALL_AIRFLOW_VERSION} == "wheel" ]]; then
+        echo "(except apache-airflow)"
+    fi

Review comment:
       Haven't got an opinion (yet, can form one if needed) - just wondering why this was done




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on a change in pull request #12685: Production images on CI are now built from packages

Posted by GitBox <gi...@apache.org>.
potiuk commented on a change in pull request #12685:
URL: https://github.com/apache/airflow/pull/12685#discussion_r533767494



##########
File path: scripts/in_container/entrypoint_ci.sh
##########
@@ -98,13 +101,51 @@ if [[ -z ${INSTALL_AIRFLOW_VERSION=} ]]; then
     mkdir -p "${AIRFLOW_SOURCES}"/logs/
     mkdir -p "${AIRFLOW_SOURCES}"/tmp/
     export PYTHONPATH=${AIRFLOW_SOURCES}
+elif [[ ${INSTALL_AIRFLOW_VERSION} == "none"  ]]; then
+    echo
+    echo "Skip installing airflow - only install wheel packages that are present locally"
+    echo
+    uninstall_airflow_and_providers
+elif [[ ${INSTALL_AIRFLOW_VERSION} == "wheel"  ]]; then
+    echo
+    echo "Install airflow from wheel package with [all] extras but uninstalling providers."
+    echo
+    uninstall_airflow_and_providers
+    install_airflow_from_wheel "[all]"
+    uninstall_providers
 else
-    # Installs released airflow version without adding more extras
-    install_released_airflow_version  "" "${INSTALL_AIRFLOW_VERSION}"
+    echo
+    echo "Install airflow from PyPI including [all] extras"
+    echo
+    install_released_airflow_version "${INSTALL_AIRFLOW_VERSION}" "[all]"
 fi
-
-if [[ ${INSTALL_WHEELS=} == "true" ]]; then
-  pip install /dist/*.whl || true
+if [[ ${INSTALL_PACKAGES_FROM_DIST=} == "true" ]]; then
+    echo
+    echo "Install all packages from dist folder"
+    if [[ ${INSTALL_AIRFLOW_VERSION} == "wheel" ]]; then
+        echo "(except apache-airflow)"
+    fi

Review comment:
       Because it takes much longer. That's the only reason. From observations, it takes usually 2-4 times longer to install an sdist package than .whl, And this test already runs for quite a few minutes, so I wanted to reduce the strain on CI
   
   Do you think @ashb we should? Do you expect any problems with sdist comparing to wheel? I can easily change it to install from sdist additionally to wheels, but it will make our tests quite a bit longer, so unless there is a good reason we suspect something wrong we should not do it. 
   
   The only reason I am installing sdist for providers is that the setup.py are generated there and they were wrong initially, but i was considering to drop that as well, as I do not expect any more problems there - sdist should behave same as .wh  if setup.py is there




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on pull request #12685: Production images on CI are now built from packages

Posted by GitBox <gi...@apache.org>.
potiuk commented on pull request #12685:
URL: https://github.com/apache/airflow/pull/12685#issuecomment-739166883


   Same here @ashb  - it would be great to get one before RC as this way we actually test "production-like" setup.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ashb commented on a change in pull request #12685: Production images on CI are now built from packages

Posted by GitBox <gi...@apache.org>.
ashb commented on a change in pull request #12685:
URL: https://github.com/apache/airflow/pull/12685#discussion_r537054443



##########
File path: LOCAL_VIRTUALENV.rst
##########
@@ -163,6 +163,31 @@ Activate your virtualenv, e.g. by using ``workon``, and once you are in it, run:
     cd airflow/www
     yarn build
 
+Developing Providers
+--------------------
+
+In Airflow 2.0 we introduced split of Apache Airflow into separate packages - there is one main
+apache-airflow package with core of Airflow and 70+ packages for all providers (external services
+and software Airflow can communicate with).
+
+Developing providers is part of Airflow development, but when you install airflow as editable in your local
+development environment, the corresponding provider packages will be also installed from PyPI. However, the
+providers will also be present in your "airflow/providers" folder. This might lead to confusion,
+which sources of providers are imported during development. It will depend on your
+environment's PYTHONPATH setting in general.
+
+In order to avoid the confusion, you can set ``INSTALL_PROVIDERS_FROM_SOURCES`` environment to ``true``
+before running ``pip install`` command:
+
+.. code-block:: bash
+
+  INSTALL_PROVIDERS_FROM_SOURCES="true" pip install -U -e ".[devel,<OTHER EXTRAS>]" \

Review comment:
       With an editable install (`-e`) does the INSTALL_PROVIDERS_FROM_SOURCES flag do anything at all, as pip/setuptools doesn't copy the files, but install a "link" to the airflow tree (or so I thought)




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on pull request #12685: Production images on CI are now built from packages

Posted by GitBox <gi...@apache.org>.
potiuk commented on pull request #12685:
URL: https://github.com/apache/airflow/pull/12685#issuecomment-735219234


   This is the change we discussed some time ago and captured in #12261. The Production image during CI build is now built from wheels rather than directly from sources to reflect a "real" installation case. 
   
   This change modifies the "CI" production image build to first:
   
   1) Using 'pip download" + constraints file it downloads all .wheel packages that are needed to install airflow with the chosen "extras" for the production image
   
   2) Buillds all providers packages that are selected for the production
   
   3) Builds airflow .whl package
   
   4) Prepares the production image using those wheel files rather than PyPI.
   
   This way this production image for development tests reflects the exact production "content" - all packages (including airflow) are installed, but at the same time we are using latest sources to build those packages, so the image can also be used in K8S tests because it is build from the current (PR) sources. 
   
   After preparing the image I am also checking if all the providers are installed as expected
   
   
   ![Screenshot from 2020-11-28 12-24-37](https://user-images.githubusercontent.com/595491/100514402-b5bd9500-3174-11eb-8c46-a4ecae05294f.png)
   
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on a change in pull request #12685: Production images on CI are now built from packages

Posted by GitBox <gi...@apache.org>.
potiuk commented on a change in pull request #12685:
URL: https://github.com/apache/airflow/pull/12685#discussion_r533834106



##########
File path: breeze
##########
@@ -2211,6 +2259,12 @@ function breeze::flag_local_file_mounting() {
 -l, --skip-mounting-local-sources
         Skips mounting local volume with sources - you get exactly what is in the
         docker image rather than your current local sources of Airflow.
+
+--skip-mounting-files

Review comment:
       Ah. I see - in the snipped from GH I saw the sources option displayed in the first line so I thought you were commenting on that. 
   
   There is no particular use case for that I can think of . It was just option I realized I missed when I was moving /dist to the right docker compose file, so I just added it. 
   
   But now that you mentioned it, indeed it can be removed - I checked that we never switch it off in the CI anyway. So yeah. I am happy to remove it together with the MOUNT_FILES env variable. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ashb commented on a change in pull request #12685: Production images on CI are now built from packages

Posted by GitBox <gi...@apache.org>.
ashb commented on a change in pull request #12685:
URL: https://github.com/apache/airflow/pull/12685#discussion_r533733644



##########
File path: .github/workflows/build-images-workflow-run.yml
##########
@@ -336,7 +336,8 @@ jobs:
         if: steps.defaults.outputs.proceed == 'true'
       - name: "Build CI images ${{ matrix.python-version }}:${{ github.event.workflow_run.id }}"
         run: ./scripts/ci/images/ci_prepare_ci_image_on_ci.sh
-        if: matrix.image-type == 'CI' && steps.defaults.outputs.proceed == 'true'
+        # locally built CI image is needed to prepare packages for PROD image build

Review comment:
       What does this do to run time of BuildImage worflows?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] github-actions[bot] commented on pull request #12685: Production images on CI are now built from packages

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #12685:
URL: https://github.com/apache/airflow/pull/12685#issuecomment-739569917


   The PR most likely needs to run full matrix of tests because it modifies parts of the core of Airflow. However, committers might decide to merge it quickly and take the risk. If they don't merge it quickly - please rebase it to the latest master at your convenience, or amend the last commit of the PR, and push it with --force-with-lease.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on a change in pull request #12685: Production images on CI are now built from packages

Posted by GitBox <gi...@apache.org>.
potiuk commented on a change in pull request #12685:
URL: https://github.com/apache/airflow/pull/12685#discussion_r533769785



##########
File path: breeze
##########
@@ -2211,6 +2259,12 @@ function breeze::flag_local_file_mounting() {
 -l, --skip-mounting-local-sources
         Skips mounting local volume with sources - you get exactly what is in the
         docker image rather than your current local sources of Airflow.
+
+--skip-mounting-files

Review comment:
       We need it badly. The problem is that when you install from wheels AND have sources mounted, python interpreter behaves differently in python 3 is to look for airflow in the current working directory. so if you mount sources AND have packages installed, behaviour depends on which directory is your cwd. This is bad and confusing. 
   
   On the other hand there are cases where you do need the sources  - even if you install the packages, for example when you want to test setup.py behaviour with installed dependencies. It helped me a lot with my dependency work. So we need that option.
   
   
   




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on pull request #12685: Production images on CI are now built from packages

Posted by GitBox <gi...@apache.org>.
potiuk commented on pull request #12685:
URL: https://github.com/apache/airflow/pull/12685#issuecomment-736748854


   > This looks fine to me, but I miss a list of these packages on [Docherhub](https://hub.docker.com/r/apache/airflow) or any other documentation. What do we have to do to add this?
   
   IT is there in the image (but yes - we need to add them to Readme automatically via pre-commit so that it appears in DockerHub


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on a change in pull request #12685: Production images on CI are now built from packages

Posted by GitBox <gi...@apache.org>.
potiuk commented on a change in pull request #12685:
URL: https://github.com/apache/airflow/pull/12685#discussion_r537064297



##########
File path: scripts/in_container/entrypoint_ci.sh
##########
@@ -98,13 +101,51 @@ if [[ -z ${INSTALL_AIRFLOW_VERSION=} ]]; then
     mkdir -p "${AIRFLOW_SOURCES}"/logs/
     mkdir -p "${AIRFLOW_SOURCES}"/tmp/
     export PYTHONPATH=${AIRFLOW_SOURCES}
+elif [[ ${INSTALL_AIRFLOW_VERSION} == "none"  ]]; then
+    echo
+    echo "Skip installing airflow - only install wheel packages that are present locally"
+    echo
+    uninstall_airflow_and_providers
+elif [[ ${INSTALL_AIRFLOW_VERSION} == "wheel"  ]]; then
+    echo
+    echo "Install airflow from wheel package with [all] extras but uninstalling providers."
+    echo
+    uninstall_airflow_and_providers
+    install_airflow_from_wheel "[all]"
+    uninstall_providers
 else
-    # Installs released airflow version without adding more extras
-    install_released_airflow_version  "" "${INSTALL_AIRFLOW_VERSION}"
+    echo
+    echo "Install airflow from PyPI including [all] extras"
+    echo
+    install_released_airflow_version "${INSTALL_AIRFLOW_VERSION}" "[all]"
 fi
-
-if [[ ${INSTALL_WHEELS=} == "true" ]]; then
-  pip install /dist/*.whl || true
+if [[ ${INSTALL_PACKAGES_FROM_DIST=} == "true" ]]; then
+    echo
+    echo "Install all packages from dist folder"
+    if [[ ${INSTALL_AIRFLOW_VERSION} == "wheel" ]]; then
+        echo "(except apache-airflow)"
+    fi
+    echo
+    installable_files=()
+    for file in /dist/*.{whl,tar.gz}
+    do
+        if [[ ${INSTALL_AIRFLOW_VERSION} == "wheel" && ${file} == "apache?airflow-[0-9]"* ]]; then
+            # Skip Apache Airflow package - it's just been installed above with extras
+            echo "Skipping ${file}"
+            continue
+        fi
+        if [[ (${PACKAGE_FORMAT} == "wheel" || ${PACKAGE_FORMAT} == "both") && ${file} == *".whl" ]]; then
+            echo "Adding ${file} to install"
+            installable_files+=( "${file}" )
+        fi
+        if [[ (${PACKAGE_FORMAT} == "sdist" || ${PACKAGE_FORMAT} == "both") && ${file} == *".tar.gz" ]]; then
+            echo "Adding ${file} to install"
+            installable_files+=( "${file}" )
+        fi

Review comment:
       Correct. I removed it and added if to error (with RED COLOR) if 'both' is used in this case.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ashb commented on a change in pull request #12685: Production images on CI are now built from packages

Posted by GitBox <gi...@apache.org>.
ashb commented on a change in pull request #12685:
URL: https://github.com/apache/airflow/pull/12685#discussion_r537055160



##########
File path: breeze
##########
@@ -1473,6 +1477,10 @@ function breeze::prepare_formatted_versions() {
         tr ',' ' ' | fold -w "${indented_screen_width}" -s | sed "s/ /,/g; s/^/${list_prefix}/")
     readonly FORMATTED_TEST_TYPES
 
+    FORMATTED_PACKAGE_FORMATS=$(echo "${_breeze_allowed_package_formats=""}" |
+        tr ',' ' ' | fold -w "${indented_screen_width}" -s | sed "s/ /,/g; s/^/${list_prefix}/")

Review comment:
       `tr ',' ' '` isn't needed I don't thnk, as `_breeze_allowed_package_formats` is
   
   ```bash
   _breeze_allowed_package_formats="wheel sdist both"
   ```
   
   i.e. it doesn't contain commas.

##########
File path: scripts/in_container/entrypoint_ci.sh
##########
@@ -98,13 +101,51 @@ if [[ -z ${INSTALL_AIRFLOW_VERSION=} ]]; then
     mkdir -p "${AIRFLOW_SOURCES}"/logs/
     mkdir -p "${AIRFLOW_SOURCES}"/tmp/
     export PYTHONPATH=${AIRFLOW_SOURCES}
+elif [[ ${INSTALL_AIRFLOW_VERSION} == "none"  ]]; then
+    echo
+    echo "Skip installing airflow - only install wheel packages that are present locally"
+    echo
+    uninstall_airflow_and_providers
+elif [[ ${INSTALL_AIRFLOW_VERSION} == "wheel"  ]]; then
+    echo
+    echo "Install airflow from wheel package with [all] extras but uninstalling providers."
+    echo
+    uninstall_airflow_and_providers
+    install_airflow_from_wheel "[all]"
+    uninstall_providers
 else
-    # Installs released airflow version without adding more extras
-    install_released_airflow_version  "" "${INSTALL_AIRFLOW_VERSION}"
+    echo
+    echo "Install airflow from PyPI including [all] extras"
+    echo
+    install_released_airflow_version "${INSTALL_AIRFLOW_VERSION}" "[all]"
 fi
-
-if [[ ${INSTALL_WHEELS=} == "true" ]]; then
-  pip install /dist/*.whl || true
+if [[ ${INSTALL_PACKAGES_FROM_DIST=} == "true" ]]; then
+    echo
+    echo "Install all packages from dist folder"
+    if [[ ${INSTALL_AIRFLOW_VERSION} == "wheel" ]]; then
+        echo "(except apache-airflow)"
+    fi
+    echo
+    installable_files=()
+    for file in /dist/*.{whl,tar.gz}
+    do
+        if [[ ${INSTALL_AIRFLOW_VERSION} == "wheel" && ${file} == "apache?airflow-[0-9]"* ]]; then
+            # Skip Apache Airflow package - it's just been installed above with extras
+            echo "Skipping ${file}"
+            continue
+        fi
+        if [[ (${PACKAGE_FORMAT} == "wheel" || ${PACKAGE_FORMAT} == "both") && ${file} == *".whl" ]]; then
+            echo "Adding ${file} to install"
+            installable_files+=( "${file}" )
+        fi
+        if [[ (${PACKAGE_FORMAT} == "sdist" || ${PACKAGE_FORMAT} == "both") && ${file} == *".tar.gz" ]]; then
+            echo "Adding ${file} to install"
+            installable_files+=( "${file}" )
+        fi

Review comment:
       Installing from both formats doesn't really make sense -- that's going to install each provider twice. I don't really know what pip would do here (I guess install it once, then replace it/overwrite it with the other format)




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ashb commented on a change in pull request #12685: Production images on CI are now built from packages

Posted by GitBox <gi...@apache.org>.
ashb commented on a change in pull request #12685:
URL: https://github.com/apache/airflow/pull/12685#discussion_r537054053



##########
File path: Dockerfile.ci
##########
@@ -330,8 +330,8 @@ RUN if [[ ${INSTALL_AIRFLOW_VIA_PIP} == "true" ]]; then \
 # they are also installed additionally to whatever is installed from Airflow.
 COPY docker-context-files/ /docker-context-files/
 
-RUN if [[ ${AIRFLOW_LOCAL_PIP_WHEELS} != "true" ]]; then \
-        if ls /docker-context-files/*.whl 1> /dev/null 2>&1; then \
+RUN if [[ ${INSTALL_FROM_DOCKER_CONTEXT_FILES} != "true" ]]; then \
+        if ls /docker-context-files/*.{whl,tar.gz} 1> /dev/null 2>&1; then \
             pip install --no-deps /docker-context-files/*.whl; \

Review comment:
       ```suggestion
               pip install --no-deps /docker-context-files/*.{whl,tar.gz}; \
   ```
   
   You are listing for .tar.gz, but only installing .whl otherwise.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk edited a comment on pull request #12685: Production images on CI are now built from packages

Posted by GitBox <gi...@apache.org>.
potiuk edited a comment on pull request #12685:
URL: https://github.com/apache/airflow/pull/12685#issuecomment-736726882


   @ashb -> I think you wanted this to happen and it's high time to get the prod image for tests (next step - the Dockerhub one) built from packages,
   
   I reviewed the list of extras installed by default and added a few - especially those that did not require any special dependencies (like http/ftp) - they worked previously but they would not work after changing to packages (only selected provider packages get installed!) 
   
   This is the list I came up with after the review. I based it on how "niche" particular providers are (but this is purely my perception, so I might be wrong).
   
   Included: "async,amazon,celery,cncf.kubernetes,docker,dask,elasticsearch,ftp,grpc,hashicorp,http,google,microsoft.azure,mysql,postgres,redis,sendgrid,sftp,slack,ssh,statsd,virtualenv"
   
   The recommendation is that people build their own images, choose the extras they want, and add extra packages, they need (and our image fully supports customization). But there will be a group of people that will rely on our images for their own production usage, so I think it's the right time to standardize it. We certainly do not want to install all providers, but I do not think we have some clear guidelines of what should be in the reference image so I have to resort to the wisdom of crowd. Once we agree to some proposal here, I will send it to devlist to vote on it.
   
   Maybe better will be to list those which are left out.
   
   Excluded: 
   * all apache providers:  Cassandra, druid, hdfs, hive, kylin livy, pig, pinot, spark, sqoop
   * cloudant
   * databricks
   * datadog
   * dingding
   * discord
   * exasol
   * facebook
   * imap
   * jdbc
   * jenkins
   * jira
   * microsoft.mssql
   * microsoft.winrm
   * mongo
   * odbc
   * openfaas
   * oracle
   * pagerduty
   * papermill
   * plexus
   * presto
   * qubole
   * salesforce
   * samba
   * segment
   * singularity
   * snowflake
   * sqlite
   * vertica
   * yandex
   * zendesk
   
   @ashb, @kaxil, @turbaszek, @mik-laj. @XD-DENG, @feluelle, @eladkal, @ryw, @vikramkoka, @KevinYang21 (also others)  -> can you please take a look and comment if you think this list is "good" for reference image of ours? Any proposals to move providers between "excluded" and "included" ?
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ashb commented on a change in pull request #12685: Production images on CI are now built from packages

Posted by GitBox <gi...@apache.org>.
ashb commented on a change in pull request #12685:
URL: https://github.com/apache/airflow/pull/12685#discussion_r533774989



##########
File path: breeze
##########
@@ -2211,6 +2259,12 @@ function breeze::flag_local_file_mounting() {
 -l, --skip-mounting-local-sources
         Skips mounting local volume with sources - you get exactly what is in the
         docker image rather than your current local sources of Airflow.
+
+--skip-mounting-files

Review comment:
       This is not sources though - it's controlling if `/files` and `/dist` are mounted. They seems safe to do always




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk edited a comment on pull request #12685: Production images on CI are now built from packages

Posted by GitBox <gi...@apache.org>.
potiuk edited a comment on pull request #12685:
URL: https://github.com/apache/airflow/pull/12685#issuecomment-736748854


   > This looks fine to me, but I miss a list of these packages on [Docherhub](https://hub.docker.com/r/apache/airflow) or any other documentation. What do we have to do to add this?
   
   IT is there in the image file (but yes - we need to add them to Readme automatically via pre-commit so that it appears in DockerHub


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on a change in pull request #12685: Production images on CI are now built from packages

Posted by GitBox <gi...@apache.org>.
potiuk commented on a change in pull request #12685:
URL: https://github.com/apache/airflow/pull/12685#discussion_r533836451



##########
File path: breeze
##########
@@ -2211,6 +2259,12 @@ function breeze::flag_local_file_mounting() {
 -l, --skip-mounting-local-sources
         Skips mounting local volume with sources - you get exactly what is in the
         docker image rather than your current local sources of Airflow.
+
+--skip-mounting-files

Review comment:
       Removed.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] mik-laj commented on a change in pull request #12685: Production images on CI are now built from packages

Posted by GitBox <gi...@apache.org>.
mik-laj commented on a change in pull request #12685:
URL: https://github.com/apache/airflow/pull/12685#discussion_r533634334



##########
File path: IMAGES.rst
##########
@@ -134,6 +134,15 @@ HEAD of development for constraints):
     pip install "https://github.com/apache/airflow/archive/<tag>.tar.gz#egg=apache-airflow" \
       --constraint "https://raw.githubusercontent.com/apache/airflow/constraints-master/constraints-3.6.txt"
 
+You can also skip installing airflow by providing ``--install-airflow-version none``parameter to Breeze:

Review comment:
       ```suggestion
   You can also skip installing airflow by providing ``--install-airflow-version none`` parameter to Breeze:
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] github-actions[bot] commented on pull request #12685: Production images on CI are now built from packages

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #12685:
URL: https://github.com/apache/airflow/pull/12685#issuecomment-735410867


   [The Workflow run](https://github.com/apache/airflow/actions/runs/390053681) is cancelling this PR. It has some failed jobs matching ^Pylint$,^Static checks,^Build docs$,^Spell check docs$,^Backport packages$,^Provider packages,^Checks: Helm tests$,^Test OpenAPI*.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on a change in pull request #12685: Production images on CI are now built from packages

Posted by GitBox <gi...@apache.org>.
potiuk commented on a change in pull request #12685:
URL: https://github.com/apache/airflow/pull/12685#discussion_r533773125



##########
File path: scripts/in_container/entrypoint_ci.sh
##########
@@ -98,13 +101,51 @@ if [[ -z ${INSTALL_AIRFLOW_VERSION=} ]]; then
     mkdir -p "${AIRFLOW_SOURCES}"/logs/
     mkdir -p "${AIRFLOW_SOURCES}"/tmp/
     export PYTHONPATH=${AIRFLOW_SOURCES}
+elif [[ ${INSTALL_AIRFLOW_VERSION} == "none"  ]]; then
+    echo
+    echo "Skip installing airflow - only install wheel packages that are present locally"
+    echo
+    uninstall_airflow_and_providers
+elif [[ ${INSTALL_AIRFLOW_VERSION} == "wheel"  ]]; then
+    echo
+    echo "Install airflow from wheel package with [all] extras but uninstalling providers."
+    echo
+    uninstall_airflow_and_providers
+    install_airflow_from_wheel "[all]"
+    uninstall_providers
 else
-    # Installs released airflow version without adding more extras
-    install_released_airflow_version  "" "${INSTALL_AIRFLOW_VERSION}"
+    echo
+    echo "Install airflow from PyPI including [all] extras"
+    echo
+    install_released_airflow_version "${INSTALL_AIRFLOW_VERSION}" "[all]"
 fi
-
-if [[ ${INSTALL_WHEELS=} == "true" ]]; then
-  pip install /dist/*.whl || true
+if [[ ${INSTALL_PACKAGES_FROM_DIST=} == "true" ]]; then
+    echo
+    echo "Install all packages from dist folder"
+    if [[ ${INSTALL_AIRFLOW_VERSION} == "wheel" ]]; then
+        echo "(except apache-airflow)"
+    fi

Review comment:
       BTW. I think we'd loose 4/5 times more than from optimising the extra step in build workflow. to be honest if we install .sdist. Do you think it's worth it ?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] github-actions[bot] commented on pull request #12685: Production images on CI are now built from packages

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #12685:
URL: https://github.com/apache/airflow/pull/12685#issuecomment-735226759


   [The Workflow run](https://github.com/apache/airflow/actions/runs/388616531) is cancelling this PR. It has some failed jobs matching ^Pylint$,^Static checks,^Build docs$,^Spell check docs$,^Backport packages$,^Provider packages,^Checks: Helm tests$,^Test OpenAPI*.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk merged pull request #12685: Production images on CI are now built from packages

Posted by GitBox <gi...@apache.org>.
potiuk merged pull request #12685:
URL: https://github.com/apache/airflow/pull/12685


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ashb commented on a change in pull request #12685: Production images on CI are now built from packages

Posted by GitBox <gi...@apache.org>.
ashb commented on a change in pull request #12685:
URL: https://github.com/apache/airflow/pull/12685#discussion_r533737690



##########
File path: scripts/in_container/entrypoint_ci.sh
##########
@@ -98,13 +101,51 @@ if [[ -z ${INSTALL_AIRFLOW_VERSION=} ]]; then
     mkdir -p "${AIRFLOW_SOURCES}"/logs/
     mkdir -p "${AIRFLOW_SOURCES}"/tmp/
     export PYTHONPATH=${AIRFLOW_SOURCES}
+elif [[ ${INSTALL_AIRFLOW_VERSION} == "none"  ]]; then
+    echo
+    echo "Skip installing airflow - only install wheel packages that are present locally"
+    echo
+    uninstall_airflow_and_providers
+elif [[ ${INSTALL_AIRFLOW_VERSION} == "wheel"  ]]; then
+    echo
+    echo "Install airflow from wheel package with [all] extras but uninstalling providers."
+    echo
+    uninstall_airflow_and_providers
+    install_airflow_from_wheel "[all]"
+    uninstall_providers
 else
-    # Installs released airflow version without adding more extras
-    install_released_airflow_version  "" "${INSTALL_AIRFLOW_VERSION}"
+    echo
+    echo "Install airflow from PyPI including [all] extras"
+    echo
+    install_released_airflow_version "${INSTALL_AIRFLOW_VERSION}" "[all]"
 fi
-
-if [[ ${INSTALL_WHEELS=} == "true" ]]; then
-  pip install /dist/*.whl || true
+if [[ ${INSTALL_PACKAGES_FROM_DIST=} == "true" ]]; then
+    echo
+    echo "Install all packages from dist folder"
+    if [[ ${INSTALL_AIRFLOW_VERSION} == "wheel" ]]; then
+        echo "(except apache-airflow)"
+    fi

Review comment:
       Why don't we install airflow from the sdist?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] github-actions[bot] commented on pull request #12685: Production images on CI are now built from packages

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #12685:
URL: https://github.com/apache/airflow/pull/12685#issuecomment-735370306


   [The Workflow run](https://github.com/apache/airflow/actions/runs/389762297) is cancelling this PR. It has some failed jobs matching ^Pylint$,^Static checks,^Build docs$,^Spell check docs$,^Backport packages$,^Provider packages,^Checks: Helm tests$,^Test OpenAPI*.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] mik-laj commented on pull request #12685: Production images on CI are now built from packages

Posted by GitBox <gi...@apache.org>.
mik-laj commented on pull request #12685:
URL: https://github.com/apache/airflow/pull/12685#issuecomment-736747724


   >  can you please take a look and comment if you think this list is "good" for reference image of ours? Any proposals to move providers between "excluded" and "included" ?
   
   This looks fine to me, but I miss a list of these packages on [Docherhub](https://hub.docker.com/r/apache/airflow) or any other documentation. What do we have to do to add this?
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on a change in pull request #12685: Production images on CI are now built from packages

Posted by GitBox <gi...@apache.org>.
potiuk commented on a change in pull request #12685:
URL: https://github.com/apache/airflow/pull/12685#discussion_r537063669



##########
File path: breeze
##########
@@ -1473,6 +1477,10 @@ function breeze::prepare_formatted_versions() {
         tr ',' ' ' | fold -w "${indented_screen_width}" -s | sed "s/ /,/g; s/^/${list_prefix}/")
     readonly FORMATTED_TEST_TYPES
 
+    FORMATTED_PACKAGE_FORMATS=$(echo "${_breeze_allowed_package_formats=""}" |
+        tr ',' ' ' | fold -w "${indented_screen_width}" -s | sed "s/ /,/g; s/^/${list_prefix}/")

Review comment:
       good point. I copied it from the line above (where it is also not needed and was copied from DEFAULT_PROD_EXTRAS. Removed for both.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on a change in pull request #12685: Production images on CI are now built from packages

Posted by GitBox <gi...@apache.org>.
potiuk commented on a change in pull request #12685:
URL: https://github.com/apache/airflow/pull/12685#discussion_r535359047



##########
File path: scripts/in_container/entrypoint_ci.sh
##########
@@ -98,13 +101,51 @@ if [[ -z ${INSTALL_AIRFLOW_VERSION=} ]]; then
     mkdir -p "${AIRFLOW_SOURCES}"/logs/
     mkdir -p "${AIRFLOW_SOURCES}"/tmp/
     export PYTHONPATH=${AIRFLOW_SOURCES}
+elif [[ ${INSTALL_AIRFLOW_VERSION} == "none"  ]]; then
+    echo
+    echo "Skip installing airflow - only install wheel packages that are present locally"
+    echo
+    uninstall_airflow_and_providers
+elif [[ ${INSTALL_AIRFLOW_VERSION} == "wheel"  ]]; then
+    echo
+    echo "Install airflow from wheel package with [all] extras but uninstalling providers."
+    echo
+    uninstall_airflow_and_providers
+    install_airflow_from_wheel "[all]"
+    uninstall_providers
 else
-    # Installs released airflow version without adding more extras
-    install_released_airflow_version  "" "${INSTALL_AIRFLOW_VERSION}"
+    echo
+    echo "Install airflow from PyPI including [all] extras"
+    echo
+    install_released_airflow_version "${INSTALL_AIRFLOW_VERSION}" "[all]"
 fi
-
-if [[ ${INSTALL_WHEELS=} == "true" ]]; then
-  pip install /dist/*.whl || true
+if [[ ${INSTALL_PACKAGES_FROM_DIST=} == "true" ]]; then
+    echo
+    echo "Install all packages from dist folder"
+    if [[ ${INSTALL_AIRFLOW_VERSION} == "wheel" ]]; then
+        echo "(except apache-airflow)"
+    fi

Review comment:
       Is it good to go @ashb as is ? 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on a change in pull request #12685: Production images on CI are now built from packages

Posted by GitBox <gi...@apache.org>.
potiuk commented on a change in pull request #12685:
URL: https://github.com/apache/airflow/pull/12685#discussion_r537063213



##########
File path: LOCAL_VIRTUALENV.rst
##########
@@ -163,6 +163,31 @@ Activate your virtualenv, e.g. by using ``workon``, and once you are in it, run:
     cd airflow/www
     yarn build
 
+Developing Providers
+--------------------
+
+In Airflow 2.0 we introduced split of Apache Airflow into separate packages - there is one main
+apache-airflow package with core of Airflow and 70+ packages for all providers (external services
+and software Airflow can communicate with).
+
+Developing providers is part of Airflow development, but when you install airflow as editable in your local
+development environment, the corresponding provider packages will be also installed from PyPI. However, the
+providers will also be present in your "airflow/providers" folder. This might lead to confusion,
+which sources of providers are imported during development. It will depend on your
+environment's PYTHONPATH setting in general.
+
+In order to avoid the confusion, you can set ``INSTALL_PROVIDERS_FROM_SOURCES`` environment to ``true``
+before running ``pip install`` command:
+
+.. code-block:: bash
+
+  INSTALL_PROVIDERS_FROM_SOURCES="true" pip install -U -e ".[devel,<OTHER EXTRAS>]" \

Review comment:
       It is even described in the paragraph above actually.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on pull request #12685: Production images on CI are now built from packages

Posted by GitBox <gi...@apache.org>.
potiuk commented on pull request #12685:
URL: https://github.com/apache/airflow/pull/12685#issuecomment-735221049


   @ashb -> Since you were asking for it. The follow -up to this one will be to build the "DockerHub" images, but that will be a separate PR.
   
   Those will be built differently depending on the image:
   
   * The  DockerHub "master" images will be built in the same way as this PR (whl packages 'pip downloaded' for external dependencies and .whl packages built from current master sources
   
   * The "official" (non-master) images wil be built from the released PyPI packages rather than wheels,


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk edited a comment on pull request #12685: Production images on CI are now built from packages

Posted by GitBox <gi...@apache.org>.
potiuk edited a comment on pull request #12685:
URL: https://github.com/apache/airflow/pull/12685#issuecomment-736726882


   @ashb -> I think you wanted this to happen and it's high time to get the prod image for tests (next step - the Dockerhub one) built from packages,
   
   I reviewed the list of extras installed by default and added a few - especially those that did not require any special dependencies (like http/ftp) - they worked previously but they would not work after changing to packages (only selected provider packages get installed!) 
   
   This is the list I came up with after the review. I based it on how "niche" particular providers are (but this is purely my perception, so I might be wrong).
   
   Included: "async,amazon,celery,cncf.kubernetes,docker,dask,elasticsearch,ftp,grpc,hashicorp,http,google,microsoft.azure,mysql,postgres,redis,sendgrid,sftp,slack,ssh,statsd,virtualenv"
   
   The recommendation is that people build their own images, choose the extras they want, and add extra packages, they need (and our image fully supports customization). But there will be a group of people that will rely on our images for their own production usage, so I think it's the right time to standardize it. We certainly do not want to install all providers, but I do not think we have some clear guidelines of what should be in the reference image so I have to resort to the wisdom of crowd. Once we agree to some proposal here, I will send it to devlist to vote on it.
   
   Maybe better will be to list those which are left out.
   
   Excluded: 
   * all apache providers:  Cassandra, druid, hdfs, hive, kylin livy, pig, pinot, spark, sqoop
   * cloudant
   * databricks
   * datadog
   * dingding
   * discord
   * exasol
   * facebook
   * imap
   * jdbc
   * jenkins
   * jira
   * microsoft.mssql
   * microsoft.winrm
   * mongo
   * odbc
   * openfaas
   * oracle
   * pagerduty
   * papermill
   * plexus
   * presto
   * qubole
   * salesforce
   * samba
   * segment
   * singularity
   * snowflake
   * sqlite
   * vertica
   * yandex
   * zendesk
   
   @ashb, @dimberman @kaxil, @turbaszek, @mik-laj. @XD-DENG, @feluelle, @eladkal, @ryw, @vikramkoka, @KevinYang21 (also others)  -> can you please take a look and comment if you think this list is "good" for reference image of ours? Any proposals to move providers between "excluded" and "included" ?
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on pull request #12685: Production images on CI are now built from packages

Posted by GitBox <gi...@apache.org>.
potiuk commented on pull request #12685:
URL: https://github.com/apache/airflow/pull/12685#issuecomment-735972648


   Hey everyone -  this looks like is going to be green now. It changes the production image to be build using packages rather than directly Airflow Sources. 
   
   This image is now built and verified in the CI and follow up change after this one is merged is to also build image from locally built packages in 2.0 for DockerHub builds. Those images are automatically verified (and #12718 add further checks including nicer way of `pip check`). 
   
   The checks include provider's discovery. 
   
   I also described how to use breeze to build and install provider packages very easily in development environment.
   It takes three commands:
   
   * ` ./breeze prepare-provider-packages [PACKAGE ...]`
   * `./breeze prepare-airflow-packages`
   * `./breeze --install-airflow-version wheel --install-packages-from-dist --skip-mounting-local-sources`
   
   
   Looking forward to comments.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on a change in pull request #12685: Production images on CI are now built from packages

Posted by GitBox <gi...@apache.org>.
potiuk commented on a change in pull request #12685:
URL: https://github.com/apache/airflow/pull/12685#discussion_r537062816



##########
File path: Dockerfile.ci
##########
@@ -330,8 +330,8 @@ RUN if [[ ${INSTALL_AIRFLOW_VIA_PIP} == "true" ]]; then \
 # they are also installed additionally to whatever is installed from Airflow.
 COPY docker-context-files/ /docker-context-files/
 
-RUN if [[ ${AIRFLOW_LOCAL_PIP_WHEELS} != "true" ]]; then \
-        if ls /docker-context-files/*.whl 1> /dev/null 2>&1; then \
+RUN if [[ ${INSTALL_FROM_DOCKER_CONTEXT_FILES} != "true" ]]; then \
+        if ls /docker-context-files/*.{whl,tar.gz} 1> /dev/null 2>&1; then \
             pip install --no-deps /docker-context-files/*.whl; \

Review comment:
       Very good point.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ashb commented on a change in pull request #12685: Production images on CI are now built from packages

Posted by GitBox <gi...@apache.org>.
ashb commented on a change in pull request #12685:
URL: https://github.com/apache/airflow/pull/12685#discussion_r533733027



##########
File path: .dockerignore
##########
@@ -60,8 +60,9 @@
 !.github
 !empty
 
-# This folder is for you if you want to add any files to the docker context when you build your own
+# This folder is for you if you want to add any packaages to the docker context when you build your own

Review comment:
       ```suggestion
   # This folder is for you if you want to add any packages to the docker context when you build your own
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] github-actions[bot] commented on pull request #12685: Production images on CI are now built from packages

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #12685:
URL: https://github.com/apache/airflow/pull/12685#issuecomment-735498842


   [The Workflow run](https://github.com/apache/airflow/actions/runs/390666494) is cancelling this PR. It has some failed jobs matching ^Pylint$,^Static checks,^Build docs$,^Spell check docs$,^Backport packages$,^Provider packages,^Checks: Helm tests$,^Test OpenAPI*.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] github-actions[bot] commented on pull request #12685: Production images on CI are now built from packages

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #12685:
URL: https://github.com/apache/airflow/pull/12685#issuecomment-735370370


   [The Workflow run](https://github.com/apache/airflow/actions/runs/389762806) is cancelling this PR. It has some failed jobs matching ^Pylint$,^Static checks,^Build docs$,^Spell check docs$,^Backport packages$,^Provider packages,^Checks: Helm tests$,^Test OpenAPI*.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] github-actions[bot] commented on pull request #12685: Production images on CI are now built from packages

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #12685:
URL: https://github.com/apache/airflow/pull/12685#issuecomment-735509332


   [The Workflow run](https://github.com/apache/airflow/actions/runs/390719145) is cancelling this PR. It has some failed jobs matching ^Pylint$,^Static checks,^Build docs$,^Spell check docs$,^Backport packages$,^Provider packages,^Checks: Helm tests$,^Test OpenAPI*.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] github-actions[bot] commented on pull request #12685: Production images on CI are now built from packages

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #12685:
URL: https://github.com/apache/airflow/pull/12685#issuecomment-739574421


   [The Workflow run](https://github.com/apache/airflow/actions/runs/404566848) is cancelling this PR. Building images for the PR has failed. Follow the the workflow link to check the reason.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on pull request #12685: Production images on CI are now built from packages

Posted by GitBox <gi...@apache.org>.
potiuk commented on pull request #12685:
URL: https://github.com/apache/airflow/pull/12685#issuecomment-736726882


   @ashb -> I think you wanted this to happen and it's high time to get the prod image for tests (next step - the Dockerhub one) built from packages,
   
   I reviewed the list of extras installed by default and added a few - especially those that did not require any special dependencies (like http/ftp) - they worked previously but they would not work after changing to packages (only selected provider packages get installed!) 
   
   This is the list I came up with after the review. I based it on how "niche" particular providers are (but this is purely my perception, so I might be wrong).
   
   Included: "async,amazon,celery,cncf.kubernetes,docker,dask,elasticsearch,ftp,grpc,hashicorp,http,google,microsoft.azure,mysql,postgres,redis,sendgrid,sftp,slack,ssh,statsd,virtualenv"
   
   The recommendation is that people build their own images, choose the extras they want, and add extra packages, they need (and our image fully supports customization). But there will be a group of people that will rely on those images for their own production usage, so I think it's the right time to standardize it. We certainly do not want to install all providers, but I do not think we have some clear guidelines of what should be in the reference image so I have to resort to the wisdom of crowd. Once we agree to some proposal here, I will send it to devlist to vote on it.
   
   Maybe better will be to list those which are left:
   Excluded: 
   * all apache providers:  Cassandra, druid, hdfs, hive, kylin livy, pig, pinot, spark, sqoop
   * cloudant
   * databricks
   * datadog
   * dingding
   * discord
   * exasol
   * facebook
   * imap
   * jdbc
   * jenkins
   * jira
   * microsoft.mssql
   * microsoft.winrm
   * mongo
   * odbc
   * openfaas
   * oracle
   * pagerduty
   * papermill
   * plexus
   * presto
   * qubole
   * salesforce
   * samba
   * segment
   * singularity
   * snowflake
   * sqlite
   * vertica
   * yandex
   * zendesk
   
   @ashb, @kaxil, @turbaszek, @mik-laj. @XD-DENG, @feluelle, @eladkal, @ryw, @vikramkoka, @KevinYang21 (also others)  -> can you please take a look and comment if you think this list is "good" for reference image of ours? Any proposals to move providers between "excluded" and "included" ?
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on a change in pull request #12685: Production images on CI are now built from packages

Posted by GitBox <gi...@apache.org>.
potiuk commented on a change in pull request #12685:
URL: https://github.com/apache/airflow/pull/12685#discussion_r533837169



##########
File path: scripts/in_container/entrypoint_ci.sh
##########
@@ -98,13 +101,51 @@ if [[ -z ${INSTALL_AIRFLOW_VERSION=} ]]; then
     mkdir -p "${AIRFLOW_SOURCES}"/logs/
     mkdir -p "${AIRFLOW_SOURCES}"/tmp/
     export PYTHONPATH=${AIRFLOW_SOURCES}
+elif [[ ${INSTALL_AIRFLOW_VERSION} == "none"  ]]; then
+    echo
+    echo "Skip installing airflow - only install wheel packages that are present locally"
+    echo
+    uninstall_airflow_and_providers
+elif [[ ${INSTALL_AIRFLOW_VERSION} == "wheel"  ]]; then
+    echo
+    echo "Install airflow from wheel package with [all] extras but uninstalling providers."
+    echo
+    uninstall_airflow_and_providers
+    install_airflow_from_wheel "[all]"
+    uninstall_providers
 else
-    # Installs released airflow version without adding more extras
-    install_released_airflow_version  "" "${INSTALL_AIRFLOW_VERSION}"
+    echo
+    echo "Install airflow from PyPI including [all] extras"
+    echo
+    install_released_airflow_version "${INSTALL_AIRFLOW_VERSION}" "[all]"
 fi
-
-if [[ ${INSTALL_WHEELS=} == "true" ]]; then
-  pip install /dist/*.whl || true
+if [[ ${INSTALL_PACKAGES_FROM_DIST=} == "true" ]]; then
+    echo
+    echo "Install all packages from dist folder"
+    if [[ ${INSTALL_AIRFLOW_VERSION} == "wheel" ]]; then
+        echo "(except apache-airflow)"
+    fi

Review comment:
       I looked at it - and the way I implemented it is rather easy to adjust - I could run preparing packages and build images jobs in matrix strategy witth PACKAGE_FORMAT ["wheels", "sdist" ] - I could get it done rather quickly, but I am still not sure if it gives us anything more than installing .whl - > if you can think of a reason why this might be useful, I am happy to add it.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on a change in pull request #12685: Production images on CI are now built from packages

Posted by GitBox <gi...@apache.org>.
potiuk commented on a change in pull request #12685:
URL: https://github.com/apache/airflow/pull/12685#discussion_r533772296



##########
File path: .github/workflows/build-images-workflow-run.yml
##########
@@ -336,7 +336,8 @@ jobs:
         if: steps.defaults.outputs.proceed == 'true'
       - name: "Build CI images ${{ matrix.python-version }}:${{ github.event.workflow_run.id }}"
         run: ./scripts/ci/images/ci_prepare_ci_image_on_ci.sh
-        if: matrix.image-type == 'CI' && steps.defaults.outputs.proceed == 'true'
+        # locally built CI image is needed to prepare packages for PROD image build

Review comment:
       4 minutes for prod image builds only. For most builds it is 4 minutes for 1 job because we only prepare default image. We need it anyway, because we need consistent environment in which we need to prepare all packages. We could potentially optimise in the future and make a parallel environment for that (we would have to see, fi this is  faster or slower). We can also wait with starting production images  until CI image is ready and pull it - which will be 1 minute overhead,  For now this is the best way if we want to make it before 2.0. There are a few ways we can optimise it. But i want to have ti in place before 2.0 is out. 
   
   That was a deliberate decision.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] github-actions[bot] commented on pull request #12685: Production images on CI are now built from packages

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #12685:
URL: https://github.com/apache/airflow/pull/12685#issuecomment-735226772


   [The Workflow run](https://github.com/apache/airflow/actions/runs/388616971) is cancelling this PR. It has some failed jobs matching ^Pylint$,^Static checks,^Build docs$,^Spell check docs$,^Backport packages$,^Provider packages,^Checks: Helm tests$,^Test OpenAPI*.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ashb commented on a change in pull request #12685: Production images on CI are now built from packages

Posted by GitBox <gi...@apache.org>.
ashb commented on a change in pull request #12685:
URL: https://github.com/apache/airflow/pull/12685#discussion_r533735419



##########
File path: breeze
##########
@@ -2211,6 +2259,12 @@ function breeze::flag_local_file_mounting() {
 -l, --skip-mounting-local-sources
         Skips mounting local volume with sources - you get exactly what is in the
         docker image rather than your current local sources of Airflow.
+
+--skip-mounting-files

Review comment:
       Why do we need an option to do this? It seems harmless to just always mount this, and the less options/configuration in breeze the better -- it is already hard to follow everything.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org