You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2020/11/07 11:00:56 UTC

[GitHub] [airflow] potiuk opened a new pull request #12154: Provider packages are installed by default in production image

potiuk opened a new pull request #12154:
URL: https://github.com/apache/airflow/pull/12154


   This is a fix to a problem introduced in #10806. The change
   turned provider packages into namespace packages - which made
   them ignored by find_packages function from setup tools - thus
   prodiuction image build automatically and used by Kubernetes
   tests did not have the provider packages installed.
   
   This PR fixes it and adds future protection during CI tests of
   production image to make sure that provider packages are
   actually installed.
   
   Fixes #12150
   
   <!--
   Thank you for contributing! Please make sure that your code changes
   are covered with tests. And in case of new features or big changes
   remember to adjust the documentation.
   
   Feel free to ping committers for the review!
   
   In case of existing issue, reference it using one of the following:
   
   closes: #ISSUE
   related: #ISSUE
   
   How to write a good git commit message:
   http://chris.beams.io/posts/git-commit/
   -->
   
   ---
   **^ Add meaningful description above**
   
   Read the **[Pull Request Guidelines](https://github.com/apache/airflow/blob/master/CONTRIBUTING.rst#pull-request-guidelines)** for more information.
   In case of fundamental code change, Airflow Improvement Proposal ([AIP](https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+Improvements+Proposals)) is needed.
   In case of a new dependency, check compliance with the [ASF 3rd Party License Policy](https://www.apache.org/legal/resolved.html#category-x).
   In case of backwards incompatible changes please leave a note in [UPDATING.md](https://github.com/apache/airflow/blob/master/UPDATING.md).
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on a change in pull request #12154: Provider packages are installed by default in production image

Posted by GitBox <gi...@apache.org>.
potiuk commented on a change in pull request #12154:
URL: https://github.com/apache/airflow/pull/12154#discussion_r519329413



##########
File path: scripts/ci/images/ci_prepare_prod_image_on_ci.sh
##########
@@ -48,4 +48,61 @@ function build_prod_images_on_ci() {
 }
 
 
+
+function verify_prod_image_has_airflow_and_providers {

Review comment:
       I can do better than that. I already have script that imports all of the providers (I am using that in providers package/backport providers package preparation and it is perfectly suited for that case. it is even very well prepared to handle this case.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] mik-laj commented on a change in pull request #12154: Provider packages are installed by default in production image

Posted by GitBox <gi...@apache.org>.
mik-laj commented on a change in pull request #12154:
URL: https://github.com/apache/airflow/pull/12154#discussion_r519514151



##########
File path: scripts/ci/images/ci_prepare_prod_image_on_ci.sh
##########
@@ -48,4 +48,61 @@ function build_prod_images_on_ci() {
 }
 
 
+
+function verify_prod_image_has_airflow_and_providers {

Review comment:
       SGTM.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] mik-laj commented on a change in pull request #12154: Provider packages are installed by default in production image

Posted by GitBox <gi...@apache.org>.
mik-laj commented on a change in pull request #12154:
URL: https://github.com/apache/airflow/pull/12154#discussion_r519192423



##########
File path: scripts/ci/images/ci_prepare_prod_image_on_ci.sh
##########
@@ -48,4 +48,61 @@ function build_prod_images_on_ci() {
 }
 
 
+
+function verify_prod_image_has_airflow_and_providers {

Review comment:
       Can you add a test that will check if at least one provider can be loaded? This will give us the greatest confidence that everything is working.
   ```
   python -c "from airflow.providers.google.common.hooks.base_google import GoogleBaseHook"
   ```
   We may have these files in the image, but for various reasons they may not be loaded by Python, e.g. due to invalid permissions.
   
   




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on pull request #12154: Provider packages are installed by default in production image

Posted by GitBox <gi...@apache.org>.
potiuk commented on pull request #12154:
URL: https://github.com/apache/airflow/pull/12154#issuecomment-723951625


   Should be fixed now. As discussed - later when we will release provider packages we will switch to the mode of building the package using wheels. But I do not think we need to do it now, I think we need to release PyPI packages first and then we will synchronize the process. I yet have to see how we are going to build those. The Docker Image has all the features needed to build the "master" image differently - using packages.
   
   However requiring going through the hoops of building all packages and installing them locally for local development and testing is not a good idea, so building from sources including all providers should remain to help developers who are actually working on improving the prod image.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk merged pull request #12154: Provider packages are installed by default in production image

Posted by GitBox <gi...@apache.org>.
potiuk merged pull request #12154:
URL: https://github.com/apache/airflow/pull/12154


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk edited a comment on pull request #12154: Provider packages are installed by default in production image

Posted by GitBox <gi...@apache.org>.
potiuk edited a comment on pull request #12154:
URL: https://github.com/apache/airflow/pull/12154#issuecomment-723951625


   Should be fixed now. As discussed - later when we will release provider packages we will switch to the mode of building the package using wheels. But I do not think we need to do it now, I think we need to release PyPI packages first and then we will synchronize the process. I yet have to see how we are going to build those. The Docker File and scripts have all the features needed to build the "master" image differently - using packages.
   
   However requiring going through the hoops of building all packages and installing them locally for local development and testing is not a good idea, so building from sources including all providers should remain to help developers who are actually working on improving the prod image.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on a change in pull request #12154: Provider packages are installed by default in production image

Posted by GitBox <gi...@apache.org>.
potiuk commented on a change in pull request #12154:
URL: https://github.com/apache/airflow/pull/12154#discussion_r519337904



##########
File path: scripts/ci/images/ci_prepare_prod_image_on_ci.sh
##########
@@ -48,4 +48,61 @@ function build_prod_images_on_ci() {
 }
 
 
+
+function verify_prod_image_has_airflow_and_providers {

Review comment:
       However it needs #12082 . So I'd rather merge it now and add the test in #12082 @mik-laj if that's ok?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on pull request #12154: Provider packages are installed by default in production image

Posted by GitBox <gi...@apache.org>.
potiuk commented on pull request #12154:
URL: https://github.com/apache/airflow/pull/12154#issuecomment-723871757


   > There was a reason I _didn't_ do this. By having this be find_namespace_packages it will _always_ include all the providers in the airflow dist. I.e. it reverses the split out.
   
   Yes. Correct. it should only happen when "install_providers_from_sources" is set. that is the right fix 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] kaxil commented on a change in pull request #12154: Provider packages are installed by default in production image

Posted by GitBox <gi...@apache.org>.
kaxil commented on a change in pull request #12154:
URL: https://github.com/apache/airflow/pull/12154#discussion_r519768341



##########
File path: scripts/ci/images/ci_prepare_prod_image_on_ci.sh
##########
@@ -48,4 +48,61 @@ function build_prod_images_on_ci() {
 }
 
 
+
+function verify_prod_image_has_airflow_and_providers {
+    echo
+    echo "Airflow folders installed in the image:"
+    echo
+
+    AIRFLOW_INSTALLATION_LOCATION="/home/airflow/.local"
+
+    docker run --rm --entrypoint /bin/bash "${AIRFLOW_PROD_IMAGE}" -c \
+        'find '"${AIRFLOW_INSTALLATION_LOCATION}"'/lib/python*/site-packages/airflow/ -type d'
+
+    EXPECTED_MIN_AIRFLOW_DIRS_COUNT="60"
+    readonly EXPECTED_MIN_AIRFLOW_DIRS_COUNT
+
+    COUNT_AIRFLOW_DIRS=$(docker run --rm --entrypoint /bin/bash "${AIRFLOW_PROD_IMAGE}" -c \
+         'find '"${AIRFLOW_INSTALLATION_LOCATION}"'/lib/python*/site-packages/airflow/ -type d | grep -c -v "/airflow/providers"')
+
+    echo
+    echo "Number of airflow dirs: ${COUNT_AIRFLOW_DIRS}"
+    echo
+
+    if [[ "${COUNT_AIRFLOW_DIRS}" -lt "${EXPECTED_MIN_AIRFLOW_DIRS_COUNT}" ]]; then
+        >&2 echo
+        >&2 echo Number of airflow folders installed is less than ${EXPECTED_MIN_AIRFLOW_DIRS_COUNT}
+        >&2 echo This is unexpected. Please investigate, looking at the output above!

Review comment:
       Can we add some more information on what needs to be investigated, i.e. something like "Providers might not have been installed" etc




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org