You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2022/04/20 01:30:40 UTC

[GitHub] [airflow] potiuk opened a new pull request, #23104: Use new Breese for building, pulling and verifying the images.

potiuk opened a new pull request, #23104:
URL: https://github.com/apache/airflow/pull/23104

   We have the new Python-based breeze and we want to replace all
   the functionality used from the old Breeze with the new one.
   
   The most important is image management. It is used in many
   places and this PR replaces all the places and removes image
   builidng, pushing, pulling to use the new Breeze everywhere.
   
   That includes:
   
   * Building and pushing CI image on CI
   * Building and pushing PROD image on CI which includes:
     * building Airflow packages
     * building Provider packages
     * waiting for images in parallel (both CI and and PROD)
     * veifying the images (both CI and PROD)
   
   All those commands have been moved to breeze.py and that required
   some modification in the Breeze parameter handling - mainly related
   to adding spaces when help was displayed because long list of
   packages looked very bad in help output.
   
   It's been easier to implement it in one big PR as using
   image building and pulling was deeply embedded in many scripts.
   
   With this change all the scripts in CI use directly breeze
   and combine using command line parameters with evn variables
   directly in the job that execute the breeze commands which
   makes it much easier to understand what is going on and
   repeat it locally using Breeze.
   
   Fixes: #22825
   Fixes: #23077
   Fixes: #23076
   Fixes: #22829
   Fixes: #22828
   Fixes: #22826
   Fixes: #20961
   Fixes: #23102
   
   <!--
   Thank you for contributing! Please make sure that your code changes
   are covered with tests. And in case of new features or big changes
   remember to adjust the documentation.
   
   Feel free to ping committers for the review!
   
   In case of existing issue, reference it using one of the following:
   
   closes: #ISSUE
   related: #ISSUE
   
   How to write a good git commit message:
   http://chris.beams.io/posts/git-commit/
   -->
   
   ---
   **^ Add meaningful description above**
   
   Read the **[Pull Request Guidelines](https://github.com/apache/airflow/blob/main/CONTRIBUTING.rst#pull-request-guidelines)** for more information.
   In case of fundamental code change, Airflow Improvement Proposal ([AIP](https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+Improvements+Proposals)) is needed.
   In case of a new dependency, check compliance with the [ASF 3rd Party License Policy](https://www.apache.org/legal/resolved.html#category-x).
   In case of backwards incompatible changes please leave a note in [UPDATING.md](https://github.com/apache/airflow/blob/main/UPDATING.md).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] github-actions[bot] commented on pull request #23104: Use new Breeze for building, pulling and verifying the images.

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on PR #23104:
URL: https://github.com/apache/airflow/pull/23104#issuecomment-1104334261

   The PR most likely needs to run full matrix of tests because it modifies parts of the core of Airflow. However, committers might decide to merge it quickly and take the risk. If they don't merge it quickly - please rebase it to the latest main at your convenience, or amend the last commit of the PR, and push it with --force-with-lease.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] potiuk commented on pull request #23104: Use new Breeze for building, pulling and verifying the images.

Posted by GitBox <gi...@apache.org>.
potiuk commented on PR #23104:
URL: https://github.com/apache/airflow/pull/23104#issuecomment-1104273051

   cc: @Bowrna @edithturn -> getting much closer to replace all of it with the new Breeze :)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] potiuk commented on pull request #23104: Use new Breeze for building, pulling and verifying the images.

Posted by GitBox <gi...@apache.org>.
potiuk commented on PR #23104:
URL: https://github.com/apache/airflow/pull/23104#issuecomment-1107406260

   All right. All green. Merging.
   
   I tested:
   
   * PR coming from fork here
   * main build here: https://github.com/potiuk/airflow/actions/runs/2210724864
   * PR coming from airflow here: https://github.com/apache/airflow/pull/23150
   
   That should cover all scenarios - there is slight chance of problems with cache refreshing but we can handle it if we see a problem.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] potiuk merged pull request #23104: Use new Breeze for building, pulling and verifying the images.

Posted by GitBox <gi...@apache.org>.
potiuk merged PR #23104:
URL: https://github.com/apache/airflow/pull/23104


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] potiuk commented on a diff in pull request #23104: Use new Breeze for building, pulling and verifying the images.

Posted by GitBox <gi...@apache.org>.
potiuk commented on code in PR #23104:
URL: https://github.com/apache/airflow/pull/23104#discussion_r854433222


##########
tests/system/providers/google/bigquery/example_bigquery_sensors.py:
##########
@@ -23,6 +23,7 @@
 from datetime import datetime
 
 from airflow import models
+from airflow.models.baseoperator import BaseOperator

Review Comment:
   this came as part of the semi-rndom "mypy" failures. Mypy detected slightly different errors when there was a different set of files - so I had some 20 more mypy failures with this change. 
   
    I thought it has been merged separately but I will definitely check it (I will myself do a thorough review of it before I merge :). 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] potiuk commented on pull request #23104: Use new Breeze for building, pulling and verifying the images.

Posted by GitBox <gi...@apache.org>.
potiuk commented on PR #23104:
URL: https://github.com/apache/airflow/pull/23104#issuecomment-1104267648

   This is the currernt set of commands supported now: 
   
   ![image](https://user-images.githubusercontent.com/595491/164295871-ff7e1718-b0c2-446c-8ecb-eab5027f8a57.png)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] eladkal commented on a diff in pull request #23104: Use new Breeze for building, pulling and verifying the images.

Posted by GitBox <gi...@apache.org>.
eladkal commented on code in PR #23104:
URL: https://github.com/apache/airflow/pull/23104#discussion_r854426385


##########
tests/system/providers/google/bigquery/example_bigquery_sensors.py:
##########
@@ -23,6 +23,7 @@
 from datetime import datetime
 
 from airflow import models
+from airflow.models.baseoperator import BaseOperator

Review Comment:
   Why changing system tests is part of this PR?



##########
dev/REFRESHING_CI_CACHE.md:
##########
@@ -91,18 +91,57 @@ docker run --privileged --rm tonistiigi/binfmt --install all
 
 More information can be found [here](https://docs.docker.com/engine/reference/commandline/buildx_create/)
 
-The images can be rebuilt and refreshed after the constraints are pushed. Refreshing image for particular
-python version is a simple as running the [refresh_images.sh](refresh_images.sh) script with python version
-as parameter:
+However, emulation is very slow - more than 10x slower than hardware-backed builds.
+
+## Setting up cache refreshing with hardware ARM/AMD support
+
+If you plan to build  a number of images, probably better solution is to set up a hardware remote builder

Review Comment:
   ```suggestion
   If you plan to build a number of images, probably better solution is to set up a hardware remote builder
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] potiuk commented on pull request #23104: Use new Breeze for building, pulling and verifying the images.

Posted by GitBox <gi...@apache.org>.
potiuk commented on PR #23104:
URL: https://github.com/apache/airflow/pull/23104#issuecomment-1104266008

   Apologies for such a HUGE PR, but I could not find a better way of replacing old with  new Breeze's  build/pull/push image.
   
   I could keep on replacing piece by piece but keeping the two in parallel in CI was really burdensome. That's why I replaced all the related build/pull/push tasks in one go. I am still testing it (I will do some tests with "main" push to my own repo).  But it should be generally working.
   
   It removes another 1000 lines of bash code (YAY!) and paves the way to remove nearly all of them in subsequent PRs. 
   
   Most of all It unifies the way how CI and local development works - rather than having scripts on CI and local breeze commands, everythong is done via python-based breeze commands (including running stuff in parallel on CI which does not use Gnu Parallel but Python Multiprocessing)
   
   It's way neater, nicere and much easier to reproduce locally. Example - most complex "building PROD image" is this:
   
   ````yaml
         - name: "Build & Push PROD image ${{ matrix.python-version }}:${{ env.IMAGE_TAG_FOR_THE_BUILD }}"
           run: |
             rm -f "./dist/"*.{whl,tar.gz}
             rm -f "./docker-context-files/"*.{whl,tar.gz}
             breeze prepare-provider-packages \
                 --package-list-file scripts/ci/installed_providers.txt" \
                 --package-format wheel
             breeze prepare-airflow-package --package-format wheel
             mv "./dist/"*.{whl,tar.gz} "./docker-context-files"
             breeze build-prod-image --push-image --install-from-docker-context-files
           env:
             PYTHON_MAJOR_MINOR_VERSION: ${{ matrix.python-version }}
             UPGRADE_TO_NEWER_DEPENDENCIES: ${{ needs.build-info.outputs.upgradeToNewerDependencies }}
             DOCKER_CACHE: ${{ needs.build-info.outputs.cacheDirective }}
             IMAGE_TAG: ${{ env.IMAGE_TAG_FOR_THE_BUILD }}
   ````
   
   Which you can very easily reproduce locally with the new breeze. It combines (for clarity of the commands) both evn-vars and flags, but all the relevant env vars used are now added directly in each step rather than in job or in workflow, which makes it much more obvious what's going on.
   
   @mik-laj  - I also removed the "special" preparation of virtualenv for docker tests - the docker tests now can be executed via `breeze` command and they will automatically use breeze's virtualenv (which already containts pytest, pytest-xdistts and requests):
   
   
   
   
   
   
    


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] potiuk commented on pull request #23104: Use new Breeze for building, pulling and verifying the images.

Posted by GitBox <gi...@apache.org>.
potiuk commented on PR #23104:
URL: https://github.com/apache/airflow/pull/23104#issuecomment-1104290139

   > This is massive! tada
   
   Oh yeah. Unfortunately. Now You know why your PRs were neglected a bit recently ;). The nice thing about it that it really implements all the remaining bits of the "breeze" infrastructure (for example running stuff in parallel is now embedded in some Breeze's commands which make the maintenance stuff much nicer and easier to perform by anyone without too much of a learning curve. After we finish - pretty much everything what happens in CI will be just the right sequence of breeze commands.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org