You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2020/08/18 00:40:17 UTC

[GitHub] [airflow] potiuk edited a comment on pull request #10368: CI Images are now pre-build and stored in registry

potiuk edited a comment on pull request #10368:
URL: https://github.com/apache/airflow/pull/10368#issuecomment-675177393


   Hello everyone. This is a major overhaul of the way how we are utilizing GitHub Actions - something that was enabled by recent features released by GitHub Actions (namely "workflow_run" feature). 
   
   I've been working on it last week and heavily tested it on my fork https://github.com/potiuk/airflow/actions. I hope there will be rather little number of teething problems, but I am already very familiar with how GA work and I will be able to fix any problems quickly. I am also going to watch it once we merge it to make sure it works as expected. 
   
   See commit description for what is achieved by this change. I have just one thing to say - this is my "dream" architecture of the CI builds that I had in mind at the very beginning of my work on Airflow, one that could only be achieved by the most recent changes by GitHub. I really hope this is one of the last fundamental changes in the scripting for CI because I literally run out of ideas what can be improved (just kidding - there are always small things ;).
   
   It has many nice properties but the most important ones:
   
   *  5-12 minutes saved for each Job (Builds of images are done only once not for each job). Not per whole run - but per Job (!). This will help both - increase number of parallell PRs that can be run and decrease the feedback time for each build. There were sometimes much slower builds when python base image was upgraded or Dockerfile changed - this problem will be gone.
   
   * the jobs/runs are fully consistent - all jobs in the same build use exactly the same image prepared only once. 
   
   * full trackability and reproducibility of each run - we keep the images in GitHub registry and you can recreate the exact failed run by running `./breeze --github-image-id <RUN_ID>` or `./breeze --github-image-id <COMMIT_ID>` for merged runs.
   
   * I cleaned up outputs of the job so that they only show relevant information
   
   * I cleaned up initialization code for bash scripts - removed some duplicates and organized it better and I fully documented it - describing the purpose of all options (that was the lat script refactoring I planned)
   
   It's quite a huge change, and I can try to split it into smaller ones (but conceptually it is one big overhaul of the way our CI works) 
   
   When you can start from the workflows at the end of the documentation: https://github.com/PolideaInternal/airflow/blob/prebuild-ci-images-in-github-actions/CI.rst  - I prepared some sequence diagrams of the CI architecture (using mermaid - which is an absolutely cool tool for converting markdownish descriptions of diagrams into really nice diagrams). It explains all the "whys" and also "hows". 
   
   NOTE! For Review/Merge I needed to disable waiting for images, so the speedups are not visible yet - I have to merge it to master in order to  enable the "Build Image" workflows. I also use "master" version of my own Github Cancel Action which I developed for that purpose - I will release it's v2 version and switch to it once we get a few days of the builds working in Airflow.
   
   I developed https://github.com/potiuk/cancel-workflow-runs new Github Action for "Cancel Workflow Run" that is a swiss-army-knife of Run cancelling and I plan to share it with Apache Beam and other Apache projects that might need it as well.
   
   I really look forward to review comments and merging it eventually. This will help all of contributors and committers to move faster. This is literally completion of 2 years of the "dream" architecture for our CI :).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org