You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2020/10/19 09:36:27 UTC

[GitHub] [airflow] potiuk opened a new pull request #11656: Optimizes CI builds heavily with selective checks

potiuk opened a new pull request #11656:
URL: https://github.com/apache/airflow/pull/11656


   * Images are not built if the change is not touching code or docs.
   * In case we have no need for CI images we run stripped-down
     pre-commit checks which skip the long checks and only run for
     changed files
   * If none of the CLI/Providers/Kubernetes/WWW files changed
     the relevant tests are skipped, unless some of the core files
     changed as well.
   * The selective checks logic is explained and documented.
   
   This is the second attempt at the problem with better
   strategy to get the list of files from the incoming PR.
   
   The strategy works now better in a number of cases:
   * when PR comes from the same repo
   * when PR comes from the pull_repo
   * when PR contains more than one commit
   * when PR is based on older master and GitHub creates
     merge commit
   
   <!--
   Thank you for contributing! Please make sure that your code changes
   are covered with tests. And in case of new features or big changes
   remember to adjust the documentation.
   
   Feel free to ping committers for the review!
   
   In case of existing issue, reference it using one of the following:
   
   closes: #ISSUE
   related: #ISSUE
   
   How to write a good git commit message:
   http://chris.beams.io/posts/git-commit/
   -->
   
   ---
   **^ Add meaningful description above**
   
   Read the **[Pull Request Guidelines](https://github.com/apache/airflow/blob/master/CONTRIBUTING.rst#pull-request-guidelines)** for more information.
   In case of fundamental code change, Airflow Improvement Proposal ([AIP](https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+Improvements+Proposals)) is needed.
   In case of a new dependency, check compliance with the [ASF 3rd Party License Policy](https://www.apache.org/legal/resolved.html#category-x).
   In case of backwards incompatible changes please leave a note in [UPDATING.md](https://github.com/apache/airflow/blob/master/UPDATING.md).
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on a change in pull request #11656: Optimizes CI builds heavily with selective checks

Posted by GitBox <gi...@apache.org>.
potiuk commented on a change in pull request #11656:
URL: https://github.com/apache/airflow/pull/11656#discussion_r507821146



##########
File path: scripts/ci/selective_ci_checks.sh
##########
@@ -121,76 +182,93 @@ function show_changed_files() {
 # Output:
 #    Count of changed files matching the patterns
 function count_changed_files() {
+    local count_changed_files
     count_changed_files=$(echo "${CHANGED_FILES}" | grep -c -E "$(get_regexp_from_patterns)" || true)
     echo "${count_changed_files}"
 }
 
-function run_all_tests_when_push_or_schedule() {
-    if [[ ${GITHUB_EVENT_NAME} == "push" || ${GITHUB_EVENT_NAME} == "schedule" ]]; then
-        echo
-        echo "Always run all tests in case of push/schedule events"
-        echo
-        set_outputs_run_all_tests
-        exit
+function check_if_api_tests_should_be_run() {
+    local pattern_array=(
+        "^airflow/api"
+    )
+    show_changed_files

Review comment:
       Yep. See here: https://tldp.org/LDP/abs/html/localvar.html. This is built in bash:
   
   > However, as Thomas Braunberger points out, a local variable declared in a function is also visible to functions called by the parent function.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on a change in pull request #11656: Optimizes CI builds heavily with selective checks

Posted by GitBox <gi...@apache.org>.
potiuk commented on a change in pull request #11656:
URL: https://github.com/apache/airflow/pull/11656#discussion_r507819445



##########
File path: scripts/ci/selective_ci_checks.sh
##########
@@ -121,76 +182,93 @@ function show_changed_files() {
 # Output:
 #    Count of changed files matching the patterns
 function count_changed_files() {
+    local count_changed_files
     count_changed_files=$(echo "${CHANGED_FILES}" | grep -c -E "$(get_regexp_from_patterns)" || true)
     echo "${count_changed_files}"

Review comment:
       Yep. Reminder from previous version.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on pull request #11656: Optimizes CI builds heavily with selective checks

Posted by GitBox <gi...@apache.org>.
potiuk commented on pull request #11656:
URL: https://github.com/apache/airflow/pull/11656#issuecomment-712580528


   If we won't try - we won't know :)


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk edited a comment on pull request #11656: Optimizes CI builds heavily with selective checks

Posted by GitBox <gi...@apache.org>.
potiuk edited a comment on pull request #11656:
URL: https://github.com/apache/airflow/pull/11656#issuecomment-712580528


   If we don't try - we won't know :)


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on pull request #11656: Optimizes CI builds heavily with selective checks

Posted by GitBox <gi...@apache.org>.
potiuk commented on pull request #11656:
URL: https://github.com/apache/airflow/pull/11656#issuecomment-711994355


   Tested heavily:
   
   Simple PR same repo: 
   * Build Image: https://github.com/potiuk/airflow/actions/runs/315091925
   * CI Image: https://github.com/potiuk/airflow/actions/runs/315091891
   
   Simple PR other repo:
   * Build Image: https://github.com/potiuk/airflow/actions/runs/315101915
   * CI Image: https://github.com/potiuk/airflow/actions/runs/315101895
   
   Multi-commit PR same repo:
   * Build Image: https://github.com/potiuk/airflow/actions/runs/315118421
   * CI Image: https://github.com/potiuk/airflow/actions/runs/315115637
   
   Mutil-commit PR same repo:
   * Build Image: https://github.com/potiuk/airflow/actions/runs/315126705
   * CI Image: https://github.com/potiuk/airflow/actions/runs/315126671
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on a change in pull request #11656: Optimizes CI builds heavily with selective checks

Posted by GitBox <gi...@apache.org>.
potiuk commented on a change in pull request #11656:
URL: https://github.com/apache/airflow/pull/11656#discussion_r507824946



##########
File path: CI.rst
##########
@@ -629,6 +691,90 @@ delete old artifacts that are > 7 days old. It only runs for the 'apache/airflow
 We also have a script that can help to clean-up the old artifacts:
 `remove_artifacts.sh <dev/remove_artifacts.sh>`_
 
+Selective CI Checks
+===================
+
+In order to optimise our CI builds, we've implemented optimisations to only run selected checks for some
+kind of changes. The logic implemented reflects the internal architecture of Airflow 2.0 packages
+and it helps to keep down both the usage of jobs in GitHub Actions as well as CI feedback time to
+contributors in case of simpler changes.
+
+We have the following test types (separated by packages in which they are):
+
+* Core - for the core Airflow functionality (core folder)

Review comment:
       I actually moved the tests to "core" folder in the previous PR because I needed cleaner "per directory" split of tests in pytest. In pytest it is far easier to say "run tests" from that directory rather than "run tests from all the files in this directory but do not recurse to this specific directory". So I moved all the tests that were in "airflow/*.py" to "airflow/core/"




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk edited a comment on pull request #11656: Optimizes CI builds heavily with selective checks

Posted by GitBox <gi...@apache.org>.
potiuk edited a comment on pull request #11656:
URL: https://github.com/apache/airflow/pull/11656#issuecomment-712005444


   @ashb @kaxil -> I solved it by rather straightforward approach.
   
   * In the "Build" workflow I take as base the "targetCommitSha" -> i already implemented in the "cancel-workflow-run" the logic to pick the right commit depending on whether merge commit is done by GitHub or not
   
   * In the "CI Build" workflow I take as base the "${ github.sha }" which is also good and it's either the merge commit (if there is one) or the source commit (if there is no merge commit - which is the case when there is only one commit in PR and the PR is based on the current master). 
   
   In both cases I check which files has changed by generating list of changes in the commit vs. it's first parent (^). This seems to work perfectly well. Seems that in GitHub even in case of Merge Commit the first parent is the "target branch". It could be the other way, but seems that in GitHub it is always this direction. And in any case it does not matter because if you have a single merge commit generated from sequence of commits and merged to master, no matter which side you compare, comparing with either parents should show you the same files.
   
   
   
   
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on pull request #11656: Optimizes CI builds heavily with selective checks

Posted by GitBox <gi...@apache.org>.
potiuk commented on pull request #11656:
URL: https://github.com/apache/airflow/pull/11656#issuecomment-712310143


   All good, this is just random AWS (rare) unrelated failure.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on a change in pull request #11656: Optimizes CI builds heavily with selective checks

Posted by GitBox <gi...@apache.org>.
potiuk commented on a change in pull request #11656:
URL: https://github.com/apache/airflow/pull/11656#discussion_r507824946



##########
File path: CI.rst
##########
@@ -629,6 +691,90 @@ delete old artifacts that are > 7 days old. It only runs for the 'apache/airflow
 We also have a script that can help to clean-up the old artifacts:
 `remove_artifacts.sh <dev/remove_artifacts.sh>`_
 
+Selective CI Checks
+===================
+
+In order to optimise our CI builds, we've implemented optimisations to only run selected checks for some
+kind of changes. The logic implemented reflects the internal architecture of Airflow 2.0 packages
+and it helps to keep down both the usage of jobs in GitHub Actions as well as CI feedback time to
+contributors in case of simpler changes.
+
+We have the following test types (separated by packages in which they are):
+
+* Core - for the core Airflow functionality (core folder)

Review comment:
       I actually moved the tests to "core" folder in the previous PR because I needed cleaner "per directory" split of tests in pytest. In pytest it is far easier to say "run tests" from that directory rather than "run tests from all the files in this directory but do not recurse to this specific directory". So I moved all the tests that were in "tests/*.py" to "tests/core/"




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on a change in pull request #11656: Optimizes CI builds heavily with selective checks

Posted by GitBox <gi...@apache.org>.
potiuk commented on a change in pull request #11656:
URL: https://github.com/apache/airflow/pull/11656#discussion_r507815811



##########
File path: .github/workflows/ci.yml
##########
@@ -202,12 +228,58 @@ jobs:
         env:
           VERBOSE: false
 
+  # Those checks are run if no image needs to be built for checks. This is for simple changes that
+  # Do not touch any of the python code or any of the important files that might require building
+  # The CI Docker image and they can be run entirely using the pre-commit virtual environments on host
+  static-checks-basic-checks-only:
+    timeout-minutes: 30
+    name: "Static checks: basic checks only"
+    runs-on: ubuntu-latest
+    needs: [build-info]
+    env:
+      SKIP: "build,mypy,flake8,pylint,bats-in-container-tests"
+      MOUNT_LOCAL_SOURCES: "true"
+      PYTHON_MAJOR_MINOR_VERSION: ${{needs.build-info.outputs.defaultPythonVersion}}
+    if: >
+      needs.build-info.outputs.basic-checks-only == 'true' &&
+      (github.repository == 'apache/airflow' || github.event_name != 'schedule')
+    steps:
+      - name: "Checkout ${{ github.ref }} ( ${{ github.sha }} )"
+        uses: actions/checkout@v2
+      - name: "Setup python"
+        uses: actions/setup-python@v2
+        with:
+          python-version: ${{needs.build-info.outputs.defaultPythonVersion}}
+      - name: Cache pre-commit env
+        uses: actions/cache@v2
+        env:
+          cache-name: cache-pre-commit-master-basic-checks-v1
+        with:
+          path: ~/.cache/pre-commit
+          key: ${{ env.cache-name }}-${{ github.job }}-${{ hashFiles('.pre-commit-config.yaml') }}
+      - name: "Free space"
+        run: ./scripts/ci/tools/ci_free_space_on_ci.sh
+      - name: >
+          Fetch incoming commit ${{ github.sha }} with its parent
+        uses: actions/checkout@v2
+        with:
+          ref: ${{ github.sha }}
+          fetch-depth: 2

Review comment:
       We need to pull in the parent commit as well. By default checkout action pulls only teh one that is being built and we need the parent to get the list of changes.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on a change in pull request #11656: Optimizes CI builds heavily with selective checks

Posted by GitBox <gi...@apache.org>.
potiuk commented on a change in pull request #11656:
URL: https://github.com/apache/airflow/pull/11656#discussion_r507817567



##########
File path: scripts/ci/selective_ci_checks.sh
##########
@@ -121,76 +182,93 @@ function show_changed_files() {
 # Output:
 #    Count of changed files matching the patterns
 function count_changed_files() {
+    local count_changed_files
     count_changed_files=$(echo "${CHANGED_FILES}" | grep -c -E "$(get_regexp_from_patterns)" || true)
     echo "${count_changed_files}"
 }
 
-function run_all_tests_when_push_or_schedule() {
-    if [[ ${GITHUB_EVENT_NAME} == "push" || ${GITHUB_EVENT_NAME} == "schedule" ]]; then
-        echo
-        echo "Always run all tests in case of push/schedule events"
-        echo
-        set_outputs_run_all_tests
-        exit
+function check_if_api_tests_should_be_run() {
+    local pattern_array=(
+        "^airflow/api"
+    )
+    show_changed_files
+
+    if [[ $(count_changed_files) == "0" ]]; then
+        needs_api_tests "false"
+    else
+        needs_api_tests "true"
+    fi
+}
+
+function check_if_helm_tests_should_be_run() {
+    local pattern_array=(
+        "^chart"
+    )
+    show_changed_files
+
+    if [[ $(count_changed_files) == "0" ]]; then
+        needs_helm_tests "false"
+    else
+        needs_helm_tests "true"
+    fi
+}
+
+function check_if_docs_should_be_generated() {
+    local pattern_array=(
+        "^docs$"
+        "\.py$"

Review comment:
       True!




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ashb commented on a change in pull request #11656: Optimizes CI builds heavily with selective checks

Posted by GitBox <gi...@apache.org>.
ashb commented on a change in pull request #11656:
URL: https://github.com/apache/airflow/pull/11656#discussion_r507751942



##########
File path: .github/workflows/ci.yml
##########
@@ -202,12 +228,58 @@ jobs:
         env:
           VERBOSE: false
 
+  # Those checks are run if no image needs to be built for checks. This is for simple changes that
+  # Do not touch any of the python code or any of the important files that might require building
+  # The CI Docker image and they can be run entirely using the pre-commit virtual environments on host
+  static-checks-basic-checks-only:
+    timeout-minutes: 30
+    name: "Static checks: basic checks only"
+    runs-on: ubuntu-latest
+    needs: [build-info]
+    env:
+      SKIP: "build,mypy,flake8,pylint,bats-in-container-tests"
+      MOUNT_LOCAL_SOURCES: "true"
+      PYTHON_MAJOR_MINOR_VERSION: ${{needs.build-info.outputs.defaultPythonVersion}}
+    if: >
+      needs.build-info.outputs.basic-checks-only == 'true' &&
+      (github.repository == 'apache/airflow' || github.event_name != 'schedule')
+    steps:
+      - name: "Checkout ${{ github.ref }} ( ${{ github.sha }} )"
+        uses: actions/checkout@v2
+      - name: "Setup python"
+        uses: actions/setup-python@v2
+        with:
+          python-version: ${{needs.build-info.outputs.defaultPythonVersion}}
+      - name: Cache pre-commit env
+        uses: actions/cache@v2
+        env:
+          cache-name: cache-pre-commit-master-basic-checks-v1
+        with:
+          path: ~/.cache/pre-commit
+          key: ${{ env.cache-name }}-${{ github.job }}-${{ hashFiles('.pre-commit-config.yaml') }}
+      - name: "Free space"
+        run: ./scripts/ci/tools/ci_free_space_on_ci.sh
+      - name: >
+          Fetch incoming commit ${{ github.sha }} with its parent
+        uses: actions/checkout@v2
+        with:
+          ref: ${{ github.sha }}
+          fetch-depth: 2

Review comment:
       Do we need this here? Haven't we already decided what we've built, and the commit in question will already be built. 

##########
File path: .github/workflows/ci.yml
##########
@@ -202,12 +228,58 @@ jobs:
         env:
           VERBOSE: false
 
+  # Those checks are run if no image needs to be built for checks. This is for simple changes that
+  # Do not touch any of the python code or any of the important files that might require building
+  # The CI Docker image and they can be run entirely using the pre-commit virtual environments on host
+  static-checks-basic-checks-only:
+    timeout-minutes: 30
+    name: "Static checks: basic checks only"
+    runs-on: ubuntu-latest
+    needs: [build-info]
+    env:
+      SKIP: "build,mypy,flake8,pylint,bats-in-container-tests"
+      MOUNT_LOCAL_SOURCES: "true"
+      PYTHON_MAJOR_MINOR_VERSION: ${{needs.build-info.outputs.defaultPythonVersion}}
+    if: >
+      needs.build-info.outputs.basic-checks-only == 'true' &&
+      (github.repository == 'apache/airflow' || github.event_name != 'schedule')
+    steps:
+      - name: "Checkout ${{ github.ref }} ( ${{ github.sha }} )"
+        uses: actions/checkout@v2
+      - name: "Setup python"
+        uses: actions/setup-python@v2
+        with:
+          python-version: ${{needs.build-info.outputs.defaultPythonVersion}}
+      - name: Cache pre-commit env
+        uses: actions/cache@v2
+        env:
+          cache-name: cache-pre-commit-master-basic-checks-v1
+        with:
+          path: ~/.cache/pre-commit
+          key: ${{ env.cache-name }}-${{ github.job }}-${{ hashFiles('.pre-commit-config.yaml') }}
+      - name: "Free space"
+        run: ./scripts/ci/tools/ci_free_space_on_ci.sh

Review comment:
       Not important, but we could probably remove the free space check here if we aren't pulling in big images

##########
File path: scripts/ci/selective_ci_checks.sh
##########
@@ -121,76 +182,93 @@ function show_changed_files() {
 # Output:
 #    Count of changed files matching the patterns
 function count_changed_files() {
+    local count_changed_files
     count_changed_files=$(echo "${CHANGED_FILES}" | grep -c -E "$(get_regexp_from_patterns)" || true)
     echo "${count_changed_files}"
 }
 
-function run_all_tests_when_push_or_schedule() {
-    if [[ ${GITHUB_EVENT_NAME} == "push" || ${GITHUB_EVENT_NAME} == "schedule" ]]; then
-        echo
-        echo "Always run all tests in case of push/schedule events"
-        echo
-        set_outputs_run_all_tests
-        exit
+function check_if_api_tests_should_be_run() {
+    local pattern_array=(
+        "^airflow/api"
+    )
+    show_changed_files
+
+    if [[ $(count_changed_files) == "0" ]]; then
+        needs_api_tests "false"
+    else
+        needs_api_tests "true"
+    fi
+}
+
+function check_if_helm_tests_should_be_run() {
+    local pattern_array=(
+        "^chart"
+    )
+    show_changed_files
+
+    if [[ $(count_changed_files) == "0" ]]; then
+        needs_helm_tests "false"
+    else
+        needs_helm_tests "true"
+    fi
+}
+
+function check_if_docs_should_be_generated() {
+    local pattern_array=(
+        "^docs$"
+        "\.py$"

Review comment:
       ```suggestion
           "^airflow/.*\.py$"
   ```
   
   We don't need to rebuild docs for test-only changes.

##########
File path: CI.rst
##########
@@ -629,6 +691,90 @@ delete old artifacts that are > 7 days old. It only runs for the 'apache/airflow
 We also have a script that can help to clean-up the old artifacts:
 `remove_artifacts.sh <dev/remove_artifacts.sh>`_
 
+Selective CI Checks
+===================
+
+In order to optimise our CI builds, we've implemented optimisations to only run selected checks for some
+kind of changes. The logic implemented reflects the internal architecture of Airflow 2.0 packages
+and it helps to keep down both the usage of jobs in GitHub Actions as well as CI feedback time to
+contributors in case of simpler changes.
+
+We have the following test types (separated by packages in which they are):
+
+* Core - for the core Airflow functionality (core folder)

Review comment:
       There isn't a core folder -- do you mean airflow/, if not one of the more specific examples below?

##########
File path: CI.rst
##########
@@ -32,6 +32,69 @@ However part of the philosophy we have is that we are not tightly coupled with a
 environments we use. Most of our CI jobs are written as bash scripts which are executed as steps in
 the CI jobs. And we have  a number of variables determine build behaviour.
 
+
+
+
+Github Actions runs
+-------------------
+
+Our builds on CI I highly optimized. They utilise some of the latest features provided by GitHub Actions

Review comment:
       ```suggestion
   Our builds on CI are highly optimized. They utilise some of the latest features provided by GitHub Actions
   ```
   
   (I think you meant `I` -> `is`, but `are` is more correct.)

##########
File path: scripts/ci/selective_ci_checks.sh
##########
@@ -121,76 +182,93 @@ function show_changed_files() {
 # Output:
 #    Count of changed files matching the patterns
 function count_changed_files() {
+    local count_changed_files
     count_changed_files=$(echo "${CHANGED_FILES}" | grep -c -E "$(get_regexp_from_patterns)" || true)
     echo "${count_changed_files}"
 }
 
-function run_all_tests_when_push_or_schedule() {
-    if [[ ${GITHUB_EVENT_NAME} == "push" || ${GITHUB_EVENT_NAME} == "schedule" ]]; then
-        echo
-        echo "Always run all tests in case of push/schedule events"
-        echo
-        set_outputs_run_all_tests
-        exit
+function check_if_api_tests_should_be_run() {
+    local pattern_array=(
+        "^airflow/api"
+    )
+    show_changed_files

Review comment:
       Wait, this works? `show_changed_files` calls `get_regexp_from_patterns`, and that references `pattern_array` -- and it can pick up the local array defined here?
   
   That was not how I thought local worked. I can see how the scoping rules of bash made this the case, but I guess I was expecting something else.

##########
File path: scripts/ci/selective_ci_checks.sh
##########
@@ -121,76 +182,93 @@ function show_changed_files() {
 # Output:
 #    Count of changed files matching the patterns
 function count_changed_files() {
+    local count_changed_files
     count_changed_files=$(echo "${CHANGED_FILES}" | grep -c -E "$(get_regexp_from_patterns)" || true)
     echo "${count_changed_files}"

Review comment:
       ```suggestion
       echo "${CHANGED_FILES}" | grep -c -E "$(get_regexp_from_patterns)" || true
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on pull request #11656: Optimizes CI builds heavily with selective checks

Posted by GitBox <gi...@apache.org>.
potiuk commented on pull request #11656:
URL: https://github.com/apache/airflow/pull/11656#issuecomment-712228353


   All addressed I think!


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ashb commented on a change in pull request #11656: Optimizes CI builds heavily with selective checks

Posted by GitBox <gi...@apache.org>.
ashb commented on a change in pull request #11656:
URL: https://github.com/apache/airflow/pull/11656#discussion_r508027959



##########
File path: .github/workflows/ci.yml
##########
@@ -202,12 +228,58 @@ jobs:
         env:
           VERBOSE: false
 
+  # Those checks are run if no image needs to be built for checks. This is for simple changes that
+  # Do not touch any of the python code or any of the important files that might require building
+  # The CI Docker image and they can be run entirely using the pre-commit virtual environments on host
+  static-checks-basic-checks-only:
+    timeout-minutes: 30
+    name: "Static checks: basic checks only"
+    runs-on: ubuntu-latest
+    needs: [build-info]
+    env:
+      SKIP: "build,mypy,flake8,pylint,bats-in-container-tests"
+      MOUNT_LOCAL_SOURCES: "true"
+      PYTHON_MAJOR_MINOR_VERSION: ${{needs.build-info.outputs.defaultPythonVersion}}
+    if: >
+      needs.build-info.outputs.basic-checks-only == 'true' &&
+      (github.repository == 'apache/airflow' || github.event_name != 'schedule')
+    steps:
+      - name: "Checkout ${{ github.ref }} ( ${{ github.sha }} )"
+        uses: actions/checkout@v2
+      - name: "Setup python"
+        uses: actions/setup-python@v2
+        with:
+          python-version: ${{needs.build-info.outputs.defaultPythonVersion}}
+      - name: Cache pre-commit env
+        uses: actions/cache@v2
+        env:
+          cache-name: cache-pre-commit-master-basic-checks-v1
+        with:
+          path: ~/.cache/pre-commit
+          key: ${{ env.cache-name }}-${{ github.job }}-${{ hashFiles('.pre-commit-config.yaml') }}
+      - name: "Free space"
+        run: ./scripts/ci/tools/ci_free_space_on_ci.sh

Review comment:
       I don't see any image pulls happening in https://github.com/potiuk/airflow/runs/1274138083?check_suite_focus=true




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on a change in pull request #11656: Optimizes CI builds heavily with selective checks

Posted by GitBox <gi...@apache.org>.
potiuk commented on a change in pull request #11656:
URL: https://github.com/apache/airflow/pull/11656#discussion_r508036142



##########
File path: .github/workflows/ci.yml
##########
@@ -202,12 +228,58 @@ jobs:
         env:
           VERBOSE: false
 
+  # Those checks are run if no image needs to be built for checks. This is for simple changes that
+  # Do not touch any of the python code or any of the important files that might require building
+  # The CI Docker image and they can be run entirely using the pre-commit virtual environments on host
+  static-checks-basic-checks-only:
+    timeout-minutes: 30
+    name: "Static checks: basic checks only"
+    runs-on: ubuntu-latest
+    needs: [build-info]
+    env:
+      SKIP: "build,mypy,flake8,pylint,bats-in-container-tests"
+      MOUNT_LOCAL_SOURCES: "true"
+      PYTHON_MAJOR_MINOR_VERSION: ${{needs.build-info.outputs.defaultPythonVersion}}
+    if: >
+      needs.build-info.outputs.basic-checks-only == 'true' &&
+      (github.repository == 'apache/airflow' || github.event_name != 'schedule')
+    steps:
+      - name: "Checkout ${{ github.ref }} ( ${{ github.sha }} )"
+        uses: actions/checkout@v2
+      - name: "Setup python"
+        uses: actions/setup-python@v2
+        with:
+          python-version: ${{needs.build-info.outputs.defaultPythonVersion}}
+      - name: Cache pre-commit env
+        uses: actions/cache@v2
+        env:
+          cache-name: cache-pre-commit-master-basic-checks-v1
+        with:
+          path: ~/.cache/pre-commit
+          key: ${{ env.cache-name }}-${{ github.job }}-${{ hashFiles('.pre-commit-config.yaml') }}
+      - name: "Free space"
+        run: ./scripts/ci/tools/ci_free_space_on_ci.sh

Review comment:
       Ah yeah :)




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on a change in pull request #11656: Optimizes CI builds heavily with selective checks

Posted by GitBox <gi...@apache.org>.
potiuk commented on a change in pull request #11656:
URL: https://github.com/apache/airflow/pull/11656#discussion_r507814890



##########
File path: .github/workflows/ci.yml
##########
@@ -202,12 +228,58 @@ jobs:
         env:
           VERBOSE: false
 
+  # Those checks are run if no image needs to be built for checks. This is for simple changes that
+  # Do not touch any of the python code or any of the important files that might require building
+  # The CI Docker image and they can be run entirely using the pre-commit virtual environments on host
+  static-checks-basic-checks-only:
+    timeout-minutes: 30
+    name: "Static checks: basic checks only"
+    runs-on: ubuntu-latest
+    needs: [build-info]
+    env:
+      SKIP: "build,mypy,flake8,pylint,bats-in-container-tests"
+      MOUNT_LOCAL_SOURCES: "true"
+      PYTHON_MAJOR_MINOR_VERSION: ${{needs.build-info.outputs.defaultPythonVersion}}
+    if: >
+      needs.build-info.outputs.basic-checks-only == 'true' &&
+      (github.repository == 'apache/airflow' || github.event_name != 'schedule')
+    steps:
+      - name: "Checkout ${{ github.ref }} ( ${{ github.sha }} )"
+        uses: actions/checkout@v2
+      - name: "Setup python"
+        uses: actions/setup-python@v2
+        with:
+          python-version: ${{needs.build-info.outputs.defaultPythonVersion}}
+      - name: Cache pre-commit env
+        uses: actions/cache@v2
+        env:
+          cache-name: cache-pre-commit-master-basic-checks-v1
+        with:
+          path: ~/.cache/pre-commit
+          key: ${{ env.cache-name }}-${{ github.job }}-${{ hashFiles('.pre-commit-config.yaml') }}
+      - name: "Free space"
+        run: ./scripts/ci/tools/ci_free_space_on_ci.sh

Review comment:
       We are still pulling them in :)




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on pull request #11656: Optimizes CI builds heavily with selective checks

Posted by GitBox <gi...@apache.org>.
potiuk commented on pull request #11656:
URL: https://github.com/apache/airflow/pull/11656#issuecomment-712005444


   @ashb @kaxil -> I solved it by rather straightforward approach.
   
   * In the "Build" workflow I take as base the "targetSHA" -> i already implemented in the "cancel-workflow-run" the logic to pick the right commit depending on whether merge commit is done by GitHub or not
   
   * In the "CI Build" workflow I take as base the "${ github.sha }" which is also good and it's either the merge commit (if there is one) or the source commit (if there is no merge commit - which is the case when there is only one commit in PR and the PR is based on the current master). 
   
   In both cases I check which files has changed by generating list of changes in the commit vs. it's first parent (^). This seems to work perfectly well. Seems that in GitHub even in case of Merge Commit the first parent is the "target branch". It could be the other way, but seems that in GitHub it is always this direction. And in any case it does not matter because if you have a single merge commit generated from sequence of commits and merged to master, no matter which side you compare, comparing with either parents should show you the same files.
   
   
   
   
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk edited a comment on pull request #11656: Optimizes CI builds heavily with selective checks

Posted by GitBox <gi...@apache.org>.
potiuk edited a comment on pull request #11656:
URL: https://github.com/apache/airflow/pull/11656#issuecomment-712005444


   @ashb @kaxil -> I solved it by rather straightforward approach.
   
   * In the "Build" workflow I take as base the "targetCommitSha" -> I already implemented in the "cancel-workflow-run" the logic to pick the right commit depending on whether merge commit is done by GitHub or not
   
   * In the "CI Build" workflow I take as base the "${ github.sha }" which is also good and it's either the merge commit (if there is one) or the source commit (if there is no merge commit - which is the case when there is only one commit in PR and the PR is based on the current master). 
   
   In both cases I check which files has changed by generating list of changes in the commit vs. it's first parent (^). For that I fetch the commit I want with 'fetch-depth' set to 2. So this is also rather fast and optimal - without fetching the whole history.
   
    This seems to work perfectly well. Seems that in GitHub even in case of Merge Commit the first parent is the "target branch". It could be the other way, but seems that in GitHub it is always this direction. And in any case it does not matter because if you have a single merge commit generated from sequence of commits and merged to master, no matter which side you compare, comparing with either parents should show you the same files.
   
   
   
   
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on pull request #11656: Optimizes CI builds heavily with selective checks

Posted by GitBox <gi...@apache.org>.
potiuk commented on pull request #11656:
URL: https://github.com/apache/airflow/pull/11656#issuecomment-712422237


   Also the last nitpick addressed


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk merged pull request #11656: Optimizes CI builds heavily with selective checks

Posted by GitBox <gi...@apache.org>.
potiuk merged pull request #11656:
URL: https://github.com/apache/airflow/pull/11656


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk edited a comment on pull request #11656: Optimizes CI builds heavily with selective checks

Posted by GitBox <gi...@apache.org>.
potiuk edited a comment on pull request #11656:
URL: https://github.com/apache/airflow/pull/11656#issuecomment-712005444


   @ashb @kaxil -> I solved it by rather straightforward approach.
   
   * In the "Build" workflow I take as base the "targetCommitSha" -> I already implemented in the "cancel-workflow-run" the logic to pick the right commit depending on whether merge commit is done by GitHub or not
   
   * In the "CI Build" workflow I take as base the "${ github.sha }" which is also good and it's either the merge commit (if there is one) or the source commit (if there is no merge commit - which is the case when there is only one commit in PR and the PR is based on the current master). 
   
   In both cases I check which files has changed by generating list of changes in the commit vs. it's first parent (^). This seems to work perfectly well. Seems that in GitHub even in case of Merge Commit the first parent is the "target branch". It could be the other way, but seems that in GitHub it is always this direction. And in any case it does not matter because if you have a single merge commit generated from sequence of commits and merged to master, no matter which side you compare, comparing with either parents should show you the same files.
   
   
   
   
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk edited a comment on pull request #11656: Optimizes CI builds heavily with selective checks

Posted by GitBox <gi...@apache.org>.
potiuk edited a comment on pull request #11656:
URL: https://github.com/apache/airflow/pull/11656#issuecomment-711994355


   Tested heavily:
   
   Simple PR same repo: 
   * Build Image: https://github.com/potiuk/airflow/actions/runs/315091925
   * CI Image: https://github.com/potiuk/airflow/actions/runs/315091891
   
   Simple PR other repo:
   * Build Image: https://github.com/potiuk/airflow/actions/runs/315101915
   * CI Image: https://github.com/potiuk/airflow/actions/runs/315101895
   
   Multi-commit PR same repo:
   * Build Image: https://github.com/potiuk/airflow/actions/runs/315118421
   * CI Image: https://github.com/potiuk/airflow/actions/runs/315115637
   
   Mutil-commit PR other repo:
   * Build Image: https://github.com/potiuk/airflow/actions/runs/315126705
   * CI Image: https://github.com/potiuk/airflow/actions/runs/315126671
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org