You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2022/06/22 18:53:53 UTC

[GitHub] [airflow] potiuk opened a new pull request, #24610: Convert selective checks to Breeze Python

potiuk opened a new pull request, #24610:
URL: https://github.com/apache/airflow/pull/24610

   Instead of bash-based, complex logic script to perform PR selective
   checks we now integrated the whole logic into Breeze Python code.
   
   It is now much simplified, when it comes to algorithm. We've
   implemented simple rule-based decision tree. The rules describing
   the decision tree are now are now much easier
   to reason about and they correspond one-to-one with the rules
   that are implemented in the code in rather straightforward way.
   
   The code is much simpler and diagnostics of the selective checks
   has also been vastly improved:
   
   * The rule engine displays status of applying each rule and
     explains (with yellow warning message what decision was made
     and why. Informative messages are printed showing the resulting
     output
   
   * List of files impacting the decision are also displayed
   
   * The names of "ci file group" and "test type" were aligned
   
   * Unit tests covering wide range of cases are added. Each test
     describes what is the case they demonstrate
   
   * `breeze selective-checks` command that is used in CI can also
     be used locally by just providing commit-ish reference of the
     commit to check. This way you can very easily debug problems and
     fix them
   
   Fixes: #19971
   
   <!--
   Thank you for contributing! Please make sure that your code changes
   are covered with tests. And in case of new features or big changes
   remember to adjust the documentation.
   
   Feel free to ping committers for the review!
   
   In case of existing issue, reference it using one of the following:
   
   closes: #ISSUE
   related: #ISSUE
   
   How to write a good git commit message:
   http://chris.beams.io/posts/git-commit/
   -->
   
   ---
   **^ Add meaningful description above**
   
   Read the **[Pull Request Guidelines](https://github.com/apache/airflow/blob/main/CONTRIBUTING.rst#pull-request-guidelines)** for more information.
   In case of fundamental code change, Airflow Improvement Proposal ([AIP](https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+Improvements+Proposals)) is needed.
   In case of a new dependency, check compliance with the [ASF 3rd Party License Policy](https://www.apache.org/legal/resolved.html#category-x).
   In case of backwards incompatible changes please leave a note in a newsfragement file, named `{pr_number}.significant.rst`, in [newsfragments](https://github.com/apache/airflow/tree/main/newsfragments).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] potiuk commented on pull request #24610: Convert selective checks to Breeze Python

Posted by GitBox <gi...@apache.org>.
potiuk commented on PR #24610:
URL: https://github.com/apache/airflow/pull/24610#issuecomment-1164568024

   BTW. By converting the checks to Python I also found that current selectiuve checks were not "selective enough" as I missed the cases where "Helm" tests are run unnecessarily in quite a number of cases (For example they we run when only providers were modified - this should speed up PRs from contributors who only modified one or more providers and did not touch the code).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] edithturn commented on pull request #24610: Convert selective checks to Breeze Python

Posted by GitBox <gi...@apache.org>.
edithturn commented on PR #24610:
URL: https://github.com/apache/airflow/pull/24610#issuecomment-1163631718

   I wouldn't have thought of all those changes, really impressive @potiuk.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] mik-laj commented on a diff in pull request #24610: Convert selective checks to Breeze Python

Posted by GitBox <gi...@apache.org>.
mik-laj commented on code in PR #24610:
URL: https://github.com/apache/airflow/pull/24610#discussion_r906447025


##########
dev/breeze/src/airflow_breeze/commands/ci_commands.py:
##########
@@ -0,0 +1,237 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+import sys
+from typing import Optional, Tuple
+
+import click
+
+from airflow_breeze.commands.main_command import main
+from airflow_breeze.global_constants import (
+    DEFAULT_PYTHON_MAJOR_MINOR_VERSION,
+    MOUNT_ALL,
+    GithubEvents,
+    github_events,
+)
+from airflow_breeze.params.shell_params import ShellParams
+from airflow_breeze.utils.common_options import (
+    option_airflow_constraints_reference,
+    option_answer,
+    option_dry_run,
+    option_github_repository,
+    option_max_age,
+    option_python,
+    option_timezone,
+    option_updated_on_or_after,
+    option_verbose,
+)
+from airflow_breeze.utils.confirm import Answer, user_confirm
+from airflow_breeze.utils.console import get_console
+from airflow_breeze.utils.custom_param_types import BetterChoice
+from airflow_breeze.utils.docker_command_utils import (
+    check_docker_resources,
+    get_env_variables_for_docker_commands,
+    get_extra_docker_flags,
+    perform_environment_checks,
+)
+from airflow_breeze.utils.find_newer_dependencies import find_newer_dependencies
+from airflow_breeze.utils.image import find_available_ci_image
+from airflow_breeze.utils.run_utils import run_command
+
+CI_COMMANDS = {
+    "name": "CI commands",
+    "commands": [
+        "fix-ownership",
+        "free-space",
+        "resource-check",
+        "selective-check",
+        "find-newer-dependencies",
+    ],
+}
+
+CI_PARAMETERS = {
+    "breeze selective-check": [
+        {
+            "name": "Selective check flags",
+            "options": [
+                "--commit-ref",
+                "--pr-labels",
+                "--default-branch",
+                "--github-event-name",
+            ],
+        }
+    ],
+    "breeze find-newer-dependencies": [
+        {
+            "name": "Find newer dependencies flags",
+            "options": [
+                "--python",
+                "--timezone",
+                "--constraints-branch",
+                "--updated-on-or-after",
+                "--max-age",
+            ],
+        }
+    ],
+}
+
+
+@main.command(name="free-space", help="Free space for jobs run in CI.")
+@option_verbose
+@option_dry_run
+@option_answer
+def free_space(verbose: bool, dry_run: bool, answer: str):
+    if user_confirm("Are you sure to run free-space and perform cleanup?") == Answer.YES:
+        run_command(["sudo", "swapoff", "-a"], verbose=verbose, dry_run=dry_run)
+        run_command(["sudo", "rm", "-f", "/swapfile"], verbose=verbose, dry_run=dry_run)
+        run_command(["sudo", "apt-get", "clean"], verbose=verbose, dry_run=dry_run, check=False)
+        run_command(
+            ["docker", "system", "prune", "--all", "--force", "--volumes"], verbose=verbose, dry_run=dry_run
+        )
+        run_command(["df", "-h"], verbose=verbose, dry_run=dry_run)
+        run_command(["docker", "logout", "ghcr.io"], verbose=verbose, dry_run=dry_run, check=False)
+
+
+@main.command(name="resource-check", help="Check if available docker resources are enough.")
+@option_verbose
+@option_dry_run
+def resource_check(verbose: bool, dry_run: bool):
+    perform_environment_checks(verbose=verbose)
+    shell_params = ShellParams(verbose=verbose, python=DEFAULT_PYTHON_MAJOR_MINOR_VERSION)
+    check_docker_resources(shell_params.airflow_image_name, verbose=verbose, dry_run=dry_run)
+
+
+@main.command(name="fix-ownership", help="Fix ownership of source files to be same as host user.")
+@option_github_repository
+@option_verbose
+@option_dry_run
+def fix_ownership(github_repository: str, verbose: bool, dry_run: bool):
+    perform_environment_checks(verbose=verbose)
+    shell_params = find_available_ci_image(github_repository, dry_run, verbose)
+    extra_docker_flags = get_extra_docker_flags(MOUNT_ALL)
+    env = get_env_variables_for_docker_commands(shell_params)
+    cmd = [
+        "docker",
+        "run",
+        "-t",
+        *extra_docker_flags,
+        "--pull",
+        "never",
+        shell_params.airflow_image_name_with_tag,
+        "/opt/airflow/scripts/in_container/run_fix_ownership.sh",
+    ]
+    run_command(
+        cmd, verbose=verbose, dry_run=dry_run, text=True, env=env, check=False, enabled_output_group=True
+    )
+    # Always succeed
+    sys.exit(0)
+
+
+def get_changed_files(commit_ref: Optional[str], dry_run: bool, verbose: bool) -> Tuple[str, ...]:
+    if commit_ref is None:
+        return ()
+    cmd = [
+        "git",
+        "diff-tree",
+        "--no-commit-id",
+        "--name-only",
+        "-r",
+        commit_ref + "^",
+        commit_ref,
+    ]
+    result = run_command(cmd, dry_run=dry_run, verbose=verbose, check=False, capture_output=True, text=True)
+    if result.returncode != 0:
+        get_console().print(
+            f"[warning] Error when running diff-tree command [/]\n{result.stdout}\n{result.stderr}"
+        )
+        return ()
+    changed_files = tuple(result.stdout.splitlines()) if result.stdout else ()
+    get_console().print("\n[info]Changed files:[/]\n")
+    get_console().print(changed_files)
+    get_console().print()
+    return changed_files
+
+
+@main.command(name="selective-check", help="Check if available docker resources are enough.")

Review Comment:
   Can you update description?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] mik-laj commented on a diff in pull request #24610: Convert selective checks to Breeze Python

Posted by GitBox <gi...@apache.org>.
mik-laj commented on code in PR #24610:
URL: https://github.com/apache/airflow/pull/24610#discussion_r905573948


##########
SELECTIVE_CHECKS.md:
##########
@@ -21,20 +21,38 @@
 
 In order to optimise our CI jobs, we've implemented optimisations to only run selected checks for some
 kind of changes. The logic implemented reflects the internal architecture of Airflow 2.0 packages
-and it helps to keep down both the usage of jobs in GitHub Actions as well as CI feedback time to
+and it helps to keep down both the usage of jobs in GitHub Actions abd CI feedback time to

Review Comment:
   ?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] potiuk commented on pull request #24610: Convert selective checks to Breeze Python

Posted by GitBox <gi...@apache.org>.
potiuk commented on PR #24610:
URL: https://github.com/apache/airflow/pull/24610#issuecomment-1163720735

   > I wouldn't have thought of all those changes, really impressive @potiuk.
   
   Just Python is way easier to write things better :)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] potiuk commented on pull request #24610: Convert selective checks to Breeze Python

Posted by GitBox <gi...@apache.org>.
potiuk commented on PR #24610:
URL: https://github.com/apache/airflow/pull/24610#issuecomment-1163503379

   This is the next step in Breeze conversion. This time -800 lines of Bash Code (replaced with Python but including lots of unit tests) which is far more readable and easy to reason about.
   
   Plus very clear diagnosticts of what decisions are made and why, including `breeze static-check` tool that can be used locally to debug and reproduce selective check behaviour with any local commit.
   
   Regular PR that triggers all tests (core change) but does not  run any Helm/K8S/UI javascript tests
   
   ![image](https://user-images.githubusercontent.com/595491/175116212-df43f2c7-ea64-475e-8008-28d795f9230e.png)
   
   More complex change (this one) that trigers all possible tests because environment changed:
   ![image](https://user-images.githubusercontent.com/595491/175116706-08c5d140-9250-4f33-9773-1242324b9912.png)
   
   Fragment of more complex change (amazon + cncf.kubernetes) that triggers only subset of unit tests ("Always", "Providers") bit also all our K8S changes:
   
   ![image](https://user-images.githubusercontent.com/595491/175117116-1ad16849-74ec-425f-a32e-07c7ff8ac0b4.png)
   
   Tests are pretty comprehensive and we will be able to easily test and add any new cases in the future (especially when we split providers and want to make more fine-grained decisions which tests to run.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] potiuk commented on a diff in pull request #24610: Convert selective checks to Breeze Python

Posted by GitBox <gi...@apache.org>.
potiuk commented on code in PR #24610:
URL: https://github.com/apache/airflow/pull/24610#discussion_r906606647


##########
dev/breeze/src/airflow_breeze/commands/ci_commands.py:
##########
@@ -0,0 +1,237 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+import sys
+from typing import Optional, Tuple
+
+import click
+
+from airflow_breeze.commands.main_command import main
+from airflow_breeze.global_constants import (
+    DEFAULT_PYTHON_MAJOR_MINOR_VERSION,
+    MOUNT_ALL,
+    GithubEvents,
+    github_events,
+)
+from airflow_breeze.params.shell_params import ShellParams
+from airflow_breeze.utils.common_options import (
+    option_airflow_constraints_reference,
+    option_answer,
+    option_dry_run,
+    option_github_repository,
+    option_max_age,
+    option_python,
+    option_timezone,
+    option_updated_on_or_after,
+    option_verbose,
+)
+from airflow_breeze.utils.confirm import Answer, user_confirm
+from airflow_breeze.utils.console import get_console
+from airflow_breeze.utils.custom_param_types import BetterChoice
+from airflow_breeze.utils.docker_command_utils import (
+    check_docker_resources,
+    get_env_variables_for_docker_commands,
+    get_extra_docker_flags,
+    perform_environment_checks,
+)
+from airflow_breeze.utils.find_newer_dependencies import find_newer_dependencies
+from airflow_breeze.utils.image import find_available_ci_image
+from airflow_breeze.utils.run_utils import run_command
+
+CI_COMMANDS = {
+    "name": "CI commands",
+    "commands": [
+        "fix-ownership",
+        "free-space",
+        "resource-check",
+        "selective-check",
+        "find-newer-dependencies",
+    ],
+}
+
+CI_PARAMETERS = {
+    "breeze selective-check": [
+        {
+            "name": "Selective check flags",
+            "options": [
+                "--commit-ref",
+                "--pr-labels",
+                "--default-branch",
+                "--github-event-name",
+            ],
+        }
+    ],
+    "breeze find-newer-dependencies": [
+        {
+            "name": "Find newer dependencies flags",
+            "options": [
+                "--python",
+                "--timezone",
+                "--constraints-branch",
+                "--updated-on-or-after",
+                "--max-age",
+            ],
+        }
+    ],
+}
+
+
+@main.command(name="free-space", help="Free space for jobs run in CI.")
+@option_verbose
+@option_dry_run
+@option_answer
+def free_space(verbose: bool, dry_run: bool, answer: str):
+    if user_confirm("Are you sure to run free-space and perform cleanup?") == Answer.YES:
+        run_command(["sudo", "swapoff", "-a"], verbose=verbose, dry_run=dry_run)
+        run_command(["sudo", "rm", "-f", "/swapfile"], verbose=verbose, dry_run=dry_run)
+        run_command(["sudo", "apt-get", "clean"], verbose=verbose, dry_run=dry_run, check=False)
+        run_command(
+            ["docker", "system", "prune", "--all", "--force", "--volumes"], verbose=verbose, dry_run=dry_run
+        )
+        run_command(["df", "-h"], verbose=verbose, dry_run=dry_run)
+        run_command(["docker", "logout", "ghcr.io"], verbose=verbose, dry_run=dry_run, check=False)
+
+
+@main.command(name="resource-check", help="Check if available docker resources are enough.")
+@option_verbose
+@option_dry_run
+def resource_check(verbose: bool, dry_run: bool):
+    perform_environment_checks(verbose=verbose)
+    shell_params = ShellParams(verbose=verbose, python=DEFAULT_PYTHON_MAJOR_MINOR_VERSION)
+    check_docker_resources(shell_params.airflow_image_name, verbose=verbose, dry_run=dry_run)
+
+
+@main.command(name="fix-ownership", help="Fix ownership of source files to be same as host user.")
+@option_github_repository
+@option_verbose
+@option_dry_run
+def fix_ownership(github_repository: str, verbose: bool, dry_run: bool):
+    perform_environment_checks(verbose=verbose)
+    shell_params = find_available_ci_image(github_repository, dry_run, verbose)
+    extra_docker_flags = get_extra_docker_flags(MOUNT_ALL)
+    env = get_env_variables_for_docker_commands(shell_params)
+    cmd = [
+        "docker",
+        "run",
+        "-t",
+        *extra_docker_flags,
+        "--pull",
+        "never",
+        shell_params.airflow_image_name_with_tag,
+        "/opt/airflow/scripts/in_container/run_fix_ownership.sh",
+    ]
+    run_command(
+        cmd, verbose=verbose, dry_run=dry_run, text=True, env=env, check=False, enabled_output_group=True
+    )
+    # Always succeed
+    sys.exit(0)
+
+
+def get_changed_files(commit_ref: Optional[str], dry_run: bool, verbose: bool) -> Tuple[str, ...]:
+    if commit_ref is None:
+        return ()
+    cmd = [
+        "git",
+        "diff-tree",
+        "--no-commit-id",
+        "--name-only",
+        "-r",
+        commit_ref + "^",
+        commit_ref,
+    ]
+    result = run_command(cmd, dry_run=dry_run, verbose=verbose, check=False, capture_output=True, text=True)
+    if result.returncode != 0:
+        get_console().print(
+            f"[warning] Error when running diff-tree command [/]\n{result.stdout}\n{result.stderr}"
+        )
+        return ()
+    changed_files = tuple(result.stdout.splitlines()) if result.stdout else ()
+    get_console().print("\n[info]Changed files:[/]\n")
+    get_console().print(changed_files)
+    get_console().print()
+    return changed_files
+
+
+@main.command(name="selective-check", help="Check if available docker resources are enough.")

Review Comment:
   Right! 🤦 . Fixed!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] edithturn commented on pull request #24610: Convert selective checks to Breeze Python

Posted by GitBox <gi...@apache.org>.
edithturn commented on PR #24610:
URL: https://github.com/apache/airflow/pull/24610#issuecomment-1173930350

   It was merged, well done @potiuk Jarek 🥳 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] potiuk commented on pull request #24610: Convert selective checks to Breeze Python

Posted by GitBox <gi...@apache.org>.
potiuk commented on PR #24610:
URL: https://github.com/apache/airflow/pull/24610#issuecomment-1166162319

   All Green!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] github-actions[bot] commented on pull request #24610: Convert selective checks to Breeze Python

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on PR #24610:
URL: https://github.com/apache/airflow/pull/24610#issuecomment-1166195199

   The PR most likely needs to run full matrix of tests because it modifies parts of the core of Airflow. However, committers might decide to merge it quickly and take the risk. If they don't merge it quickly - please rebase it to the latest main at your convenience, or amend the last commit of the PR, and push it with --force-with-lease.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] potiuk merged pull request #24610: Convert selective checks to Breeze Python

Posted by GitBox <gi...@apache.org>.
potiuk merged PR #24610:
URL: https://github.com/apache/airflow/pull/24610


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] potiuk commented on pull request #24610: Convert selective checks to Breeze Python

Posted by GitBox <gi...@apache.org>.
potiuk commented on PR #24610:
URL: https://github.com/apache/airflow/pull/24610#issuecomment-1163505222

   cc: @edithturn @bowrna : -800 lines of bash less and much easier "logic" to grasp for selective checks implementation


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org