You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2020/10/10 16:09:15 UTC

[GitHub] [airflow] potiuk opened a new pull request #11402: Split tests to more sub-types

potiuk opened a new pull request #11402:
URL: https://github.com/apache/airflow/pull/11402


   We seem to have a problem with running all tests at once - most
   likely due to some resource problems in our CI, therefore it makes
   sense to split the tests into more batches. This is not yet full
   implementation of selective tests but it is going in this direction
   by splitting to Core/Providers/API/CLI tests. The full selective
   tests approach will be implemented as part of #10507 issue.
   
   This split is possible thanks to #10422 which moved building image
   to a separate workflow - this way each image is only built once
   and it is uploaded to a shared registry, where it is quickly
   downloaded from rather than built by all the jobs separately - this
   way we can have many more jobs as there is very little per-job
   overhead before the tests start runnning.
   
   <!--
   Thank you for contributing! Please make sure that your code changes
   are covered with tests. And in case of new features or big changes
   remember to adjust the documentation.
   
   Feel free to ping committers for the review!
   
   In case of existing issue, reference it using one of the following:
   
   closes: #ISSUE
   related: #ISSUE
   
   How to write a good git commit message:
   http://chris.beams.io/posts/git-commit/
   -->
   
   ---
   **^ Add meaningful description above**
   
   Read the **[Pull Request Guidelines](https://github.com/apache/airflow/blob/master/CONTRIBUTING.rst#pull-request-guidelines)** for more information.
   In case of fundamental code change, Airflow Improvement Proposal ([AIP](https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+Improvements+Proposals)) is needed.
   In case of a new dependency, check compliance with the [ASF 3rd Party License Policy](https://www.apache.org/legal/resolved.html#category-x).
   In case of backwards incompatible changes please leave a note in [UPDATING.md](https://github.com/apache/airflow/blob/master/UPDATING.md).
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on pull request #11402: Split tests to more sub-types

Posted by GitBox <gi...@apache.org>.
potiuk commented on pull request #11402:
URL: https://github.com/apache/airflow/pull/11402#issuecomment-706665870


   Hey Everyone. I think with this change we have a chance to finally reach stability of the tests.
   
   I split the tests into multiple jobs, and I think I have a very nice split. I believe with the split I introduced, we will hit resource limitations far less frequently (even if we run many more jobs). Each of the test jobs does not run longer than ~7 minutes (including pulling the image built once in the separate workflow). Additionally I've added all the test types to breeze, so that it will be really easy to reproduce every job type locally. Once (alsob built once) you have the image locally built with breeze it takes literally up to 4-5 minutes (or less depending on your machine type) to re-run the complete failed set and reproduce the failure.
   
   Additionally, in case of test job failure I print some useful instructions on how to reproduce such failed run locally. In case you have not noticed, it is now very easy to reproduce the failed build from CI using the RUN_ID from GitHub Actions (you can pull the very image that was used to run the tests). Now this information is printed out on failure of tests and it will be immediately visible by the author and committer and reproducing the failed tests will be a ..... BREEZE. 
   
   ```
   *******************************************************************************************************
   *
   * ERROR! Some tests failed, unfortunately. Those might be transient errors,
   *        but usually you have to fix something.
   *        see the above log for details.
   *
   *******************************************************************************************************
   *  You can easily reproduce the failed tests on your dev machine/
   *
   *   When you have the source branch checked out locally:
   *
   *     Run all tests:
   *
   *       ./breeze --backend postgres --python 3.6 --db-reset --test-type Core  tests
   *
   *     Enter docker shell:
   *
   *       ./breeze --backend postgres --python 3.6 --db-reset --test-type Core  shell
   *
   *   When you do not have sources:
   *
   *     Run all tests:
   *
   *       ./breeze --gihub-image-id NNNNNNNN --backend postgres --python 3.6 --db-reset --test-type Core  tests
   *
   *     Enter docker shell:
   *
   *       ./breeze --gihub-image-id NNNNNNNN  --backend postgres --python 3.6 --db-reset --test-type Core  shell
   *
   *
   *   NOTE! Once you are in the docker shell, you can run failed test with:
   *
   *            pytest [TEST_NAME]
   *
   *   You can copy the test name from the output above
   *
   ***************************************************************************************************************
   ```
   
   
   
   
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on pull request #11402: Split tests to more sub-types

Posted by GitBox <gi...@apache.org>.
potiuk commented on pull request #11402:
URL: https://github.com/apache/airflow/pull/11402#issuecomment-706573098


   This is an attempt to improve stability in our tests. I am still trying it - it will likely fail (I had to move tests from "tests" to "core" directory and it likely will cause some more troubles, but I think it's going in the right direction - we will have many less tests to run "per job" but many more jobs to run. I think that will be fine because those jobs will generally run much, much faster in general and I hope the 137 "errors" will be gone (I also move the "backfill_job" to Heisentests for now.
   
   The next step will be to only run subset of tests for non-core-related changes as described in #10507


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk edited a comment on pull request #11402: Split tests to more sub-types

Posted by GitBox <gi...@apache.org>.
potiuk edited a comment on pull request #11402:
URL: https://github.com/apache/airflow/pull/11402#issuecomment-706665870


   Hey Everyone. I think with this change we have a chance to finally reach stability of the tests.
   
   I split the tests into multiple jobs, and I think I have a very nice split. I believe with the split I introduced, we will hit resource limitations far less frequently (even if we run many more jobs). Each of the test jobs does not run longer than ~7 minutes (including pulling the image built once in the separate workflow). Additionally I've added all the test types to breeze, so that it will be really easy to reproduce every job type locally. Once (alsob built once) you have the image locally built with breeze it takes literally up to 4-5 minutes (or less depending on your machine type) to re-run the complete failed set and reproduce the failure.
   
   Additionally, in case of test job failure I print some useful instructions on how to reproduce such failed run locally. In case you have not noticed, it is now very easy to reproduce the failed build from CI using the RUN_ID from GitHub Actions (you can pull the very image that was used to run the tests). Now this information is printed out on failure of tests and it will be immediately visible by the author and committer and reproducing the failed tests will be a ..... BREEZE. 
   
   ```
   *******************************************************************************************************
   *
   * ERROR! Some tests failed, unfortunately. Those might be transient errors,
   *        but usually you have to fix something.
   *        See the above log for details.
   *
   *******************************************************************************************************
   *  You can easily reproduce the failed tests on your dev machine/
   *
   *   When you have the source branch checked out locally:
   *
   *     Run all tests:
   *
   *       ./breeze --backend postgres --python 3.6 --db-reset --test-type Core  tests
   *
   *     Enter docker shell:
   *
   *       ./breeze --backend postgres --python 3.6 --db-reset --test-type Core  shell
   *
   *   When you do not have sources:
   *
   *     Run all tests:
   *
   *       ./breeze --gihub-image-id NNNNNNNN --backend postgres --python 3.6 --db-reset --test-type Core  tests
   *
   *     Enter docker shell:
   *
   *       ./breeze --gihub-image-id NNNNNNNN  --backend postgres --python 3.6 --db-reset --test-type Core  shell
   *
   *
   *   NOTE! Once you are in the docker shell, you can run failed test with:
   *
   *            pytest [TEST_NAME]
   *
   *   You can copy the test name from the output above
   *
   ***************************************************************************************************************
   ```
   
   
   
   
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on pull request #11402: Split tests to more sub-types

Posted by GitBox <gi...@apache.org>.
potiuk commented on pull request #11402:
URL: https://github.com/apache/airflow/pull/11402#issuecomment-706677939


   Some of the tests about processes were flaky with dropping connections to MySQL/Postgres. I moved them to Quarantined. However I have not seen yet a single transient error caused by resource problems (Exit 137). So it looks really good. Paired with teh workaround I implemented for the "unknown blob" problem (#11411 ) we might be finally back to a reasonably stable state of tests. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on pull request #11402: Split tests to more sub-types

Posted by GitBox <gi...@apache.org>.
potiuk commented on pull request #11402:
URL: https://github.com/apache/airflow/pull/11402#issuecomment-706719998


   Just look out for #11417! This will be the killer one. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on pull request #11402: Split tests to more sub-types

Posted by GitBox <gi...@apache.org>.
potiuk commented on pull request #11402:
URL: https://github.com/apache/airflow/pull/11402#issuecomment-706748776


   I already run several hundreds of those tests and not a single intermittent problem as of yet. I have really high hopes for this one!


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk edited a comment on pull request #11402: Split tests to more sub-types

Posted by GitBox <gi...@apache.org>.
potiuk edited a comment on pull request #11402:
URL: https://github.com/apache/airflow/pull/11402#issuecomment-706677939


   Some of the tests about processes were flaky with dropping connections to MySQL/Postgres. I moved them to Quarantined. However, I have not seen yet a single transient error caused by resource problems (Exit 137). So it looks really good. Paired with the workaround I implemented for the "unknown blob" problem (#11411 ) we might be finally back to a reasonably stable state of tests. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] dimberman commented on pull request #11402: Split tests to more sub-types

Posted by GitBox <gi...@apache.org>.
dimberman commented on pull request #11402:
URL: https://github.com/apache/airflow/pull/11402#issuecomment-706714745


   Yes exactly, before we couldn't do this because we'd have to rebuild the image a bunch of times, but I think this will be great for reducing strain on the CI


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] github-actions[bot] commented on pull request #11402: Split tests to more sub-types

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #11402:
URL: https://github.com/apache/airflow/pull/11402#issuecomment-706573197


   [The Workflow run](https://github.com/apache/airflow/actions/runs/299356179) is cancelling this PR. Building image for the PR has been cancelled


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ashb commented on pull request #11402: Split tests to more sub-types

Posted by GitBox <gi...@apache.org>.
ashb commented on pull request #11402:
URL: https://github.com/apache/airflow/pull/11402#issuecomment-706748036


   Nice one!


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on pull request #11402: Split tests to more sub-types

Posted by GitBox <gi...@apache.org>.
potiuk commented on pull request #11402:
URL: https://github.com/apache/airflow/pull/11402#issuecomment-706697720


   Succes! I got green build. Thera are a few follow-up tasks with Qurantined tests that need some love, and implementing full selective tests (I run some last tests with it) and we might be back in Green CI business. 
   
   Looking forward to reviews, but I think it's going to be quite a game-changer.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] dimberman merged pull request #11402: Split tests to more sub-types

Posted by GitBox <gi...@apache.org>.
dimberman merged pull request #11402:
URL: https://github.com/apache/airflow/pull/11402


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk edited a comment on pull request #11402: Split tests to more sub-types

Posted by GitBox <gi...@apache.org>.
potiuk edited a comment on pull request #11402:
URL: https://github.com/apache/airflow/pull/11402#issuecomment-706665870


   Hey Everyone. I think with this change we have a chance to finally reach stability of the tests.
   
   I split the tests into multiple jobs, and I think I have a very nice split. I believe with the split I introduced, we will hit resource limitations far less frequently (even if we run many more jobs). Each of the test jobs does not run longer than ~7 minutes (including pulling the image built once in the separate workflow). Additionally I've added all the test types to breeze, so that it will be really easy to reproduce every test type locally. If (also built only once) you have the image locally built with breeze it takes literally up to 4-5 minutes (or less depending on your machine type) to re-run the complete failed set and reproduce the failure.
   
   Additionally, in case of test job failure I print some useful instructions on how to reproduce such failed run locally. In case you have not noticed, it is now very easy to reproduce the failed build from CI using the RUN_ID from GitHub Actions (you can pull the very image that was used to run the tests). Now this information is printed out on failure of tests and it will be immediately visible by the author and committer and reproducing the failed tests will be a ..... BREEZE. 
   
   ```
   *******************************************************************************************************
   *
   * ERROR! Some tests failed, unfortunately. Those might be transient errors,
   *        but usually you have to fix something.
   *        See the above log for details.
   *
   *******************************************************************************************************
   *  You can easily reproduce the failed tests on your dev machine/
   *
   *   When you have the source branch checked out locally:
   *
   *     Run all tests:
   *
   *       ./breeze --backend postgres --python 3.6 --db-reset --test-type Core  tests
   *
   *     Enter docker shell:
   *
   *       ./breeze --backend postgres --python 3.6 --db-reset --test-type Core  shell
   *
   *   When you do not have sources:
   *
   *     Run all tests:
   *
   *       ./breeze --gihub-image-id NNNNNNNN --backend postgres --python 3.6 --db-reset --test-type Core  tests
   *
   *     Enter docker shell:
   *
   *       ./breeze --gihub-image-id NNNNNNNN  --backend postgres --python 3.6 --db-reset --test-type Core  shell
   *
   *
   *   NOTE! Once you are in the docker shell, you can run failed test with:
   *
   *            pytest [TEST_NAME]
   *
   *   You can copy the test name from the output above
   *
   ***************************************************************************************************************
   ```
   
   
   
   
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk edited a comment on pull request #11402: Split tests to more sub-types

Posted by GitBox <gi...@apache.org>.
potiuk edited a comment on pull request #11402:
URL: https://github.com/apache/airflow/pull/11402#issuecomment-706665870


   Hey Everyone. I think with this change we have a chance to finally reach stability of the tests.
   
   I split the tests into multiple jobs, and I think I have a very nice split. I believe with the split I introduced, we will hit resource limitations far less frequently (even if we run many more jobs). Each of the test jobs does not run longer than ~7 minutes (including pulling the image built once in the separate workflow). Additionally I've added all the test types to breeze, so that it will be really easy to reproduce every test type locally. If (also built only once) you have the image locally built with breeze it takes literally up to 4-5 minutes (or less depending on your machine type) to re-run the complete failed set for the given test type and reproduce the failure, then you can enter Breeze and reproduce it one-by-one.
   
   Additionally, in case of test job failure I print some useful instructions on how to reproduce such failed run locally. In case you have not noticed, it is now very easy to reproduce the failed build from CI using the RUN_ID from GitHub Actions (you can pull the very image that was used to run the tests). Now this information is printed out on failure of tests and it will be immediately visible by the author and committer and reproducing the failed tests will be a ..... BREEZE. 
   
   ```
   *******************************************************************************************************
   *
   * ERROR! Some tests failed, unfortunately. Those might be transient errors,
   *        but usually you have to fix something.
   *        See the above log for details.
   *
   *******************************************************************************************************
   *  You can easily reproduce the failed tests on your dev machine/
   *
   *   When you have the source branch checked out locally:
   *
   *     Run all tests:
   *
   *       ./breeze --backend postgres --python 3.6 --db-reset --test-type Core  tests
   *
   *     Enter docker shell:
   *
   *       ./breeze --backend postgres --python 3.6 --db-reset --test-type Core  shell
   *
   *   When you do not have sources:
   *
   *     Run all tests:
   *
   *       ./breeze --gihub-image-id NNNNNNNN --backend postgres --python 3.6 --db-reset --test-type Core  tests
   *
   *     Enter docker shell:
   *
   *       ./breeze --gihub-image-id NNNNNNNN  --backend postgres --python 3.6 --db-reset --test-type Core  shell
   *
   *
   *   NOTE! Once you are in the docker shell, you can run failed test with:
   *
   *            pytest [TEST_NAME]
   *
   *   You can copy the test name from the output above
   *
   ***************************************************************************************************************
   ```
   
   
   
   
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on pull request #11402: Split tests to more sub-types

Posted by GitBox <gi...@apache.org>.
potiuk commented on pull request #11402:
URL: https://github.com/apache/airflow/pull/11402#issuecomment-706721329


   BTW. @dimberman -> This is where I wanted to get with CI when I joined the project ~ 2 years ago :). 
   
   Pretty much  ALL the work I've done with Breeze and CI was to reach this very point where we can do this thing and massively speed CI up.
   
   t was maaaaaaaaany PRs to get us here :). 
   
   Especially that now it will be so easy and straightforward to reproduce any failure locally. This is what I am especially happy about - that when one of those jobs fails for a good reason, It's literally one command to reproduce the failed build and another to enter the container and re-run the test.
   
   It should take literally a few minutes now, to reproduce any failure and we even show you in the logs how you can do it with breeze.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk edited a comment on pull request #11402: Split tests to more sub-types

Posted by GitBox <gi...@apache.org>.
potiuk edited a comment on pull request #11402:
URL: https://github.com/apache/airflow/pull/11402#issuecomment-706665870


   Hey Everyone. I think with this change we have a chance to finally reach stability of the tests.
   
   I split the tests into multiple jobs, and I think I have a very nice split. I believe with the split I introduced, we will hit resource limitations far less frequently (even if we run many more jobs). Each of the test jobs does not run longer than ~7 minutes (including pulling the image built once in the separate workflow). Additionally I've added all the test types to breeze, so that it will be really easy to reproduce every test type locally. Once (alsob built once) you have the image locally built with breeze it takes literally up to 4-5 minutes (or less depending on your machine type) to re-run the complete failed set and reproduce the failure.
   
   Additionally, in case of test job failure I print some useful instructions on how to reproduce such failed run locally. In case you have not noticed, it is now very easy to reproduce the failed build from CI using the RUN_ID from GitHub Actions (you can pull the very image that was used to run the tests). Now this information is printed out on failure of tests and it will be immediately visible by the author and committer and reproducing the failed tests will be a ..... BREEZE. 
   
   ```
   *******************************************************************************************************
   *
   * ERROR! Some tests failed, unfortunately. Those might be transient errors,
   *        but usually you have to fix something.
   *        See the above log for details.
   *
   *******************************************************************************************************
   *  You can easily reproduce the failed tests on your dev machine/
   *
   *   When you have the source branch checked out locally:
   *
   *     Run all tests:
   *
   *       ./breeze --backend postgres --python 3.6 --db-reset --test-type Core  tests
   *
   *     Enter docker shell:
   *
   *       ./breeze --backend postgres --python 3.6 --db-reset --test-type Core  shell
   *
   *   When you do not have sources:
   *
   *     Run all tests:
   *
   *       ./breeze --gihub-image-id NNNNNNNN --backend postgres --python 3.6 --db-reset --test-type Core  tests
   *
   *     Enter docker shell:
   *
   *       ./breeze --gihub-image-id NNNNNNNN  --backend postgres --python 3.6 --db-reset --test-type Core  shell
   *
   *
   *   NOTE! Once you are in the docker shell, you can run failed test with:
   *
   *            pytest [TEST_NAME]
   *
   *   You can copy the test name from the output above
   *
   ***************************************************************************************************************
   ```
   
   
   
   
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] dimberman commented on pull request #11402: Split tests to more sub-types

Posted by GitBox <gi...@apache.org>.
dimberman commented on pull request #11402:
URL: https://github.com/apache/airflow/pull/11402#issuecomment-706714449


   @potiuk Oh heck yes! I was about to suggest exactly this! 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org