You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2020/10/12 15:58:54 UTC

[GitHub] [airflow] potiuk opened a new pull request #11474: Adds dependencies between test jobs to avoid cancel starvation

potiuk opened a new pull request #11474:
URL: https://github.com/apache/airflow/pull/11474


   This PR adds dependencies between postgres/mysql/sqlite jobs
   which is the simplest way to avoid starvation of job cancelling.
   
   Currently several PRs with > 120 jobs might easily saturate the
   queue of requests and it means that when there are many PRs
   queueing, they might starve "workflow_run" jobs - those jobs
   are responsible for cancelling other duplicate/failed requests,
   so if they are not run for a while the jobs that should be
   cancelled might be cancelled much later than they should,
   probably already when they used a lot of build time.
   
   By adding dependencies between the different database tests,
   we are adding a bit artifficial delays between the steps - this
   means that a single PR mght run slower, but it also means that
   it will not starve cancel jobs from the new PRs and they might
   have a chance to clean the duplicate/failed jobs more efficiently.
   
   <!--
   Thank you for contributing! Please make sure that your code changes
   are covered with tests. And in case of new features or big changes
   remember to adjust the documentation.
   
   Feel free to ping committers for the review!
   
   In case of existing issue, reference it using one of the following:
   
   closes: #ISSUE
   related: #ISSUE
   
   How to write a good git commit message:
   http://chris.beams.io/posts/git-commit/
   -->
   
   ---
   **^ Add meaningful description above**
   
   Read the **[Pull Request Guidelines](https://github.com/apache/airflow/blob/master/CONTRIBUTING.rst#pull-request-guidelines)** for more information.
   In case of fundamental code change, Airflow Improvement Proposal ([AIP](https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+Improvements+Proposals)) is needed.
   In case of a new dependency, check compliance with the [ASF 3rd Party License Policy](https://www.apache.org/legal/resolved.html#category-x).
   In case of backwards incompatible changes please leave a note in [UPDATING.md](https://github.com/apache/airflow/blob/master/UPDATING.md).
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on a change in pull request #11474: Adds dependencies between test jobs to avoid cancel starvation

Posted by GitBox <gi...@apache.org>.
potiuk commented on a change in pull request #11474:
URL: https://github.com/apache/airflow/pull/11474#discussion_r503407117



##########
File path: .github/workflows/ci.yml
##########
@@ -363,7 +363,7 @@ jobs:
     timeout-minutes: 60
     name: "${{matrix.test-type}}:MySQL${{matrix.mysql-version}}, Py${{matrix.python-version}}"
     runs-on: ubuntu-latest
-    needs: [build-info, ci-images]
+    needs: [build-info, ci-images, tests-postgres]

Review comment:
       If this one causes too many troubles I will implement it this way ^^




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] kaxil commented on a change in pull request #11474: Adds dependencies between test jobs to avoid cancel starvation

Posted by GitBox <gi...@apache.org>.
kaxil commented on a change in pull request #11474:
URL: https://github.com/apache/airflow/pull/11474#discussion_r503415941



##########
File path: .github/workflows/ci.yml
##########
@@ -363,7 +363,7 @@ jobs:
     timeout-minutes: 60
     name: "${{matrix.test-type}}:MySQL${{matrix.mysql-version}}, Py${{matrix.python-version}}"
     runs-on: ubuntu-latest
-    needs: [build-info, ci-images]
+    needs: [build-info, ci-images, tests-postgres]

Review comment:
       Shouldn't that still be under our limit of 5000?
   
   i.e 20 builds in an hour -> 20 *100 jobs per build = 2000  
   




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on a change in pull request #11474: Adds dependencies between test jobs to avoid cancel starvation

Posted by GitBox <gi...@apache.org>.
potiuk commented on a change in pull request #11474:
URL: https://github.com/apache/airflow/pull/11474#discussion_r503416984



##########
File path: .github/workflows/ci.yml
##########
@@ -363,7 +363,7 @@ jobs:
     timeout-minutes: 60
     name: "${{matrix.test-type}}:MySQL${{matrix.mysql-version}}, Py${{matrix.python-version}}"
     runs-on: ubuntu-latest
-    needs: [build-info, ci-images]
+    needs: [build-info, ci-images, tests-postgres]

Review comment:
       > I would rather suggest using that instead of making these jobs serial. It will just add more time and would not take advantage of the recently merged optimizations
   
   They will still use it, especially when there are more jobs in the queue. But let's see how canceling duplicate masters (already merged) will help it, and then we can see what's next. I would like to see this one working and see what effect it will have on a busy system though before i implement external canceling.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on a change in pull request #11474: Adds dependencies between test jobs to avoid cancel starvation

Posted by GitBox <gi...@apache.org>.
potiuk commented on a change in pull request #11474:
URL: https://github.com/apache/airflow/pull/11474#discussion_r503413893



##########
File path: .github/workflows/ci.yml
##########
@@ -363,7 +363,7 @@ jobs:
     timeout-minutes: 60
     name: "${{matrix.test-type}}:MySQL${{matrix.mysql-version}}, Py${{matrix.python-version}}"
     runs-on: ubuntu-latest
-    needs: [build-info, ci-images]
+    needs: [build-info, ci-images, tests-postgres]

Review comment:
       BTW. We are really stressing GA today. We have about 11 PRs + master builds now in the queue - and we had like 16 or 17. In total, we had ~ 70 builds today in 12 hours. This is quite a lot - usually we have that much in 24 hours so today seems to be a "peak" day. What adds up is that we disabled master canceling and we had 13 full master push builds today. Usually we have ca.  6-10 /day (and a lot of them cancelled)




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on a change in pull request #11474: Adds dependencies between test jobs to avoid cancel starvation

Posted by GitBox <gi...@apache.org>.
potiuk commented on a change in pull request #11474:
URL: https://github.com/apache/airflow/pull/11474#discussion_r503422454



##########
File path: .github/workflows/ci.yml
##########
@@ -363,7 +363,7 @@ jobs:
     timeout-minutes: 60
     name: "${{matrix.test-type}}:MySQL${{matrix.mysql-version}}, Py${{matrix.python-version}}"
     runs-on: ubuntu-latest
-    needs: [build-info, ci-images]
+    needs: [build-info, ci-images, tests-postgres]

Review comment:
       I checked at Beaam: https://github.com/apache/beam and Skywalking: https://github.com/apache/skywalking where I helped to consult GA introduction and seems they also have a lot of delays. Opening issue to GA support.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on a change in pull request #11474: Adds dependencies between test jobs to avoid cancel starvation

Posted by GitBox <gi...@apache.org>.
potiuk commented on a change in pull request #11474:
URL: https://github.com/apache/airflow/pull/11474#discussion_r503419081



##########
File path: .github/workflows/ci.yml
##########
@@ -363,7 +363,7 @@ jobs:
     timeout-minutes: 60
     name: "${{matrix.test-type}}:MySQL${{matrix.mysql-version}}, Py${{matrix.python-version}}"
     runs-on: ubuntu-latest
-    needs: [build-info, ci-images]
+    needs: [build-info, ci-images, tests-postgres]

Review comment:
       BTW. I think what the real problem is now is that we have maybe 20 jobs running now not 150 - it looks like at least. so we might start being starved by other Apache projects like we had with Travis.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] kaxil commented on a change in pull request #11474: Adds dependencies between test jobs to avoid cancel starvation

Posted by GitBox <gi...@apache.org>.
kaxil commented on a change in pull request #11474:
URL: https://github.com/apache/airflow/pull/11474#discussion_r503408874



##########
File path: .github/workflows/ci.yml
##########
@@ -363,7 +363,7 @@ jobs:
     timeout-minutes: 60
     name: "${{matrix.test-type}}:MySQL${{matrix.mysql-version}}, Py${{matrix.python-version}}"
     runs-on: ubuntu-latest
-    needs: [build-info, ci-images]
+    needs: [build-info, ci-images, tests-postgres]

Review comment:
       I would rather suggest using that instead of making these jobs serial. It will just add more time and would not take advantage of the recently merged optimizations




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on a change in pull request #11474: Adds dependencies between test jobs to avoid cancel starvation

Posted by GitBox <gi...@apache.org>.
potiuk commented on a change in pull request #11474:
URL: https://github.com/apache/airflow/pull/11474#discussion_r503424396



##########
File path: .github/workflows/ci.yml
##########
@@ -363,7 +363,7 @@ jobs:
     timeout-minutes: 60
     name: "${{matrix.test-type}}:MySQL${{matrix.mysql-version}}, Py${{matrix.python-version}}"
     runs-on: ubuntu-latest
-    needs: [build-info, ci-images]
+    needs: [build-info, ci-images, tests-postgres]

Review comment:
       Ticket created: https://support.github.com/ticket/personal/0/867235




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on pull request #11474: Adds dependencies between test jobs to avoid cancel starvation

Posted by GitBox <gi...@apache.org>.
potiuk commented on pull request #11474:
URL: https://github.com/apache/airflow/pull/11474#issuecomment-707832938


   This has been actually implemented in #11504 - the way I implemented it there is a bit different - rather than wait between different database tests, sequential processing is done for tests types. This approach is much more efficient because we can re-use the machine we already have (cleaning the docker state in-between tests), we loose even less time for pulling the image (the image is pulled once per all test types for the same python/db combination)
   
   By using docker, we can not only effectively clean-up between the states but also we  can print some resources stats beteween the tests, that will help us to know if the problems come from memory or disk or other resource limits. 
   
   Closing it as it was a bad idea to do it this way.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] kaxil commented on a change in pull request #11474: Adds dependencies between test jobs to avoid cancel starvation

Posted by GitBox <gi...@apache.org>.
kaxil commented on a change in pull request #11474:
URL: https://github.com/apache/airflow/pull/11474#discussion_r503395401



##########
File path: .github/workflows/ci.yml
##########
@@ -363,7 +363,7 @@ jobs:
     timeout-minutes: 60
     name: "${{matrix.test-type}}:MySQL${{matrix.mysql-version}}, Py${{matrix.python-version}}"
     runs-on: ubuntu-latest
-    needs: [build-info, ci-images]
+    needs: [build-info, ci-images, tests-postgres]

Review comment:
       So these tests would run sequentially now?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk closed pull request #11474: Adds dependencies between test jobs to avoid cancel starvation

Posted by GitBox <gi...@apache.org>.
potiuk closed pull request #11474:
URL: https://github.com/apache/airflow/pull/11474


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on a change in pull request #11474: Adds dependencies between test jobs to avoid cancel starvation

Posted by GitBox <gi...@apache.org>.
potiuk commented on a change in pull request #11474:
URL: https://github.com/apache/airflow/pull/11474#discussion_r503413893



##########
File path: .github/workflows/ci.yml
##########
@@ -363,7 +363,7 @@ jobs:
     timeout-minutes: 60
     name: "${{matrix.test-type}}:MySQL${{matrix.mysql-version}}, Py${{matrix.python-version}}"
     runs-on: ubuntu-latest
-    needs: [build-info, ci-images]
+    needs: [build-info, ci-images, tests-postgres]

Review comment:
       BTW. We are really stressing GA today. We have about 11 PRs + master builds now in the queue - and we had like 16 or 17. In total, we had ~ 70 builds today in 12 hours. This is quite a lot - usually we have that much in 24 hours so today seems to be a "peak" day. What adds up is that we disabled master canceling and we had 13 full master push builds today. Usually we have ca.  6-10 /day.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on a change in pull request #11474: Adds dependencies between test jobs to avoid cancel starvation

Posted by GitBox <gi...@apache.org>.
potiuk commented on a change in pull request #11474:
URL: https://github.com/apache/airflow/pull/11474#discussion_r503406602



##########
File path: .github/workflows/ci.yml
##########
@@ -363,7 +363,7 @@ jobs:
     timeout-minutes: 60
     name: "${{matrix.test-type}}:MySQL${{matrix.mysql-version}}, Py${{matrix.python-version}}"
     runs-on: ubuntu-latest
-    needs: [build-info, ci-images]
+    needs: [build-info, ci-images, tests-postgres]

Review comment:
       Aa discussed with Ash - without solving canceling on "scheduler" level by GitHub, we can at most move between "faster builds" and "snappier cancels" . GitHubs works on it:
   
   ```
   Hi Jarek,
   Thank you for sending this in. We have an open feature request for this and have added your feedback.
   I cannot say if or when this will be implemented. Keep an eye on the release notes or the GitHub Blog for information on when/if this makes it into the product.
   Cheers,
   Olliec
   ```
   
   And we also might want to have an external job with a token that has only "workflow" scope, canceling the builds.<
   I actually thought about this and I might want to do just that - I can modify my "cancel" action slightly to be able to cancel builds from another project (and use a personal token for that with only "workflow" scope and nothing else). That should be rather safe. 
   
   I can run scheduled Github Action workflow from a personal project (so that it does not compete with Apache Queue). Should be rather safe. I only have to take care about rate limiting. Personal projects have 1000 requests/hr limit (much lower than 5000 we have) and I already exceeded it several times when running the cancel action too frequently. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on a change in pull request #11474: Adds dependencies between test jobs to avoid cancel starvation

Posted by GitBox <gi...@apache.org>.
potiuk commented on a change in pull request #11474:
URL: https://github.com/apache/airflow/pull/11474#discussion_r503398005



##########
File path: .github/workflows/ci.yml
##########
@@ -363,7 +363,7 @@ jobs:
     timeout-minutes: 60
     name: "${{matrix.test-type}}:MySQL${{matrix.mysql-version}}, Py${{matrix.python-version}}"
     runs-on: ubuntu-latest
-    needs: [build-info, ci-images]
+    needs: [build-info, ci-images, tests-postgres]

Review comment:
       Correct. They will run in "per-db" "batches" .  It's not as bad as it seems @dimberman - they will slow down an individual PR if the queue is empty (I think we have up to 100-150 slots in our Queue, not sure exactly). But when there are like 10 PRs (as we currently have) competing for the queue and for example one of them fails in the "static check" phase, there is much higher chance that a new PR or one of those waiting will run "cancel" job much faster and free the queue, which will make the "good" PRs complete faster.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] kaxil commented on a change in pull request #11474: Adds dependencies between test jobs to avoid cancel starvation

Posted by GitBox <gi...@apache.org>.
kaxil commented on a change in pull request #11474:
URL: https://github.com/apache/airflow/pull/11474#discussion_r503415941



##########
File path: .github/workflows/ci.yml
##########
@@ -363,7 +363,7 @@ jobs:
     timeout-minutes: 60
     name: "${{matrix.test-type}}:MySQL${{matrix.mysql-version}}, Py${{matrix.python-version}}"
     runs-on: ubuntu-latest
-    needs: [build-info, ci-images]
+    needs: [build-info, ci-images, tests-postgres]

Review comment:
       Shouldn't that still be under our limit of 5000?
   
   i.e 20 builds in an hour -> 20 *100 jobs per build = 2000  
   
   Even if 150 jobs per build, we would be around 3000
   




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] dimberman commented on pull request #11474: Adds dependencies between test jobs to avoid cancel starvation

Posted by GitBox <gi...@apache.org>.
dimberman commented on pull request #11474:
URL: https://github.com/apache/airflow/pull/11474#issuecomment-707206541


   @potiuk won't this make runs take a really long time or am I missing something?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org