You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2022/03/30 03:41:11 UTC
[GitHub] [airflow] dviru opened a new issue #22612: Schedular going down for 1-2 minute on every 10 minute as increase completed pods in EKS
dviru opened a new issue #22612:
URL: https://github.com/apache/airflow/issues/22612
### Apache Airflow version
2.2.4 (latest released)
### What happened
Hi Team, I am using airflow 2.2.4 and deployed it on aws eks cluster. I noticed that every 5-10 minute schedular down message seeing on airflow UI. When I checked airflow schedular log, seeing the lot of below statements.
`[2022-03-21 08:21:21,640] {kubernetes_executor.py:729} INFO - Attempting to adopt pod sampletask.05b6f567b4a64bd5beb16e526ba94d7a`
This above statement will print for all completed pod which exist in eks, But it is repeating multiple time and as also invoking the PATCH api.
As per my understanding what happing is, below code pulling all the completed pod details for every time from EKS cluster and invoking the patch API on completed pod. So this activity for 1000 completed POD finishing in 1 minute, for 7000 completed POD its taking 3-5 minute, thats the reason scheduler is going down
<img width="1054" alt="160352813-9ff57de3-782f-4cee-8f7c-f6d5b8a60d29" src="https://user-images.githubusercontent.com/10843400/160741990-838f15e2-485c-4c9a-8ca7-c7014e14f0b4.png">
### What you think should happen instead
This schedular will be healthy when we set "delete_worker_pods = True". but when set delete_worker_pods =False and completed pod count goes to 7000 to 10,000 The scheduler should goes down.
The scheduler should be healthy irrespective of how many completed pod exist in EKS cluster.
### How to reproduce
Deploy airflow in k8s cluster and set "delete_worker_pods = False". once completed pod reaches 7,000 to 10,000, you will able to see this issue.
### Operating System
OS:Debian GNU/Linux, VERSION: 10
### Versions of Apache Airflow Providers
_No response_
### Deployment
Other Docker-based deployment
### Deployment details
_No response_
### Anything else
_No response_
### Are you willing to submit PR?
- [X] Yes I am willing to submit a PR!
### Code of Conduct
- [X] I agree to follow this project's [Code of Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] boring-cyborg[bot] commented on issue #22612: Schedular going down for 1-2 minute on every 10 minute as increase completed pods in EKS
Posted by GitBox <gi...@apache.org>.
boring-cyborg[bot] commented on issue #22612:
URL: https://github.com/apache/airflow/issues/22612#issuecomment-1082588605
Thanks for opening your first issue here! Be sure to follow the issue template!
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] potiuk commented on issue #22612: Schedular going down for 1-2 minute on every 10 minute as increase completed pods in EKS
Posted by GitBox <gi...@apache.org>.
potiuk commented on issue #22612:
URL: https://github.com/apache/airflow/issues/22612#issuecomment-1082966752
cc: @dstandish -> what we talked about :)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org