You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2021/01/12 14:16:21 UTC
[GitHub] [airflow] OliverPfau opened a new issue #13637: Scheduler takes 100% of CPU without task execution
OliverPfau opened a new issue #13637:
URL: https://github.com/apache/airflow/issues/13637
Hi,
running with python 3.6.9 the scheduler is consuming much CPU time without execution any task:
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
15758 oli 20 0 42252 3660 3124 R 100.0 0.0 0:00.06 top
16764 oli 20 0 590272 90648 15468 R 200.0 0.3 0:00.59 airflow schedul
16769 oli 20 0 588808 77236 13900 R 200.0 0.3 0:00.55 airflow schedul
1 root 20 0 1088 548 516 S 0.0 0.0 0:13.28 init
10 root 20 0 900 80 16 S 0.0 0.0 0:00.00 init
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] yuzeh commented on issue #13637: Scheduler takes 100% of CPU without task execution
Posted by GitBox <gi...@apache.org>.
yuzeh commented on issue #13637:
URL: https://github.com/apache/airflow/issues/13637#issuecomment-831002551
To add to @bobo333 's comment, here's the before + after on CPU utilization:
![image](https://user-images.githubusercontent.com/351023/116839049-a27c7e80-ab85-11eb-9c84-fd4150723456.png)
Other info about our deployment:
- Python 3.7
- Everything runs in docker-compose with a custom built docker image
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] linxiaohui commented on issue #13637: Scheduler takes 100% of CPU without task execution
Posted by GitBox <gi...@apache.org>.
linxiaohui commented on issue #13637:
URL: https://github.com/apache/airflow/issues/13637#issuecomment-791086852
Hi everypone, I am using apache-airflow 2.0.1 (using CeleryExecuor), an I have set `min_file_process_interval = 60` and `parsing_processes = 2`. But still the scheduler CPU using is very high. (according the `top` )
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
32057 root 20 0 1753140 121540 11728 R **138.7** 0.0 0:04.19 airflow schedul
32093 root 20 0 1578452 61000 4812 R **90.4** 0.0 0:02.73 airflow schedul
any help? thanks a lot
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] fmv1992 commented on issue #13637: Scheduler takes 100% of CPU without task execution
Posted by GitBox <gi...@apache.org>.
fmv1992 commented on issue #13637:
URL: https://github.com/apache/airflow/issues/13637#issuecomment-892768703
Setting the following helped immensely:
```
[core]
min_serialized_dag_update_interval = 600
min_serialized_dag_fetch_interval = 300
[scheduler]
min_file_process_interval = 600
processor_poll_interval = 5
parsing_processes = 2
```
Some of these are suitable for our use case, I believe `min_file_process_interval = 600` might be the minimal required change.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] fmv1992 commented on issue #13637: Scheduler takes 100% of CPU without task execution
Posted by GitBox <gi...@apache.org>.
fmv1992 commented on issue #13637:
URL: https://github.com/apache/airflow/issues/13637#issuecomment-892019154
We upgraded from `airflow` `1.10.15` to `2.0.2` and are experiencing this. @kaxil, configs:
* `[scheduler] parsing_processes`: `2`.
* `[scheduler] min_file_process_interval`: `60`.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] sreeram004 commented on issue #13637: Scheduler takes 100% of CPU without task execution
Posted by GitBox <gi...@apache.org>.
sreeram004 commented on issue #13637:
URL: https://github.com/apache/airflow/issues/13637#issuecomment-886045650
I'm seeing same issue in Airflow 2.1.2. Any known fixes?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] veinkr commented on issue #13637: Scheduler takes 100% of CPU without task execution
Posted by GitBox <gi...@apache.org>.
veinkr commented on issue #13637:
URL: https://github.com/apache/airflow/issues/13637#issuecomment-1080843971
Hi, It seems that the issue still exists.
I'm using Airflow 2.2.4 with Python3.9 on local,
`airflow scheduler - DagFileProcessor xxx.py`
It show my `top ` every seconds with different pid , does it will create process every seconds?
My config is below.
```
[core]
min_serialized_dag_update_interval = 600
min_serialized_dag_fetch_interval = 300
[scheduler]
min_file_process_interval = 600
parsing_processes = 2
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] kaxil commented on issue #13637: Scheduler takes 100% of CPU without task execution
Posted by GitBox <gi...@apache.org>.
kaxil commented on issue #13637:
URL: https://github.com/apache/airflow/issues/13637#issuecomment-759435015
It will be fixed in 2.0.1 by default (it has always been the case -- even for 1.10.x), for now, you can change `[scheduler] min_file_process_interval = 1` or `2`.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] veinkr edited a comment on issue #13637: Scheduler takes 100% of CPU without task execution
Posted by GitBox <gi...@apache.org>.
veinkr edited a comment on issue #13637:
URL: https://github.com/apache/airflow/issues/13637#issuecomment-1080843971
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] kaxil commented on issue #13637: Scheduler takes 100% of CPU without task execution
Posted by GitBox <gi...@apache.org>.
kaxil commented on issue #13637:
URL: https://github.com/apache/airflow/issues/13637#issuecomment-872613748
@david30907d What are the values for the following?
- `[scheduler] parsing_processes`
- `[scheduler] min_file_process_interval`
And how much CPU is allocated to the Scheduler POD?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] fmv1992 commented on issue #13637: Scheduler takes 100% of CPU without task execution
Posted by GitBox <gi...@apache.org>.
fmv1992 commented on issue #13637:
URL: https://github.com/apache/airflow/issues/13637#issuecomment-892768703
Setting the following helped immensely:
```
[core]
min_serialized_dag_update_interval = 600
min_serialized_dag_fetch_interval = 300
[scheduler]
min_file_process_interval = 600
processor_poll_interval = 5
parsing_processes = 2
```
Some of these are suitable for our use case, I believe `min_file_process_interval = 600` might be the minimal required change.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] bobo333 commented on issue #13637: Scheduler takes 100% of CPU without task execution
Posted by GitBox <gi...@apache.org>.
bobo333 commented on issue #13637:
URL: https://github.com/apache/airflow/issues/13637#issuecomment-831001974
hi, we (@yuzeh and I) are also seeing this behavior on airflow `2.0.2` using `LocalExecutor`. With `min_file_process_interval = 60` and `parsing_processes = 2` cpu usage for the scheduler is very similar to what @linxiaohui posted, both processes are using 90% or higher cpu, often up to 150%.
There are no active workloads running during this time either, those only run on the hour, but this behavior happens constantly.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] david30907d commented on issue #13637: Scheduler takes 100% of CPU without task execution
Posted by GitBox <gi...@apache.org>.
david30907d commented on issue #13637:
URL: https://github.com/apache/airflow/issues/13637#issuecomment-855172566
hi, is there any update about this issue?
I'm using Airflow 2.1.0 with Python3.8 on k8s (celery executor), scheduler pod also consume like 90% of CPU resource as others mentioned above
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] veinkr edited a comment on issue #13637: Scheduler takes 100% of CPU without task execution
Posted by GitBox <gi...@apache.org>.
veinkr edited a comment on issue #13637:
URL: https://github.com/apache/airflow/issues/13637#issuecomment-1080843971
Hi, It seems that the issue still exists.
I'm using Airflow 2.2.4 with Python3.9 on local,
`airflow scheduler - DagFileProcessor xxx.py`
It show my `top ` every seconds with **different pid** , does it will create process every seconds?
My config is below.
```
[core]
min_serialized_dag_update_interval = 600
min_serialized_dag_fetch_interval = 300
[scheduler]
min_file_process_interval = 600
parsing_processes = 2
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] kaxil closed issue #13637: Scheduler takes 100% of CPU without task execution
Posted by GitBox <gi...@apache.org>.
kaxil closed issue #13637:
URL: https://github.com/apache/airflow/issues/13637
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] gmontanola commented on issue #13637: Scheduler takes 100% of CPU without task execution
Posted by GitBox <gi...@apache.org>.
gmontanola commented on issue #13637:
URL: https://github.com/apache/airflow/issues/13637#issuecomment-759422660
I can confirm this too. Airflow 2.0.0 with Python3.8, KubernetesExecutor on EKS with Kubernetes 1.18
**m5.large node** and nothing but Airflow running (no task executions besides some tests)
![image](https://user-images.githubusercontent.com/35972814/104453292-f5083f80-5582-11eb-851f-c8ada92b8181.png)
I noticed the same thing while using Kind locally, but with Minikube it was ok! I'll test and come back with results just to be sure.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] fmv1992 commented on issue #13637: Scheduler takes 100% of CPU without task execution
Posted by GitBox <gi...@apache.org>.
fmv1992 commented on issue #13637:
URL: https://github.com/apache/airflow/issues/13637#issuecomment-892225483
In our case we get n processes determined by `parsing_processes`. `top` gives:
```
airflow scheduler - DagFileProcessor /mydags/dag1.py
```
All of them with a super high CPU usage. I wonder what's that for. Is is "re-parsing" the DAGs to try to pick up updates in the code?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [airflow] bobo333 edited a comment on issue #13637: Scheduler takes 100% of CPU without task execution
Posted by GitBox <gi...@apache.org>.
bobo333 edited a comment on issue #13637:
URL: https://github.com/apache/airflow/issues/13637#issuecomment-831001974
hi, we (@yuzeh and I) are also seeing this behavior on airflow `2.0.2` using `LocalExecutor`. With `min_file_process_interval = 60` and `parsing_processes = 2` cpu usage for the scheduler is very similar to what @linxiaohui posted, both processes are using 90% or higher cpu, often up to 150%.
There are no active workloads running during this time either, those only run on the hour, but this behavior happens constantly.
Note: this was _not_ happening before we upgraded to airflow 2. On `1.10.14` we did not see this level of cpu usage from the scheduler.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org