You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2021/01/12 14:16:21 UTC

[GitHub] [airflow] OliverPfau opened a new issue #13637: Scheduler takes 100% of CPU without task execution

OliverPfau opened a new issue #13637:
URL: https://github.com/apache/airflow/issues/13637


   Hi,
   
   running with python 3.6.9 the scheduler is consuming much CPU time without execution any task:
   
     PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
   15758 oli       20   0   42252   3660   3124 R 100.0  0.0   0:00.06 top
   16764 oli       20   0  590272  90648  15468 R 200.0  0.3   0:00.59 airflow schedul
   16769 oli       20   0  588808  77236  13900 R 200.0  0.3   0:00.55 airflow schedul
       1 root      20   0    1088    548    516 S   0.0  0.0   0:13.28 init
      10 root      20   0     900     80     16 S   0.0  0.0   0:00.00 init


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] yuzeh commented on issue #13637: Scheduler takes 100% of CPU without task execution

Posted by GitBox <gi...@apache.org>.
yuzeh commented on issue #13637:
URL: https://github.com/apache/airflow/issues/13637#issuecomment-831002551


   To add to @bobo333 's comment, here's the before + after on CPU utilization:
   
   ![image](https://user-images.githubusercontent.com/351023/116839049-a27c7e80-ab85-11eb-9c84-fd4150723456.png)
   
   Other info about our deployment:
   - Python 3.7
   - Everything runs in docker-compose with a custom built docker image 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] linxiaohui commented on issue #13637: Scheduler takes 100% of CPU without task execution

Posted by GitBox <gi...@apache.org>.
linxiaohui commented on issue #13637:
URL: https://github.com/apache/airflow/issues/13637#issuecomment-791086852


   Hi everypone, I am using apache-airflow 2.0.1 (using CeleryExecuor), an I have set `min_file_process_interval = 60`  and `parsing_processes = 2`.  But still the scheduler CPU using is very high. (according the `top` )
   
     PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND    
   32057 root      20   0 1753140 121540  11728 R **138.7**  0.0   0:04.19 airflow schedul                                                                                                          
   32093 root      20   0 1578452  61000   4812 R  **90.4**  0.0   0:02.73 airflow schedul  
   
   any help? thanks a lot


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] fmv1992 commented on issue #13637: Scheduler takes 100% of CPU without task execution

Posted by GitBox <gi...@apache.org>.
fmv1992 commented on issue #13637:
URL: https://github.com/apache/airflow/issues/13637#issuecomment-892768703


   Setting the following helped immensely:
   
   ```
   [core]
   min_serialized_dag_update_interval = 600
   min_serialized_dag_fetch_interval = 300
   [scheduler]
   min_file_process_interval = 600
   processor_poll_interval = 5
   parsing_processes = 2
   ```
   
   Some of these are suitable for our use case, I believe `min_file_process_interval = 600` might be the minimal required change.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] fmv1992 commented on issue #13637: Scheduler takes 100% of CPU without task execution

Posted by GitBox <gi...@apache.org>.
fmv1992 commented on issue #13637:
URL: https://github.com/apache/airflow/issues/13637#issuecomment-892019154


   We upgraded from `airflow` `1.10.15` to `2.0.2` and are experiencing this. @kaxil, configs:
   
   *   `[scheduler] parsing_processes`: `2`.
   
   *   `[scheduler] min_file_process_interval`: `60`.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] sreeram004 commented on issue #13637: Scheduler takes 100% of CPU without task execution

Posted by GitBox <gi...@apache.org>.
sreeram004 commented on issue #13637:
URL: https://github.com/apache/airflow/issues/13637#issuecomment-886045650


   I'm seeing same issue in Airflow 2.1.2. Any known fixes? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] veinkr commented on issue #13637: Scheduler takes 100% of CPU without task execution

Posted by GitBox <gi...@apache.org>.
veinkr commented on issue #13637:
URL: https://github.com/apache/airflow/issues/13637#issuecomment-1080843971


   Hi, It seems that the issue still exists.
   I'm using Airflow 2.2.4 with Python3.9 on local,
   `airflow scheduler - DagFileProcessor xxx.py` 
   It show my `top ` every seconds with different pid , does it will create process every seconds?
   
   My config is below.
   ```
   [core]
   min_serialized_dag_update_interval = 600
   min_serialized_dag_fetch_interval = 300
   [scheduler]
   min_file_process_interval = 600
   parsing_processes = 2
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] kaxil commented on issue #13637: Scheduler takes 100% of CPU without task execution

Posted by GitBox <gi...@apache.org>.
kaxil commented on issue #13637:
URL: https://github.com/apache/airflow/issues/13637#issuecomment-759435015


   It will be fixed in 2.0.1 by default (it has always been the case -- even for 1.10.x), for now, you can change `[scheduler] min_file_process_interval = 1` or `2`.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] veinkr edited a comment on issue #13637: Scheduler takes 100% of CPU without task execution

Posted by GitBox <gi...@apache.org>.
veinkr edited a comment on issue #13637:
URL: https://github.com/apache/airflow/issues/13637#issuecomment-1080843971






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] kaxil commented on issue #13637: Scheduler takes 100% of CPU without task execution

Posted by GitBox <gi...@apache.org>.
kaxil commented on issue #13637:
URL: https://github.com/apache/airflow/issues/13637#issuecomment-872613748


   @david30907d What are the values for the following?
   
    - `[scheduler] parsing_processes`
    - `[scheduler] min_file_process_interval`
   
   And how much CPU is allocated to the Scheduler POD?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] fmv1992 commented on issue #13637: Scheduler takes 100% of CPU without task execution

Posted by GitBox <gi...@apache.org>.
fmv1992 commented on issue #13637:
URL: https://github.com/apache/airflow/issues/13637#issuecomment-892768703


   Setting the following helped immensely:
   
   ```
   [core]
   min_serialized_dag_update_interval = 600
   min_serialized_dag_fetch_interval = 300
   [scheduler]
   min_file_process_interval = 600
   processor_poll_interval = 5
   parsing_processes = 2
   ```
   
   Some of these are suitable for our use case, I believe `min_file_process_interval = 600` might be the minimal required change.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] bobo333 commented on issue #13637: Scheduler takes 100% of CPU without task execution

Posted by GitBox <gi...@apache.org>.
bobo333 commented on issue #13637:
URL: https://github.com/apache/airflow/issues/13637#issuecomment-831001974


   hi, we (@yuzeh and I) are also seeing this behavior on airflow `2.0.2` using `LocalExecutor`. With `min_file_process_interval = 60` and `parsing_processes = 2` cpu usage for the scheduler is very similar to what @linxiaohui posted, both processes are using 90% or higher cpu, often up to 150%.
   
   There are no active workloads running during this time either, those only run on the hour, but this behavior happens constantly.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] david30907d commented on issue #13637: Scheduler takes 100% of CPU without task execution

Posted by GitBox <gi...@apache.org>.
david30907d commented on issue #13637:
URL: https://github.com/apache/airflow/issues/13637#issuecomment-855172566


   hi, is there any update about this issue?
   I'm using Airflow 2.1.0 with Python3.8 on k8s (celery executor), scheduler pod also consume like 90% of CPU resource as others mentioned above


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] veinkr edited a comment on issue #13637: Scheduler takes 100% of CPU without task execution

Posted by GitBox <gi...@apache.org>.
veinkr edited a comment on issue #13637:
URL: https://github.com/apache/airflow/issues/13637#issuecomment-1080843971


   Hi, It seems that the issue still exists.
   I'm using Airflow 2.2.4 with Python3.9 on local,
   `airflow scheduler - DagFileProcessor xxx.py` 
   It show my `top ` every seconds with **different pid** , does it will create process every seconds?
   
   My config is below.
   ```
   [core]
   min_serialized_dag_update_interval = 600
   min_serialized_dag_fetch_interval = 300
   [scheduler]
   min_file_process_interval = 600
   parsing_processes = 2
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] kaxil closed issue #13637: Scheduler takes 100% of CPU without task execution

Posted by GitBox <gi...@apache.org>.
kaxil closed issue #13637:
URL: https://github.com/apache/airflow/issues/13637


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] gmontanola commented on issue #13637: Scheduler takes 100% of CPU without task execution

Posted by GitBox <gi...@apache.org>.
gmontanola commented on issue #13637:
URL: https://github.com/apache/airflow/issues/13637#issuecomment-759422660


   I can confirm this too. Airflow 2.0.0 with Python3.8, KubernetesExecutor on EKS with Kubernetes 1.18
   
   **m5.large node** and nothing but Airflow running (no task executions besides some tests)
   
   ![image](https://user-images.githubusercontent.com/35972814/104453292-f5083f80-5582-11eb-851f-c8ada92b8181.png)
   
   I noticed the same thing while using Kind locally, but with Minikube it was ok! I'll test and come back with results just to be sure.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] fmv1992 commented on issue #13637: Scheduler takes 100% of CPU without task execution

Posted by GitBox <gi...@apache.org>.
fmv1992 commented on issue #13637:
URL: https://github.com/apache/airflow/issues/13637#issuecomment-892225483


   In our case we get n processes determined by `parsing_processes`. `top` gives:
   
   ```
   airflow scheduler - DagFileProcessor /mydags/dag1.py
   ```
   
   All of them with a super high CPU usage. I wonder what's that for. Is is "re-parsing" the DAGs to try to pick up updates in the code?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] bobo333 edited a comment on issue #13637: Scheduler takes 100% of CPU without task execution

Posted by GitBox <gi...@apache.org>.
bobo333 edited a comment on issue #13637:
URL: https://github.com/apache/airflow/issues/13637#issuecomment-831001974


   hi, we (@yuzeh and I) are also seeing this behavior on airflow `2.0.2` using `LocalExecutor`. With `min_file_process_interval = 60` and `parsing_processes = 2` cpu usage for the scheduler is very similar to what @linxiaohui posted, both processes are using 90% or higher cpu, often up to 150%.
   
   There are no active workloads running during this time either, those only run on the hour, but this behavior happens constantly.
   
   Note: this was _not_ happening before we upgraded to airflow 2. On `1.10.14` we did not see this level of cpu usage from the scheduler.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org