You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by ka...@apache.org on 2021/01/14 13:08:24 UTC

[airflow] branch master updated: Increase the default ``min_file_process_interval`` to decrease CPU Usage (#13664)

This is an automated email from the ASF dual-hosted git repository.

kaxilnaik pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/airflow.git


The following commit(s) were added to refs/heads/master by this push:
     new e4b8ee6  Increase the default ``min_file_process_interval`` to decrease CPU Usage (#13664)
e4b8ee6 is described below

commit e4b8ee63b04a25feb21a5766b1cc997aca9951a9
Author: Kaxil Naik <ka...@gmail.com>
AuthorDate: Thu Jan 14 13:08:12 2021 +0000

    Increase the default ``min_file_process_interval`` to decrease CPU Usage (#13664)
    
    With the previous default of `0`, the CPU Usage mostly stays around 100.
    As in Airflow 2.0.0, the scheduling decisions have been moved out from
    DagFileProcessor to Scheduler, we can keep this number high.
    
    closes https://github.com/apache/airflow/issues/13637
---
 UPDATING.md                                  | 9 +++++++++
 airflow/config_templates/config.yml          | 6 ++++--
 airflow/config_templates/default_airflow.cfg | 6 ++++--
 3 files changed, 17 insertions(+), 4 deletions(-)

diff --git a/UPDATING.md b/UPDATING.md
index 1cf2a6c..374a521 100644
--- a/UPDATING.md
+++ b/UPDATING.md
@@ -59,6 +59,15 @@ However, it was unintentionally changed to `8` in 2.0.0.
 
 From Airflow 2.0.1, we revert to the old default of `16`.
 
+### Default `[scheduler] min_file_process_interval` is changed to `30`
+
+The default value for `[scheduler] min_file_process_interval` was `0`,
+due to which the CPU Usage mostly stayed around 100% as the DAG files are parsed
+constantly.
+
+From Airflow 2.0.0, the scheduling decisions have been moved from
+DagFileProcessor to Scheduler, so we can keep the default a bit higher: `30`.
+
 ## Airflow 2.0.0
 
 ### The experimental REST API is disabled by default
diff --git a/airflow/config_templates/config.yml b/airflow/config_templates/config.yml
index d475ce7..1fc16e1 100644
--- a/airflow/config_templates/config.yml
+++ b/airflow/config_templates/config.yml
@@ -1648,11 +1648,13 @@
       default: "1"
     - name: min_file_process_interval
       description: |
-        after how much time (seconds) a new DAGs should be picked up from the filesystem
+        Number of seconds after which a DAG file is parsed. The DAG file is parsed every
+        ``min_file_process_interval`` number of seconds. Updates to DAGs are reflected after
+        this interval. Keeping this number low will increase CPU usage.
       version_added: ~
       type: string
       example: ~
-      default: "0"
+      default: "30"
     - name: dag_dir_list_interval
       description: |
         How often (in seconds) to scan the DAGs directory for new files. Default to 5 minutes.
diff --git a/airflow/config_templates/default_airflow.cfg b/airflow/config_templates/default_airflow.cfg
index 458b606..f03dbca 100644
--- a/airflow/config_templates/default_airflow.cfg
+++ b/airflow/config_templates/default_airflow.cfg
@@ -814,8 +814,10 @@ num_runs = -1
 # The number of seconds to wait between consecutive DAG file processing
 processor_poll_interval = 1
 
-# after how much time (seconds) a new DAGs should be picked up from the filesystem
-min_file_process_interval = 0
+# Number of seconds after which a DAG file is parsed. The DAG file is parsed every
+# ``min_file_process_interval`` number of seconds. Updates to DAGs are reflected after
+# this interval. Keeping this number low will increase CPU usage.
+min_file_process_interval = 30
 
 # How often (in seconds) to scan the DAGs directory for new files. Default to 5 minutes.
 dag_dir_list_interval = 300