You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2023/01/04 23:01:32 UTC

[GitHub] [airflow] hussein-awala commented on a diff in pull request #28711: Disable the dag file processor agent when dag_dir_list_interval is negative value

hussein-awala commented on code in PR #28711:
URL: https://github.com/apache/airflow/pull/28711#discussion_r1061955624


##########
airflow/jobs/scheduler_job.py:
##########
@@ -151,6 +151,7 @@ def __init__(
         # How many seconds do we wait for tasks to heartbeat before mark them as zombies.
         self._zombie_threshold_secs = conf.getint("scheduler", "scheduler_zombie_task_threshold")
         self._standalone_dag_processor = conf.getboolean("scheduler", "standalone_dag_processor")
+        self._is_dag_processor_activated = conf.getint("scheduler", "dag_dir_list_interval") >= 0

Review Comment:
   As I understood, when `standalone_dag_processor` is set to True, the standalone processor is not created automatically, we just tell the scheduler that we don't want to create a dag processor in a new thread, then we need to create the dag processor in a separate pod/container/process using Airflow CLI, if we don't run it, all the dags will be considered as stale after `dag_stale_not_seen_duration` seconds, and they will be deleted from the Metadata.
   
   With this PR, we can disable the dag file processor agent created in the scheduler process, and we can run the standalone dag processor each time we need to process our dags files, without any risk to delete the dags from the Metadata.
   In the CLI there is no condition about `dag_dir_list_interval`, so the `DagFileProcessorManager` can be created normally, and `if elapsed_time_since_refresh > self.dag_dir_list_interval` will be always True, which is similar to providing a 0 or a very small value. In addition, if we run the standalone dag processor in a custom process (without using the helm chart, ex CI pipeline), we can provide a different conf value to control the interval between  the dag dir list.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org