You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2022/02/08 10:00:27 UTC

[GitHub] [airflow] ashb commented on a change in pull request #21399: (WIP) Reduce DB load incurred by Stale DAG deactivation

ashb commented on a change in pull request #21399:
URL: https://github.com/apache/airflow/pull/21399#discussion_r801455157



##########
File path: airflow/dag_processing/manager.py
##########
@@ -503,6 +507,40 @@ def start(self):
 
         return self._run_parsing_loop()
 
+    @provide_session
+    def _deactivate_stale_dags(self, session=None):
+        now = timezone.utcnow()
+        elapsed_time_since_refresh = (now - self.last_deactivate_stale_dags_time).total_seconds()
+        if elapsed_time_since_refresh > self.deactivate_stale_dags_interval:
+            last_parsed = {
+                fp: self.get_last_finish_time(fp) for fp in self.file_paths if self.get_last_finish_time(fp)
+            }
+            to_deactivate = set()
+            dags_parsed = (
+                session.query(DagModel.dag_id, DagModel.fileloc, DagModel.last_parsed_time)
+                .filter(DagModel.is_active)
+                .all()
+            )
+            for dag in dags_parsed:
+                if (
+                    dag.fileloc in last_parsed
+                    and (dag.last_parsed_time + timedelta(seconds=self._processor_timeout))

Review comment:
       This feels like the wrong timeout to use -- processor timeout is how long each file should take to process:
   
   ```
   # How long before timing out a DagFileProcessor, which processes a dag file
   dag_file_processor_timeout = 50
   ```
   
   But that doesn't mean that every dag file should be "reparsed" every 50 seconds




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org