You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2022/08/02 23:46:10 UTC

[GitHub] [airflow] argibbs commented on a diff in pull request #25489: Dag processor manager queue split

argibbs commented on code in PR #25489:
URL: https://github.com/apache/airflow/pull/25489#discussion_r936119659


##########
airflow/dag_processing/manager.py:
##########
@@ -600,11 +620,14 @@ def _run_parsing_loop(self):
 
             self._kill_timed_out_processors()
 
-            # Generate more file paths to process if we processed all the files
-            # already.
-            if not self._file_path_queue:
+            # Generate more file paths to process if we processed all the files already. Note for this
+            # to clear down, we must have first either:
+            # 1. cleared the priority queue then cleared this set while draining the std queue, or
+            # 2. been unable to clear the priority queue, hit max_file_process_interval, and drained the set
+            #    while clearing overdue files
+            if not self._outstanding_std_file_paths:

Review Comment:
   This is a key change as mentioned in the PR description: we refresh files from disk once the _`set`_ is empty. The queue might still have SLA callbacks in it, but that shouldn't stop us refreshing from disk.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org