You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2023/01/03 22:54:56 UTC

[GitHub] [airflow] hussein-awala opened a new pull request, #28711: Disable the dag file processor agent when dag_dir_list_interval is negative value

hussein-awala opened a new pull request, #28711:
URL: https://github.com/apache/airflow/pull/28711

   <!--
   Thank you for contributing! Please make sure that your code changes
   are covered with tests. And in case of new features or big changes
   remember to adjust the documentation.
   
   Feel free to ping committers for the review!
   
   In case of an existing issue, reference it using one of the following:
   
   closes: #ISSUE
   related: #ISSUE
   
   How to write a good git commit message:
   http://chris.beams.io/posts/git-commit/
   -->
   closes: #28678
   ---
   
   Disable the dag file processor agent when `dag_dir_list_interval`  is negative. It's similar to setting `standalone_dag_processor` to `true`, but with this conf, we would never check stale dags for cleaning.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] uranusjr commented on a diff in pull request #28711: Disable the dag file processor agent when dag_dir_list_interval is negative value

Posted by GitBox <gi...@apache.org>.
uranusjr commented on code in PR #28711:
URL: https://github.com/apache/airflow/pull/28711#discussion_r1061101923


##########
airflow/config_templates/config.yml:
##########
@@ -2025,6 +2025,7 @@ scheduler:
     dag_dir_list_interval:
       description: |
         How often (in seconds) to scan the DAGs directory for new files. Default to 5 minutes.
+        Setting to negative value will disable the dag processor.

Review Comment:
   ```suggestion
           Setting this to a negative value disables the DAG processor.
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] hussein-awala commented on pull request #28711: Disable the dag file processor agent when dag_dir_list_interval is negative value

Posted by "hussein-awala (via GitHub)" <gi...@apache.org>.
hussein-awala commented on PR #28711:
URL: https://github.com/apache/airflow/pull/28711#issuecomment-1437499190

   @potiuk - Yes I see, I will think about a solution for these tasks and propose it before implementation.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] bhavaniravi commented on pull request #28711: Disable the dag file processor agent when dag_dir_list_interval is negative value

Posted by GitBox <gi...@apache.org>.
bhavaniravi commented on PR #28711:
URL: https://github.com/apache/airflow/pull/28711#issuecomment-1370396573

   It will be a good idea to add it here as well
   https://airflow.apache.org/docs/apache-airflow/stable/concepts/scheduler.html#scheduler-configuration-options


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] uranusjr commented on a diff in pull request #28711: Disable the dag file processor agent when dag_dir_list_interval is negative value

Posted by GitBox <gi...@apache.org>.
uranusjr commented on code in PR #28711:
URL: https://github.com/apache/airflow/pull/28711#discussion_r1061146864


##########
airflow/jobs/scheduler_job.py:
##########
@@ -151,6 +151,7 @@ def __init__(
         # How many seconds do we wait for tasks to heartbeat before mark them as zombies.
         self._zombie_threshold_secs = conf.getint("scheduler", "scheduler_zombie_task_threshold")
         self._standalone_dag_processor = conf.getboolean("scheduler", "standalone_dag_processor")
+        self._is_dag_processor_activated = conf.getint("scheduler", "dag_dir_list_interval") >= 0

Review Comment:
   What happens if I set `standalone_dag_processor` to True and `dag_dir_list_interval` to a minus value? Does it cause the standalone processor to run idly, or?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] github-actions[bot] commented on pull request #28711: Disable the dag file processor agent when dag_dir_list_interval is negative value

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on PR #28711:
URL: https://github.com/apache/airflow/pull/28711#issuecomment-1563652513

   This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in 5 days if no further activity occurs. Thank you for your contributions.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] hussein-awala closed pull request #28711: Disable the dag file processor agent when dag_dir_list_interval is negative value

Posted by "hussein-awala (via GitHub)" <gi...@apache.org>.
hussein-awala closed pull request #28711: Disable the dag file processor agent when dag_dir_list_interval is negative value
URL: https://github.com/apache/airflow/pull/28711


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] potiuk commented on pull request #28711: Disable the dag file processor agent when dag_dir_list_interval is negative value

Posted by "potiuk (via GitHub)" <gi...@apache.org>.
potiuk commented on PR #28711:
URL: https://github.com/apache/airflow/pull/28711#issuecomment-1435670553

   @hussein-awala - I am afraid @ashb is right. DagFileProcessor executes more stuff. It performs sla management, also when DagFileProcessorProcess gets started it receives callbacks to execute for that particular file. Those are the callbacks that are generated elsewhere (allways when callbacks cannot be executed within the process that generates the callback for whatever reason) and stored in the database. Then whenever DagFileProcessor processes a given file, it will execute the callbacks.
   
   So the idea that you can stop DagFileProcessor and run it "when needed" is not good as @ashb mentioned.  Unless we introduce some other entity that can process those callbacks of course.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] github-actions[bot] commented on pull request #28711: Disable the dag file processor agent when dag_dir_list_interval is negative value

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on PR #28711:
URL: https://github.com/apache/airflow/pull/28711#issuecomment-1499780807

   This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in 5 days if no further activity occurs. Thank you for your contributions.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] hussein-awala commented on a diff in pull request #28711: Disable the dag file processor agent when dag_dir_list_interval is negative value

Posted by GitBox <gi...@apache.org>.
hussein-awala commented on code in PR #28711:
URL: https://github.com/apache/airflow/pull/28711#discussion_r1061955624


##########
airflow/jobs/scheduler_job.py:
##########
@@ -151,6 +151,7 @@ def __init__(
         # How many seconds do we wait for tasks to heartbeat before mark them as zombies.
         self._zombie_threshold_secs = conf.getint("scheduler", "scheduler_zombie_task_threshold")
         self._standalone_dag_processor = conf.getboolean("scheduler", "standalone_dag_processor")
+        self._is_dag_processor_activated = conf.getint("scheduler", "dag_dir_list_interval") >= 0

Review Comment:
   As I understood, when `standalone_dag_processor` is set to True, the standalone processor is not created automatically, we just tell the scheduler that we don't want to create a dag processor in a new thread, then we need to create the dag processor in a separate pod/container/process using Airflow CLI, if we don't run it, all the dags will be considered as stale after `dag_stale_not_seen_duration` seconds, and they will be deleted from the Metadata.
   
   With this PR, we can disable the dag file processor agent created in the scheduler process, and we can run the standalone dag processor each time we need to process our dags files, without any risk to delete the dags from the Metadata.
   In the CLI there is no condition about `dag_dir_list_interval`, so the `DagFileProcessorManager` can be created normally, and `if elapsed_time_since_refresh > self.dag_dir_list_interval` will be always True, which is similar to providing a 0 or a very small value. In addition, if we run the standalone dag processor in a custom process (without using the helm chart, ex CI pipeline), we can provide a different conf value to control the interval between  the dag dir list.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [airflow] hussein-awala commented on pull request #28711: Disable the dag file processor agent when dag_dir_list_interval is negative value

Posted by GitBox <gi...@apache.org>.
hussein-awala commented on PR #28711:
URL: https://github.com/apache/airflow/pull/28711#issuecomment-1371539491

   @ashb I reread the code of `DagFileProcessorManager` and I found that it only does two types of cleaning:
   1- clean stale dags
   2- clean the import errors for files that no longer exist
   
   The two cleaning depend on a change in the dag folder, and if nothing is changed in the dags folder, there will be no dags considered as stale, and the `DagFileProcessorManager` will not delete any import error because all the files are still there.
   This config is useful when the dags files are changed slowly, and in order to import the new dags and update the existing dags, we need to run a standalone dag processor (at least once) which can detect the stale dags and delete them, and clean the import errors for the for files that no longer exist.
   
   >  else some states in case of failure will never progress
   
   If there is another cleanup done by `DagFileProcessorManager`, please mention it to check how to handle it


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org