You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2020/08/30 22:10:25 UTC

[GitHub] [airflow] yesemsanthoshkumar commented on a change in pull request #10531: Move scheduler basics documentation to Airflow docs

yesemsanthoshkumar commented on a change in pull request #10531:
URL: https://github.com/apache/airflow/pull/10531#discussion_r479822072



##########
File path: docs/scheduler.rst
##########
@@ -18,11 +18,41 @@
 Scheduler
 ==========
 
-The Airflow scheduler monitors all tasks and DAGs, then triggers the
-task instances once their dependencies are complete. Behind the scenes,
-the scheduler spins up a subprocess, which monitors and stays in sync with all
-DAGs in the specified DAG directory. Once per minute, by default, the scheduler
-collects DAG parsing results and checks whether any active tasks can be triggered.
+The scheduler is the core component in Airflow that is responsible for monitoring all tasks and DAGs and triggers the task instances once their dependencies are complete. And for this reason, it is imperative to learn about the working of the scheduler.
+
+Scheduling in airflow involves the 3 following components.
+
+1. ``DAGFileProcessor`` - Responsible for parsing the DAG definition in a file and creating the necessary DAG runs and TaskInstances
+
+2. ``DAGFileProcessorManager`` - Responsible for listing the files in the DagBag and creating new DAGFileProcessors when required
+
+3. ``SchedulerJob`` - Responsible for sending the TaskInstances to the executor

Review comment:
       AFAIK, the scheduerJob is responsible for sending TI from one state to another and run them. Would `Responsible for transitioning the task instances from one state to another and executing them via a configured executor` be a better explanation here? Please correct me if I'm wrong.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org