You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by "JELEZ RADITCHKOV (JIRA)" <ji...@apache.org> on 2017/09/28 00:27:00 UTC

[jira] [Created] (AIRFLOW-1653) distributed schedulers with dedicated dags folders

JELEZ RADITCHKOV created AIRFLOW-1653:
-----------------------------------------

             Summary: distributed schedulers with dedicated dags folders
                 Key: AIRFLOW-1653
                 URL: https://issues.apache.org/jira/browse/AIRFLOW-1653
             Project: Apache Airflow
          Issue Type: Improvement
          Components: scheduler
    Affects Versions: 1.8.1
         Environment: Linux
            Reporter: JELEZ RADITCHKOV


The number of tasks that we schedule reaches many thousands per job and we run multiple jobs. We are thinking about running multiple distributed schedulers each processing its own dedicated segment of the dags directory in its airflow.cfg. Using celery executor.

One thing I am concerned with is the section of jobs.py where  orphaned tasks are reset see at the end of the thread. This section is executed on start of each scheduler instance.

My question is - does this section create some sort of conflict? What is the impact of it in this setup?

{quote}        self.logger.info("Resetting state for orphaned tasks")
        # grab orphaned tasks and make sure to reset their state
        active_runs = DagRun.find(
            state=State.RUNNING,
            external_trigger=False,
            session=session,
            no_backfills=True,
        )
        for dr in active_runs:
            self.logger.info("Resetting {} {}".format(dr.dag_id,
                                                      dr.execution_date))
            self.reset_state_for_orphaned_tasks(dr, session=session)
{quote}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)