You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by "Ash Berlin-Taylor (Jira)" <ji...@apache.org> on 2021/05/05 08:37:00 UTC

[jira] [Closed] (AIRFLOW-5283) Separate scheduling jobs from executing jobs

     [ https://issues.apache.org/jira/browse/AIRFLOW-5283?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ash Berlin-Taylor closed AIRFLOW-5283.
--------------------------------------
    Resolution: Abandoned

> Separate scheduling jobs from executing jobs
> --------------------------------------------
>
>                 Key: AIRFLOW-5283
>                 URL: https://issues.apache.org/jira/browse/AIRFLOW-5283
>             Project: Apache Airflow
>          Issue Type: Bug
>          Components: scheduler
>    Affects Versions: 1.9.0, 1.10.4
>            Reporter: Gero Vermaas
>            Priority: Critical
>
> Currently, Airflow does not schedule new jobs if the number of active runs > `max_active_runs`, [see this|[https://github.com/apache/airflow/blob/d760d63e1a141a43a4a43daee9abd54cf11c894b/airflow/jobs.py#L768]] for Airflow 1.9, Airflow 1.10 behaves the same.  
> A result of this is that if a DAG (incidentally) runs longer than the time between scheduled DAG runs, some runs will be missed because the next DAG run being scheduled is first planned one after the one that ran longer.
> For example, imagine DAG runs every hour and the DAG run of 02:00 takes (for some reason) 2 hours 45 minutes to complete instead of the usual 15 minutes. And the `max_active_runs` is set to 1.
> This would mean that:
>  * The DAG run of 02:00 is finished at 04:45
>  * The DAG runs of 03:00 and 04:00 are not scheduled because there is already a DAG active and `max_active_runs` is set to 1
>  * The next DAG run scheduled will be the one from 05:00
>  * The DAG runs of 03:00 and 04:00 are never scheduled.
> The problem is that scheduling and execution of DAG runs are now both tight to the `max_active_runs` setting. This should be separated so that jobs are scheduled at all planned times, but only `max_active_runs` are executed concurrently. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)