You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by "Andrew Heuermann (JIRA)" <ji...@apache.org> on 2017/03/30 07:19:41 UTC

[jira] [Updated] (AIRFLOW-1056) Single dag run triggered when un-pausing job with catchup=False

     [ https://issues.apache.org/jira/browse/AIRFLOW-1056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Andrew Heuermann updated AIRFLOW-1056:
--------------------------------------
    Description: 
When "catchup=False" a single job run is still triggered when un-pausing a dag when there are missed run windows. 

In airflow/jobs.py:create_dag_run(): When catchup is disabled it updates the dag.start_date here to prevent the backfill: https://github.com/apache/incubator-airflow/blob/bb39078a35cf2bceea58d7831d7a2028c8ef849f/airflow/jobs.py#L770.
But it looks like the function schedules dags based on a window (using sequential run times as lower and upper bounds) so it will always schedule a single dag run if there is a missed run between the last run and the time which it was unpaused. Even if it was un-paused AFTER those missed runs.

Some ideas on solutions:
* Pass in the time when the scheduler last ran and use that as the lower bound of the window, but not sure how easy that is to get to. 
* Do something when a dag with catchup=False is unpaused like update the start_date or update missed runs as skipped (the latter may be expensive)

There might be a simpler solution I'm missing.

  was:
When "catchup=False" a single job run is still triggered when un-pausing a dag when there are missed run windows. 

It updates the dag.start_date here to prevent the backfill: https://github.com/apache/incubator-airflow/blob/bb39078a35cf2bceea58d7831d7a2028c8ef849f/airflow/jobs.py#L770.
But it looks like the function schedules dags based on a window (using sequential run times as lower and upper bounds) it still schedules one dag run.

The only ideas I have now on how to fix is to pass in the time when the scheduler last ran and use that as the lower bound of the window, but not sure how easy that is to get to. There might be a simpler solution I'm missing.


> Single dag run triggered when un-pausing job with catchup=False
> ---------------------------------------------------------------
>
>                 Key: AIRFLOW-1056
>                 URL: https://issues.apache.org/jira/browse/AIRFLOW-1056
>             Project: Apache Airflow
>          Issue Type: Bug
>    Affects Versions: 1.8.0
>            Reporter: Andrew Heuermann
>
> When "catchup=False" a single job run is still triggered when un-pausing a dag when there are missed run windows. 
> In airflow/jobs.py:create_dag_run(): When catchup is disabled it updates the dag.start_date here to prevent the backfill: https://github.com/apache/incubator-airflow/blob/bb39078a35cf2bceea58d7831d7a2028c8ef849f/airflow/jobs.py#L770.
> But it looks like the function schedules dags based on a window (using sequential run times as lower and upper bounds) so it will always schedule a single dag run if there is a missed run between the last run and the time which it was unpaused. Even if it was un-paused AFTER those missed runs.
> Some ideas on solutions:
> * Pass in the time when the scheduler last ran and use that as the lower bound of the window, but not sure how easy that is to get to. 
> * Do something when a dag with catchup=False is unpaused like update the start_date or update missed runs as skipped (the latter may be expensive)
> There might be a simpler solution I'm missing.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)