You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by "Abhilash Kishore (Jira)" <ji...@apache.org> on 2020/03/19 02:56:00 UTC

[jira] [Updated] (AIRFLOW-7090) With depends_on_past=True, second instance of task not scheduled even when first instance ran successfully

     [ https://issues.apache.org/jira/browse/AIRFLOW-7090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Abhilash Kishore updated AIRFLOW-7090:
--------------------------------------
    Description: 
The first task of my DAG has `depends_on_past=True` and `wait_for_downstream=True`. The DAG ran automatically when I turned it `On` and it completed successfully. Now, I manually triggered the DAG again (after the first run completed successfully), but this time, my first task did not start running. `Task Instance Details` for this task shows `depends_on_past is true for this task's DAG, but the previous task instance has not run yet.`

According to [docs|[https://airflow.apache.org/docs/stable/concepts.html#trigger-rules]] about `depends_on_past (boolean)`:

> when set to True, keeps a task from getting triggered if the previous
 > schedule for the task hasn’t succeeded.

The first DAG run was successful and the first instance of the first task was (obviously) successful as well. Yet, why is the second instance of the first task complaining that the `previous task instance has not run yet`?

Relevant parts of my code:

```python
 ...
 args =

{ 'owner': 'USC Graduate School', 'start_date': days_ago(1), }

dag = DAG(
 dag_id='enrollment_import_poc',
 default_args=args,
 schedule_interval='0 0 * * *',
 dagrun_timeout=timedelta(minutes=60),
 max_active_runs=1,
 template_searchpath = os.environ.get('AIRFLOW_HOME'),
 tags=['uscgradschool']
 )

schools = MsSqlOperator(
 task_id='schools',
 depends_on_past=True,
 wait_for_downstream=True,
 sql=os.path.join("queries", "01_schools.sql"),
 mssql_conn_id="mssql_local",
 autocommit=True,
 dag=dag
 )
 ...

```
 [![First Task - Task Instances][2]][2]
 [![First Task - Second Task Instance Details][5]][5]
 [![DAG Run History][3]][3]
 [![DAG Details][4]][4]

[1]: [https://airflow.apache.org/docs/stable/concepts.html#trigger-rules]
 [2]: [https://i.stack.imgur.com/xzqeU.png]
 [3]: [https://i.stack.imgur.com/2Vt4v.png]
 [4]: [https://i.stack.imgur.com/dkQYi.png]
 [5]: [https://i.stack.imgur.com/TUDef.png]

  was:
The first task of my DAG has `depends_on_past=True` and `wait_for_downstream=True`. The DAG ran automatically when I turned it `On` and it completed successfully. Now, I manually triggered the DAG again (after the first run completed successfully), but this time, my first task did not start running. `Task Instance Details` for this task shows `depends_on_past is true for this task's DAG, but the previous task instance has not run yet.`

According to [docs][1] about `depends_on_past (boolean)`:

> when set to True, keeps a task from getting triggered if the previous
 > schedule for the task hasn’t succeeded.

The first DAG run was successful and the first instance of the first task was (obviously) successful as well. Yet, why is the second instance of the first task complaining that the `previous task instance has not run yet`?

Relevant parts of my code:

```python
 ...
 args = {
 'owner': 'USC Graduate School',
 'start_date': days_ago(1),
 }

dag = DAG(
 dag_id='enrollment_import_poc',
 default_args=args,
 schedule_interval='0 0 * * *',
 dagrun_timeout=timedelta(minutes=60),
 max_active_runs=1,
 template_searchpath = os.environ.get('AIRFLOW_HOME'),
 tags=['uscgradschool']
 )

schools = MsSqlOperator(
 task_id='schools',
 depends_on_past=True,
 wait_for_downstream=True,
 sql=os.path.join("queries", "01_schools.sql"),
 mssql_conn_id="mssql_local",
 autocommit=True,
 dag=dag
 )
 ...

```
 [![First Task - Task Instances][2]][2]
 [![First Task - Second Task Instance Details][5]][5]
 [![DAG Run History][3]][3]
 [![DAG Details][4]][4]

[1]: [https://airflow.apache.org/docs/stable/concepts.html#trigger-rules]
 [2]: [https://i.stack.imgur.com/xzqeU.png]
 [3]: [https://i.stack.imgur.com/2Vt4v.png]
 [4]: [https://i.stack.imgur.com/dkQYi.png]
 [5]: [https://i.stack.imgur.com/TUDef.png]


> With depends_on_past=True, second instance of task not scheduled even when first instance ran successfully
> ----------------------------------------------------------------------------------------------------------
>
>                 Key: AIRFLOW-7090
>                 URL: https://issues.apache.org/jira/browse/AIRFLOW-7090
>             Project: Apache Airflow
>          Issue Type: Bug
>          Components: scheduler
>    Affects Versions: 1.10.9
>            Reporter: Abhilash Kishore
>            Priority: Major
>
> The first task of my DAG has `depends_on_past=True` and `wait_for_downstream=True`. The DAG ran automatically when I turned it `On` and it completed successfully. Now, I manually triggered the DAG again (after the first run completed successfully), but this time, my first task did not start running. `Task Instance Details` for this task shows `depends_on_past is true for this task's DAG, but the previous task instance has not run yet.`
> According to [docs|[https://airflow.apache.org/docs/stable/concepts.html#trigger-rules]] about `depends_on_past (boolean)`:
> > when set to True, keeps a task from getting triggered if the previous
>  > schedule for the task hasn’t succeeded.
> The first DAG run was successful and the first instance of the first task was (obviously) successful as well. Yet, why is the second instance of the first task complaining that the `previous task instance has not run yet`?
> Relevant parts of my code:
> ```python
>  ...
>  args =
> { 'owner': 'USC Graduate School', 'start_date': days_ago(1), }
> dag = DAG(
>  dag_id='enrollment_import_poc',
>  default_args=args,
>  schedule_interval='0 0 * * *',
>  dagrun_timeout=timedelta(minutes=60),
>  max_active_runs=1,
>  template_searchpath = os.environ.get('AIRFLOW_HOME'),
>  tags=['uscgradschool']
>  )
> schools = MsSqlOperator(
>  task_id='schools',
>  depends_on_past=True,
>  wait_for_downstream=True,
>  sql=os.path.join("queries", "01_schools.sql"),
>  mssql_conn_id="mssql_local",
>  autocommit=True,
>  dag=dag
>  )
>  ...
> ```
>  [![First Task - Task Instances][2]][2]
>  [![First Task - Second Task Instance Details][5]][5]
>  [![DAG Run History][3]][3]
>  [![DAG Details][4]][4]
> [1]: [https://airflow.apache.org/docs/stable/concepts.html#trigger-rules]
>  [2]: [https://i.stack.imgur.com/xzqeU.png]
>  [3]: [https://i.stack.imgur.com/2Vt4v.png]
>  [4]: [https://i.stack.imgur.com/dkQYi.png]
>  [5]: [https://i.stack.imgur.com/TUDef.png]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)