You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by "Daniel Imberman (Jira)" <ji...@apache.org> on 2020/03/29 18:53:00 UTC

[jira] [Commented] (AIRFLOW-593) Tasks do not get backfilled sequentially

    [ https://issues.apache.org/jira/browse/AIRFLOW-593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17070484#comment-17070484 ] 

Daniel Imberman commented on AIRFLOW-593:
-----------------------------------------

This issue has been moved to https://github.com/apache/airflow/issues/7989

> Tasks do not get backfilled sequentially
> ----------------------------------------
>
>                 Key: AIRFLOW-593
>                 URL: https://issues.apache.org/jira/browse/AIRFLOW-593
>             Project: Apache Airflow
>          Issue Type: Bug
>          Components: DagRun, scheduler
>    Affects Versions: 1.7.1.3
>            Reporter: Jong Kim
>            Priority: Minor
>         Attachments: Screen Shot 2018-07-20 at 10.04.24 AM.png
>
>
> I need to have the tasks within a DAG complete in order when running backfills. I am running on my mac locally using SequentialExecutor.
> Let's say I have a DAG running daily at 11AM UTC (0 11 * * *) with a start_date: datetime(2016, 10, 20, 11, 0, 0). The DAG consists of 3 tasks, which must complete in order. task0 -> task1 -> task2. This dependency is set using .set_downstream().
> Today (2016/10/22) I reset the database, turn-on the DAGrun using the on/off toggle in the webserver, and issue "airflow scheduler", which will automatically backfill starting from start_date.
> It will backfill for 2016/10/20 and 2016/10/21.  I expect backfill to run like the following sequentially:
> datetime(2016, 10, 20, 11, 0, 0) task0
> datetime(2016, 10, 20, 11, 0, 0) task1
> datetime(2016, 10, 20, 11, 0, 0) task2
> datetime(2016, 10, 21, 11, 0, 0) task0
> datetime(2016, 10, 21, 11, 0, 0) task1
> datetime(2016, 10, 21, 11, 0, 0) task2
> With 'depends_on_past': False, I see Airflow running tasks grouped by sequence number something like this, which is not what I want:
> datetime(2016, 10, 20, 11, 0, 0) task0
> datetime(2016, 10, 21, 11, 0, 0) task0
> datetime(2016, 10, 20, 11, 0, 0) task1
> datetime(2016, 10, 21, 11, 0, 0) task1
> datetime(2016, 10, 20, 11, 0, 0) task2
> datetime(2016, 10, 21, 11, 0, 0) task2
> With 'depends_on_past': True and 'wait_for_downstream': True, I expect it to run like what I need to, but instead it runs some tasks out of order like this:
> datetime(2016, 10, 20, 11, 0, 0) task0
> datetime(2016, 10, 20, 11, 0, 0) task1
> datetime(2016, 10, 21, 11, 0, 0) task0   <- out of order!
> datetime(2016, 10, 20, 11, 0, 0) task2   <- out of order!
> datetime(2016, 10, 21, 11, 0, 0) task1
> datetime(2016, 10, 21, 11, 0, 0) task2
> Is this a bug? If not, am I understanding 'depends_on_past' and 'wait_for_downstream' correctly? What do I need to do?
> The only remedy I can think of is to backfill each date manually.
> Public gist of DAG: https://gist.github.com/jong-eatsa/cba1bf3c182b38e966696da47164faf1



--
This message was sent by Atlassian Jira
(v8.3.4#803005)