You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2020/03/29 18:51:38 UTC
[GitHub] [airflow] dimberman opened a new issue #7989: Tasks do not get
backfilled sequentially
dimberman opened a new issue #7989: Tasks do not get backfilled sequentially
URL: https://github.com/apache/airflow/issues/7989
Ticket was created 25/Oct/16 00:06
**Description**
I need to have the tasks within a DAG complete in order when running backfills. I am running on my mac locally using SequentialExecutor.
Let's say I have a DAG running daily at 11AM UTC (0 11 * * *) with a start_date: datetime(2016, 10, 20, 11, 0, 0). The DAG consists of 3 tasks, which must complete in order. task0 -> task1 -> task2. This dependency is set using .set_downstream().
Today (2016/10/22) I reset the database, turn-on the DAGrun using the on/off toggle in the webserver, and issue "airflow scheduler", which will automatically backfill starting from start_date.
It will backfill for 2016/10/20 and 2016/10/21. I expect backfill to run like the following sequentially:
datetime(2016, 10, 20, 11, 0, 0) task0
datetime(2016, 10, 20, 11, 0, 0) task1
datetime(2016, 10, 20, 11, 0, 0) task2
datetime(2016, 10, 21, 11, 0, 0) task0
datetime(2016, 10, 21, 11, 0, 0) task1
datetime(2016, 10, 21, 11, 0, 0) task2
With 'depends_on_past': False, I see Airflow running tasks grouped by sequence number something like this, which is not what I want:
datetime(2016, 10, 20, 11, 0, 0) task0
datetime(2016, 10, 21, 11, 0, 0) task0
datetime(2016, 10, 20, 11, 0, 0) task1
datetime(2016, 10, 21, 11, 0, 0) task1
datetime(2016, 10, 20, 11, 0, 0) task2
datetime(2016, 10, 21, 11, 0, 0) task2
With 'depends_on_past': True and 'wait_for_downstream': True, I expect it to run like what I need to, but instead it runs some tasks out of order like this:
datetime(2016, 10, 20, 11, 0, 0) task0
datetime(2016, 10, 20, 11, 0, 0) task1
datetime(2016, 10, 21, 11, 0, 0) task0 <- out of order!
datetime(2016, 10, 20, 11, 0, 0) task2 <- out of order!
datetime(2016, 10, 21, 11, 0, 0) task1
datetime(2016, 10, 21, 11, 0, 0) task2
Is this a bug? If not, am I understanding 'depends_on_past' and 'wait_for_downstream' correctly? What do I need to do?
The only remedy I can think of is to backfill each date manually.
Public gist of DAG: https://gist.github.com/jong-eatsa/cba1bf3c182b38e966696da47164faf1
**Use case / motivation**
**Related Issues**
Moved here from https://issues.apache.org/jira/browse/AIRFLOW-593
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services