You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2020/03/29 18:51:38 UTC

[GitHub] [airflow] dimberman opened a new issue #7989: Tasks do not get backfilled sequentially

dimberman opened a new issue #7989: Tasks do not get backfilled sequentially
URL: https://github.com/apache/airflow/issues/7989
 
 
   
   
   Ticket was created 25/Oct/16 00:06
   
   **Description**
   
   I need to have the tasks within a DAG complete in order when running backfills. I am running on my mac locally using SequentialExecutor.
   
   Let's say I have a DAG running daily at 11AM UTC (0 11 * * *) with a start_date: datetime(2016, 10, 20, 11, 0, 0). The DAG consists of 3 tasks, which must complete in order. task0 -> task1 -> task2. This dependency is set using .set_downstream().
   
   Today (2016/10/22) I reset the database, turn-on the DAGrun using the on/off toggle in the webserver, and issue "airflow scheduler", which will automatically backfill starting from start_date.
   
   It will backfill for 2016/10/20 and 2016/10/21.  I expect backfill to run like the following sequentially:
   datetime(2016, 10, 20, 11, 0, 0) task0
   datetime(2016, 10, 20, 11, 0, 0) task1
   datetime(2016, 10, 20, 11, 0, 0) task2
   datetime(2016, 10, 21, 11, 0, 0) task0
   datetime(2016, 10, 21, 11, 0, 0) task1
   datetime(2016, 10, 21, 11, 0, 0) task2
   
   With 'depends_on_past': False, I see Airflow running tasks grouped by sequence number something like this, which is not what I want:
   datetime(2016, 10, 20, 11, 0, 0) task0
   datetime(2016, 10, 21, 11, 0, 0) task0
   datetime(2016, 10, 20, 11, 0, 0) task1
   datetime(2016, 10, 21, 11, 0, 0) task1
   datetime(2016, 10, 20, 11, 0, 0) task2
   datetime(2016, 10, 21, 11, 0, 0) task2
   
   With 'depends_on_past': True and 'wait_for_downstream': True, I expect it to run like what I need to, but instead it runs some tasks out of order like this:
   datetime(2016, 10, 20, 11, 0, 0) task0
   datetime(2016, 10, 20, 11, 0, 0) task1
   datetime(2016, 10, 21, 11, 0, 0) task0   <- out of order!
   datetime(2016, 10, 20, 11, 0, 0) task2   <- out of order!
   datetime(2016, 10, 21, 11, 0, 0) task1
   datetime(2016, 10, 21, 11, 0, 0) task2
   
   Is this a bug? If not, am I understanding 'depends_on_past' and 'wait_for_downstream' correctly? What do I need to do?
   
   The only remedy I can think of is to backfill each date manually.
   
   Public gist of DAG: https://gist.github.com/jong-eatsa/cba1bf3c182b38e966696da47164faf1
   
   **Use case / motivation**
   
   **Related Issues**
   
   Moved here from https://issues.apache.org/jira/browse/AIRFLOW-593

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services