You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@airflow.apache.org by Sunil Khaire <su...@gmail.com> on 2020/09/18 06:25:45 UTC

Question on dynamic tasks in a DAG and wait_for_downstream --- wait_on_downstream waits forever for dynamically added tasks in later dag runs.

Hello Team,



Currently we are using the airflow version - 1.10.10 to data ingest.



In our DAG, we create tasks dynamically based on data volume , i.e if data
volume is high, number of parallel tasks increases and if the data volume
is less the number of parallel tasks reduces in the next run or vice versa.

As DAG execution instances use the same table to update, we use
*‘wait_for_downstream'*  to True to maintain the data consistency and make
sure the next run should not happen if the previous run is in progress or
failed.



In this scenario, we are seeing one issue i.e. If previous instances have
less number of tasks than the current one because of dynamic task creation,
then the current DAG is always in a waiting state . As the current DAG  is
waiting for the new task/s which are generated during the run but not
existing in the previous DAG instance, but waiting for the same tasks to be
in completion state in the previous DAG.  As soon as we manually mark those
tasks as completed in the previous DAG instance, current DAG starts running
.



Let me know if you have any work around for this scenario.



*wait_on_downstream waits forever for dynamically added tasks in later dag
runs.*




Thanks ,

Sunil Khaire

Data Engineer