You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@airflow.apache.org by Felix Uellendall <fe...@protonmail.com.INVALID> on 2019/07/21 19:56:28 UTC
Propagate skipped state from inside subdag to parent dag
Hey all,
I am wondering if there should be an "official" way to propagate a
skipped state of the last downstream tasks in a subdag to the parent dag.
Simple Use Case:
I have a subdag with the following tasks "transfer data from an api to
s3", "convert json to csv", "transform csv".
Sometimes the api returns a json with an empty data object. The
"transfer" task succeeds which I think is correct. The "convert" task
which actually does json to pands.DataFrame to csv could check if the
data frame is empty before proceeding. At this point I am raising a
AirflowSkipException to skip this task (and all downstream tasks) which
need the data to work. I don't want the task to fail with a simple
AirflowException since I think no data does not necessary mean it is an
error. It just means there is no data available for the time period
requested - I want to skip it. The problem comes when the subdag run has
finished because its state is set to success like all other runs where
data was available. That means you won't be able to see the difference
from outside of the subdag (the parent dag).
I would be very interested in what you guys think about this. Should we
add a feature to "propagate skipped state from inside subdag to parent
dag" or could my problem just be solved easier / better? Please let me
know :)
(P.S. I made it work but I am just thinking of an official way of doing
it if you guys agree with the idea)
Kind regards,
Felix