You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@airflow.apache.org by Robin Edwards <ro...@goshift.com> on 2018/09/03 15:58:57 UTC

dynamic dag generation

Hello!

I have a DAG where the input size (rows) may grow or shrink significantly.

The first step (A) determines the size of the input set and groups into
batches of a pre-defined size.

The second step I want to generate a task per batch to perform an upload to
a third party API (google adwords) / computation.

The final step is a sensor which waits for the status of the batch to be
completed and then a final task.

Thoughts so far:

- I don't necessarily need all tasks to execute in parallel I just want to
be able to control the number that do through Pools
- I could potentially calculate the batch size and number of tasks required
at DAG compile time but this would make my DAG loading very slow (as I will
have lots of DAGs doing this)
- Is changing the number of tasks in a DAG dynamically going to screw up
airflow?
- I found this https://stackoverflow.com/a/51977800 but it feels a bit of a
hack.
- I could trigger multiple dagruns but this makes it harder to visualise
and trace through the UI

Or am i approaching this problem in the wrong way?

Thanks for your help,

Rob