You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@airflow.apache.org by Daniel Standish <dp...@gmail.com> on 2021/06/10 03:16:35 UTC

dynamic task scenario

*background*

Suppose you have a dag with tasks that run some data science model for
*active* AB tests.

Suppose there are 2-5 tests running at any given time.

And the tests come and go every couple weeks.

Right now what I'm doing is I have a task at start of dag that updates an
airflow variable with the list of active experiments.

Then the dag iterates through this variable and defines the tasks.

This is a little hacky but since the active experiments don't change that
rapidly it works fine.

*question*

How would you handle this scenario?

*thoughts*

We could combine the model runs into a single task.  Then we wouldn't need
the update vars step.  But then you're running in series which is slow, and
you don't have distinct task logs and retry.

*idea*

maybe there wants to be some kind of "subtask" concept.  some way for one
task to spawn any number of tasks based on the circumstances it finds (e.g.
the specific list of active AB tests right now).

thoughts?

Re: dynamic task scenario

Posted by Bruno Gonzalez <br...@homelight.com>.
Hi Daniel. We have a similar scenario and we're using almost the same
approach. The difference is that we use a file that is copied locally to
the workers and is used by the DAG to define the tasks.

I didn't analyze all the new features in Airflow 2, but tried the subtask
approach with the last versions of 1.10 and didn't work as intended.

Would be great if someone else could drop some thoughts and discuss a
"better" solution.

On Thu, Jun 10, 2021 at 12:17 AM Daniel Standish <dp...@gmail.com>
wrote:

> *background*
>
> Suppose you have a dag with tasks that run some data science model for
> *active* AB tests.
>
> Suppose there are 2-5 tests running at any given time.
>
> And the tests come and go every couple weeks.
>
> Right now what I'm doing is I have a task at start of dag that updates an
> airflow variable with the list of active experiments.
>
> Then the dag iterates through this variable and defines the tasks.
>
> This is a little hacky but since the active experiments don't change that
> rapidly it works fine.
>
> *question*
>
> How would you handle this scenario?
>
> *thoughts*
>
> We could combine the model runs into a single task.  Then we wouldn't need
> the update vars step.  But then you're running in series which is slow, and
> you don't have distinct task logs and retry.
>
> *idea*
>
> maybe there wants to be some kind of "subtask" concept.  some way for one
> task to spawn any number of tasks based on the circumstances it finds (e.g.
> the specific list of active AB tests right now).
>
> thoughts?
>
>