You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@airflow.apache.org by Duy Tran <dt...@siftscience.com> on 2018/01/23 21:51:25 UTC

Running multiple versions of a DAG

Hi Airflow-Dev,

I have a question on approach of running multiple versions of a DAG. We
have a main DAG, and want to run/test multiple versions of it in the same
Airflow cluster. To allow this, I'm trying to "submit" a version of the DAG
to Airflow to run in isolation.



*Approach:*
I'm currently copying over the DAG to it's own timestamp prefixed folder on
the master and workers. Example:

DAGS_HOME/my_dag_20180123_1241234/my_dag.py

The DAG "my_dag.py" dynamically looks up the DAG id from it's parent folder
name. I've also set the "schedule=None" as it's only manually triggered.

And then I manually trigger the DAG using

"airflow trigger_dag my_dag_20180123_1241234 -c {'conf':'value'} "

I've had to edit a couple airflow configs to get this to work correctly,
and so I can trigger the DAG immediately after copying it over to Airflow.

dags_are_paused_at_creation = False
min_file_process_interval = 1

And with the above changes, DAGs can be submitted to airflow in isolation
of one another by multiple developers.


*Issues:*
When I have a lot of DAGs (50+) in Airflow, each with 30+ tasks, Airflow
seems to take a long time to schedule tasks (I think this is a known issue
looking through JIRA, but want to double check with anyone here). I tried
pausing the DAGs to maybe help with scheduling latency, but it looks like
Airflow also tries to reload paused DAGs. I've also created another
scheduled DAG to clean up the old DAGs older than some time so it doesn't
clutter the UI too much

I have to copy the DAG to master and workers, it would be nice if there was
a mechanism for Airflow to manage the distribution of the DAG from the
master to the workers.

Dynamically looking up the DAG id from it's parent folder seems a bit hacky.

*Questions:*

This works for the most part, and allows us to run slightly different
versions of the same DAG in isolation, but wanted to see if there's
opinions on approach, or if there's a better way.

Thanks in advance, would appreciate any feedback!

Duy