You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@airflow.apache.org by Luke Rohde <lr...@quartethealth.com> on 2017/02/04 23:34:45 UTC

serialization of tasks within DAG run

Hello Airflowians!

At the recent airflow meetup in NYC at Blue Apron, I talked with a few
folks (Andrew, Jeremiah) about a question that had been on my mind: if one
deploys code in the middle of a DAG run, can't it be that the completed
parts of the DAG will have run a different version of the code from the
incomplete parts? The DAG itself is serialized and stored in the database
when using remote executors, but this is not a complete solution, since the
code backing the tasks could be different. It seems to me that workers
shouldn't even necessarily need direct access to code as it is deployed;
just as they are deserializing the DAG from the DB, they should be able to
deserialize the task (or obtain it via some other consistent source) and
run it. Jeremiah mentioned that he recalled someone looking into this in
the past and hitting a wall with jinja templates' pickle-ability.

Curious what other people think about this issue and whether the
maintainers would be interested in a patch attempting to fix this - I'd be
happy to take a stab if so.

-Luke