You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@airflow.apache.org by Jason Chen <ch...@gmail.com> on 2016/08/26 03:35:33 UTC

Delete a dag ?

Hi,

 How to delete a dag in airflow (instead of turning it off ) ?

 Thanks.

Jason

Re: Delete a dag ?

Posted by siddharth anand <sa...@apache.org>.
There was some discussion around this, as mentioned by Vince, when jlowin
proposed a fix : https://github.com/apache/incubator-airflow/pull/1344

It's really funny (sad) that deleting a dag is so counter-intuitive.

We (Agari) deploy by essentially pointing Airflow's dags_folder (specified
in airflow.cfg) to a git repo. To delete a DAG, we :

   1. *git rm* the dag
   2. git update the git repo on all airflow machines - machines running
   airflow webservers & schedulers
   3. Restart the webserver and scheduler

That does the trick. We run the local scheduler. If you run celery, you
probably have to add the celery worker processes to the list in steps 2 and
3

Dags are identified by name (and I believe also by a path on the file system)
so if a dag with the same name were to ever be checked back into git,
Airflow would resurrect the dag based on what's in the db. Unfortunately,
it will keep the original start date and much of the history. The result is
that the scheduler will start filling in for times since it was "deleted",
doing a potentially expensive backfill.

Now, parts of the code, like the scheduler, periodically reparse the dags
in the dag folder.. so, simply deleting a record of it from the DB will not
suffice because the next reparse will resurrect it. The reparsing happens
every few minutes. However, cleaning up the DB is important as well to
avoid a conflict when you do want to reload a dag that was previously
active.

We (Agari) and Airbnb and a lot of other users depend on git to distribute
a dag to airflow machines, hence the deletion of a dag also depends on git.
This is an unspecified design pattern/dependency of running Airflow. More
plainly, we could just say that airflow depends on some distributed file
system for distributing dags.

One way to decouple the deletion of dags from its distribution is to write
a "tombstone" in a new tombstone table. The tombstone could act as an
"ignore this dag" filter and could be applied during dag parsing. We could
also generate a hash of the contents of the file, so if someone were to
pick the same dag_id as a previously deleted one, if the hashes were
different, then the tombstone would not match - we would match tombstones
based on the dag_id+hash.

There is a challenge around tombstone expiration. The reason is that
airflow would not know the details of its dag_folder's file system. Is it
git, cvs, svn, nfs, etc.. hence, which command should be used to move or
delete the dag file permanently? Until the file could be removed from the
file system, the tombstone could not be expired. My proposal here is to
keep tombstones around until the user did the necessary cleanup
himself/herself. Airflow could check periodically for cleanups of the dag
file and remove tombstones and any rows in tables at that time as well.

I feel there is enough solutioninzing in this email and the PR conversation
preceding it to welcome an implementation of this fix from the community.
If you have some time, please implement this and send a PR.
-s

On Fri, Aug 26, 2016 at 4:34 PM, Lance Norskog <la...@gmail.com>
wrote:

> This is for the data model as of March 2016. I haven't tried it lately.
> Wrap in a transaction.
>
> For MySQL:
>
> set @dag_id = 'BAD_DAG';
> delete from airflow.xcom where dag_id = @dag_id;
> delete from airflow.task_instance where dag_id = @dag_id;
> delete from airflow.sla_miss where dag_id = @dag_id;
> delete from airflow.log where dag_id = @dag_id;
> delete from airflow.job where dag_id = @dag_id;
> delete from airflow.dag_run where dag_id = @dag_id;
> delete from airflow.dag where dag_id = @dag_id;
>
>
> On Thu, Aug 25, 2016 at 8:57 PM, Vince Reuter <vi...@gmail.com>
> wrote:
>
> > Hey Jason, I think it's an open PR https://github.com/apache/
> > incubator-airflow/pull/1344
> >
> > -Vince
> >
> > > On Aug 25, 2016, at 8:35 PM, Jason Chen <ch...@gmail.com>
> > wrote:
> > >
> > > Hi,
> > >
> > > How to delete a dag in airflow (instead of turning it off ) ?
> > >
> > > Thanks.
> > >
> > > Jason
> >
>
>
>
> --
> Lance Norskog
> lance.norskog@gmail.com
> Redwood City, CA
>

Re: Delete a dag ?

Posted by Lance Norskog <la...@gmail.com>.
This is for the data model as of March 2016. I haven't tried it lately.
Wrap in a transaction.

For MySQL:

set @dag_id = 'BAD_DAG';
delete from airflow.xcom where dag_id = @dag_id;
delete from airflow.task_instance where dag_id = @dag_id;
delete from airflow.sla_miss where dag_id = @dag_id;
delete from airflow.log where dag_id = @dag_id;
delete from airflow.job where dag_id = @dag_id;
delete from airflow.dag_run where dag_id = @dag_id;
delete from airflow.dag where dag_id = @dag_id;


On Thu, Aug 25, 2016 at 8:57 PM, Vince Reuter <vi...@gmail.com>
wrote:

> Hey Jason, I think it's an open PR https://github.com/apache/
> incubator-airflow/pull/1344
>
> -Vince
>
> > On Aug 25, 2016, at 8:35 PM, Jason Chen <ch...@gmail.com>
> wrote:
> >
> > Hi,
> >
> > How to delete a dag in airflow (instead of turning it off ) ?
> >
> > Thanks.
> >
> > Jason
>



-- 
Lance Norskog
lance.norskog@gmail.com
Redwood City, CA

Re: Delete a dag ?

Posted by Vince Reuter <vi...@gmail.com>.
Hey Jason, I think it's an open PR https://github.com/apache/incubator-airflow/pull/1344

-Vince

> On Aug 25, 2016, at 8:35 PM, Jason Chen <ch...@gmail.com> wrote:
> 
> Hi,
> 
> How to delete a dag in airflow (instead of turning it off ) ?
> 
> Thanks.
> 
> Jason