You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@airflow.apache.org by Russell Jurney <ru...@gmail.com> on 2017/02/15 01:26:56 UTC

How to stop tasks from running!? Which sql table or file(s) is it?

I had a backfill operation that failed, and now I can't stop it from
running! I have tried many times to clear the tasks, but this has no
effect. I have tried stopping, clearing and restarting the scheduler, but
this has no effect.

I have opened the sqlite DB and want to remove the record that is causing
the job to run, but I don't know which table (there are lots!)? Is it just
the database, or is there a file some place that I need to edit?

Please help, because I run one thread on SQLite and so I can't get any
other tasks to run until I clear this one :(
---
Russell Jurney @rjurney <http://twitter.com/rjurney>
russell.jurney@gmail.com LI <http://linkedin.com/in/russelljurney> FB
<http://facebook.com/jurney> datasyndrome.com

Re: How to stop tasks from running!? Which sql table or file(s) is it?

Posted by Maxime Beauchemin <ma...@gmail.com>.
Hi Russell,

Individual task instances connect to the database on a interval specified
in your configuration file (30 secs by default) to emit heartbeats. In
recent versions at each heartbeat, when a task instance entry in the
task_instances table is set to "shutdown", is deleted or status differs
from "running" somehow, the process will shut itself down properly.
Shutting down properly means running the operator's `on_kill` method if
defined, running the on_failure/on_retry callback if specified, bumping the
retry number by one (though this may be on task restarting, I forgot) on
marking the task_instance entry as failed (from memory). For that to be
possible, all tasks run in a subprocess to the the parent process can
handle the logic described above.

Similarly if a task fails to emit heartbeats for a certain period of time
while the state is still set to "running", the scheduler will handle the
failure itself. If retries are allowed for that task, it will be
re-triggered on the following scheduler cycle.

We used to have problems with "zombies" and "undead" but from my
understanding the vast majority of these have been addressed.

At scale pretty much anything can and will happen, and you may have in rare
cases to kill some zombies on worker boxes where say both the parent and
subprocesses are held up for some odd reason. Please share and try to get
to the bottom of it if that does happen in your environment. If it's
somewhat minimal or very sporadic I'd advise to automate a distributed unix
command that kills old processes targeting your specific identified issue.

Max

On Tue, Feb 14, 2017 at 5:52 PM, Russell Jurney <ru...@gmail.com>
wrote:

> Ok, I deleted all references to the dag_id of this task in dag_run, jobs
> and task_instance.
>
> The database doesn't seem to control this. What does?
>
> ---
> Russell Jurney @rjurney <http://twitter.com/rjurney>
> russell.jurney@gmail.com LI <http://linkedin.com/in/russelljurney> FB
> <http://facebook.com/jurney> datasyndrome.com
>
> On Tue, Feb 14, 2017 at 5:26 PM, Russell Jurney <ru...@gmail.com>
> wrote:
>
> > I had a backfill operation that failed, and now I can't stop it from
> > running! I have tried many times to clear the tasks, but this has no
> > effect. I have tried stopping, clearing and restarting the scheduler, but
> > this has no effect.
> >
> > I have opened the sqlite DB and want to remove the record that is causing
> > the job to run, but I don't know which table (there are lots!)? Is it
> just
> > the database, or is there a file some place that I need to edit?
> >
> > Please help, because I run one thread on SQLite and so I can't get any
> > other tasks to run until I clear this one :(
> > ---
> > Russell Jurney @rjurney <http://twitter.com/rjurney>
> > russell.jurney@gmail.com LI <http://linkedin.com/in/russelljurney> FB
> > <http://facebook.com/jurney> datasyndrome.com
> >
>

Re: How to stop tasks from running!? Which sql table or file(s) is it?

Posted by Russell Jurney <ru...@gmail.com>.
Ok, I deleted all references to the dag_id of this task in dag_run, jobs
and task_instance.

The database doesn't seem to control this. What does?

---
Russell Jurney @rjurney <http://twitter.com/rjurney>
russell.jurney@gmail.com LI <http://linkedin.com/in/russelljurney> FB
<http://facebook.com/jurney> datasyndrome.com

On Tue, Feb 14, 2017 at 5:26 PM, Russell Jurney <ru...@gmail.com>
wrote:

> I had a backfill operation that failed, and now I can't stop it from
> running! I have tried many times to clear the tasks, but this has no
> effect. I have tried stopping, clearing and restarting the scheduler, but
> this has no effect.
>
> I have opened the sqlite DB and want to remove the record that is causing
> the job to run, but I don't know which table (there are lots!)? Is it just
> the database, or is there a file some place that I need to edit?
>
> Please help, because I run one thread on SQLite and so I can't get any
> other tasks to run until I clear this one :(
> ---
> Russell Jurney @rjurney <http://twitter.com/rjurney>
> russell.jurney@gmail.com LI <http://linkedin.com/in/russelljurney> FB
> <http://facebook.com/jurney> datasyndrome.com
>