You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@airflow.apache.org by Jarek Potiuk <Ja...@polidea.com> on 2020/01/04 11:01:22 UTC

Killing Webserver/Scheduler gracefully

I would like to bring the subject from user@ group
https://lists.apache.org/thread.html/5add5e8a19cb86ef2141d9d0634bd01c12d74a7655c4eddfa7b8e75a%40%3Cusers.airflow.apache.org%3E


Seems some people have problems with nicely killing airflow
scheduler/webserver with signals and I was wondering if this already
implemented/or someone has some insight/experience with it and can share
thoughts about it, before we dig deeper?

I know Tomek had recently some experience with killing workers nicely and
is looking at it, but I think it would be great to have working and
described scheduler/webserver killing scenarios - which signals work, how
threads/processes behave when the signals are received etc.

Does anyone have any insight into it ?

J.
-- 

Jarek Potiuk
Polidea <https://www.polidea.com/> | Principal Software Engineer

M: +48 660 796 129 <+48660796129>
[image: Polidea] <https://www.polidea.com/>

Re: Killing Webserver/Scheduler gracefully

Posted by Jarek Potiuk <Ja...@polidea.com>.
Some more links Huang: Docker has its own issues - depending on what is the
docker entrypoint you use, you can get different (and wrong) behaviour. It
is especially problematic if you are using bash script as an entrypoint. It
can lead to zombie processes easily.

There are many blogs about it, but I think one of the best explanations can
be found here (https://github.com/krallin/tini/issues/8)

What is the most important though that as of Docker 1.13 you can use --init
flag when starting docker and it will run the "tini" init process for you
that takes care about reaping all the docker processes.

See: https://github.com/krallin/tini#using-tini for details.

J.

On Sat, Jan 4, 2020 at 9:20 PM Huang Xinbin <bi...@gmail.com> wrote:

> +1 for the `stop` idea too. In my team, we use docker to manage airflow, so
> `stoping` and `restart on failure` are all handled by Docker. It works
> great so far, but not sure if this is a good practice and I would
> definitely like to hear other people's feedback.
>
> i.e. thanks Jarek for the reference links, will take a deeper look into
> those.
>
> Best
> Bin
>
>
> On Sat, Jan 4, 2020 at 6:59 AM hotmail <zh...@hotmail.com> wrote:
>
> > +1 for `stop` arg, and thank Jarek for the clarification.
> >
> > Best Wish
> > — Jiajie
> >
> >
>


-- 

Jarek Potiuk
Polidea <https://www.polidea.com/> | Principal Software Engineer

M: +48 660 796 129 <+48660796129>
[image: Polidea] <https://www.polidea.com/>

Re: Killing Webserver/Scheduler gracefully

Posted by Huang Xinbin <bi...@gmail.com>.
+1 for the `stop` idea too. In my team, we use docker to manage airflow, so
`stoping` and `restart on failure` are all handled by Docker. It works
great so far, but not sure if this is a good practice and I would
definitely like to hear other people's feedback.

i.e. thanks Jarek for the reference links, will take a deeper look into
those.

Best
Bin


On Sat, Jan 4, 2020 at 6:59 AM hotmail <zh...@hotmail.com> wrote:

> +1 for `stop` arg, and thank Jarek for the clarification.
>
> Best Wish
> — Jiajie
>
>

Re: Killing Webserver/Scheduler gracefully

Posted by hotmail <zh...@hotmail.com>.
+1 for `stop` arg, and thank Jarek for the clarification.

Best Wish
— Jiajie


Re: Killing Webserver/Scheduler gracefully

Posted by Jarek Potiuk <Ja...@polidea.com>.
I also like "stop" idea. Also to answer a bit my own question and explain
current behaviour.

We know that if you use systemd or similar (or simply run airflow in
terminal and press ^C) the webserver and scheduler will be killed nicely.
But I think we miss the case when you want to kill the webserver process
itself using the pid (even if we handle the --pid) command.

Not everyone knows that, but pressing ^C actually sends INT signal to the
foreground process group and not to the main process. This is a surprise
for many people who even know how signals work in Unix so I wanted to
mention it here.
You can read more about it here:
https://unix.stackexchange.com/questions/149741/why-is-sigint-not-propagated-to-child-process-when-sent-to-its-parent-process


Systemd uses "control-group" KillMode that basically does the same - that's
why systemd integration works well for airflow.

But if you use manually started webserver/scheduler with -D mode and even
specify --pid file then even if you kill -INT <webserver pid > or kill -INT
<scheduler pid>.  Then (if we do not propagate the signal) -  only main
process is killed. Child process are moved to be owned by init and they
continue running.

I looked briefly at the code and - unless I missed something - it seems
that in -D mode we are not setting our own signal handlers. In the
interactive mode we are setting signal handlers that simply do
sys.exit(0).

I just wonder if others now/looked in the past in how it is done and have
some thoughts about it.

One of the ways how we could improve it (it worked for me in the past) - we
could have Webserver/Scheduler start all the processes in their own new
process group and propagate all signals to that group before handling them.
That would work nicely in both - interactive and daemon mode. Both systemd
integration and manually sending signal to webserver/scheduler would kill
all the processes spawned by webserver/scheduler.

Let me know what you think about it.

J.

On Sat, Jan 4, 2020 at 12:38 PM Kaxil Naik <ka...@gmail.com> wrote:

> That is a good idea I think.
>
> On Sat, Jan 4, 2020 at 11:33 AM Tomasz Urbaszek <tu...@apache.org>
> wrote:
>
> > From some time I think about adding "stop" commands like "airflow
> scheduler
> > stop", "airflow celery worker stop".
> > What do you think? I have already done this in native executor POC and
> it's
> > helpful.
> >
> > T.
> >
> > On Sat, Jan 4, 2020 at 12:22 PM Kaxil Naik <ka...@gmail.com> wrote:
> >
> > > Systemd integrations have worked nicely for me:
> > > https://airflow.apache.org/docs/stable/howto/run-with-systemd.html
> > >
> > >
> > >
> > > On Sat, Jan 4, 2020 at 11:01 AM Jarek Potiuk <Jarek.Potiuk@polidea.com
> >
> > > wrote:
> > >
> > > > I would like to bring the subject from user@ group
> > > >
> > > >
> > >
> >
> https://lists.apache.org/thread.html/5add5e8a19cb86ef2141d9d0634bd01c12d74a7655c4eddfa7b8e75a%40%3Cusers.airflow.apache.org%3E
> > > >
> > > >
> > > > Seems some people have problems with nicely killing airflow
> > > > scheduler/webserver with signals and I was wondering if this already
> > > > implemented/or someone has some insight/experience with it and can
> > share
> > > > thoughts about it, before we dig deeper?
> > > >
> > > > I know Tomek had recently some experience with killing workers nicely
> > and
> > > > is looking at it, but I think it would be great to have working and
> > > > described scheduler/webserver killing scenarios - which signals work,
> > how
> > > > threads/processes behave when the signals are received etc.
> > > >
> > > > Does anyone have any insight into it ?
> > > >
> > > > J.
> > > > --
> > > >
> > > > Jarek Potiuk
> > > > Polidea <https://www.polidea.com/> | Principal Software Engineer
> > > >
> > > > M: +48 660 796 129 <+48660796129>
> > > > [image: Polidea] <https://www.polidea.com/>
> > > >
> > >
> >
>


-- 

Jarek Potiuk
Polidea <https://www.polidea.com/> | Principal Software Engineer

M: +48 660 796 129 <+48660796129>
[image: Polidea] <https://www.polidea.com/>

Re: Killing Webserver/Scheduler gracefully

Posted by Kaxil Naik <ka...@gmail.com>.
That is a good idea I think.

On Sat, Jan 4, 2020 at 11:33 AM Tomasz Urbaszek <tu...@apache.org>
wrote:

> From some time I think about adding "stop" commands like "airflow scheduler
> stop", "airflow celery worker stop".
> What do you think? I have already done this in native executor POC and it's
> helpful.
>
> T.
>
> On Sat, Jan 4, 2020 at 12:22 PM Kaxil Naik <ka...@gmail.com> wrote:
>
> > Systemd integrations have worked nicely for me:
> > https://airflow.apache.org/docs/stable/howto/run-with-systemd.html
> >
> >
> >
> > On Sat, Jan 4, 2020 at 11:01 AM Jarek Potiuk <Ja...@polidea.com>
> > wrote:
> >
> > > I would like to bring the subject from user@ group
> > >
> > >
> >
> https://lists.apache.org/thread.html/5add5e8a19cb86ef2141d9d0634bd01c12d74a7655c4eddfa7b8e75a%40%3Cusers.airflow.apache.org%3E
> > >
> > >
> > > Seems some people have problems with nicely killing airflow
> > > scheduler/webserver with signals and I was wondering if this already
> > > implemented/or someone has some insight/experience with it and can
> share
> > > thoughts about it, before we dig deeper?
> > >
> > > I know Tomek had recently some experience with killing workers nicely
> and
> > > is looking at it, but I think it would be great to have working and
> > > described scheduler/webserver killing scenarios - which signals work,
> how
> > > threads/processes behave when the signals are received etc.
> > >
> > > Does anyone have any insight into it ?
> > >
> > > J.
> > > --
> > >
> > > Jarek Potiuk
> > > Polidea <https://www.polidea.com/> | Principal Software Engineer
> > >
> > > M: +48 660 796 129 <+48660796129>
> > > [image: Polidea] <https://www.polidea.com/>
> > >
> >
>

Re: Killing Webserver/Scheduler gracefully

Posted by Tomasz Urbaszek <tu...@apache.org>.
From some time I think about adding "stop" commands like "airflow scheduler
stop", "airflow celery worker stop".
What do you think? I have already done this in native executor POC and it's
helpful.

T.

On Sat, Jan 4, 2020 at 12:22 PM Kaxil Naik <ka...@gmail.com> wrote:

> Systemd integrations have worked nicely for me:
> https://airflow.apache.org/docs/stable/howto/run-with-systemd.html
>
>
>
> On Sat, Jan 4, 2020 at 11:01 AM Jarek Potiuk <Ja...@polidea.com>
> wrote:
>
> > I would like to bring the subject from user@ group
> >
> >
> https://lists.apache.org/thread.html/5add5e8a19cb86ef2141d9d0634bd01c12d74a7655c4eddfa7b8e75a%40%3Cusers.airflow.apache.org%3E
> >
> >
> > Seems some people have problems with nicely killing airflow
> > scheduler/webserver with signals and I was wondering if this already
> > implemented/or someone has some insight/experience with it and can share
> > thoughts about it, before we dig deeper?
> >
> > I know Tomek had recently some experience with killing workers nicely and
> > is looking at it, but I think it would be great to have working and
> > described scheduler/webserver killing scenarios - which signals work, how
> > threads/processes behave when the signals are received etc.
> >
> > Does anyone have any insight into it ?
> >
> > J.
> > --
> >
> > Jarek Potiuk
> > Polidea <https://www.polidea.com/> | Principal Software Engineer
> >
> > M: +48 660 796 129 <+48660796129>
> > [image: Polidea] <https://www.polidea.com/>
> >
>

Re: Killing Webserver/Scheduler gracefully

Posted by Kaxil Naik <ka...@gmail.com>.
Systemd integrations have worked nicely for me:
https://airflow.apache.org/docs/stable/howto/run-with-systemd.html



On Sat, Jan 4, 2020 at 11:01 AM Jarek Potiuk <Ja...@polidea.com>
wrote:

> I would like to bring the subject from user@ group
>
> https://lists.apache.org/thread.html/5add5e8a19cb86ef2141d9d0634bd01c12d74a7655c4eddfa7b8e75a%40%3Cusers.airflow.apache.org%3E
>
>
> Seems some people have problems with nicely killing airflow
> scheduler/webserver with signals and I was wondering if this already
> implemented/or someone has some insight/experience with it and can share
> thoughts about it, before we dig deeper?
>
> I know Tomek had recently some experience with killing workers nicely and
> is looking at it, but I think it would be great to have working and
> described scheduler/webserver killing scenarios - which signals work, how
> threads/processes behave when the signals are received etc.
>
> Does anyone have any insight into it ?
>
> J.
> --
>
> Jarek Potiuk
> Polidea <https://www.polidea.com/> | Principal Software Engineer
>
> M: +48 660 796 129 <+48660796129>
> [image: Polidea] <https://www.polidea.com/>
>