You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@airflow.apache.org by Jarek Potiuk <Ja...@polidea.com> on 2019/03/10 19:48:55 UTC

Pods not being TERMinated properly for K8s airflow

NOTE! I changed the subject to not pollute the AIP-12 thread

Fokko,

I think I know why TERM signals do not work in the current POD
implementation. I already experienced that several times  - dockerized app
not receiving TERM signal. The reason was always the same. It is not a bug
actually - it is expected behaviour in case your ENTRYPOINT is in the SHELL
form ("/binary arg1 arg2") or when you use shell script as ENTRYPOINT first
argument in [ "shell script", "arg" ] form.

In those cases "shell" process becomes the "0" init process. Unfortunately
shell process is not prepared to do all the stuff that proper init process
should be doing:

* Inherit orphaned child processes
* reap them
* Handle and propagate signals properly
* Wait until all subprocesses are terminated before terminating itself

What happens is that the TERM signal just kills the init "shell" process
but, then the signal does not reach any of its children and the container
continues to run. It's well known problem in docker world and there are a
number of solutions (including exec-ing from shell or - better - using a
dumb-init/tini in your ENTRYPOINT- very tiny "proper" init implementations
that do what they should do.

You can read more for example here: https://hynek.me/articles/docker-signals
.

J.

On Sun, Mar 10, 2019 at 7:29 PM Driesprong, Fokko <fo...@driesprong.frl>
wrote:

Ps. Jarek, interesting idea. It shouldn't be too hard to make Airflow more
> k8s native. You could package your dags within your container, and do a
> rolling update. Add the DAGs as the last layer, and then point the DAGs
> folder to the same location. The hard part here is that you need to
> gracefuly restart the workers. Currently AFAIK the signals given to the pod
> aren't respected. So when the scheduler/webserver/worker receives a
> SIGTERM, it should stop the jobs nicely and then exit the container, before
> k8s kills the container using a SIGKILL.  This will be challenging with the
> workers, which they are potentially long-running. Maybe stop kicking off
> new jobs, and let the old ones finish, will be good enough, but then we
> need to increase the standard kill timeout substantially. Having this would
> also enable the autoscaling of the workers
>


-- 

Jarek Potiuk
Polidea <https://www.polidea.com/> | Principal Software Engineer

M: +48 660 796 129 <+48660796129>
E: jarek.potiuk@polidea.com

Re: Pods not being TERMinated properly for K8s airflow

Posted by Jarek Potiuk <Ja...@polidea.com>.
One small comment - in my current WIP multi-layered Docker file AIP-7 (
https://github.com/apache/airflow/pull/4543) I used dumb-init (developed by
Yelp) rather than tini. It's a bit better for pythonic world as it can be
installed from PIP:
'dumb-init>=1.2.2',
 (
https://github.com/apache/airflow/pull/4543/files#diff-2eeaed663bd0d25b7e608891384b7298R311
),

and the usage is  also rather straightforward:

ENTRYPOINT ["/usr/local/bin/dumb-init", "--", "/entrypoint.sh"]

(
https://github.com/apache/airflow/pull/4543/files#diff-3254677a7917c6c01f55212f86c57fbfR361
)

One small caveat I noticed so far with using proper "init" is that when you
want to run interactive shell, you need to override --entrypoint -
otherwise the shell has no terminal to use and complains.

J.


On Tue, Mar 12, 2019 at 4:41 AM Daniel Imberman <da...@gmail.com>
wrote:

> Oh yikes. Thanks for catching this @Jarek! @Greg Neiheisel
> <gr...@astronomer.io>  were there a lot of caveats to deploying from tini
> or
> would this be a fairly straightforward fix?
>
> On Sun, Mar 10, 2019 at 5:13 PM Greg Neiheisel <gr...@astronomer.io> wrote:
>
> > I can confirm that wrapping airflow with tini/dumb-init works pretty
> well.
> > We've been relying on it at Astronomer for the past year. We run
> > exclusively on k8s and are restarting airflow pods very frequently.
> Here's
> > our production 1.10.2 image using tini on alpine, for example
> >
> >
> https://github.com/astronomer/astronomer/blob/master/docker/airflow/1.10.2/Dockerfile#L102
> > .
> >
> > On Sun, Mar 10, 2019 at 3:49 PM Jarek Potiuk <Ja...@polidea.com>
> > wrote:
> >
> > > NOTE! I changed the subject to not pollute the AIP-12 thread
> > >
> > > Fokko,
> > >
> > > I think I know why TERM signals do not work in the current POD
> > > implementation. I already experienced that several times  - dockerized
> > app
> > > not receiving TERM signal. The reason was always the same. It is not a
> > bug
> > > actually - it is expected behaviour in case your ENTRYPOINT is in the
> > SHELL
> > > form ("/binary arg1 arg2") or when you use shell script as ENTRYPOINT
> > first
> > > argument in [ "shell script", "arg" ] form.
> > >
> > > In those cases "shell" process becomes the "0" init process.
> > Unfortunately
> > > shell process is not prepared to do all the stuff that proper init
> > process
> > > should be doing:
> > >
> > > * Inherit orphaned child processes
> > > * reap them
> > > * Handle and propagate signals properly
> > > * Wait until all subprocesses are terminated before terminating itself
> > >
> > > What happens is that the TERM signal just kills the init "shell"
> process
> > > but, then the signal does not reach any of its children and the
> container
> > > continues to run. It's well known problem in docker world and there
> are a
> > > number of solutions (including exec-ing from shell or - better - using
> a
> > > dumb-init/tini in your ENTRYPOINT- very tiny "proper" init
> > implementations
> > > that do what they should do.
> > >
> > > You can read more for example here:
> > > https://hynek.me/articles/docker-signals
> > > .
> > >
> > > J.
> > >
> > > On Sun, Mar 10, 2019 at 7:29 PM Driesprong, Fokko <fokko@driesprong.frl
> >
> > > wrote:
> > >
> > > Ps. Jarek, interesting idea. It shouldn't be too hard to make Airflow
> > more
> > > > k8s native. You could package your dags within your container, and
> do a
> > > > rolling update. Add the DAGs as the last layer, and then point the
> DAGs
> > > > folder to the same location. The hard part here is that you need to
> > > > gracefuly restart the workers. Currently AFAIK the signals given to
> the
> > > pod
> > > > aren't respected. So when the scheduler/webserver/worker receives a
> > > > SIGTERM, it should stop the jobs nicely and then exit the container,
> > > before
> > > > k8s kills the container using a SIGKILL.  This will be challenging
> with
> > > the
> > > > workers, which they are potentially long-running. Maybe stop kicking
> > off
> > > > new jobs, and let the old ones finish, will be good enough, but then
> we
> > > > need to increase the standard kill timeout substantially. Having this
> > > would
> > > > also enable the autoscaling of the workers
> > > >
> > >
> > >
> > > --
> > >
> > > Jarek Potiuk
> > > Polidea <https://www.polidea.com/> | Principal Software Engineer
> > >
> > > M: +48 660 796 129 <+48660796129>
> > > E: jarek.potiuk@polidea.com
> > >
> >
> >
> > --
> > *Greg Neiheisel* / CTO Astronomer.io
> >
>


-- 

Jarek Potiuk
Polidea <https://www.polidea.com/> | Principal Software Engineer

M: +48 660 796 129 <+48660796129>
E: jarek.potiuk@polidea.com

Re: Pods not being TERMinated properly for K8s airflow

Posted by Daniel Imberman <da...@gmail.com>.
Oh yikes. Thanks for catching this @Jarek! @Greg Neiheisel
<gr...@astronomer.io>  were there a lot of caveats to deploying from tini or
would this be a fairly straightforward fix?

On Sun, Mar 10, 2019 at 5:13 PM Greg Neiheisel <gr...@astronomer.io> wrote:

> I can confirm that wrapping airflow with tini/dumb-init works pretty well.
> We've been relying on it at Astronomer for the past year. We run
> exclusively on k8s and are restarting airflow pods very frequently. Here's
> our production 1.10.2 image using tini on alpine, for example
>
> https://github.com/astronomer/astronomer/blob/master/docker/airflow/1.10.2/Dockerfile#L102
> .
>
> On Sun, Mar 10, 2019 at 3:49 PM Jarek Potiuk <Ja...@polidea.com>
> wrote:
>
> > NOTE! I changed the subject to not pollute the AIP-12 thread
> >
> > Fokko,
> >
> > I think I know why TERM signals do not work in the current POD
> > implementation. I already experienced that several times  - dockerized
> app
> > not receiving TERM signal. The reason was always the same. It is not a
> bug
> > actually - it is expected behaviour in case your ENTRYPOINT is in the
> SHELL
> > form ("/binary arg1 arg2") or when you use shell script as ENTRYPOINT
> first
> > argument in [ "shell script", "arg" ] form.
> >
> > In those cases "shell" process becomes the "0" init process.
> Unfortunately
> > shell process is not prepared to do all the stuff that proper init
> process
> > should be doing:
> >
> > * Inherit orphaned child processes
> > * reap them
> > * Handle and propagate signals properly
> > * Wait until all subprocesses are terminated before terminating itself
> >
> > What happens is that the TERM signal just kills the init "shell" process
> > but, then the signal does not reach any of its children and the container
> > continues to run. It's well known problem in docker world and there are a
> > number of solutions (including exec-ing from shell or - better - using a
> > dumb-init/tini in your ENTRYPOINT- very tiny "proper" init
> implementations
> > that do what they should do.
> >
> > You can read more for example here:
> > https://hynek.me/articles/docker-signals
> > .
> >
> > J.
> >
> > On Sun, Mar 10, 2019 at 7:29 PM Driesprong, Fokko <fo...@driesprong.frl>
> > wrote:
> >
> > Ps. Jarek, interesting idea. It shouldn't be too hard to make Airflow
> more
> > > k8s native. You could package your dags within your container, and do a
> > > rolling update. Add the DAGs as the last layer, and then point the DAGs
> > > folder to the same location. The hard part here is that you need to
> > > gracefuly restart the workers. Currently AFAIK the signals given to the
> > pod
> > > aren't respected. So when the scheduler/webserver/worker receives a
> > > SIGTERM, it should stop the jobs nicely and then exit the container,
> > before
> > > k8s kills the container using a SIGKILL.  This will be challenging with
> > the
> > > workers, which they are potentially long-running. Maybe stop kicking
> off
> > > new jobs, and let the old ones finish, will be good enough, but then we
> > > need to increase the standard kill timeout substantially. Having this
> > would
> > > also enable the autoscaling of the workers
> > >
> >
> >
> > --
> >
> > Jarek Potiuk
> > Polidea <https://www.polidea.com/> | Principal Software Engineer
> >
> > M: +48 660 796 129 <+48660796129>
> > E: jarek.potiuk@polidea.com
> >
>
>
> --
> *Greg Neiheisel* / CTO Astronomer.io
>

Re: Pods not being TERMinated properly for K8s airflow

Posted by Greg Neiheisel <gr...@astronomer.io>.
I can confirm that wrapping airflow with tini/dumb-init works pretty well.
We've been relying on it at Astronomer for the past year. We run
exclusively on k8s and are restarting airflow pods very frequently. Here's
our production 1.10.2 image using tini on alpine, for example
https://github.com/astronomer/astronomer/blob/master/docker/airflow/1.10.2/Dockerfile#L102
.

On Sun, Mar 10, 2019 at 3:49 PM Jarek Potiuk <Ja...@polidea.com>
wrote:

> NOTE! I changed the subject to not pollute the AIP-12 thread
>
> Fokko,
>
> I think I know why TERM signals do not work in the current POD
> implementation. I already experienced that several times  - dockerized app
> not receiving TERM signal. The reason was always the same. It is not a bug
> actually - it is expected behaviour in case your ENTRYPOINT is in the SHELL
> form ("/binary arg1 arg2") or when you use shell script as ENTRYPOINT first
> argument in [ "shell script", "arg" ] form.
>
> In those cases "shell" process becomes the "0" init process. Unfortunately
> shell process is not prepared to do all the stuff that proper init process
> should be doing:
>
> * Inherit orphaned child processes
> * reap them
> * Handle and propagate signals properly
> * Wait until all subprocesses are terminated before terminating itself
>
> What happens is that the TERM signal just kills the init "shell" process
> but, then the signal does not reach any of its children and the container
> continues to run. It's well known problem in docker world and there are a
> number of solutions (including exec-ing from shell or - better - using a
> dumb-init/tini in your ENTRYPOINT- very tiny "proper" init implementations
> that do what they should do.
>
> You can read more for example here:
> https://hynek.me/articles/docker-signals
> .
>
> J.
>
> On Sun, Mar 10, 2019 at 7:29 PM Driesprong, Fokko <fo...@driesprong.frl>
> wrote:
>
> Ps. Jarek, interesting idea. It shouldn't be too hard to make Airflow more
> > k8s native. You could package your dags within your container, and do a
> > rolling update. Add the DAGs as the last layer, and then point the DAGs
> > folder to the same location. The hard part here is that you need to
> > gracefuly restart the workers. Currently AFAIK the signals given to the
> pod
> > aren't respected. So when the scheduler/webserver/worker receives a
> > SIGTERM, it should stop the jobs nicely and then exit the container,
> before
> > k8s kills the container using a SIGKILL.  This will be challenging with
> the
> > workers, which they are potentially long-running. Maybe stop kicking off
> > new jobs, and let the old ones finish, will be good enough, but then we
> > need to increase the standard kill timeout substantially. Having this
> would
> > also enable the autoscaling of the workers
> >
>
>
> --
>
> Jarek Potiuk
> Polidea <https://www.polidea.com/> | Principal Software Engineer
>
> M: +48 660 796 129 <+48660796129>
> E: jarek.potiuk@polidea.com
>


-- 
*Greg Neiheisel* / CTO Astronomer.io