You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@airflow.apache.org by Pramiti Goel <pr...@gmail.com> on 2018/10/14 10:56:27 UTC

Question on Running Airflow 1.10 in Kubernetes

Hi,

We are trying to run airflow 1.10 in kubernetes.
1) We are running our scheduler, worker and webserver service in individual
containers.
2) We are using docker image which has airflow 1.10, python 3.x. We are
deploying our dags in docker image.

With above architecture of airflow setup in kubernetes, whenever we deploy
dags, we need to create new docker image, kill the current running workers
in airflow and restart them again with new docker image.

My question is: Is killing airflow worker (starting/stopping airflow worker
service )many times in a day is good and advisable ? What can be the risk
installed if worker doesn't gracefully shutdown(which i have seen quite
some time) ?

Let me know if this is not correct place to ask.

Thanks,
Pramiti

Re: Question on Running Airflow 1.10 in Kubernetes

Posted by Pramiti Goel <pr...@gmail.com>.

Hi Daniel,
Thank you for the reply. In our current deployment, the airflow workers are
running in Kubernetes. But we are not using Kubernetes operator in our
Dags. So our worker are long running pod on kubernetes.  So if we
restart/kill worker when we do new deployment (add/update dags), so I have
following doubt,

Is killing airflow worker (starting/stopping airflow worker service )many
times in a day is good and advisable ? What can be the risk installed if
worker doesn't gracefully shutdown(which i have seen quite some time)  ?

Also as per above replies, One issue is that, while updating dag definition
while tasks are running. But by risk, I meant also that like if we kill the
worker in kubernetes to do new deployment (add/update dags)between the
tasks are running, and worker doesn't do warm shutdown, can this lead to
zombie tasks. Or tasks whose status doesn't get updated etc ?






On Mon, Oct 15, 2018 at 10:37 AM Michael Ghen <mi...@mikeghen.com> wrote:

> We have a similar setup with Kubernetes. We deploy (often several times)
> during the day when DAG runs are active and it does kill them. Like a few
> others mentioned, we do a few things to mitigate any issues this would
> cause:
>
> 1. DAGs are idempotent, can be rerun with no issues (we have a few
> exceptions to this, so it goes)
> 2. We set retries on all DAGs so when they are killed during a deploy, they
> will retry before alerting us
> 3. We log to a GCS bucket
>
> We often do a few deployments in a day because we don't have our local
> development environments set up as well as we should. We are getting better
> at building and testing DAGs locally using Docker. Still, not uncommon to
> do 1 or 2 deploys to production in the day. We have dag runs every hour
> 24/7, deploying while they're running hasn't been an issue given the 3
> precautions taken above.
>
> On Sun, Oct 14, 2018 at 4:48 PM Jeff Payne <jp...@bombora.com> wrote:
>
> > We have a similar airflow system, except that everything is in the same
> > container image. We use GCS for task log file storage, cloudsql postgres
> > for the airflow db, and conda to package our DAGs and dependencies. We
> > redeploy the entire system any time we want to deploy new DAGs or changes
> > to any existing DAGs, which works out to once every week or two, often in
> > the middle of active DAG runs. We are careful to try to keep the DAGs
> > idempotent, which helps. Regardless, being conscious of what the DAGs are
> > doing at each stage also helps ?
> >
> > I'm curious about your use cases that require multiple deployments in a
> > single day...
> >
> > Get Outlook for Android<https://aka.ms/ghei36>
> >
> > ________________________________
> > From: Daniel Imberman <da...@gmail.com>
> > Sent: Sunday, October 14, 2018 8:41:58 AM
> > To: dev@airflow.incubator.apache.org
> > Subject: Re: Question on Running Airflow 1.10 in Kubernetes
> >
> > Hi pramiti,
> >
> > We're in the process of allowing baked in images for the k8s executor
> > (should be merged soon/possibly already merged). With this added you can
> > specify the worker image in the airflow.cfg pretty easily the only
> > potential issue with re-launching multiple times a day would be if a DAG
> > was mid execution. Otherwise should be fine.
> >
> > WRT worker failures with the k8s executor you don't even need to shut
> down
> > the workers since the workers only last as long as the tasks do. We also
> > use the k8s event stream to bubble up any worker failures to the airflow
> UI
> >
> > On Sun, Oct 14, 2018, 3:56 AM Pramiti Goel <pr...@gmail.com>
> > wrote:
> >
> > > Hi,
> > >
> > > We are trying to run airflow 1.10 in kubernetes.
> > > 1) We are running our scheduler, worker and webserver service in
> > individual
> > > containers.
> > > 2) We are using docker image which has airflow 1.10, python 3.x. We are
> > > deploying our dags in docker image.
> > >
> > > With above architecture of airflow setup in kubernetes, whenever we
> > deploy
> > > dags, we need to create new docker image, kill the current running
> > workers
> > > in airflow and restart them again with new docker image.
> > >
> > > My question is: Is killing airflow worker (starting/stopping airflow
> > worker
> > > service )many times in a day is good and advisable ? What can be the
> risk
> > > installed if worker doesn't gracefully shutdown(which i have seen quite
> > > some time) ?
> > >
> > > Let me know if this is not correct place to ask.
> > >
> > > Thanks,
> > > Pramiti
> > >
> >
>

Re: Question on Running Airflow 1.10 in Kubernetes

Posted by Michael Ghen <mi...@mikeghen.com>.

We have a similar setup with Kubernetes. We deploy (often several times)
during the day when DAG runs are active and it does kill them. Like a few
others mentioned, we do a few things to mitigate any issues this would
cause:

1. DAGs are idempotent, can be rerun with no issues (we have a few
exceptions to this, so it goes)
2. We set retries on all DAGs so when they are killed during a deploy, they
will retry before alerting us
3. We log to a GCS bucket

We often do a few deployments in a day because we don't have our local
development environments set up as well as we should. We are getting better
at building and testing DAGs locally using Docker. Still, not uncommon to
do 1 or 2 deploys to production in the day. We have dag runs every hour
24/7, deploying while they're running hasn't been an issue given the 3
precautions taken above.

On Sun, Oct 14, 2018 at 4:48 PM Jeff Payne <jp...@bombora.com> wrote:

> We have a similar airflow system, except that everything is in the same
> container image. We use GCS for task log file storage, cloudsql postgres
> for the airflow db, and conda to package our DAGs and dependencies. We
> redeploy the entire system any time we want to deploy new DAGs or changes
> to any existing DAGs, which works out to once every week or two, often in
> the middle of active DAG runs. We are careful to try to keep the DAGs
> idempotent, which helps. Regardless, being conscious of what the DAGs are
> doing at each stage also helps ?
>
> I'm curious about your use cases that require multiple deployments in a
> single day...
>
> Get Outlook for Android<https://aka.ms/ghei36>
>
> ________________________________
> From: Daniel Imberman <da...@gmail.com>
> Sent: Sunday, October 14, 2018 8:41:58 AM
> To: dev@airflow.incubator.apache.org
> Subject: Re: Question on Running Airflow 1.10 in Kubernetes
>
> Hi pramiti,
>
> We're in the process of allowing baked in images for the k8s executor
> (should be merged soon/possibly already merged). With this added you can
> specify the worker image in the airflow.cfg pretty easily the only
> potential issue with re-launching multiple times a day would be if a DAG
> was mid execution. Otherwise should be fine.
>
> WRT worker failures with the k8s executor you don't even need to shut down
> the workers since the workers only last as long as the tasks do. We also
> use the k8s event stream to bubble up any worker failures to the airflow UI
>
> On Sun, Oct 14, 2018, 3:56 AM Pramiti Goel <pr...@gmail.com>
> wrote:
>
> > Hi,
> >
> > We are trying to run airflow 1.10 in kubernetes.
> > 1) We are running our scheduler, worker and webserver service in
> individual
> > containers.
> > 2) We are using docker image which has airflow 1.10, python 3.x. We are
> > deploying our dags in docker image.
> >
> > With above architecture of airflow setup in kubernetes, whenever we
> deploy
> > dags, we need to create new docker image, kill the current running
> workers
> > in airflow and restart them again with new docker image.
> >
> > My question is: Is killing airflow worker (starting/stopping airflow
> worker
> > service )many times in a day is good and advisable ? What can be the risk
> > installed if worker doesn't gracefully shutdown(which i have seen quite
> > some time) ?
> >
> > Let me know if this is not correct place to ask.
> >
> > Thanks,
> > Pramiti
> >
>

Re: Question on Running Airflow 1.10 in Kubernetes

Posted by Jeff Payne <jp...@bombora.com>.

We have a similar airflow system, except that everything is in the same container image. We use GCS for task log file storage, cloudsql postgres for the airflow db, and conda to package our DAGs and dependencies. We redeploy the entire system any time we want to deploy new DAGs or changes to any existing DAGs, which works out to once every week or two, often in the middle of active DAG runs. We are careful to try to keep the DAGs idempotent, which helps. Regardless, being conscious of what the DAGs are doing at each stage also helps ?

I'm curious about your use cases that require multiple deployments in a single day...

Get Outlook for Android<https://aka.ms/ghei36>

________________________________
From: Daniel Imberman <da...@gmail.com>
Sent: Sunday, October 14, 2018 8:41:58 AM
To: dev@airflow.incubator.apache.org
Subject: Re: Question on Running Airflow 1.10 in Kubernetes

Hi pramiti,

We're in the process of allowing baked in images for the k8s executor
(should be merged soon/possibly already merged). With this added you can
specify the worker image in the airflow.cfg pretty easily the only
potential issue with re-launching multiple times a day would be if a DAG
was mid execution. Otherwise should be fine.

WRT worker failures with the k8s executor you don't even need to shut down
the workers since the workers only last as long as the tasks do. We also
use the k8s event stream to bubble up any worker failures to the airflow UI

On Sun, Oct 14, 2018, 3:56 AM Pramiti Goel <pr...@gmail.com> wrote:

> Hi,
>
> We are trying to run airflow 1.10 in kubernetes.
> 1) We are running our scheduler, worker and webserver service in individual
> containers.
> 2) We are using docker image which has airflow 1.10, python 3.x. We are
> deploying our dags in docker image.
>
> With above architecture of airflow setup in kubernetes, whenever we deploy
> dags, we need to create new docker image, kill the current running workers
> in airflow and restart them again with new docker image.
>
> My question is: Is killing airflow worker (starting/stopping airflow worker
> service )many times in a day is good and advisable ? What can be the risk
> installed if worker doesn't gracefully shutdown(which i have seen quite
> some time) ?
>
> Let me know if this is not correct place to ask.
>
> Thanks,
> Pramiti
>

Re: Question on Running Airflow 1.10 in Kubernetes

Posted by Daniel Imberman <da...@gmail.com>.

Hi pramiti,

We're in the process of allowing baked in images for the k8s executor
(should be merged soon/possibly already merged). With this added you can
specify the worker image in the airflow.cfg pretty easily the only
potential issue with re-launching multiple times a day would be if a DAG
was mid execution. Otherwise should be fine.

WRT worker failures with the k8s executor you don't even need to shut down
the workers since the workers only last as long as the tasks do. We also
use the k8s event stream to bubble up any worker failures to the airflow UI

On Sun, Oct 14, 2018, 3:56 AM Pramiti Goel <pr...@gmail.com> wrote:

> Hi,
>
> We are trying to run airflow 1.10 in kubernetes.
> 1) We are running our scheduler, worker and webserver service in individual
> containers.
> 2) We are using docker image which has airflow 1.10, python 3.x. We are
> deploying our dags in docker image.
>
> With above architecture of airflow setup in kubernetes, whenever we deploy
> dags, we need to create new docker image, kill the current running workers
> in airflow and restart them again with new docker image.
>
> My question is: Is killing airflow worker (starting/stopping airflow worker
> service )many times in a day is good and advisable ? What can be the risk
> installed if worker doesn't gracefully shutdown(which i have seen quite
> some time) ?
>
> Let me know if this is not correct place to ask.
>
> Thanks,
> Pramiti
>