You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@airflow.apache.org by Alex Van Boxel <al...@vanboxel.be> on 2016/12/02 12:55:35 UTC

Airflow-GCP for Google Container Engine

Hi all,

I think I pulled it off to have kind of "one-execute" install of Airflow on
Google Container Engine. Personally I find this my preferred setup (because
I can have a production environment and staging environment). It's what I
use for our production/staging setup. I think someone else could pull it
off with everything in this repo:

https://github.com/alexvanboxel/airflow-gcp-k8s

It has a README that should be enough to get it setup.

Now I'll work on my sync'er so my dags will refresh as soon as I push to
git :-)

Re: Airflow-GCP for Google Container Engine

Posted by Koen Mevissen <km...@travix.com>.
Hey Alex, with resource management I meant the Pod CPU and Memory limits
<http://kubernetes.io/docs/admin/limitrange/> which can be set in k8s
deployments. By default the pods will use all available resources on the
node it's on. In theory this could mean a pod ends up by itself on 1 node
when it consumes a lot of resources.

I agree with your remark about delegation in GCP, I tried keeping workers
small so I could run multiple on a single node, and scale the number of
workers up/down accordingly. The reason this failed for me was because I
also had these resource limits apply on scheduler and rabbitmq pods. This
resulted in SIGTERM errors, most likely due to lack of resources.

Where your ConfigMap is now populated from a settings file, I think an
override from env vars would also be very nice.

I'll definitely give this a try when I have some time!



On Mon, Dec 5, 2016 at 1:20 PM, Alex Van Boxel <al...@vanboxel.be> wrote:

> Hey Koen, I tried running it with 2 but one is for my use-case good enough.
> I don't exactly know what you "in-this-context" mean by managing resources,
> but on GCloud you want to delegate as much as possible to the services
> (DataProc/DataFlow/BigQuery) and soon also other docker containers in the
> same cluster, so I like to keep the workers as light as possible. But good
> point about the max (I should make it configurable).
>
> My cluster for now is 2x4CPU for my Kubernetes cluster, but it has other
> stuff running aside from Airflow.
>
> On Mon, Dec 5, 2016 at 12:42 PM Koen Mevissen <km...@travix.com>
> wrote:
>
> > Nice one, thanks! I tried this a while ago, and got stuck on resource
> > issues - probably because I capped the pods max resources to hard.
> >
> > I see in the worker yaml it creates 1 worker replica, are you running it
> > with multiple workers as well? Are you managing any resources on the
> pods,
> > or you let the workers consume whatever's available on the nodes?
> >
> >
> > On Fri, Dec 2, 2016 at 10:14 PM, Chris Riccomini <cr...@apache.org>
> > wrote:
> >
> > > Nice, thanks! :)
> > >
> > > On Fri, Dec 2, 2016 at 4:55 AM, Alex Van Boxel <al...@vanboxel.be>
> wrote:
> > >
> > > > Hi all,
> > > >
> > > > I think I pulled it off to have kind of "one-execute" install of
> > Airflow
> > > on
> > > > Google Container Engine. Personally I find this my preferred setup
> > > (because
> > > > I can have a production environment and staging environment). It's
> > what I
> > > > use for our production/staging setup. I think someone else could pull
> > it
> > > > off with everything in this repo:
> > > >
> > > > https://github.com/alexvanboxel/airflow-gcp-k8s
> > > >
> > > > It has a README that should be enough to get it setup.
> > > >
> > > > Now I'll work on my sync'er so my dags will refresh as soon as I push
> > to
> > > > git :-)
> > > >
> > >
> >
> >
> >
> > --
> > Kind regards,
> > Met vriendelijke groet,
> >
> > *Koen Mevissen*
> > Principal BI Developer
> >
> >
> > *Travix Nederland B.V.*
> > Piet Heinkade 55
> > 1019 GM Amsterdam
> > The Netherlands
> >
> > T. +31 (0)20 203 3241 <+31%2020%20203%203241>
> > E: KMevissen@travix.com
> > www.travix.com
> >
> > *Brands: * CheapTickets  |  Vliegwinkel  |  Vayama  |  BudgetAir  |
> >  Flugladen
> >
>



-- 
Kind regards,
Met vriendelijke groet,

*Koen Mevissen*
Principal BI Developer


*Travix Nederland B.V.*
Piet Heinkade 55
1019 GM Amsterdam
The Netherlands

T. +31 (0)20 203 3241
E: KMevissen@travix.com
www.travix.com

*Brands: * CheapTickets  |  Vliegwinkel  |  Vayama  |  BudgetAir  |
 Flugladen

Re: Airflow-GCP for Google Container Engine

Posted by Alex Van Boxel <al...@vanboxel.be>.
Hey Koen, I tried running it with 2 but one is for my use-case good enough.
I don't exactly know what you "in-this-context" mean by managing resources,
but on GCloud you want to delegate as much as possible to the services
(DataProc/DataFlow/BigQuery) and soon also other docker containers in the
same cluster, so I like to keep the workers as light as possible. But good
point about the max (I should make it configurable).

My cluster for now is 2x4CPU for my Kubernetes cluster, but it has other
stuff running aside from Airflow.

On Mon, Dec 5, 2016 at 12:42 PM Koen Mevissen <km...@travix.com> wrote:

> Nice one, thanks! I tried this a while ago, and got stuck on resource
> issues - probably because I capped the pods max resources to hard.
>
> I see in the worker yaml it creates 1 worker replica, are you running it
> with multiple workers as well? Are you managing any resources on the pods,
> or you let the workers consume whatever's available on the nodes?
>
>
> On Fri, Dec 2, 2016 at 10:14 PM, Chris Riccomini <cr...@apache.org>
> wrote:
>
> > Nice, thanks! :)
> >
> > On Fri, Dec 2, 2016 at 4:55 AM, Alex Van Boxel <al...@vanboxel.be> wrote:
> >
> > > Hi all,
> > >
> > > I think I pulled it off to have kind of "one-execute" install of
> Airflow
> > on
> > > Google Container Engine. Personally I find this my preferred setup
> > (because
> > > I can have a production environment and staging environment). It's
> what I
> > > use for our production/staging setup. I think someone else could pull
> it
> > > off with everything in this repo:
> > >
> > > https://github.com/alexvanboxel/airflow-gcp-k8s
> > >
> > > It has a README that should be enough to get it setup.
> > >
> > > Now I'll work on my sync'er so my dags will refresh as soon as I push
> to
> > > git :-)
> > >
> >
>
>
>
> --
> Kind regards,
> Met vriendelijke groet,
>
> *Koen Mevissen*
> Principal BI Developer
>
>
> *Travix Nederland B.V.*
> Piet Heinkade 55
> 1019 GM Amsterdam
> The Netherlands
>
> T. +31 (0)20 203 3241 <+31%2020%20203%203241>
> E: KMevissen@travix.com
> www.travix.com
>
> *Brands: * CheapTickets  |  Vliegwinkel  |  Vayama  |  BudgetAir  |
>  Flugladen
>

Re: Airflow-GCP for Google Container Engine

Posted by Koen Mevissen <km...@travix.com>.
Nice one, thanks! I tried this a while ago, and got stuck on resource
issues - probably because I capped the pods max resources to hard.

I see in the worker yaml it creates 1 worker replica, are you running it
with multiple workers as well? Are you managing any resources on the pods,
or you let the workers consume whatever's available on the nodes?


On Fri, Dec 2, 2016 at 10:14 PM, Chris Riccomini <cr...@apache.org>
wrote:

> Nice, thanks! :)
>
> On Fri, Dec 2, 2016 at 4:55 AM, Alex Van Boxel <al...@vanboxel.be> wrote:
>
> > Hi all,
> >
> > I think I pulled it off to have kind of "one-execute" install of Airflow
> on
> > Google Container Engine. Personally I find this my preferred setup
> (because
> > I can have a production environment and staging environment). It's what I
> > use for our production/staging setup. I think someone else could pull it
> > off with everything in this repo:
> >
> > https://github.com/alexvanboxel/airflow-gcp-k8s
> >
> > It has a README that should be enough to get it setup.
> >
> > Now I'll work on my sync'er so my dags will refresh as soon as I push to
> > git :-)
> >
>



-- 
Kind regards,
Met vriendelijke groet,

*Koen Mevissen*
Principal BI Developer


*Travix Nederland B.V.*
Piet Heinkade 55
1019 GM Amsterdam
The Netherlands

T. +31 (0)20 203 3241
E: KMevissen@travix.com
www.travix.com

*Brands: * CheapTickets  |  Vliegwinkel  |  Vayama  |  BudgetAir  |
 Flugladen

Re: Airflow-GCP for Google Container Engine

Posted by Chris Riccomini <cr...@apache.org>.
Nice, thanks! :)

On Fri, Dec 2, 2016 at 4:55 AM, Alex Van Boxel <al...@vanboxel.be> wrote:

> Hi all,
>
> I think I pulled it off to have kind of "one-execute" install of Airflow on
> Google Container Engine. Personally I find this my preferred setup (because
> I can have a production environment and staging environment). It's what I
> use for our production/staging setup. I think someone else could pull it
> off with everything in this repo:
>
> https://github.com/alexvanboxel/airflow-gcp-k8s
>
> It has a README that should be enough to get it setup.
>
> Now I'll work on my sync'er so my dags will refresh as soon as I push to
> git :-)
>