You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@airflow.apache.org by Lance Norskog <la...@gmail.com> on 2016/05/19 21:37:54 UTC

How do you use pools?

How should we use pools in our dags?

We do a lot of analytics queries and copying between databases. I've set up
pools for each database instance so that we avoid overloading instances
with queries. Is this the right approach?

Thanks,

-- 
Lance Norskog
lance.norskog@gmail.com
Redwood City, CA

Re: How do you use pools?

Posted by Lance Norskog <la...@gmail.com>.
Ok, thanks.

Yes, there is a problem with over-subscribing pools. If your pool is set to
4, you can get 15 active tasks and another 20 waiting. This is still true
in 1.7.0.

Lance

On Thu, May 19, 2016 at 5:21 PM, Chris Riccomini <cr...@apache.org>
wrote:

> We do the same as well. BigQuery limits UDF usage to 6, so any DAG that
> uses a UDF goes in a pool (the 'udf' pool), which has a max of 6.
>
> On Thu, May 19, 2016 at 4:27 PM, siddharth anand <r3...@gmail.com> wrote:
>
> > Hi Lance!
> > Yes, we do the same. Specifically, we have multiple DAGs that share
> access
> > to a Spark cluster through the use of Pools. By setting the pool size to
> > say 4, we remove the possibility of some backfill swamping the Spark
> > cluster. BTW, there were some bugs with over-subscription of pools. It's
> > not a common occurrence, but it has been reported.
> >
> > -s
> >
> > On Thu, May 19, 2016 at 9:37 PM, Lance Norskog <la...@gmail.com>
> > wrote:
> >
> > > How should we use pools in our dags?
> > >
> > > We do a lot of analytics queries and copying between databases. I've
> set
> > up
> > > pools for each database instance so that we avoid overloading instances
> > > with queries. Is this the right approach?
> > >
> > > Thanks,
> > >
> > > --
> > > Lance Norskog
> > > lance.norskog@gmail.com
> > > Redwood City, CA
> > >
> >
>



-- 
Lance Norskog
lance.norskog@gmail.com
Redwood City, CA

Re: How do you use pools?

Posted by Chris Riccomini <cr...@apache.org>.
We do the same as well. BigQuery limits UDF usage to 6, so any DAG that
uses a UDF goes in a pool (the 'udf' pool), which has a max of 6.

On Thu, May 19, 2016 at 4:27 PM, siddharth anand <r3...@gmail.com> wrote:

> Hi Lance!
> Yes, we do the same. Specifically, we have multiple DAGs that share access
> to a Spark cluster through the use of Pools. By setting the pool size to
> say 4, we remove the possibility of some backfill swamping the Spark
> cluster. BTW, there were some bugs with over-subscription of pools. It's
> not a common occurrence, but it has been reported.
>
> -s
>
> On Thu, May 19, 2016 at 9:37 PM, Lance Norskog <la...@gmail.com>
> wrote:
>
> > How should we use pools in our dags?
> >
> > We do a lot of analytics queries and copying between databases. I've set
> up
> > pools for each database instance so that we avoid overloading instances
> > with queries. Is this the right approach?
> >
> > Thanks,
> >
> > --
> > Lance Norskog
> > lance.norskog@gmail.com
> > Redwood City, CA
> >
>

Re: How do you use pools?

Posted by siddharth anand <r3...@gmail.com>.
Hi Lance!
Yes, we do the same. Specifically, we have multiple DAGs that share access
to a Spark cluster through the use of Pools. By setting the pool size to
say 4, we remove the possibility of some backfill swamping the Spark
cluster. BTW, there were some bugs with over-subscription of pools. It's
not a common occurrence, but it has been reported.

-s

On Thu, May 19, 2016 at 9:37 PM, Lance Norskog <la...@gmail.com>
wrote:

> How should we use pools in our dags?
>
> We do a lot of analytics queries and copying between databases. I've set up
> pools for each database instance so that we avoid overloading instances
> with queries. Is this the right approach?
>
> Thanks,
>
> --
> Lance Norskog
> lance.norskog@gmail.com
> Redwood City, CA
>