You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@airflow.apache.org by Manish Trivedi <tr...@gmail.com> on 2018/04/06 20:46:16 UTC

Slot pools correct usage

Hi Airflow devs,

I have a use case to limit the # of calls to a certain database. I am using
the pool along with priority weight to schedule the tasks to the slot pool.
I have around 5 operators that I need to execute in serial order across
different dags.

Slot pool is created with "1" slot to ensure sequential exection. I am not
able to achieve the desired function with current setup.

Re: Slot pools correct usage

Posted by Brian Greene <br...@heisenbergwoodworking.com>.
So what’s it doing (your config)?  Does it work if you don’t use pools?  What about if the pool is if size 2?   What if just one dag runs?  Have you ever seen this query work, or is it just since you started messing with pools that it stopped working?

I use 1 pool, no priority (I don’t care about sequence), and it “throttles” fine...

Which executor are you using?  I’m not familiar enough with the intricacies to know if the pool settings are honored with different executors, but I’m using CeleryExecutor with success.

B

Sent from a device with less than stellar autocorrect

> On Apr 6, 2018, at 10:40 PM, Manish Trivedi <tr...@gmail.com> wrote:
> 
> Hi Brian,
> 
> Really appreciate your quick reply. Just to be clear, I did not intend to
> run them in particular order. as a matter of fact, these are expensive db
> queries that I cant afford to run in parallel.
> I think I have setup the tasks correctly to use pool but may be missing the
> priority_weight setting correctly. Appreciate if you could run by your
> configs just to see if I am not missing any simple point.
> 
> thanks much,
> Manish
> 
> On Fri, Apr 6, 2018 at 6:18 PM, Brian Greene <
> brian@heisenbergwoodworking.com> wrote:
> 
>> To be clear, you’re hoping that setting the slots to 1 will cause the
>> tasks across district dags to run in order based on the assumption that
>> they’ll queue up and then execute off the pool?
>> 
>> I don’t think it will quite work that way - there’s no guarantee the
>> scheduler will execute your tasks across dags in any particular sequence,
>> and if 1 is “faster” than the other for sure they don’t “line up”.  Thus,
>> no way to ensure they’ll queue in the right order.
>> 
>> I successfully use pools across many dags to limit access to an expensive
>> resource and it works really well, but my design doesn’t require they
>> execute in any particular order, each idempotent.
>> 
>> I’m curious as to your design/constraints - could you elaborate?
>> 
>> Brian
>> 
>> Sent from a device with less than stellar autocorrect
>> 
>>> On Apr 6, 2018, at 3:46 PM, Manish Trivedi <tr...@gmail.com> wrote:
>>> 
>>> Hi Airflow devs,
>>> 
>>> I have a use case to limit the # of calls to a certain database. I am
>> using
>>> the pool along with priority weight to schedule the tasks to the slot
>> pool.
>>> I have around 5 operators that I need to execute in serial order across
>>> different dags.
>>> 
>>> Slot pool is created with "1" slot to ensure sequential exection. I am
>> not
>>> able to achieve the desired function with current setup.
>> 

Re: Slot pools correct usage

Posted by Manish Trivedi <tr...@gmail.com>.
Hi Brian,

Really appreciate your quick reply. Just to be clear, I did not intend to
run them in particular order. as a matter of fact, these are expensive db
queries that I cant afford to run in parallel.
I think I have setup the tasks correctly to use pool but may be missing the
priority_weight setting correctly. Appreciate if you could run by your
configs just to see if I am not missing any simple point.

thanks much,
Manish

On Fri, Apr 6, 2018 at 6:18 PM, Brian Greene <
brian@heisenbergwoodworking.com> wrote:

> To be clear, you’re hoping that setting the slots to 1 will cause the
> tasks across district dags to run in order based on the assumption that
> they’ll queue up and then execute off the pool?
>
> I don’t think it will quite work that way - there’s no guarantee the
> scheduler will execute your tasks across dags in any particular sequence,
> and if 1 is “faster” than the other for sure they don’t “line up”.  Thus,
> no way to ensure they’ll queue in the right order.
>
> I successfully use pools across many dags to limit access to an expensive
> resource and it works really well, but my design doesn’t require they
> execute in any particular order, each idempotent.
>
> I’m curious as to your design/constraints - could you elaborate?
>
> Brian
>
> Sent from a device with less than stellar autocorrect
>
> > On Apr 6, 2018, at 3:46 PM, Manish Trivedi <tr...@gmail.com> wrote:
> >
> > Hi Airflow devs,
> >
> > I have a use case to limit the # of calls to a certain database. I am
> using
> > the pool along with priority weight to schedule the tasks to the slot
> pool.
> > I have around 5 operators that I need to execute in serial order across
> > different dags.
> >
> > Slot pool is created with "1" slot to ensure sequential exection. I am
> not
> > able to achieve the desired function with current setup.
>

Re: Slot pools correct usage

Posted by Brian Greene <br...@heisenbergwoodworking.com>.
To be clear, you’re hoping that setting the slots to 1 will cause the tasks across district dags to run in order based on the assumption that they’ll queue up and then execute off the pool?

I don’t think it will quite work that way - there’s no guarantee the scheduler will execute your tasks across dags in any particular sequence, and if 1 is “faster” than the other for sure they don’t “line up”.  Thus, no way to ensure they’ll queue in the right order.

I successfully use pools across many dags to limit access to an expensive resource and it works really well, but my design doesn’t require they execute in any particular order, each idempotent.

I’m curious as to your design/constraints - could you elaborate?

Brian

Sent from a device with less than stellar autocorrect

> On Apr 6, 2018, at 3:46 PM, Manish Trivedi <tr...@gmail.com> wrote:
> 
> Hi Airflow devs,
> 
> I have a use case to limit the # of calls to a certain database. I am using
> the pool along with priority weight to schedule the tasks to the slot pool.
> I have around 5 operators that I need to execute in serial order across
> different dags.
> 
> Slot pool is created with "1" slot to ensure sequential exection. I am not
> able to achieve the desired function with current setup.