You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@airflow.apache.org by Andreas Koeltringer <an...@n-fuse.co> on 2018/08/08 16:14:34 UTC

SubdagOperator and Pools

Hi,

we have a SubdagOperator with lots of tasks in it. We want to limit the 
parallelism, with which these tasks execute. Therefore we created a pool 
and added the tasks within the SubdagOperator to this pool.

However, this setting is not respected (see image attached).

Now we am wondering why that is. In 'subdag_operator.py' on the master 
branch there is a comment that

     "Airflow pool is not honored by SubDagOperator."

This comment is not in the file in v1.9.0 (which I am using).

So this means that Pools are not respected for Subdags?

On the other handside it states that Subdags use the SequentialExecutor, 
which *should* execute tasks sequentially?

Can anyone clarify this, please?
And if pools do not work, what options do we have to limit parallelism 
in a Subdag?

Thanks in advance,
Andreas

Re: SubdagOperator and Pools

Posted by Andreas Koeltringer <an...@n-fuse.co>.
Hi,

to clarify, I created a Gist with instructions for how to reproduce this 
issue:

https://gist.github.com/akoeltringer/63fcf0340ae219c112b2a5377e6d2715

thanks, regards
Andreas


On 08/09/2018 07:41 AM, Andreas Koeltringer wrote:
> Hi Tao,
> 
> thanks for your response.
> 
> That's just the thing: I am talking about ONE SubdagOperator: the tasks 
> within in execute in parallel. That's what confuses me.
> 
> 
> Kind regards,
> Andreas
> 
> 
> On 08/08/2018 06:41 PM, Tao Feng wrote:
>> Hi Andreas,
>>
>> The default executor for SubdagOperator is SequentialExecutor which makes
>> sure all the tasks within subdag are executed in sequential order. But if
>> you have too many subdags within single DAG and want to control with
>> pooling(https://airflow.apache.org/concepts.html#pools), subdagOperator u
>> nfortunately doesn't respect pooling(
>> https://issues.apache.org/jira/browse/AIRFLOW-2371) at this momement. My
>> understanding is that airflow uses backfill Scheduler to schedule
>> subdagOperator instead of the normal scheduler which backfill 
>> scheduler has
>> certain discrepancies with the normal scheduler on pooling support.
>>
>> Best,
>> -Tao
>>
>> On Wed, Aug 8, 2018 at 9:14 AM, Andreas Koeltringer <
>> andreas.koeltringer@n-fuse.co> wrote:
>>
>>> Hi,
>>>
>>> we have a SubdagOperator with lots of tasks in it. We want to limit the
>>> parallelism, with which these tasks execute. Therefore we created a pool
>>> and added the tasks within the SubdagOperator to this pool.
>>>
>>> However, this setting is not respected (see image attached).
>>>
>>> Now we am wondering why that is. In 'subdag_operator.py' on the master
>>> branch there is a comment that
>>>
>>>      "Airflow pool is not honored by SubDagOperator."
>>>
>>> This comment is not in the file in v1.9.0 (which I am using).
>>>
>>> So this means that Pools are not respected for Subdags?
>>>
>>> On the other handside it states that Subdags use the SequentialExecutor,
>>> which *should* execute tasks sequentially?
>>>
>>> Can anyone clarify this, please?
>>> And if pools do not work, what options do we have to limit 
>>> parallelism in
>>> a Subdag?
>>>
>>> Thanks in advance,
>>> Andreas
>>>
>>
> 

Re: SubdagOperator and Pools

Posted by Andreas Koeltringer <an...@n-fuse.co>.
Hi Tao,

thanks for your response.

That's just the thing: I am talking about ONE SubdagOperator: the tasks 
within in execute in parallel. That's what confuses me.


Kind regards,
Andreas


On 08/08/2018 06:41 PM, Tao Feng wrote:
> Hi Andreas,
> 
> The default executor for SubdagOperator is SequentialExecutor which makes
> sure all the tasks within subdag are executed in sequential order. But if
> you have too many subdags within single DAG and want to control with
> pooling(https://airflow.apache.org/concepts.html#pools), subdagOperator u
> nfortunately doesn't respect pooling(
> https://issues.apache.org/jira/browse/AIRFLOW-2371) at this momement. My
> understanding is that airflow uses backfill Scheduler to schedule
> subdagOperator instead of the normal scheduler which backfill scheduler has
> certain discrepancies with the normal scheduler on pooling support.
> 
> Best,
> -Tao
> 
> On Wed, Aug 8, 2018 at 9:14 AM, Andreas Koeltringer <
> andreas.koeltringer@n-fuse.co> wrote:
> 
>> Hi,
>>
>> we have a SubdagOperator with lots of tasks in it. We want to limit the
>> parallelism, with which these tasks execute. Therefore we created a pool
>> and added the tasks within the SubdagOperator to this pool.
>>
>> However, this setting is not respected (see image attached).
>>
>> Now we am wondering why that is. In 'subdag_operator.py' on the master
>> branch there is a comment that
>>
>>      "Airflow pool is not honored by SubDagOperator."
>>
>> This comment is not in the file in v1.9.0 (which I am using).
>>
>> So this means that Pools are not respected for Subdags?
>>
>> On the other handside it states that Subdags use the SequentialExecutor,
>> which *should* execute tasks sequentially?
>>
>> Can anyone clarify this, please?
>> And if pools do not work, what options do we have to limit parallelism in
>> a Subdag?
>>
>> Thanks in advance,
>> Andreas
>>
> 

-- 
Andreas Koeltringer
Mail:   andreas.koeltringer@n-fuse.co
Mobile: +49 173 7060379

n-fuse GmbH
Ossietzkystrasse 4
70174 Stuttgart
Germany

Geschäftsführer: Thomas Hoppe
Handelsregister: Amtsgericht Stuttgart HRB 736379


Re: SubdagOperator and Pools

Posted by Tao Feng <fe...@gmail.com>.
Hi Andreas,

The default executor for SubdagOperator is SequentialExecutor which makes
sure all the tasks within subdag are executed in sequential order. But if
you have too many subdags within single DAG and want to control with
pooling(https://airflow.apache.org/concepts.html#pools), subdagOperator u
nfortunately doesn't respect pooling(
https://issues.apache.org/jira/browse/AIRFLOW-2371) at this momement. My
understanding is that airflow uses backfill Scheduler to schedule
subdagOperator instead of the normal scheduler which backfill scheduler has
certain discrepancies with the normal scheduler on pooling support.

Best,
-Tao

On Wed, Aug 8, 2018 at 9:14 AM, Andreas Koeltringer <
andreas.koeltringer@n-fuse.co> wrote:

> Hi,
>
> we have a SubdagOperator with lots of tasks in it. We want to limit the
> parallelism, with which these tasks execute. Therefore we created a pool
> and added the tasks within the SubdagOperator to this pool.
>
> However, this setting is not respected (see image attached).
>
> Now we am wondering why that is. In 'subdag_operator.py' on the master
> branch there is a comment that
>
>     "Airflow pool is not honored by SubDagOperator."
>
> This comment is not in the file in v1.9.0 (which I am using).
>
> So this means that Pools are not respected for Subdags?
>
> On the other handside it states that Subdags use the SequentialExecutor,
> which *should* execute tasks sequentially?
>
> Can anyone clarify this, please?
> And if pools do not work, what options do we have to limit parallelism in
> a Subdag?
>
> Thanks in advance,
> Andreas
>