You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@airflow.apache.org by David Muñoz <da...@gmail.com> on 2019/12/29 17:50:23 UTC

[AirFlow]: Use of SubDags

Hi all,

Apologies if this topic has already been treated.

I want to create a solution for a data pipeline and subdags are perfect due
to it allows me to group the phases / tasks on functional meaning. Reading
documentation and other experiences in internet, strongly recommend to
avoid them, what do you think?

Thanks in advance.

Kind regards.

David.

Re: [AirFlow]: Use of SubDags

Posted by Chao-Han Tsai <mi...@gmail.com>.
Hi David,

A few reasons why SubDagOperator was strongly discouraged in the past:

   - No concurrency control, e.g. you cannot control the number of parallel
   tasks in the subdag via pool or DAG concurrency. SubDagOperator used to
   rely on the backfill scheduler which did not have any concurrency control.
   We added concurrency control to the backfill scheduler in the recent
   release so that should not be an issue in the most recent release.
   - Dead lock issue. SubDagOperator occupies one task slot to launch
   backfill process to schedule tasks in the subdags. If the parent DAG does
   not have enough DAG concurrency, it is possible that all the slots were
   taken by the SubDagOperators. This should be fixed in the latest master
   since SubDagOperator now uses airflow core scheduler to launch tasks but I
   am not sure if it is released.

Chao-Han


On Sun, Dec 29, 2019 at 4:32 PM James Coder <jc...@gmail.com> wrote:

> I find them to be very useful. I think it is an easy way to group a set of
> tasks together that have a
> one to many to one dependency structure. I find using a subdag to group
> the many into a single task makes for a much cleaner dag and makes it
> easier to see the status of the dag.
> I read many of the same “warnings” when first implementing airflow but
> decided to give it a try. I rarely see any issues, but I also have a pretty
> small footprint running only about 20 dags.
> I do use the Celery Executor as well and assign all my sub dags to their
> own pool to try and make sure the sub dags don’t take up too many worker
> slots. I’m not sure how much of a difference that makes but it’s there.
>
> James Coder
>
> > On Dec 29, 2019, at 12:48 PM, David Muñoz <da...@gmail.com>
> wrote:
> >
> > Hi all,
> >
> > Apologies if this topic has already been treated.
> >
> > I want to create a solution for a data pipeline and subdags are perfect
> due
> > to it allows me to group the phases / tasks on functional meaning.
> Reading
> > documentation and other experiences in internet, strongly recommend to
> > avoid them, what do you think?
> >
> > Thanks in advance.
> >
> > Kind regards.
> >
> > David.
>


-- 

Chao-Han Tsai

Re: [AirFlow]: Use of SubDags

Posted by James Coder <jc...@gmail.com>.
I find them to be very useful. I think it is an easy way to group a set of tasks together that have a 
one to many to one dependency structure. I find using a subdag to group the many into a single task makes for a much cleaner dag and makes it easier to see the status of the dag. 
I read many of the same “warnings” when first implementing airflow but decided to give it a try. I rarely see any issues, but I also have a pretty small footprint running only about 20 dags. 
I do use the Celery Executor as well and assign all my sub dags to their own pool to try and make sure the sub dags don’t take up too many worker slots. I’m not sure how much of a difference that makes but it’s there. 

James Coder

> On Dec 29, 2019, at 12:48 PM, David Muñoz <da...@gmail.com> wrote:
> 
> Hi all,
> 
> Apologies if this topic has already been treated.
> 
> I want to create a solution for a data pipeline and subdags are perfect due
> to it allows me to group the phases / tasks on functional meaning. Reading
> documentation and other experiences in internet, strongly recommend to
> avoid them, what do you think?
> 
> Thanks in advance.
> 
> Kind regards.
> 
> David.