You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by po...@apache.org on 2022/01/23 13:22:14 UTC

[airflow] 20/24: Compare taskgroup and subdag (#20700)

This is an automated email from the ASF dual-hosted git repository.

potiuk pushed a commit to branch v2-2-test
in repository https://gitbox.apache.org/repos/asf/airflow.git

commit 7cd3fd68fbee97ab84420e70d3f17cd6a21b9e84
Author: Alan Ma <al...@gmail.com>
AuthorDate: Sun Jan 9 13:58:26 2022 -0800

    Compare taskgroup and subdag (#20700)
    
    (cherry picked from commit 6b0c52898555641059e149c5ff0d9b46b2d45379)
---
 docs/apache-airflow/concepts/dags.rst | 43 +++++++++++++++++++++++++++++++++--
 1 file changed, 41 insertions(+), 2 deletions(-)

diff --git a/docs/apache-airflow/concepts/dags.rst b/docs/apache-airflow/concepts/dags.rst
index 8aa4955..8d9b387 100644
--- a/docs/apache-airflow/concepts/dags.rst
+++ b/docs/apache-airflow/concepts/dags.rst
@@ -605,8 +605,47 @@ Some other tips when using SubDAGs:
 
 See ``airflow/example_dags`` for a demonstration.
 
-Note that :doc:`pools` are *not honored* by :class:`~airflow.operators.subdag.SubDagOperator`, and so
-resources could be consumed by SubdagOperators beyond any limits you may have set.
+
+.. note::
+
+    Parallelism is *not honored* by :class:`~airflow.operators.subdag.SubDagOperator`, and so resources could be consumed by SubdagOperators beyond any limits you may have set.
+
+
+
+TaskGroups vs SubDAGs
+----------------------
+
+SubDAGs, while serving a similar purpose as TaskGroups, introduces both performance and functional issues due to its implementation.
+
+* The SubDagOperator starts a BackfillJob, which ignores existing parallelism configurations potentially oversubscribing the worker environment.
+* SubDAGs have their own DAG attributes. When the SubDAG DAG attributes are inconsistent with its parent DAG, unexpected behavior can occur.
+* Unable to see the "full" DAG in one view as SubDAGs exists as a full fledged DAG.
+* SubDAGs introduces all sorts of edge cases and caveats. This can disrupt user experience and expectation.
+
+TaskGroups, on the other hand, is a better option given that it is purely a UI grouping concept. All tasks within the TaskGroup still behave as any other tasks outside of the TaskGroup.
+
+You can see the core differences between these two constructs.
+
++--------------------------------------------------------+--------------------------------------------------------+
+| TaskGroup                                              | SubDAG                                                 |
++========================================================+========================================================+
+| Repeating patterns as part of the same DAG             |  Repeating patterns as a separate DAG                  |
++--------------------------------------------------------+--------------------------------------------------------+
+| One set of views and statistics for the DAG            |  Separate set of views and statistics between parent   |
+|                                                        |  and child DAGs                                        |
++--------------------------------------------------------+--------------------------------------------------------+
+| One set of DAG configuration                           |  Several sets of DAG configurations                    |
++--------------------------------------------------------+--------------------------------------------------------+
+| Honors parallelism configurations through existing     |  Does not honor parallelism configurations due to      |
+| SchedulerJob                                           |  newly spawned BackfillJob                             |
++--------------------------------------------------------+--------------------------------------------------------+
+| Simple construct declaration with context manager      |  Complex DAG factory with naming restrictions          |
++--------------------------------------------------------+--------------------------------------------------------+
+
+.. note::
+
+    SubDAG is deprecated hence TaskGroup is always the preferred choice.
+
 
 
 Packaging DAGs