You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by po...@apache.org on 2022/01/23 13:22:14 UTC
[airflow] 20/24: Compare taskgroup and subdag (#20700)
This is an automated email from the ASF dual-hosted git repository.
potiuk pushed a commit to branch v2-2-test
in repository https://gitbox.apache.org/repos/asf/airflow.git
commit 7cd3fd68fbee97ab84420e70d3f17cd6a21b9e84
Author: Alan Ma <al...@gmail.com>
AuthorDate: Sun Jan 9 13:58:26 2022 -0800
Compare taskgroup and subdag (#20700)
(cherry picked from commit 6b0c52898555641059e149c5ff0d9b46b2d45379)
---
docs/apache-airflow/concepts/dags.rst | 43 +++++++++++++++++++++++++++++++++--
1 file changed, 41 insertions(+), 2 deletions(-)
diff --git a/docs/apache-airflow/concepts/dags.rst b/docs/apache-airflow/concepts/dags.rst
index 8aa4955..8d9b387 100644
--- a/docs/apache-airflow/concepts/dags.rst
+++ b/docs/apache-airflow/concepts/dags.rst
@@ -605,8 +605,47 @@ Some other tips when using SubDAGs:
See ``airflow/example_dags`` for a demonstration.
-Note that :doc:`pools` are *not honored* by :class:`~airflow.operators.subdag.SubDagOperator`, and so
-resources could be consumed by SubdagOperators beyond any limits you may have set.
+
+.. note::
+
+ Parallelism is *not honored* by :class:`~airflow.operators.subdag.SubDagOperator`, and so resources could be consumed by SubdagOperators beyond any limits you may have set.
+
+
+
+TaskGroups vs SubDAGs
+----------------------
+
+SubDAGs, while serving a similar purpose as TaskGroups, introduces both performance and functional issues due to its implementation.
+
+* The SubDagOperator starts a BackfillJob, which ignores existing parallelism configurations potentially oversubscribing the worker environment.
+* SubDAGs have their own DAG attributes. When the SubDAG DAG attributes are inconsistent with its parent DAG, unexpected behavior can occur.
+* Unable to see the "full" DAG in one view as SubDAGs exists as a full fledged DAG.
+* SubDAGs introduces all sorts of edge cases and caveats. This can disrupt user experience and expectation.
+
+TaskGroups, on the other hand, is a better option given that it is purely a UI grouping concept. All tasks within the TaskGroup still behave as any other tasks outside of the TaskGroup.
+
+You can see the core differences between these two constructs.
+
++--------------------------------------------------------+--------------------------------------------------------+
+| TaskGroup | SubDAG |
++========================================================+========================================================+
+| Repeating patterns as part of the same DAG | Repeating patterns as a separate DAG |
++--------------------------------------------------------+--------------------------------------------------------+
+| One set of views and statistics for the DAG | Separate set of views and statistics between parent |
+| | and child DAGs |
++--------------------------------------------------------+--------------------------------------------------------+
+| One set of DAG configuration | Several sets of DAG configurations |
++--------------------------------------------------------+--------------------------------------------------------+
+| Honors parallelism configurations through existing | Does not honor parallelism configurations due to |
+| SchedulerJob | newly spawned BackfillJob |
++--------------------------------------------------------+--------------------------------------------------------+
+| Simple construct declaration with context manager | Complex DAG factory with naming restrictions |
++--------------------------------------------------------+--------------------------------------------------------+
+
+.. note::
+
+ SubDAG is deprecated hence TaskGroup is always the preferred choice.
+
Packaging DAGs