You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2020/12/01 17:06:22 UTC

[GitHub] [airflow] VeenaArv opened a new pull request #12741: Adds Docs to compare SubDAGs and TaskGroups

VeenaArv opened a new pull request #12741:
URL: https://github.com/apache/airflow/pull/12741


   <!--
   Thank you for contributing! Please make sure that your code changes
   are covered with tests. And in case of new features or big changes
   remember to adjust the documentation.
   
   Feel free to ping committers for the review!
   
   In case of existing issue, reference it using one of the following:
   
   closes: 12298
   link: https://github.com/apache/airflow/issues/12298
   
   How to write a good git commit message:
   http://chris.beams.io/posts/git-commit/
   -->
   
   
   closes: 12298
   link: https://github.com/apache/airflow/issues/12298
   
   Adds a section under TaskGroup to compare it to SubDAGs.  
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] jhtimmins commented on pull request #12741: Adds Docs to compare SubDAGs and TaskGroups

Posted by GitBox <gi...@apache.org>.
jhtimmins commented on pull request #12741:
URL: https://github.com/apache/airflow/pull/12741#issuecomment-833240069


   Hi @VeenaArv, are you still interested in getting this merged?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] boring-cyborg[bot] commented on pull request #12741: Adds Docs to compare SubDAGs and TaskGroups

Posted by GitBox <gi...@apache.org>.
boring-cyborg[bot] commented on pull request #12741:
URL: https://github.com/apache/airflow/pull/12741#issuecomment-736688127


   Congratulations on your first Pull Request and welcome to the Apache Airflow community! If you have any issues or are unsure about any anything please check our Contribution Guide (https://github.com/apache/airflow/blob/master/CONTRIBUTING.rst)
   Here are some useful points:
   - Pay attention to the quality of your code (flake8, pylint and type annotations). Our [pre-commits]( https://github.com/apache/airflow/blob/master/STATIC_CODE_CHECKS.rst#prerequisites-for-pre-commit-hooks) will help you with that.
   - In case of a new feature add useful documentation (in docstrings or in `docs/` directory). Adding a new operator? Check this short [guide](https://github.com/apache/airflow/blob/master/docs/howto/custom-operator.rst) Consider adding an example DAG that shows how users should use it.
   - Consider using [Breeze environment](https://github.com/apache/airflow/blob/master/BREEZE.rst) for testing locally, it’s a heavy docker but it ships with a working Airflow and a lot of integrations.
   - Be patient and persistent. It might take some time to get a review or get the final approval from Committers.
   - Please follow [ASF Code of Conduct](https://www.apache.org/foundation/policies/conduct) for all communication including (but not limited to) comments on Pull Requests, Mailing list and Slack.
   - Be sure to read the [Airflow Coding style]( https://github.com/apache/airflow/blob/master/CONTRIBUTING.rst#coding-style-and-best-practices).
   Apache Airflow is a community-driven project and together we are making it better πŸš€.
   In case of doubts contact the developers at:
   Mailing List: dev@airflow.apache.org
   Slack: https://s.apache.org/airflow-slack
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] github-actions[bot] commented on pull request #12741: Adds Docs to compare SubDAGs and TaskGroups

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #12741:
URL: https://github.com/apache/airflow/pull/12741#issuecomment-793199285


   This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in 5 days if no further activity occurs. Thank you for your contributions.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ryw commented on pull request #12741: Adds Docs to compare SubDAGs and TaskGroups

Posted by GitBox <gi...@apache.org>.
ryw commented on pull request #12741:
URL: https://github.com/apache/airflow/pull/12741#issuecomment-737280674


   I'd recommend stronger language against subdags. I know @mistercrunch has an opinion on this that we could add.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] potiuk commented on pull request #12741: Adds Docs to compare SubDAGs and TaskGroups

Posted by GitBox <gi...@apache.org>.
potiuk commented on pull request #12741:
URL: https://github.com/apache/airflow/pull/12741#issuecomment-739792222


   Should we complete the review before 2.0.0rc1 tomorrow? @ryw  @VeenaArv @turbaszek -> this one needs some love I think.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] turbaszek commented on a change in pull request #12741: Adds Docs to compare SubDAGs and TaskGroups

Posted by GitBox <gi...@apache.org>.
turbaszek commented on a change in pull request #12741:
URL: https://github.com/apache/airflow/pull/12741#discussion_r533589030



##########
File path: docs/apache-airflow/concepts.rst
##########
@@ -1022,6 +1023,28 @@ This animated gif shows the UI interactions. TaskGroups are expanded or collapse
 
 .. image:: img/task_group.gif
 
+While TaskGroups and SubDAGs are both used to create repeating patterns, depending on your use case, one may be better
+than the other. The SubDagOperator launches a DAG as a separate entity from the original graph. This design pattern
+offers flexibility to create SubDAGs with different schedulers and executors at the cost of greater complexity and
+maintenance burden. TaskGroups creates a UI grouping concept on the same original DAG which simplifies logic and
+maintenance for less flexibility.

Review comment:
       How users can schedule SubDAGs using different scheduler or executor? 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] zuk commented on a change in pull request #12741: Adds Docs to compare SubDAGs and TaskGroups

Posted by GitBox <gi...@apache.org>.
zuk commented on a change in pull request #12741:
URL: https://github.com/apache/airflow/pull/12741#discussion_r562715639



##########
File path: docs/apache-airflow/concepts.rst
##########
@@ -1022,6 +1023,28 @@ This animated gif shows the UI interactions. TaskGroups are expanded or collapse
 
 .. image:: img/task_group.gif
 
+While TaskGroups and SubDAGs are both used to create repeating patterns, depending on your use case, one may be better
+than the other. The SubDagOperator launches a DAG as a separate entity from the original graph. This design pattern
+offers flexibility to create SubDAGs with different schedulers and executors at the cost of greater complexity and
+maintenance burden. TaskGroups creates a UI grouping concept on the same original DAG which simplifies logic and
+maintenance for less flexibility.
+
+Here is a table that summarizes their differences:
++----------------------+----------------------+
+| Task Group           | SubDAG               |
++======================+======================+
+| Repeating patterns   |  Repeating patterns  |
+| live on the same DAG |  run as separate DAGs|
++----------------------+----------------------+
+| Follows schedule of  |  Creates a new       |
+| DAG                  |  schedule            |
++----------------------+----------------------+
+| Has same executor as |  Can specify an      |
+| DAG                  |  executor            |
++----------------------+----------------------+
+| Honors all pool      |  Does not honor pool |
+| configurations       |  configurations      |
++----------------------+----------------------+

Review comment:
       This is very helpful, but one thing still not clear to me from these docs: Since SubDAGs can have different schedules, how does a SubDAG's execution trigger or block executions of tasks further down in the parent DAG?
   
   For example, if a SubDAG is scheduled to execute every 1 hour, and the parent DAG is scheduled every 20 minutes, will the SubDAG be executed every 20 minutes?
   
   Or if the schedules were reversed (SubDAG every 20 minutes and parent DAG every hour), how do the SubDAG's multiple executions in that hour factor into the parent DAG's eventual execution?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] github-actions[bot] commented on pull request #12741: Adds Docs to compare SubDAGs and TaskGroups

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #12741:
URL: https://github.com/apache/airflow/pull/12741#issuecomment-736741625


   [The Workflow run](https://github.com/apache/airflow/actions/runs/394381602) is cancelling this PR. It has some failed jobs matching ^Pylint$,^Static checks,^Build docs$,^Spell check docs$,^Backport packages$,^Provider packages,^Checks: Helm tests$,^Test OpenAPI*.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] github-actions[bot] closed pull request #12741: Adds Docs to compare SubDAGs and TaskGroups

Posted by GitBox <gi...@apache.org>.
github-actions[bot] closed pull request #12741:
URL: https://github.com/apache/airflow/pull/12741


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] dinigo commented on a change in pull request #12741: Adds Docs to compare SubDAGs and TaskGroups

Posted by GitBox <gi...@apache.org>.
dinigo commented on a change in pull request #12741:
URL: https://github.com/apache/airflow/pull/12741#discussion_r629970942



##########
File path: docs/apache-airflow/concepts.rst
##########
@@ -1022,6 +1023,28 @@ This animated gif shows the UI interactions. TaskGroups are expanded or collapse
 
 .. image:: img/task_group.gif
 
+While TaskGroups and SubDAGs are both used to create repeating patterns, depending on your use case, one may be better
+than the other. The SubDagOperator launches a DAG as a separate entity from the original graph. This design pattern
+offers flexibility to create SubDAGs with different schedulers and executors at the cost of greater complexity and
+maintenance burden. TaskGroups creates a UI grouping concept on the same original DAG which simplifies logic and
+maintenance for less flexibility.
+
+Here is a table that summarizes their differences:
++----------------------+----------------------+
+| Task Group           | SubDAG               |
++======================+======================+
+| Repeating patterns   |  Repeating patterns  |
+| live on the same DAG |  run as separate DAGs|
++----------------------+----------------------+
+| Follows schedule of  |  Creates a new       |
+| DAG                  |  schedule            |
++----------------------+----------------------+
+| Has same executor as |  Can specify an      |
+| DAG                  |  executor            |
++----------------------+----------------------+
+| Honors all pool      |  Does not honor pool |
+| configurations       |  configurations      |
++----------------------+----------------------+

Review comment:
       There are two parts here:
   * Parent DAG "SubDAG task" β€” This is the link between the two: the parent and the SubDAG. It is runs  a "Sensor" underneath. I think you can even specify the poking interval.
   * SubDAG "DAG" β€” Can be added to the `global` scope to make it available in the main screen, have different schedule intervals... just like a regular DAG. Only limitation is the name (AFAIK).
   
   SubDAG could easily be renamed as `ExternalDagSensor`




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org