You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2022/01/17 22:51:50 UTC

[GitHub] [airflow] alex-astronomer edited a comment on issue #16764: TaskGroup dependencies handled inconsistently

alex-astronomer edited a comment on issue #16764:
URL: https://github.com/apache/airflow/issues/16764#issuecomment-1014936179


   
   Did some more research and it leads me to believe that if we consider the TaskGroup to be a "dependable" in the same way that we consider tasks able to depend on each other, that is: Taskgroups may depend on or be depended on Tasks and other TaskGroups, then we will be available to avoid many other problems that occur like this.
   
   ---
   
   *Appendix A: Expected Graph and Tree View*
   
   ![7280204D-5796-4741-8F2E-CB0E06C91A28](https://user-images.githubusercontent.com/89415310/149844546-425041a5-576f-4fe6-bcc7-c4124b8f62ec.png)
   ![CDDC9FF2-B1ED-476B-BF35-845FD6028BC0](https://user-images.githubusercontent.com/89415310/149844554-1cbf4e1e-8e73-41fa-a2e6-0bcf07c4cc98.png)
   
   ---
   
   I expect all definitions below to give a graph view, tree view, and actual running order to look like the pictures linked in Appendix A.
   
   Here are the definitions that I found that give the correct graph and tree view:
   
   ```
   with TaskGroup(‘tg’) as taskgroup:
       task1 = PythonOperator(task_id=‘hello1’, python_callable=_print_hello)
       task2 = PythonOperator(task_id=‘hello2’, python_callable=_print_hello)
       task1 >> task2
       start >> taskgroup >> end
   ```
   
   and
   
   ```
   with TaskGroup(‘tg’) as taskgroup:
       task1 = PythonOperator(task_id=‘hello1’, python_callable=_print_hello)
       task2 = PythonOperator(task_id=‘hello2’, python_callable=_print_hello)
       task1 >> task2
   ```
   
   ---
   
   The definition below gives a graph and tree view that are consistent with each other, but not correct and matching with Appendix A:
   
   ```
   with TaskGroup(‘tg’) as taskgroup:
       task1 = PythonOperator(task_id=‘hello1’, python_callable=_print_hello)
       task2 = PythonOperator(task_id=‘hello2’, python_callable=_print_hello)
       start >> taskgroup >> end
       task1 >> task2
   ```
   ![7280204D-5796-4741-8F2E-CB0E06C91A28](https://user-images.githubusercontent.com/89415310/149844849-8b4a31cd-8f1e-41a3-af53-f5f4fb0bc437.png)
   ![9B51C6F3-F2F9-4149-98D1-44FAC8353506](https://user-images.githubusercontent.com/89415310/149844855-6fcdc149-8cf1-4dc7-8734-1b7c4ffc9d57.png)
   ---
   The definition below gives an inconsistent tree and graph view, as well as incorrect running order.  This is the example given by OP of this issue.
   
   ```
   with TaskGroup(‘tg’) as taskgroup:
       start >> taskgroup >> end
       task1 = PythonOperator(task_id=‘hello1’, python_callable=_print_hello)
       task2 = PythonOperator(task_id=‘hello2’, python_callable=_print_hello)
       task1 >> task2
   ```
   ![53DDE4F7-DA19-4FBB-A1D3-359F4F7763F4](https://user-images.githubusercontent.com/89415310/149844811-2635d8ae-c993-403d-a9c3-4e7c3e344fa4.png)
   ![C3AED63E-C82A-4D16-A82F-81E344566624](https://user-images.githubusercontent.com/89415310/149844820-b8a91a52-810a-4235-97b8-c17b4826d193.png)
   
   ---
   
   What we can see from the examples and the diagrams above is that there are a few events which depending on their order can affect the correctness of the dependencies in the DAG as well as the graph and tree view, which are sometimes inconsistent with each other.  The events that are significant in these definitions that I can see are:
   1. taskgroup variable defined
   2. internal tasks defined
   3. dependency set between `start >> taskgroup >> end`
   4. "internal" dependency set between `hello1 >> hello2`
   
   Before steps 2, 3, or 4 happens, we must ensure that step 1 has taken place.  This means that we are left with 3 steps that can have an interchangeable order and affect the graph view, tree view, and running order of the DAG.
   
   I believe that all of the definitions above should give the running order and graph/tree view specified in Appendix A.  This means that steps 2, 3, 4, from the above paragraph can be run in any order and the result will always be the same.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org