You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@airflow.apache.org by GitBox <gi...@apache.org> on 2021/07/01 22:09:51 UTC

[GitHub] [airflow] tomyedwab opened a new issue #16764: TaskGroup dependencies handled inconsistently

tomyedwab opened a new issue #16764:
URL: https://github.com/apache/airflow/issues/16764


   **Apache Airflow version**: 2.0.1
   
   **Environment**:
   
   - **Cloud provider or hardware configuration**: Various (GCP & local Python)
   - **OS** (e.g. from /etc/os-release): Various (linux, OSX)
   
   **What happened**:
   
   I read the following documentation about Task Groups:
   https://airflow.apache.org/docs/apache-airflow/stable/concepts/index.html
   https://www.astronomer.io/guides/task-groups
   
   From this documentation it seemed that dependencies between Task Groups are possible, which is a really nice feature for complex DAGs where adding a task to one group no longer involves updating the dependencies of tasks in downstream groups.
   
   I implemented a Task Group with dependency relationships to start and end dummy tasks. However when the DAG was run the start, end, and first task of the group all ran simultaneously. It took me a while to see what I was doing wrong, which was that I was adding the group dependencies *before* adding tasks to the group.
   
   One big source of confusion here is that the Graph View of the DAG does show connecting lines from the start/end tasks to the Task Group, so it __looks__ like there should be dependencies when there aren't any. The Tree View however shows no such dependencies.
   
   **What you expected to happen**:
   
   I would expect the Graph View to show the same dependencies as the Tree View, and not show dependencies that aren't actually there.
   
   My mental model from reading the documentation was that the dependencies were set on the group, whereas it seems as if the dependencies are actually set on whatever tasks happen to be in the group at the time the dependency is added.
   
   If this is indeed how Task Groups are intended to work it might be worth clarifying this somewhere in the documentation and not just rely on examples that do the right thing.
   
   **How to reproduce it**:
   
   Here is an example that shows what how my DAG was laid out:
   
   ```
   with DAG(
       'task_group_test',
       default_args=default_args,
       description='Task Group Test',
       start_date=datetime(2021, 7, 1),
       schedule_interval=None) as dag:
   
       start = DummyOperator(task_id='start')
       end = DummyOperator(task_id='end')
   
       with TaskGroup('tg') as taskgroup:
           start >> taskgroup >> end
   
           task1 = PythonOperator(task_id='hello1', python_callable=_print_hello)
           task2 = PythonOperator(task_id='hello2', python_callable=_print_hello)
           task1 >> task2
   ```
   
   Here is what I see in the Graph View:
   ![image](https://user-images.githubusercontent.com/1458589/124194614-bd804780-da7d-11eb-9568-dd3e69015288.png)
   
   Here is what I see in the Tree View:
   ![image](https://user-images.githubusercontent.com/1458589/124194659-d1c44480-da7d-11eb-8af1-70f18645e41d.png)
   
   If I move the `start >> taskgroup >> end` line below the `task1 >> task2` line the Graph View is exactly identical but the Tree View matches my expectation:
   
   ![image](https://user-images.githubusercontent.com/1458589/124194858-24056580-da7e-11eb-9ac2-0091165f3083.png)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] lucharo commented on issue #16764: TaskGroup dependencies handled inconsistently

Posted by GitBox <gi...@apache.org>.
lucharo commented on issue #16764:
URL: https://github.com/apache/airflow/issues/16764#issuecomment-1064104399


   Hello, I am experiencing a similar issue with nested TaskGroups (TG). This is my current GraphView:
   <img width="488" alt="image" src="https://user-images.githubusercontent.com/47890755/157677262-e9535404-145e-4dea-be36-66d4872ea2ff.png">
   The tree view reveals the same: 
   <img width="248" alt="image" src="https://user-images.githubusercontent.com/47890755/157677341-f46c6005-3c6b-4478-9e18-efdcec794d27.png">
   
   This is not the behaviour I would expect to observe based on how I define my dag. Within ANAH-LOAD and DPI-LOAD there is a TG for each year and for each month and then each month contains several tasks. This is the code where I define the top layer of the nested TGs:
   ```py
   with TaskGroup(f'{platform}-LOAD', dag = dag) as TG:
       year_tasks = []
       for year_list in res:
           with TaskGroup(f'{year_list[0][:4]}', dag = dag) as tg_year:
               ym_tasks = []
               for ym in year_list:
                   ym_tasks.append(populate_task_group(ym, gvars))
   
           year_tasks.append(ym_tasks)
   ```
   
   where the function `populate_task_group` simply returns a `TaskGroup` object and `year_list` contains date in `ym` format (e.g. 201901 for January 2019). **Note, the code above is part of a function (`populate()`) that returns the following:**
   
   ```py
   return TG >> end_email
   ```
   
   where `end_email` is an `EmailOperator` as you can imagine
   
   Then finally I define my DAG order/sequence as:
   
   ```py
   reset >> start_populate_email >>  [populate(), populate('ANAH',HadoopCluster.ProdAnaH)]
   ```
   
   Hope I've made myself clear, all help is appreciated and please message me if you need more details
   
   ### Expected behaviour 
   
   I would expect the logic/graph view/tree view to be:
                      
   ```                                                            
   cleanrefresh >> start-populate-notification >> ANAH-LOAD >> END-EMAIL-ANAH 
                                               >> DPI-LOAD >> END-EMAIL-DPI
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] alex-astronomer commented on issue #16764: TaskGroup dependencies handled inconsistently

Posted by GitBox <gi...@apache.org>.
alex-astronomer commented on issue #16764:
URL: https://github.com/apache/airflow/issues/16764#issuecomment-1014936179


   Did some more research and it leads me to believe that if we consider the TaskGroup to be a "dependable" in the same way that we consider tasks able to depend on each other, that is: Taskgroups may depend on or be depended on Tasks and other TaskGroups, then we will be available to avoid many other problems that occur like this.
   
   ---
   
   *Appendix A: Expected Graph and Tree View*
   
   ![7280204D-5796-4741-8F2E-CB0E06C91A28](https://user-images.githubusercontent.com/89415310/149844546-425041a5-576f-4fe6-bcc7-c4124b8f62ec.png)
   ![CDDC9FF2-B1ED-476B-BF35-845FD6028BC0](https://user-images.githubusercontent.com/89415310/149844554-1cbf4e1e-8e73-41fa-a2e6-0bcf07c4cc98.png)
   
   ---
   
   I expect all definitions below to give a graph view, tree view, and actual running order to look like the pictures linked in Appendix A.
   
   Here are the definitions that I found that give the correct graph and tree view:
   
   ```
   with TaskGroup(‘tg’) as taskgroup:
       task1 = PythonOperator(task_id=‘hello1’, python_callable=_print_hello)
       task2 = PythonOperator(task_id=‘hello2’, python_callable=_print_hello)
       task1 >> task2
       start >> taskgroup >> end
   ```
   
   and
   
   ```
   with TaskGroup(‘tg’) as taskgroup:
       task1 = PythonOperator(task_id=‘hello1’, python_callable=_print_hello)
       task2 = PythonOperator(task_id=‘hello2’, python_callable=_print_hello)
       task1 >> task2
   ```
   
   The definition below gives a graph and tree view that are consistent with each other, but not correct and matching with Appendix A:
   
   ```
   with TaskGroup(‘tg’) as taskgroup:
       task1 = PythonOperator(task_id=‘hello1’, python_callable=_print_hello)
       task2 = PythonOperator(task_id=‘hello2’, python_callable=_print_hello)
       start >> taskgroup >> end
       task1 >> task2
   ```
   ![7280204D-5796-4741-8F2E-CB0E06C91A28](https://user-images.githubusercontent.com/89415310/149844849-8b4a31cd-8f1e-41a3-af53-f5f4fb0bc437.png)
   ![9B51C6F3-F2F9-4149-98D1-44FAC8353506](https://user-ima
   Did some more research and it leads me to believe that if we consider the TaskGroup to be a "dependable" in the same way that we consider tasks able to depend on each other, that is: Taskgroups may depend on or be depended on Tasks and other TaskGroups, then we will be available to avoid many other problems that occur like this.
   
   ---
   
   *Appendix A: Expected Graph and Tree View*
   
   ![7280204D-5796-4741-8F2E-CB0E06C91A28](https://user-images.githubusercontent.com/89415310/149844546-425041a5-576f-4fe6-bcc7-c4124b8f62ec.png)
   ![CDDC9FF2-B1ED-476B-BF35-845FD6028BC0](https://user-images.githubusercontent.com/89415310/149844554-1cbf4e1e-8e73-41fa-a2e6-0bcf07c4cc98.png)
   
   ---
   
   I expect all definitions below to give a graph view, tree view, and actual running order to look like the pictures linked in Appendix A.
   
   Here are the definitions that I found that give the correct graph and tree view:
   
   ```
   with TaskGroup(‘tg’) as taskgroup:
       task1 = PythonOperator(task_id=‘hello1’, python_callable=_print_hello)
       task2 = PythonOperator(task_id=‘hello2’, python_callable=_print_hello)
       task1 >> task2
       start >> taskgroup >> end
   ```
   
   and
   
   ```
   with TaskGroup(‘tg’) as taskgroup:
       task1 = PythonOperator(task_id=‘hello1’, python_callable=_print_hello)
       task2 = PythonOperator(task_id=‘hello2’, python_callable=_print_hello)
       task1 >> task2
   ```
   
   ---
   
   The definition below gives a graph and tree view that are consistent with each other, but not correct and matching with Appendix A:
   
   ```
   with TaskGroup(‘tg’) as taskgroup:
       task1 = PythonOperator(task_id=‘hello1’, python_callable=_print_hello)
       task2 = PythonOperator(task_id=‘hello2’, python_callable=_print_hello)
       start >> taskgroup >> end
       task1 >> task2
   ```
   ![7280204D-5796-4741-8F2E-CB0E06C91A28](https://user-images.githubusercontent.com/89415310/149844849-8b4a31cd-8f1e-41a3-af53-f5f4fb0bc437.png)
   ![9B51C6F3-F2F9-4149-98D1-44FAC8353506](https://user-images.githubusercontent.com/89415310/149844855-6fcdc149-8cf1-4dc7-8734-1b7c4ffc9d57.png)
   ---
   The definition below gives an inconsistent tree and graph view, as well as incorrect running order.  This is the example given by OP of this issue.
   
   ```
   with TaskGroup(‘tg’) as taskgroup:
       start >> taskgroup >> end
       task1 = PythonOperator(task_id=‘hello1’, python_callable=_print_hello)
       task2 = PythonOperator(task_id=‘hello2’, python_callable=_print_hello)
       task1 >> task2
   ```
   ![53DDE4F7-DA19-4FBB-A1D3-359F4F7763F4](https://user-images.githubusercontent.com/89415310/149844811-2635d8ae-c993-403d-a9c3-4e7c3e344fa4.png)
   ![C3AED63E-C82A-4D16-A82F-81E344566624](https://user-images.githubusercontent.com/89415310/149844820-b8a91a52-810a-4235-97b8-c17b4826d193.png)
   
   ---
   
   What we can see from the examples and the diagrams above is that there are a few events which depending on their order can affect the correctness of the dependencies in the DAG as well as the graph and tree view, which are sometimes inconsistent with each other.  The events that are significant in these definitions that I can see are:
   1. taskgroup variable defined
   2. internal tasks defined
   3. dependency set between `start >> taskgroup >> end`
   4. "internal" dependency set between `hello1 >> hello2`
   
   Before steps 2, 3, or 4 happens, we must ensure that step 1 has taken place.  This means that we are left with 3 steps that can have an interchangeable order and affect the graph view, tree view, and running order of the DAG.
   
   I believe that all of the definitions above should give the running order and graph/tree view specified in Appendix A.  This means that steps 2, 3, 4, from the above paragraph can be run in any order and the result will always be the same.ges.githubusercontent.com/89415310/149844855-6fcdc149-8cf1-4dc7-8734-1b7c4ffc9d57.png)
   
   The definition below gives an inconsistent tree and graph view, as well as incorrect running order.  This is the example given by OP of this issue.
   
   ```
   with TaskGroup(‘tg’) as taskgroup:
       start >> taskgroup >> end
       task1 = PythonOperator(task_id=‘hello1’, python_callable=_print_hello)
       task2 = PythonOperator(task_id=‘hello2’, python_callable=_print_hello)
       task1 >> task2
   ```
   ![53DDE4F7-DA19-4FBB-A1D3-359F4F7763F4](https://user-images.githubusercontent.com/89415310/149844811-2635d8ae-c993-403d-a9c3-4e7c3e344fa4.png)
   ![C3AED63E-C82A-4D16-A82F-81E344566624](https://user-images.githubusercontent.com/89415310/149844820-b8a91a52-810a-4235-97b8-c17b4826d193.png)
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] alex-astronomer edited a comment on issue #16764: TaskGroup dependencies handled inconsistently

Posted by GitBox <gi...@apache.org>.
alex-astronomer edited a comment on issue #16764:
URL: https://github.com/apache/airflow/issues/16764#issuecomment-1014936179


   
   Did some more research and it leads me to believe that if we consider the TaskGroup to be a "dependable" in the same way that we consider tasks able to depend on each other, that is: Taskgroups may depend on or be depended on Tasks and other TaskGroups, then we will be available to avoid many other problems that occur like this.
   
   ---
   
   *Appendix A: Expected Graph and Tree View*
   
   ![7280204D-5796-4741-8F2E-CB0E06C91A28](https://user-images.githubusercontent.com/89415310/149844546-425041a5-576f-4fe6-bcc7-c4124b8f62ec.png)
   ![CDDC9FF2-B1ED-476B-BF35-845FD6028BC0](https://user-images.githubusercontent.com/89415310/149844554-1cbf4e1e-8e73-41fa-a2e6-0bcf07c4cc98.png)
   
   ---
   
   I expect all definitions below to give a graph view, tree view, and actual running order to look like the pictures linked in Appendix A.
   
   Here are the definitions that I found that give the correct graph and tree view:
   
   ```
   with TaskGroup(‘tg’) as taskgroup:
       task1 = PythonOperator(task_id=‘hello1’, python_callable=_print_hello)
       task2 = PythonOperator(task_id=‘hello2’, python_callable=_print_hello)
       task1 >> task2
       start >> taskgroup >> end
   ```
   
   and
   
   ```
   with TaskGroup('tg') as taskgroup:  
       task1 = PythonOperator(task_id='hello1', python_callable=_print_hello)  
       task2 = PythonOperator(task_id='hello2', python_callable=_print_hello)  
       task1 >> task2  
     
   start >> taskgroup >> end
   ```
   
   ---
   
   The definition below gives a graph and tree view that are consistent with each other, but not correct and matching with Appendix A:
   
   ```
   with TaskGroup(‘tg’) as taskgroup:
       task1 = PythonOperator(task_id=‘hello1’, python_callable=_print_hello)
       task2 = PythonOperator(task_id=‘hello2’, python_callable=_print_hello)
       start >> taskgroup >> end
       task1 >> task2
   ```
   
   ![1F30F659-8BBC-4014-997A-EB00F0BB0D42](https://user-images.githubusercontent.com/89415310/149845443-bddedfde-1453-4a09-978a-27d0bf8b56fa.png)
   ![6641254F-B31D-4BF1-9846-C6B17875C223](https://user-images.githubusercontent.com/89415310/149845454-54a4a8ab-8f67-42f1-8d92-2e9285670f26.png)
   
   ---
   The definition below gives an inconsistent tree and graph view, as well as incorrect running order.  This is the example given by OP of this issue.
   
   ```
   with TaskGroup(‘tg’) as taskgroup:
       start >> taskgroup >> end
       task1 = PythonOperator(task_id=‘hello1’, python_callable=_print_hello)
       task2 = PythonOperator(task_id=‘hello2’, python_callable=_print_hello)
       task1 >> task2
   ```
   ![53DDE4F7-DA19-4FBB-A1D3-359F4F7763F4](https://user-images.githubusercontent.com/89415310/149844811-2635d8ae-c993-403d-a9c3-4e7c3e344fa4.png)
   ![C3AED63E-C82A-4D16-A82F-81E344566624](https://user-images.githubusercontent.com/89415310/149844820-b8a91a52-810a-4235-97b8-c17b4826d193.png)
   
   ---
   
   What we can see from the examples and the diagrams above is that there are a few events which depending on their order can affect the correctness of the dependencies in the DAG as well as the graph and tree view, which are sometimes inconsistent with each other.  The events that are significant in these definitions that I can see are:
   1. taskgroup variable defined
   2. internal tasks defined
   3. dependency set between `start >> taskgroup >> end`
   4. "internal" dependency set between `hello1 >> hello2`
   
   Before steps 2, 3, or 4 happens, we must ensure that step 1 has taken place.  This means that we are left with 3 steps that can have an interchangeable order and affect the graph view, tree view, and running order of the DAG.
   
   I believe that all of the definitions above should give the running order and graph/tree view specified in Appendix A.  This means that steps 2, 3, 4, from the above paragraph can be run in any order and the result will always be the same.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] alex-astronomer commented on issue #16764: TaskGroup dependencies handled inconsistently

Posted by GitBox <gi...@apache.org>.
alex-astronomer commented on issue #16764:
URL: https://github.com/apache/airflow/issues/16764#issuecomment-1014913471


   I think in order to really tackle this one, there are two issues here that need to be addressed.
   
   1. Just like the ticket mentions, we do expect that the TaskGroup can be a part of the dependency chain right?  So if someone were to build a dependency like the "broken" example from this ticket, then the tasks would still all be connected like they are supposed to be.  That would be my understanding at least, from the perspective of a user rather than a developer.  By using the taskgroup as a "top-level" dependency, and handling all "sub-dependencies" within the TaskGroup separately, I think this problem could be solved.
   2. The graph view and tree view are showing inconsistencies, and my understanding is that the tree view dependencies are being honored in this case, rather than the ones that are showing in the graph view.
   
   I believe that by solving these two problems independently, with respect to the bugs that are involved with TaskGroups, as shown in this Issue, will give a better base for the platform and provide a better integration with TaskGroups.  Thoughts on anything that I've written here?  I also believe that one solution may also inform the other.  I know that you've already come up with some ideas @ashb but I'm really curious to see how you feel about tackling the two problems that I've given here separately, using one to inform the other.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ashb commented on issue #16764: TaskGroup dependencies handled inconsistently

Posted by GitBox <gi...@apache.org>.
ashb commented on issue #16764:
URL: https://github.com/apache/airflow/issues/16764#issuecomment-875549047


   Oh yeah.
   
   Okay, so why this happens:
   
   TaskGroups don't actually exist as dependencies, so when you do `start >> taskgroup >> end` you are setting the downstream of start and the upstream of end based on the _current_ tasks in the taskgroup.
   
   To fix this properly we would need need a "two pass" approach (which I think, isn't a problem): the first pass happens when parsing the DAG file, and when we do `start >> taskgroup` we store the Actual TaskGroup there, and only in the second pass (likely when we "bag" the DAG, handled internally in the parsing process of Airflow) is when we'd turn TaskGroups in the dependencies in to their actual values.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ashb commented on issue #16764: TaskGroup dependencies handled inconsistently

Posted by GitBox <gi...@apache.org>.
ashb commented on issue #16764:
URL: https://github.com/apache/airflow/issues/16764#issuecomment-875549625


   The workaround for now, is as you said, to move the `start >> taskgroup >> end` to outside of the TG context.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] alex-astronomer edited a comment on issue #16764: TaskGroup dependencies handled inconsistently

Posted by GitBox <gi...@apache.org>.
alex-astronomer edited a comment on issue #16764:
URL: https://github.com/apache/airflow/issues/16764#issuecomment-1014936179


   
   Did some more research and it leads me to believe that if we consider the TaskGroup to be a "dependable" in the same way that we consider tasks able to depend on each other, that is: Taskgroups may depend on or be depended on Tasks and other TaskGroups, then we will be available to avoid many other problems that occur like this.
   
   ---
   
   *Appendix A: Expected Graph and Tree View*
   
   ![7280204D-5796-4741-8F2E-CB0E06C91A28](https://user-images.githubusercontent.com/89415310/149844546-425041a5-576f-4fe6-bcc7-c4124b8f62ec.png)
   ![CDDC9FF2-B1ED-476B-BF35-845FD6028BC0](https://user-images.githubusercontent.com/89415310/149844554-1cbf4e1e-8e73-41fa-a2e6-0bcf07c4cc98.png)
   
   ---
   
   I expect all definitions below to give a graph view, tree view, and actual running order to look like the pictures linked in Appendix A.
   
   Here are the definitions that I found that give the correct graph and tree view:
   
   ```
   with TaskGroup(‘tg’) as taskgroup:
       task1 = PythonOperator(task_id=‘hello1’, python_callable=_print_hello)
       task2 = PythonOperator(task_id=‘hello2’, python_callable=_print_hello)
       task1 >> task2
       start >> taskgroup >> end
   ```
   
   and
   
   ```
   with TaskGroup('tg') as taskgroup:  
       task1 = PythonOperator(task_id='hello1', python_callable=_print_hello)  
       task2 = PythonOperator(task_id='hello2', python_callable=_print_hello)  
       task1 >> task2  
     
   start >> taskgroup >> end
   ```
   
   ---
   
   The definition below gives a graph and tree view that are consistent with each other, but not correct and matching with Appendix A:
   
   ```
   with TaskGroup(‘tg’) as taskgroup:
       task1 = PythonOperator(task_id=‘hello1’, python_callable=_print_hello)
       task2 = PythonOperator(task_id=‘hello2’, python_callable=_print_hello)
       start >> taskgroup >> end
       task1 >> task2
   ```
   
   ![53DDE4F7-DA19-4FBB-A1D3-359F4F7763F4](https://user-images.githubusercontent.com/89415310/149845400-cb5e6cb5-611a-4333-8610-bd45c56451ec.png)
   ![C3AED63E-C82A-4D16-A82F-81E344566624](https://user-images.githubusercontent.com/89415310/149845405-60cf61a7-73f5-43a6-b2be-199b325a35bf.png)
   ---
   The definition below gives an inconsistent tree and graph view, as well as incorrect running order.  This is the example given by OP of this issue.
   
   ```
   with TaskGroup(‘tg’) as taskgroup:
       start >> taskgroup >> end
       task1 = PythonOperator(task_id=‘hello1’, python_callable=_print_hello)
       task2 = PythonOperator(task_id=‘hello2’, python_callable=_print_hello)
       task1 >> task2
   ```
   ![53DDE4F7-DA19-4FBB-A1D3-359F4F7763F4](https://user-images.githubusercontent.com/89415310/149844811-2635d8ae-c993-403d-a9c3-4e7c3e344fa4.png)
   ![C3AED63E-C82A-4D16-A82F-81E344566624](https://user-images.githubusercontent.com/89415310/149844820-b8a91a52-810a-4235-97b8-c17b4826d193.png)
   
   ---
   
   What we can see from the examples and the diagrams above is that there are a few events which depending on their order can affect the correctness of the dependencies in the DAG as well as the graph and tree view, which are sometimes inconsistent with each other.  The events that are significant in these definitions that I can see are:
   1. taskgroup variable defined
   2. internal tasks defined
   3. dependency set between `start >> taskgroup >> end`
   4. "internal" dependency set between `hello1 >> hello2`
   
   Before steps 2, 3, or 4 happens, we must ensure that step 1 has taken place.  This means that we are left with 3 steps that can have an interchangeable order and affect the graph view, tree view, and running order of the DAG.
   
   I believe that all of the definitions above should give the running order and graph/tree view specified in Appendix A.  This means that steps 2, 3, 4, from the above paragraph can be run in any order and the result will always be the same.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] ashb commented on issue #16764: TaskGroup dependencies handled inconsistently

Posted by GitBox <gi...@apache.org>.
ashb commented on issue #16764:
URL: https://github.com/apache/airflow/issues/16764#issuecomment-1015338563


   That would work, but there are also other problems that we get from not having TaskGroups actually exist in the DAG/dependency chain, so another option is to make tasks be able to depend directly on TaskGroups (I.e. rename downstream_task_ids to downstream_ids)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] uranusjr commented on issue #16764: TaskGroup dependencies handled inconsistently

Posted by GitBox <gi...@apache.org>.
uranusjr commented on issue #16764:
URL: https://github.com/apache/airflow/issues/16764#issuecomment-1015058864


   One way I can think of to get rid of all the complexity is to _prohibit_ the task group from being used before the context manager exits, i.e.
   
   ```python
   with TaskGroup('tg') as taskgroup:
       task1 = ...
       task2 = ...
       task1 >> task2  # Fine.
       start >> taskgroup >> end  # Throws error!
   ```
   
   ```python
   with TaskGroup('tg') as taskgroup:
       task1 = ...
       task2 = ...
       task1 >> task2
   start >> taskgroup >> end  # Must do this instead.
   ```
   
   This essentially ensures the step 3 happens after step 2, and leaves only steps 3 and 4 to be interchangable. Does
   
   ```python
   with TaskGroup('tg') as taskgroup:
       task1 = ...
       task2 = ...
   start >> taskgroup >> end
   task1 >> task2
   ```
   
   produce an “expected” graph? If it does, all problems would be solved from what I can tell; if not, this is the only thing we need to fix (aside from implementing logic to prohibit a task gorup to be used before exiting).
   
   This would break some of the existing usages, but I _think_ we might be able to get away with the breaking change because most of the usages that would break does not work very well right now anyway (as shown in this issue), and therefore are unlikely to be widely in use right now.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [airflow] alex-astronomer edited a comment on issue #16764: TaskGroup dependencies handled inconsistently

Posted by GitBox <gi...@apache.org>.
alex-astronomer edited a comment on issue #16764:
URL: https://github.com/apache/airflow/issues/16764#issuecomment-1014936179


   
   Did some more research and it leads me to believe that if we consider the TaskGroup to be a "dependable" in the same way that we consider tasks able to depend on each other, that is: Taskgroups may depend on or be depended on Tasks and other TaskGroups, then we will be available to avoid many other problems that occur like this.
   
   ---
   
   *Appendix A: Expected Graph and Tree View*
   
   ![7280204D-5796-4741-8F2E-CB0E06C91A28](https://user-images.githubusercontent.com/89415310/149844546-425041a5-576f-4fe6-bcc7-c4124b8f62ec.png)
   ![CDDC9FF2-B1ED-476B-BF35-845FD6028BC0](https://user-images.githubusercontent.com/89415310/149844554-1cbf4e1e-8e73-41fa-a2e6-0bcf07c4cc98.png)
   
   ---
   
   I expect all definitions below to give a graph view, tree view, and actual running order to look like the pictures linked in Appendix A.
   
   Here are the definitions that I found that give the correct graph and tree view:
   
   ```
   with TaskGroup(‘tg’) as taskgroup:
       task1 = PythonOperator(task_id=‘hello1’, python_callable=_print_hello)
       task2 = PythonOperator(task_id=‘hello2’, python_callable=_print_hello)
       task1 >> task2
       start >> taskgroup >> end
   ```
   
   and
   
   ```
   with TaskGroup(‘tg’) as taskgroup:
       task1 = PythonOperator(task_id=‘hello1’, python_callable=_print_hello)
       task2 = PythonOperator(task_id=‘hello2’, python_callable=_print_hello)
       task1 >> task2
   ```
   
   ---
   
   The definition below gives a graph and tree view that are consistent with each other, but not correct and matching with Appendix A:
   
   ```
   with TaskGroup(‘tg’) as taskgroup:
       task1 = PythonOperator(task_id=‘hello1’, python_callable=_print_hello)
       task2 = PythonOperator(task_id=‘hello2’, python_callable=_print_hello)
       start >> taskgroup >> end
       task1 >> task2
   ```
   ![7280204D-5796-4741-8F2E-CB0E06C91A28](https://user-images.githubusercontent.com/89415310/149844849-8b4a31cd-8f1e-41a3-af53-f5f4fb0bc437.png)
   ![9B51C6F3-F2F9-4149-98D1-44FAC8353506](https://user-images.githubusercontent.com/89415310/149844855-6fcdc149-8cf1-4dc7-8734-1b7c4ffc9d57.png)
   ---
   The definition below gives an inconsistent tree and graph view, as well as incorrect running order.  This is the example given by OP of this issue.
   
   ```
   with TaskGroup(‘tg’) as taskgroup:
       start >> taskgroup >> end
       task1 = PythonOperator(task_id=‘hello1’, python_callable=_print_hello)
       task2 = PythonOperator(task_id=‘hello2’, python_callable=_print_hello)
       task1 >> task2
   ```
   ![53DDE4F7-DA19-4FBB-A1D3-359F4F7763F4](https://user-images.githubusercontent.com/89415310/149844811-2635d8ae-c993-403d-a9c3-4e7c3e344fa4.png)
   ![C3AED63E-C82A-4D16-A82F-81E344566624](https://user-images.githubusercontent.com/89415310/149844820-b8a91a52-810a-4235-97b8-c17b4826d193.png)
   
   ---
   
   What we can see from the examples and the diagrams above is that there are a few events which depending on their order can affect the correctness of the dependencies in the DAG as well as the graph and tree view, which are sometimes inconsistent with each other.  The events that are significant in these definitions that I can see are:
   1. taskgroup variable defined
   2. internal tasks defined
   3. dependency set between `start >> taskgroup >> end`
   4. "internal" dependency set between `hello1 >> hello2`
   
   Before steps 2, 3, or 4 happens, we must ensure that step 1 has taken place.  This means that we are left with 3 steps that can have an interchangeable order and affect the graph view, tree view, and running order of the DAG.
   
   I believe that all of the definitions above should give the running order and graph/tree view specified in Appendix A.  This means that steps 2, 3, 4, from the above paragraph can be run in any order and the result will always be the same.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@airflow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org