You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@airflow.apache.org by Kevin Yang <yr...@gmail.com> on 2021/03/04 05:21:25 UTC

[DISCUSS] TaskGroup in Tree View

Hi team,

We are very glad to see the introduction of TaskGroup in Airflow 2.0 and
really like it. Thanks to Yu Qian and everyone that contributed to it. To
continue moving towards the goal of replacing SubDagOperator with
TaskGroup, I'd like to kick off a discussion on bringing TaskGroup into
Tree View.

*Why do we need TaskGroup in Tree View?*
For owners of larger DAGs, say a DAG with 500 tasks, Tree View is the
preferred view for its loading speed and simpler representation.
SubDagOperator is often used to provide an isolated view into a subset of
tasks in such large DAGs. To replace such SubDag use cases, TaskGroup will
need to support Tree View.

*What should TaskGroup look like in Tree View**?*
We didn't have a conclusion during the 1st iteration of TaskGroup. In
Airbnb, we use SubDag mostly for providing a zoom in view on a small set of
tasks and the SubDag zoom in feature worked well for us. We'd like to see
TaskGroup provide a zoom in option for both Graph View and Tree View but
also like to hear everyone's thoughts.

*What needs to be in TaskGroup and what doesn't?*
TaskGroup started off as a pure UI concept while SubDag is something more,
e.g. it has its own DagRun thus isolated scheduling decisions, it can serve
as a logical isolation layer that holds different sets of DAG level params,
etc. While we only use SubDag as a UI feature, I think it would be a good
opportunity for us to discuss what should be TaskGroup and what shouldn't.

Please don't hesitate to share your thoughts.


Cheers,
Kevin Y

Re: [DISCUSS] TaskGroup in Tree View

Posted by Kevin Yang <yr...@gmail.com>.
Hi Yu Qian,

Sorry I somehow missed this email 😢 It is great to hear that you are also
interested in adding TaskGroup to the TreeView, hope you still do 😀 I like
how the TaskGroup is collapsible in the prototype, though I wonder if we
can replicate the zoom in behavior of SubDag--that might be the most useful
feature in SubDag for the big DAG owners.


Cheers,
Kevin Y

On Mon, Apr 19, 2021 at 7:18 AM Yu Qian <yu...@gmail.com> wrote:

> Hi, all,
>
> I'm interested in contributing to adding TaskGroup to Tree View. Here's a
> prototype <https://yuqian90.github.io/task_group_tree/> of how it can
> look like. Suggestions are welcome.
>
> I understand AIP-38
> <https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-38+Modern+Web+Application> plans
> to use modern web technology such as React, etc to revamp the Airflow UI.
> I'm not sure where we are on that front. With AIP-38
> <https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-38+Modern+Web+Application> in
> mind, if we make improvements to pages such as Tree View, it makes a lot of
> sense to contribute it as a reusable TaskInstanceTree component so that it
> can be used in more than just the Tree View page itself. One other example
> where a TaskInstanceTree component can be useful is when displaying the
> confirmation page when a user clears/marks success/marks failed task
> instances. Right now the confirmation page just concatenates all the
> TaskInstance text representation. It's very difficult to read when a lot of
> task instances are cleared or marked. If we put the task instances in the
> confirmation page into a TaskInstanceTree component with TaskGroup support,
> it can be much easier to read. If you have ideas regarding how to
> contribute to Tree View so it's easy to incorporate into AIP-38, definitely
> let me know.
>
> One
>
> On Tue, Mar 9, 2021 at 1:19 AM Ash Berlin-Taylor <as...@apache.org> wrote:
>
>> I agree, but we should see what of those we can implement just on the
>> parsing side - i.e. can we continue to make the scheduler not have to care
>> about Task Groups?
>>
>> If so, then things like the default args example is a small enough change
>> that it doesn't need an AIP (IMO)
>>
>> -ash
>>
>> On 8 March 2021 17:12:07 GMT, Daniel Imberman <da...@gmail.com>
>> wrote:
>>>
>>> I personally think that TaskGroup should go beyond being “just” a UI
>>> concept. I think that there are a lot of use-cases where people might want
>>> to perform a single operation across an entire group of tasks. I think that
>>> Bin points out a few really good examples (default arguments and group
>>> delete are good examples). I also have a proposal coming out hopefully
>>> later this week that will offer some more functionality to TaskGroup
>>> objects as well.
>>>
>>> I don’t personally see the benefit of keeping them “UI only.” If we want
>>> to be able to group delete or add external sensors to a group of tasks we’d
>>> basically need to create another concept that centers around “a grouping of
>>> tasks” which I think might create confusion.
>>>
>>> On Mon, Mar 8, 2021 at 7:19 AM, Yu Qian <yu...@gmail.com> wrote:
>>>
>>> Hi, all, it's really exciting to see the great discussions about
>>> TaskGroup.
>>>
>>> There are some interesting ideas here.
>>> - Tree View support for TaskGroup: I think this can mostly be achieved
>>> at the web layer? Changes probably involve tree.html and www/view.py.
>>> Should we change Tree View to organize tasks based on the TaskGroup
>>> hierarchy (no need to duplicate tasks in Tree View)? Currently the Tree
>>> View is organized into a flattened graph hierarchy, which means the same
>>> task can appear multiple times in Tree View.
>>> - Clear an entire TaskGroup. We should be able to do this in graph.html
>>> and www/view.py too. E.g. the UI passes the group_id of the TaskGroup
>>> to the web server which then clears the list of tasks in the TaskGroup,
>>> which is already an iterable of its child tasks so this should be possible.
>>> In fact, I've heard from several users that they sometimes want to select
>>> multiple tasks on Graph View with the mouse and then clear all of them at
>>> once. This is actually a very similar problem as clearing a TaskGroup.
>>>
>>> Some other ideas such as default_args and ExternalTaskSensor support
>>> sound good too. We can probably continue the discussion on those individual
>>> issues/PRs.
>>>
>>> On Sun, Mar 7, 2021 at 3:55 AM Xinbin Huang <bi...@gmail.com>
>>> wrote:
>>>
>>>> Hi Kaxil,
>>>>
>>>> One use case I have is to reuse TaskGroup across different DAGs as a
>>>> predefined sub-workflow. For example, my team is currently building out a
>>>> data platform that will allow a certain level of self-serve ability. Users
>>>> of the platform (mostly analyst and scientist) should focus on business
>>>> logic - transformation part - while don't need to pay too much attention to
>>>> some standard operations (i.e. from S3 to Redshift staging table - validate
>>>> data - swap to production table), as these types of tasks are boring and
>>>> repetitive. Reuse these sub-workflows also enables us to load data to a
>>>> different destination/warehouse without users needing to change their code.
>>>> We can also have a notification sub-workflow that allows us to swap in and
>>>> out Slack/Pageduty/etc over time without impacting the user.
>>>>
>>>> Other use cases
>>>> - allow default_args at TaskGroup level as in this issue:
>>>> https://github.com/apache/airflow/issues/13911
>>>> - ExternalTaskSensor on TaskGroup as mentioned by Nathan:
>>>> https://github.com/apache/airflow/issues/14563
>>>> - delete an entire TaskGroup:
>>>> https://github.com/apache/airflow/issues/14529
>>>>
>>>> All these use cases go beyond the pure UI level and require operations
>>>> (viewing/triggering/deleting/waiting/etc) on *a group of tasks. *I
>>>> think we can easily implement/formalize this with the current API without
>>>> changing the backend too much (this PR
>>>> https://github.com/apache/airflow/pull/14640 shows a small example).
>>>>
>>>> What do other people think?
>>>>
>>>> Best
>>>> Bin
>>>>
>>>> On Sat, Mar 6, 2021 at 4:51 AM Kaxil Naik <ka...@gmail.com> wrote:
>>>>
>>>>> Hi all, interesting discussion. I would love to hear about some more
>>>>> use-cases where TaskGroup needs to be something more than the UI concept.
>>>>>
>>>>> All of Kevin's use-cases can be achieved while keeping it as a UI
>>>>> concept.Xinbin can you please expand a bit on your use case.
>>>>>
>>>>> Regards,
>>>>> Kaxil
>>>>>
>>>>> On Sat, Mar 6, 2021, 10:08 Xinbin Huang <bi...@gmail.com> wrote:
>>>>>
>>>>>> Hi Kevin, Vikram, and Nathan,
>>>>>>
>>>>>> I think we don't need to restrict too much on keeping TaskGroup only
>>>>>> as a UI concept. We are already using TaskGroup to author DAGs and create
>>>>>> dependencies, which already lies a bit outside the UI.
>>>>>> To fully replace SubDagOperator, I think it's necessary to expand
>>>>>> TaskGroup as a *container for tasks* than just UI concept.
>>>>>>
>>>>>> As for TaskGroupSensor specifically, I land with the same approach as
>>>>>> Kevin, and I have created a draft PR here:
>>>>>> https://github.com/apache/airflow/pull/14640
>>>>>>
>>>>>> Cheers
>>>>>> Bin
>>>>>>
>>>>>> On Fri, Mar 5, 2021 at 10:00 PM Kevin Yang <yr...@gmail.com> wrote:
>>>>>>
>>>>>>> Hi Vikram,
>>>>>>>
>>>>>>> Good point. What I had in mind was getting the TaskGroup definition
>>>>>>> in a sensor, e.g. extract the _task_group field from serialized DAG, and
>>>>>>> query the DB for the TI states within.
>>>>>>>
>>>>>>> You are right that it might not be clean nor does it keep TaskGroup
>>>>>>> as a UI concept.
>>>>>>>
>>>>>>>
>>>>>>> Cheers,
>>>>>>> Kevin Y
>>>>>>>
>>>>>>> On Fri, Mar 5, 2021 at 8:19 PM Vikram Koka
>>>>>>> <vi...@astronomer.io.invalid> wrote:
>>>>>>>
>>>>>>>> Kevin,
>>>>>>>>
>>>>>>>> I am not sure I understand your response to Nathan.
>>>>>>>>
>>>>>>>> I agree that it is also a valid use case, but I don't see how it
>>>>>>>> can be cleanly done while keeping TaskGroup only as a UI concept.
>>>>>>>> Would this require extending the TaskGroup concept to the backend?
>>>>>>>>
>>>>>>>> Best regards,
>>>>>>>> Vikram
>>>>>>>>
>>>>>>>> On Fri, Mar 5, 2021 at 1:31 AM Kevin Yang <yr...@gmail.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Hi Nathan,
>>>>>>>>>
>>>>>>>>> Thanks a lot for your input and it is indeed a valid use case.
>>>>>>>>> This can be done either keeping TaskGroup as a UI concept or bringing it
>>>>>>>>> into the backend. I'm curious to hear what others think.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Cheers,
>>>>>>>>> Kevin Y
>>>>>>>>>
>>>>>>>>> On Thu, Mar 4, 2021 at 12:57 AM Nathan Hadfield <
>>>>>>>>> Nathan.Hadfield@king.com> wrote:
>>>>>>>>>
>>>>>>>>>> Hi Kevin,
>>>>>>>>>>
>>>>>>>>>> A quick piece of input from our recent experiences of working
>>>>>>>>>> with TaskGroup is that we often have dependencies across DAGs that require
>>>>>>>>>> waiting upon the completion of all the tasks in a group. At the moment, you
>>>>>>>>>> basically have two options:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>    1. Create a sensor task in a DAG for every task in the group
>>>>>>>>>>    2. Create a Dummy task after the group that a sensor waits on
>>>>>>>>>>
>>>>>>>>>> So, I would certainly like TaskGroups to have some notion of run
>>>>>>>>>> status as to better enable downstream decision making.
>>>>>>>>>>
>>>>>>>>>> I’ve already created a feature ticket to try to add some kind of
>>>>>>>>>> TaskGroup Sensor but perhaps this can also form part of the wider
>>>>>>>>>> discussions here.
>>>>>>>>>>
>>>>>>>>>> https://github.com/apache/airflow/issues/14563
>>>>>>>>>>
>>>>>>>>>> Cheers,
>>>>>>>>>>
>>>>>>>>>> Nathan
>>>>>>>>>>
>>>>>>>>>> *From: *Kevin Yang <yr...@gmail.com>
>>>>>>>>>> *Date: *Thursday, 4 March 2021 at 05:21
>>>>>>>>>> *To: *dev@airflow.apache.org <de...@airflow.apache.org>
>>>>>>>>>> *Subject: *[DISCUSS] TaskGroup in Tree View
>>>>>>>>>>
>>>>>>>>>> Hi team,
>>>>>>>>>>
>>>>>>>>>> We are very glad to see the introduction of TaskGroup in Airflow
>>>>>>>>>> 2.0 and really like it. Thanks to Yu Qian and everyone that contributed to
>>>>>>>>>> it. To continue moving towards the goal of replacing SubDagOperator with
>>>>>>>>>> TaskGroup, I'd like to kick off a discussion on bringing TaskGroup into
>>>>>>>>>> Tree View.
>>>>>>>>>>
>>>>>>>>>> *Why do we need TaskGroup in Tree View?*
>>>>>>>>>>
>>>>>>>>>> For owners of larger DAGs, say a DAG with 500 tasks, Tree View is
>>>>>>>>>> the preferred view for its loading speed and simpler representation.
>>>>>>>>>> SubDagOperator is often used to provide an isolated view into a subset of
>>>>>>>>>> tasks in such large DAGs. To replace such SubDag use cases, TaskGroup will
>>>>>>>>>> need to support Tree View.
>>>>>>>>>>
>>>>>>>>>> *What should TaskGroup look like in Tree View?*
>>>>>>>>>>
>>>>>>>>>> We didn't have a conclusion during the 1st iteration of
>>>>>>>>>> TaskGroup. In Airbnb, we use SubDag mostly for providing a zoom in view on
>>>>>>>>>> a small set of tasks and the SubDag zoom in feature worked well for us.
>>>>>>>>>> We'd like to see TaskGroup provide a zoom in option for both Graph View and
>>>>>>>>>> Tree View but also like to hear everyone's thoughts.
>>>>>>>>>>
>>>>>>>>>> *What needs to be in TaskGroup and what doesn't?*
>>>>>>>>>>
>>>>>>>>>> TaskGroup started off as a pure UI concept while SubDag is
>>>>>>>>>> something more, e.g. it has its own DagRun thus isolated scheduling
>>>>>>>>>> decisions, it can serve as a logical isolation layer that holds different
>>>>>>>>>> sets of DAG level params, etc. While we only use SubDag as a UI feature, I
>>>>>>>>>> think it would be a good opportunity for us to discuss what should be
>>>>>>>>>> TaskGroup and what shouldn't.
>>>>>>>>>>
>>>>>>>>>> Please don't hesitate to share your thoughts.
>>>>>>>>>>
>>>>>>>>>> Cheers,
>>>>>>>>>>
>>>>>>>>>> Kevin Y
>>>>>>>>>>
>>>>>>>>>

Re: [DISCUSS] TaskGroup in Tree View

Posted by Yu Qian <yu...@gmail.com>.
Hi, all,

I'm interested in contributing to adding TaskGroup to Tree View. Here's a
prototype <https://yuqian90.github.io/task_group_tree/> of how it can look
like. Suggestions are welcome.

I understand AIP-38
<https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-38+Modern+Web+Application>
plans
to use modern web technology such as React, etc to revamp the Airflow UI.
I'm not sure where we are on that front. With AIP-38
<https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-38+Modern+Web+Application>
in
mind, if we make improvements to pages such as Tree View, it makes a lot of
sense to contribute it as a reusable TaskInstanceTree component so that it
can be used in more than just the Tree View page itself. One other example
where a TaskInstanceTree component can be useful is when displaying the
confirmation page when a user clears/marks success/marks failed task
instances. Right now the confirmation page just concatenates all the
TaskInstance text representation. It's very difficult to read when a lot of
task instances are cleared or marked. If we put the task instances in the
confirmation page into a TaskInstanceTree component with TaskGroup support,
it can be much easier to read. If you have ideas regarding how to
contribute to Tree View so it's easy to incorporate into AIP-38, definitely
let me know.

One

On Tue, Mar 9, 2021 at 1:19 AM Ash Berlin-Taylor <as...@apache.org> wrote:

> I agree, but we should see what of those we can implement just on the
> parsing side - i.e. can we continue to make the scheduler not have to care
> about Task Groups?
>
> If so, then things like the default args example is a small enough change
> that it doesn't need an AIP (IMO)
>
> -ash
>
> On 8 March 2021 17:12:07 GMT, Daniel Imberman <da...@gmail.com>
> wrote:
>>
>> I personally think that TaskGroup should go beyond being “just” a UI
>> concept. I think that there are a lot of use-cases where people might want
>> to perform a single operation across an entire group of tasks. I think that
>> Bin points out a few really good examples (default arguments and group
>> delete are good examples). I also have a proposal coming out hopefully
>> later this week that will offer some more functionality to TaskGroup
>> objects as well.
>>
>> I don’t personally see the benefit of keeping them “UI only.” If we want
>> to be able to group delete or add external sensors to a group of tasks we’d
>> basically need to create another concept that centers around “a grouping of
>> tasks” which I think might create confusion.
>>
>> On Mon, Mar 8, 2021 at 7:19 AM, Yu Qian <yu...@gmail.com> wrote:
>>
>> Hi, all, it's really exciting to see the great discussions about
>> TaskGroup.
>>
>> There are some interesting ideas here.
>> - Tree View support for TaskGroup: I think this can mostly be achieved at
>> the web layer? Changes probably involve tree.html and www/view.py.
>> Should we change Tree View to organize tasks based on the TaskGroup
>> hierarchy (no need to duplicate tasks in Tree View)? Currently the Tree
>> View is organized into a flattened graph hierarchy, which means the same
>> task can appear multiple times in Tree View.
>> - Clear an entire TaskGroup. We should be able to do this in graph.html
>> and www/view.py too. E.g. the UI passes the group_id of the TaskGroup to
>> the web server which then clears the list of tasks in the TaskGroup, which
>> is already an iterable of its child tasks so this should be possible. In
>> fact, I've heard from several users that they sometimes want to select
>> multiple tasks on Graph View with the mouse and then clear all of them at
>> once. This is actually a very similar problem as clearing a TaskGroup.
>>
>> Some other ideas such as default_args and ExternalTaskSensor support
>> sound good too. We can probably continue the discussion on those individual
>> issues/PRs.
>>
>> On Sun, Mar 7, 2021 at 3:55 AM Xinbin Huang <bi...@gmail.com>
>> wrote:
>>
>>> Hi Kaxil,
>>>
>>> One use case I have is to reuse TaskGroup across different DAGs as a
>>> predefined sub-workflow. For example, my team is currently building out a
>>> data platform that will allow a certain level of self-serve ability. Users
>>> of the platform (mostly analyst and scientist) should focus on business
>>> logic - transformation part - while don't need to pay too much attention to
>>> some standard operations (i.e. from S3 to Redshift staging table - validate
>>> data - swap to production table), as these types of tasks are boring and
>>> repetitive. Reuse these sub-workflows also enables us to load data to a
>>> different destination/warehouse without users needing to change their code.
>>> We can also have a notification sub-workflow that allows us to swap in and
>>> out Slack/Pageduty/etc over time without impacting the user.
>>>
>>> Other use cases
>>> - allow default_args at TaskGroup level as in this issue:
>>> https://github.com/apache/airflow/issues/13911
>>> - ExternalTaskSensor on TaskGroup as mentioned by Nathan:
>>> https://github.com/apache/airflow/issues/14563
>>> - delete an entire TaskGroup:
>>> https://github.com/apache/airflow/issues/14529
>>>
>>> All these use cases go beyond the pure UI level and require operations
>>> (viewing/triggering/deleting/waiting/etc) on *a group of tasks. *I
>>> think we can easily implement/formalize this with the current API without
>>> changing the backend too much (this PR
>>> https://github.com/apache/airflow/pull/14640 shows a small example).
>>>
>>> What do other people think?
>>>
>>> Best
>>> Bin
>>>
>>> On Sat, Mar 6, 2021 at 4:51 AM Kaxil Naik <ka...@gmail.com> wrote:
>>>
>>>> Hi all, interesting discussion. I would love to hear about some more
>>>> use-cases where TaskGroup needs to be something more than the UI concept.
>>>>
>>>> All of Kevin's use-cases can be achieved while keeping it as a UI
>>>> concept.Xinbin can you please expand a bit on your use case.
>>>>
>>>> Regards,
>>>> Kaxil
>>>>
>>>> On Sat, Mar 6, 2021, 10:08 Xinbin Huang <bi...@gmail.com> wrote:
>>>>
>>>>> Hi Kevin, Vikram, and Nathan,
>>>>>
>>>>> I think we don't need to restrict too much on keeping TaskGroup only
>>>>> as a UI concept. We are already using TaskGroup to author DAGs and create
>>>>> dependencies, which already lies a bit outside the UI.
>>>>> To fully replace SubDagOperator, I think it's necessary to expand
>>>>> TaskGroup as a *container for tasks* than just UI concept.
>>>>>
>>>>> As for TaskGroupSensor specifically, I land with the same approach as
>>>>> Kevin, and I have created a draft PR here:
>>>>> https://github.com/apache/airflow/pull/14640
>>>>>
>>>>> Cheers
>>>>> Bin
>>>>>
>>>>> On Fri, Mar 5, 2021 at 10:00 PM Kevin Yang <yr...@gmail.com> wrote:
>>>>>
>>>>>> Hi Vikram,
>>>>>>
>>>>>> Good point. What I had in mind was getting the TaskGroup definition
>>>>>> in a sensor, e.g. extract the _task_group field from serialized DAG, and
>>>>>> query the DB for the TI states within.
>>>>>>
>>>>>> You are right that it might not be clean nor does it keep TaskGroup
>>>>>> as a UI concept.
>>>>>>
>>>>>>
>>>>>> Cheers,
>>>>>> Kevin Y
>>>>>>
>>>>>> On Fri, Mar 5, 2021 at 8:19 PM Vikram Koka
>>>>>> <vi...@astronomer.io.invalid> wrote:
>>>>>>
>>>>>>> Kevin,
>>>>>>>
>>>>>>> I am not sure I understand your response to Nathan.
>>>>>>>
>>>>>>> I agree that it is also a valid use case, but I don't see how it can
>>>>>>> be cleanly done while keeping TaskGroup only as a UI concept.
>>>>>>> Would this require extending the TaskGroup concept to the backend?
>>>>>>>
>>>>>>> Best regards,
>>>>>>> Vikram
>>>>>>>
>>>>>>> On Fri, Mar 5, 2021 at 1:31 AM Kevin Yang <yr...@gmail.com> wrote:
>>>>>>>
>>>>>>>> Hi Nathan,
>>>>>>>>
>>>>>>>> Thanks a lot for your input and it is indeed a valid use case. This
>>>>>>>> can be done either keeping TaskGroup as a UI concept or bringing it into
>>>>>>>> the backend. I'm curious to hear what others think.
>>>>>>>>
>>>>>>>>
>>>>>>>> Cheers,
>>>>>>>> Kevin Y
>>>>>>>>
>>>>>>>> On Thu, Mar 4, 2021 at 12:57 AM Nathan Hadfield <
>>>>>>>> Nathan.Hadfield@king.com> wrote:
>>>>>>>>
>>>>>>>>> Hi Kevin,
>>>>>>>>>
>>>>>>>>> A quick piece of input from our recent experiences of working with
>>>>>>>>> TaskGroup is that we often have dependencies across DAGs that require
>>>>>>>>> waiting upon the completion of all the tasks in a group. At the moment, you
>>>>>>>>> basically have two options:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>    1. Create a sensor task in a DAG for every task in the group
>>>>>>>>>    2. Create a Dummy task after the group that a sensor waits on
>>>>>>>>>
>>>>>>>>> So, I would certainly like TaskGroups to have some notion of run
>>>>>>>>> status as to better enable downstream decision making.
>>>>>>>>>
>>>>>>>>> I’ve already created a feature ticket to try to add some kind of
>>>>>>>>> TaskGroup Sensor but perhaps this can also form part of the wider
>>>>>>>>> discussions here.
>>>>>>>>>
>>>>>>>>> https://github.com/apache/airflow/issues/14563
>>>>>>>>>
>>>>>>>>> Cheers,
>>>>>>>>>
>>>>>>>>> Nathan
>>>>>>>>>
>>>>>>>>> *From: *Kevin Yang <yr...@gmail.com>
>>>>>>>>> *Date: *Thursday, 4 March 2021 at 05:21
>>>>>>>>> *To: *dev@airflow.apache.org <de...@airflow.apache.org>
>>>>>>>>> *Subject: *[DISCUSS] TaskGroup in Tree View
>>>>>>>>>
>>>>>>>>> Hi team,
>>>>>>>>>
>>>>>>>>> We are very glad to see the introduction of TaskGroup in Airflow
>>>>>>>>> 2.0 and really like it. Thanks to Yu Qian and everyone that contributed to
>>>>>>>>> it. To continue moving towards the goal of replacing SubDagOperator with
>>>>>>>>> TaskGroup, I'd like to kick off a discussion on bringing TaskGroup into
>>>>>>>>> Tree View.
>>>>>>>>>
>>>>>>>>> *Why do we need TaskGroup in Tree View?*
>>>>>>>>>
>>>>>>>>> For owners of larger DAGs, say a DAG with 500 tasks, Tree View is
>>>>>>>>> the preferred view for its loading speed and simpler representation.
>>>>>>>>> SubDagOperator is often used to provide an isolated view into a subset of
>>>>>>>>> tasks in such large DAGs. To replace such SubDag use cases, TaskGroup will
>>>>>>>>> need to support Tree View.
>>>>>>>>>
>>>>>>>>> *What should TaskGroup look like in Tree View?*
>>>>>>>>>
>>>>>>>>> We didn't have a conclusion during the 1st iteration of TaskGroup.
>>>>>>>>> In Airbnb, we use SubDag mostly for providing a zoom in view on a small set
>>>>>>>>> of tasks and the SubDag zoom in feature worked well for us. We'd like to
>>>>>>>>> see TaskGroup provide a zoom in option for both Graph View and Tree View
>>>>>>>>> but also like to hear everyone's thoughts.
>>>>>>>>>
>>>>>>>>> *What needs to be in TaskGroup and what doesn't?*
>>>>>>>>>
>>>>>>>>> TaskGroup started off as a pure UI concept while SubDag is
>>>>>>>>> something more, e.g. it has its own DagRun thus isolated scheduling
>>>>>>>>> decisions, it can serve as a logical isolation layer that holds different
>>>>>>>>> sets of DAG level params, etc. While we only use SubDag as a UI feature, I
>>>>>>>>> think it would be a good opportunity for us to discuss what should be
>>>>>>>>> TaskGroup and what shouldn't.
>>>>>>>>>
>>>>>>>>> Please don't hesitate to share your thoughts.
>>>>>>>>>
>>>>>>>>> Cheers,
>>>>>>>>>
>>>>>>>>> Kevin Y
>>>>>>>>>
>>>>>>>>

Re: [DISCUSS] TaskGroup in Tree View

Posted by Ash Berlin-Taylor <as...@apache.org>.
I agree, but we should see what of those we can implement just on the parsing side - i.e. can we continue to make the scheduler not have to care about Task Groups?

If so, then things like the default args example is a small enough change that it doesn't need an AIP (IMO)

-ash

On 8 March 2021 17:12:07 GMT, Daniel Imberman <da...@gmail.com> wrote:
>I personally think that TaskGroup should go beyond being “just” a UI concept. I think that there are a lot of use-cases where people might want to perform a single operation across an entire group of tasks. I think that Bin points out a few really good examples (default arguments and group delete are good examples). I also have a proposal coming out hopefully later this week that will offer some more functionality to TaskGroup objects as well.
>I don’t personally see the benefit of keeping them “UI only.” If we want to be able to group delete or add external sensors to a group of tasks we’d basically need to create another concept that centers around “a grouping of tasks” which I think might create confusion.
>On Mon, Mar 8, 2021 at 7:19 AM, Yu Qian <yu...@gmail.com> wrote:
>Hi, all, it's really exciting to see the great discussions about TaskGroup.
>There are some interesting ideas here. - Tree View support for TaskGroup: I think this can mostly be achieved at the web layer? Changes probably involve tree.html and www/view.py. Should we change Tree View to organize tasks based on the TaskGroup hierarchy (no need to duplicate tasks in Tree View)? Currently the Tree View is organized into a flattened graph hierarchy, which means the same task can appear multiple times in Tree View. - Clear an entire TaskGroup. We should be able to do this in graph.html and www/view.py too. E.g. the UI passes the group_id of the TaskGroup to the web server which then clears the list of tasks in the TaskGroup, which is already an iterable of its child tasks so this should be possible. In fact, I've heard from several users that they sometimes want to select multiple tasks on Graph View with the mouse and then clear all of them at once. This is actually a very similar problem as clearing a TaskGroup.
>Some other ideas such as default_args and ExternalTaskSensor support sound good too. We can probably continue the discussion on those individual issues/PRs.
>On Sun, Mar 7, 2021 at 3:55 AM Xinbin Huang < bin.huangxb@gmail.com [bin.huangxb@gmail.com] > wrote:
>Hi Kaxil,
>One use case I have is to reuse TaskGroup across different DAGs as a predefined sub-workflow. For example, my team is currently building out a data platform that will allow a certain level of self-serve ability. Users of the platform (mostly analyst and scientist) should focus on business logic - transformation part - while don't need to pay too much attention to some standard operations (i.e. from S3 to Redshift staging table - validate data - swap to production table), as these types of tasks are boring and repetitive. Reuse these sub-workflows also enables us to load data to a different destination/warehouse without users needing to change their code. We can also have a notification sub-workflow that allows us to swap in and out Slack/Pageduty/etc over time without impacting the user.
>Other use cases - allow default_args at TaskGroup level as in this issue: https://github.com/apache/airflow/issues/13911 [https://github.com/apache/airflow/issues/13911] - ExternalTaskSensor on TaskGroup as mentioned by Nathan: https://github.com/apache/airflow/issues/14563 [https://github.com/apache/airflow/issues/14563] - delete an entire TaskGroup: https://github.com/apache/airflow/issues/14529 [https://github.com/apache/airflow/issues/14529]
>All these use cases go beyond the pure UI level and require operations (viewing/triggering/deleting/waiting/etc) on a group of tasks. I think we can easily implement/formalize this with the current API without changing the backend too much (this PR https://github.com/apache/airflow/pull/14640 [https://github.com/apache/airflow/pull/14640] shows a small example).
>What do other people think?
>Best Bin
>On Sat, Mar 6, 2021 at 4:51 AM Kaxil Naik < kaxilnaik@gmail.com [kaxilnaik@gmail.com] > wrote:
>Hi all, interesting discussion. I would love to hear about some more use-cases where TaskGroup needs to be something more than the UI concept.
>All of Kevin's use-cases can be achieved while keeping it as a UI concept.Xinbin can you please expand a bit on your use case.
>Regards, Kaxil
>On Sat, Mar 6, 2021, 10:08 Xinbin Huang < bin.huangxb@gmail.com [bin.huangxb@gmail.com] > wrote:
>Hi Kevin, Vikram, and Nathan,
>I think we don't need to restrict too much on keeping TaskGroup only as a UI concept. We are already using TaskGroup to author DAGs and create dependencies, which already lies a bit outside the UI. To fully replace SubDagOperator, I think it's necessary to expand TaskGroup as a container for tasks than just UI concept.
>As for TaskGroupSensor specifically, I land with the same approach as Kevin, and I have created a draft PR here: https://github.com/apache/airflow/pull/14640 [https://github.com/apache/airflow/pull/14640]
>Cheers Bin
>On Fri, Mar 5, 2021 at 10:00 PM Kevin Yang < yrqls21@gmail.com [yrqls21@gmail.com] > wrote:
>Hi Vikram,
>Good point. What I had in mind was getting the TaskGroup definition in a sensor, e.g. extract the _task_group field from serialized DAG, and query the DB for the TI states within.
>You are right that it might not be clean nor does it keep TaskGroup as a UI concept.
>
>Cheers, Kevin Y
>On Fri, Mar 5, 2021 at 8:19 PM Vikram Koka <vi...@astronomer.io.invalid> wrote:
>Kevin,
>I am not sure I understand your response to Nathan.
>I agree that it is also a valid use case, but I don't see how it can be cleanly done while keeping TaskGroup only as a UI concept. Would this require extending the TaskGroup concept to the backend?
>Best regards, Vikram
>On Fri, Mar 5, 2021 at 1:31 AM Kevin Yang < yrqls21@gmail.com [yrqls21@gmail.com] > wrote:
>Hi Nathan,
>Thanks a lot for your input and it is indeed a valid use case. This can be done either keeping TaskGroup as a UI concept or bringing it into the backend. I'm curious to hear what others think.
>
>Cheers, Kevin Y
>On Thu, Mar 4, 2021 at 12:57 AM Nathan Hadfield < Nathan.Hadfield@king.com [Nathan.Hadfield@king.com] > wrote:
>Hi Kevin,
>
>
>
>A quick piece of input from our recent experiences of working with TaskGroup is that we often have dependencies across DAGs that require waiting upon the completion of all the tasks in a group. At the moment, you basically have two options:
>
>
>
> 1. Create a sensor task in a DAG for every task in the group
> 2. Create a Dummy task after the group that a sensor waits on
>
>
>
>So, I would certainly like TaskGroups to have some notion of run status as to better enable downstream decision making.
>
>
>
>I’ve already created a feature ticket to try to add some kind of TaskGroup Sensor but perhaps this can also form part of the wider discussions here.
>
>
>
>https://github.com/apache/airflow/issues/14563 [https://github.com/apache/airflow/issues/14563]
>
>
>
>Cheers,
>
>
>
>Nathan
>
>
>
>From: Kevin Yang < yrqls21@gmail.com [yrqls21@gmail.com] >
>Date: Thursday, 4 March 2021 at 05:21
>To: dev@airflow.apache.org [dev@airflow.apache.org] < dev@airflow.apache.org [dev@airflow.apache.org] >
>Subject: [DISCUSS] TaskGroup in Tree View
>
>Hi team,
>
>
>
>We are very glad to see the introduction of TaskGroup in Airflow 2.0 and really like it. Thanks to Yu Qian and everyone that contributed to it. To continue moving towards the goal of replacing SubDagOperator with TaskGroup, I'd like to kick off a discussion on bringing TaskGroup into Tree View.
>
>
>
>Why do we need TaskGroup in Tree View?
>
>For owners of larger DAGs, say a DAG with 500 tasks, Tree View is the preferred view for its loading speed and simpler representation. SubDagOperator is often used to provide an isolated view into a subset of tasks in such large DAGs. To replace such SubDag use cases, TaskGroup will need to support Tree View.
>
>
>
>What should TaskGroup look like in Tree View?
>
>We didn't have a conclusion during the 1st iteration of TaskGroup. In Airbnb, we use SubDag mostly for providing a zoom in view on a small set of tasks and the SubDag zoom in feature worked well for us. We'd like to see TaskGroup provide a zoom in option for both Graph View and Tree View but also like to hear everyone's thoughts.
>
>
>
>What needs to be in TaskGroup and what doesn't?
>
>TaskGroup started off as a pure UI concept while SubDag is something more, e.g. it has its own DagRun thus isolated scheduling decisions, it can serve as a logical isolation layer that holds different sets of DAG level params, etc. While we only use SubDag as a UI feature, I think it would be a good opportunity for us to discuss what should be TaskGroup and what shouldn't.
>
>
>
>Please don't hesitate to share your thoughts.
>
>
>
>
>
>Cheers,
>
>Kevin Y

Re: [DISCUSS] TaskGroup in Tree View

Posted by Daniel Imberman <da...@gmail.com>.
I personally think that TaskGroup should go beyond being “just” a UI concept. I think that there are a lot of use-cases where people might want to perform a single operation across an entire group of tasks. I think that Bin points out a few really good examples (default arguments and group delete are good examples). I also have a proposal coming out hopefully later this week that will offer some more functionality to TaskGroup objects as well.
I don’t personally see the benefit of keeping them “UI only.” If we want to be able to group delete or add external sensors to a group of tasks we’d basically need to create another concept that centers around “a grouping of tasks” which I think might create confusion.
On Mon, Mar 8, 2021 at 7:19 AM, Yu Qian <yu...@gmail.com> wrote:
Hi, all, it's really exciting to see the great discussions about TaskGroup.
There are some interesting ideas here. - Tree View support for TaskGroup: I think this can mostly be achieved at the web layer? Changes probably involve tree.html and www/view.py. Should we change Tree View to organize tasks based on the TaskGroup hierarchy (no need to duplicate tasks in Tree View)? Currently the Tree View is organized into a flattened graph hierarchy, which means the same task can appear multiple times in Tree View. - Clear an entire TaskGroup. We should be able to do this in graph.html and www/view.py too. E.g. the UI passes the group_id of the TaskGroup to the web server which then clears the list of tasks in the TaskGroup, which is already an iterable of its child tasks so this should be possible. In fact, I've heard from several users that they sometimes want to select multiple tasks on Graph View with the mouse and then clear all of them at once. This is actually a very similar problem as clearing a TaskGroup.
Some other ideas such as default_args and ExternalTaskSensor support sound good too. We can probably continue the discussion on those individual issues/PRs.
On Sun, Mar 7, 2021 at 3:55 AM Xinbin Huang < bin.huangxb@gmail.com [bin.huangxb@gmail.com] > wrote:
Hi Kaxil,
One use case I have is to reuse TaskGroup across different DAGs as a predefined sub-workflow. For example, my team is currently building out a data platform that will allow a certain level of self-serve ability. Users of the platform (mostly analyst and scientist) should focus on business logic - transformation part - while don't need to pay too much attention to some standard operations (i.e. from S3 to Redshift staging table - validate data - swap to production table), as these types of tasks are boring and repetitive. Reuse these sub-workflows also enables us to load data to a different destination/warehouse without users needing to change their code. We can also have a notification sub-workflow that allows us to swap in and out Slack/Pageduty/etc over time without impacting the user.
Other use cases - allow default_args at TaskGroup level as in this issue: https://github.com/apache/airflow/issues/13911 [https://github.com/apache/airflow/issues/13911] - ExternalTaskSensor on TaskGroup as mentioned by Nathan: https://github.com/apache/airflow/issues/14563 [https://github.com/apache/airflow/issues/14563] - delete an entire TaskGroup: https://github.com/apache/airflow/issues/14529 [https://github.com/apache/airflow/issues/14529]
All these use cases go beyond the pure UI level and require operations (viewing/triggering/deleting/waiting/etc) on a group of tasks. I think we can easily implement/formalize this with the current API without changing the backend too much (this PR https://github.com/apache/airflow/pull/14640 [https://github.com/apache/airflow/pull/14640] shows a small example).
What do other people think?
Best Bin
On Sat, Mar 6, 2021 at 4:51 AM Kaxil Naik < kaxilnaik@gmail.com [kaxilnaik@gmail.com] > wrote:
Hi all, interesting discussion. I would love to hear about some more use-cases where TaskGroup needs to be something more than the UI concept.
All of Kevin's use-cases can be achieved while keeping it as a UI concept.Xinbin can you please expand a bit on your use case.
Regards, Kaxil
On Sat, Mar 6, 2021, 10:08 Xinbin Huang < bin.huangxb@gmail.com [bin.huangxb@gmail.com] > wrote:
Hi Kevin, Vikram, and Nathan,
I think we don't need to restrict too much on keeping TaskGroup only as a UI concept. We are already using TaskGroup to author DAGs and create dependencies, which already lies a bit outside the UI. To fully replace SubDagOperator, I think it's necessary to expand TaskGroup as a container for tasks than just UI concept.
As for TaskGroupSensor specifically, I land with the same approach as Kevin, and I have created a draft PR here: https://github.com/apache/airflow/pull/14640 [https://github.com/apache/airflow/pull/14640]
Cheers Bin
On Fri, Mar 5, 2021 at 10:00 PM Kevin Yang < yrqls21@gmail.com [yrqls21@gmail.com] > wrote:
Hi Vikram,
Good point. What I had in mind was getting the TaskGroup definition in a sensor, e.g. extract the _task_group field from serialized DAG, and query the DB for the TI states within.
You are right that it might not be clean nor does it keep TaskGroup as a UI concept.

Cheers, Kevin Y
On Fri, Mar 5, 2021 at 8:19 PM Vikram Koka <vi...@astronomer.io.invalid> wrote:
Kevin,
I am not sure I understand your response to Nathan.
I agree that it is also a valid use case, but I don't see how it can be cleanly done while keeping TaskGroup only as a UI concept. Would this require extending the TaskGroup concept to the backend?
Best regards, Vikram
On Fri, Mar 5, 2021 at 1:31 AM Kevin Yang < yrqls21@gmail.com [yrqls21@gmail.com] > wrote:
Hi Nathan,
Thanks a lot for your input and it is indeed a valid use case. This can be done either keeping TaskGroup as a UI concept or bringing it into the backend. I'm curious to hear what others think.

Cheers, Kevin Y
On Thu, Mar 4, 2021 at 12:57 AM Nathan Hadfield < Nathan.Hadfield@king.com [Nathan.Hadfield@king.com] > wrote:
Hi Kevin,



A quick piece of input from our recent experiences of working with TaskGroup is that we often have dependencies across DAGs that require waiting upon the completion of all the tasks in a group. At the moment, you basically have two options:



 1. Create a sensor task in a DAG for every task in the group
 2. Create a Dummy task after the group that a sensor waits on



So, I would certainly like TaskGroups to have some notion of run status as to better enable downstream decision making.



I’ve already created a feature ticket to try to add some kind of TaskGroup Sensor but perhaps this can also form part of the wider discussions here.



https://github.com/apache/airflow/issues/14563 [https://github.com/apache/airflow/issues/14563]



Cheers,



Nathan



From: Kevin Yang < yrqls21@gmail.com [yrqls21@gmail.com] >
Date: Thursday, 4 March 2021 at 05:21
To: dev@airflow.apache.org [dev@airflow.apache.org] < dev@airflow.apache.org [dev@airflow.apache.org] >
Subject: [DISCUSS] TaskGroup in Tree View

Hi team,



We are very glad to see the introduction of TaskGroup in Airflow 2.0 and really like it. Thanks to Yu Qian and everyone that contributed to it. To continue moving towards the goal of replacing SubDagOperator with TaskGroup, I'd like to kick off a discussion on bringing TaskGroup into Tree View.



Why do we need TaskGroup in Tree View?

For owners of larger DAGs, say a DAG with 500 tasks, Tree View is the preferred view for its loading speed and simpler representation. SubDagOperator is often used to provide an isolated view into a subset of tasks in such large DAGs. To replace such SubDag use cases, TaskGroup will need to support Tree View.



What should TaskGroup look like in Tree View?

We didn't have a conclusion during the 1st iteration of TaskGroup. In Airbnb, we use SubDag mostly for providing a zoom in view on a small set of tasks and the SubDag zoom in feature worked well for us. We'd like to see TaskGroup provide a zoom in option for both Graph View and Tree View but also like to hear everyone's thoughts.



What needs to be in TaskGroup and what doesn't?

TaskGroup started off as a pure UI concept while SubDag is something more, e.g. it has its own DagRun thus isolated scheduling decisions, it can serve as a logical isolation layer that holds different sets of DAG level params, etc. While we only use SubDag as a UI feature, I think it would be a good opportunity for us to discuss what should be TaskGroup and what shouldn't.



Please don't hesitate to share your thoughts.





Cheers,

Kevin Y

Re: [DISCUSS] TaskGroup in Tree View

Posted by Yu Qian <yu...@gmail.com>.
Hi, all, it's really exciting to see the great discussions about TaskGroup.

There are some interesting ideas here.
- Tree View support for TaskGroup: I think this can mostly be achieved at
the web layer? Changes probably involve tree.html and www/view.py. Should
we change Tree View to organize tasks based on the TaskGroup hierarchy (no
need to duplicate tasks in Tree View)? Currently the Tree View is organized
into a flattened graph hierarchy, which means the same task can
appear multiple times in Tree View.
- Clear an entire TaskGroup. We should be able to do this in graph.html and
www/view.py too. E.g. the UI passes the group_id of the TaskGroup to the
web server which then clears the list of tasks in the TaskGroup, which is
already an iterable of its child tasks so this should be possible. In fact,
I've heard from several users that they sometimes want to select multiple
tasks on Graph View with the mouse and then clear all of them at once. This
is actually a very similar problem as clearing a TaskGroup.

Some other ideas such as default_args and ExternalTaskSensor support
sound good too. We can probably continue the discussion on those individual
issues/PRs.

On Sun, Mar 7, 2021 at 3:55 AM Xinbin Huang <bi...@gmail.com> wrote:

> Hi Kaxil,
>
> One use case I have is to reuse TaskGroup across different DAGs as a
> predefined sub-workflow. For example, my team is currently building out a
> data platform that will allow a certain level of self-serve ability. Users
> of the platform (mostly analyst and scientist) should focus on business
> logic - transformation part - while don't need to pay too much attention to
> some standard operations (i.e. from S3 to Redshift staging table - validate
> data - swap to production table), as these types of tasks are boring and
> repetitive. Reuse these sub-workflows also enables us to load data to a
> different destination/warehouse without users needing to change their code.
> We can also have a notification sub-workflow that allows us to swap in and
> out Slack/Pageduty/etc over time without impacting the user.
>
> Other use cases
> - allow default_args at TaskGroup level as in this issue:
> https://github.com/apache/airflow/issues/13911
> - ExternalTaskSensor on TaskGroup as mentioned by Nathan:
> https://github.com/apache/airflow/issues/14563
> - delete an entire TaskGroup:
> https://github.com/apache/airflow/issues/14529
>
> All these use cases go beyond the pure UI level and require operations
> (viewing/triggering/deleting/waiting/etc) on *a group of tasks. *I think
> we can easily implement/formalize this with the current API without
> changing the backend too much (this PR
> https://github.com/apache/airflow/pull/14640 shows a small example).
>
> What do other people think?
>
> Best
> Bin
>
> On Sat, Mar 6, 2021 at 4:51 AM Kaxil Naik <ka...@gmail.com> wrote:
>
>> Hi all, interesting discussion. I would love to hear about some more
>> use-cases where TaskGroup needs to be something more than the UI concept.
>>
>> All of Kevin's use-cases can be achieved while keeping it as a UI
>> concept.Xinbin can you please expand a bit on your use case.
>>
>> Regards,
>> Kaxil
>>
>> On Sat, Mar 6, 2021, 10:08 Xinbin Huang <bi...@gmail.com> wrote:
>>
>>> Hi Kevin, Vikram, and Nathan,
>>>
>>> I think we don't need to restrict too much on keeping TaskGroup only as
>>> a UI concept. We are already using TaskGroup to author DAGs and create
>>> dependencies, which already lies a bit outside the UI.
>>> To fully replace SubDagOperator, I think it's necessary to expand
>>> TaskGroup as a *container for tasks* than just UI concept.
>>>
>>> As for TaskGroupSensor specifically, I land with the same approach as
>>> Kevin, and I have created a draft PR here:
>>> https://github.com/apache/airflow/pull/14640
>>>
>>> Cheers
>>> Bin
>>>
>>> On Fri, Mar 5, 2021 at 10:00 PM Kevin Yang <yr...@gmail.com> wrote:
>>>
>>>> Hi Vikram,
>>>>
>>>> Good point. What I had in mind was getting the TaskGroup definition in
>>>> a sensor, e.g. extract the _task_group field from serialized DAG, and query
>>>> the DB for the TI states within.
>>>>
>>>> You are right that it might not be clean nor does it keep TaskGroup as
>>>> a UI concept.
>>>>
>>>>
>>>> Cheers,
>>>> Kevin Y
>>>>
>>>> On Fri, Mar 5, 2021 at 8:19 PM Vikram Koka <vi...@astronomer.io.invalid>
>>>> wrote:
>>>>
>>>>> Kevin,
>>>>>
>>>>> I am not sure I understand your response to Nathan.
>>>>>
>>>>> I agree that it is also a valid use case, but I don't see how it can
>>>>> be cleanly done while keeping TaskGroup only as a UI concept.
>>>>> Would this require extending the TaskGroup concept to the backend?
>>>>>
>>>>> Best regards,
>>>>> Vikram
>>>>>
>>>>> On Fri, Mar 5, 2021 at 1:31 AM Kevin Yang <yr...@gmail.com> wrote:
>>>>>
>>>>>> Hi Nathan,
>>>>>>
>>>>>> Thanks a lot for your input and it is indeed a valid use case. This
>>>>>> can be done either keeping TaskGroup as a UI concept or bringing it into
>>>>>> the backend. I'm curious to hear what others think.
>>>>>>
>>>>>>
>>>>>> Cheers,
>>>>>> Kevin Y
>>>>>>
>>>>>> On Thu, Mar 4, 2021 at 12:57 AM Nathan Hadfield <
>>>>>> Nathan.Hadfield@king.com> wrote:
>>>>>>
>>>>>>> Hi Kevin,
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> A quick piece of input from our recent experiences of working with
>>>>>>> TaskGroup is that we often have dependencies across DAGs that require
>>>>>>> waiting upon the completion of all the tasks in a group.  At the moment,
>>>>>>> you basically have two options:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>    1. Create a sensor task in a DAG for every task in the group
>>>>>>>    2. Create a Dummy task after the group that a sensor waits on
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> So, I would certainly like TaskGroups to have some notion of run
>>>>>>> status as to better enable downstream decision making.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> I’ve already created a feature ticket to try to add some kind of
>>>>>>> TaskGroup Sensor but perhaps this can also form part of the wider
>>>>>>> discussions here.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> https://github.com/apache/airflow/issues/14563
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Cheers,
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Nathan
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> *From: *Kevin Yang <yr...@gmail.com>
>>>>>>> *Date: *Thursday, 4 March 2021 at 05:21
>>>>>>> *To: *dev@airflow.apache.org <de...@airflow.apache.org>
>>>>>>> *Subject: *[DISCUSS] TaskGroup in Tree View
>>>>>>>
>>>>>>> Hi team,
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> We are very glad to see the introduction of TaskGroup in Airflow 2.0
>>>>>>> and really like it. Thanks to Yu Qian and everyone that contributed to it.
>>>>>>> To continue moving towards the goal of replacing SubDagOperator with
>>>>>>> TaskGroup, I'd like to kick off a discussion on bringing TaskGroup into
>>>>>>> Tree View.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> *Why do we need TaskGroup in Tree View?*
>>>>>>>
>>>>>>> For owners of larger DAGs, say a DAG with 500 tasks, Tree View is
>>>>>>> the preferred view for its loading speed and simpler representation.
>>>>>>> SubDagOperator is often used to provide an isolated view into a subset of
>>>>>>> tasks in such large DAGs. To replace such SubDag use cases, TaskGroup will
>>>>>>> need to support Tree View.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> *What should TaskGroup look like in Tree View?*
>>>>>>>
>>>>>>> We didn't have a conclusion during the 1st iteration of TaskGroup.
>>>>>>> In Airbnb, we use SubDag mostly for providing a zoom in view on a small set
>>>>>>> of tasks and the SubDag zoom in feature worked well for us. We'd like to
>>>>>>> see TaskGroup provide a zoom in option for both Graph View and Tree View
>>>>>>> but also like to hear everyone's thoughts.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> *What needs to be in TaskGroup and what doesn't?*
>>>>>>>
>>>>>>> TaskGroup started off as a pure UI concept while SubDag is something
>>>>>>> more, e.g. it has its own DagRun thus isolated scheduling decisions, it can
>>>>>>> serve as a logical isolation layer that holds different sets of DAG level
>>>>>>> params, etc. While we only use SubDag as a UI feature, I think it would be
>>>>>>> a good opportunity for us to discuss what should be TaskGroup and what
>>>>>>> shouldn't.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Please don't hesitate to share your thoughts.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Cheers,
>>>>>>>
>>>>>>> Kevin Y
>>>>>>>
>>>>>>

Re: [DISCUSS] TaskGroup in Tree View

Posted by Xinbin Huang <bi...@gmail.com>.
Hi Kaxil,

One use case I have is to reuse TaskGroup across different DAGs as a
predefined sub-workflow. For example, my team is currently building out a
data platform that will allow a certain level of self-serve ability. Users
of the platform (mostly analyst and scientist) should focus on business
logic - transformation part - while don't need to pay too much attention to
some standard operations (i.e. from S3 to Redshift staging table - validate
data - swap to production table), as these types of tasks are boring and
repetitive. Reuse these sub-workflows also enables us to load data to a
different destination/warehouse without users needing to change their code.
We can also have a notification sub-workflow that allows us to swap in and
out Slack/Pageduty/etc over time without impacting the user.

Other use cases
- allow default_args at TaskGroup level as in this issue:
https://github.com/apache/airflow/issues/13911
- ExternalTaskSensor on TaskGroup as mentioned by Nathan:
https://github.com/apache/airflow/issues/14563
- delete an entire TaskGroup: https://github.com/apache/airflow/issues/14529

All these use cases go beyond the pure UI level and require operations
(viewing/triggering/deleting/waiting/etc) on *a group of tasks. *I think we
can easily implement/formalize this with the current API without changing
the backend too much (this PR
https://github.com/apache/airflow/pull/14640 shows
a small example).

What do other people think?

Best
Bin

On Sat, Mar 6, 2021 at 4:51 AM Kaxil Naik <ka...@gmail.com> wrote:

> Hi all, interesting discussion. I would love to hear about some more
> use-cases where TaskGroup needs to be something more than the UI concept.
>
> All of Kevin's use-cases can be achieved while keeping it as a UI
> concept.Xinbin can you please expand a bit on your use case.
>
> Regards,
> Kaxil
>
> On Sat, Mar 6, 2021, 10:08 Xinbin Huang <bi...@gmail.com> wrote:
>
>> Hi Kevin, Vikram, and Nathan,
>>
>> I think we don't need to restrict too much on keeping TaskGroup only as a
>> UI concept. We are already using TaskGroup to author DAGs and create
>> dependencies, which already lies a bit outside the UI.
>> To fully replace SubDagOperator, I think it's necessary to expand
>> TaskGroup as a *container for tasks* than just UI concept.
>>
>> As for TaskGroupSensor specifically, I land with the same approach as
>> Kevin, and I have created a draft PR here:
>> https://github.com/apache/airflow/pull/14640
>>
>> Cheers
>> Bin
>>
>> On Fri, Mar 5, 2021 at 10:00 PM Kevin Yang <yr...@gmail.com> wrote:
>>
>>> Hi Vikram,
>>>
>>> Good point. What I had in mind was getting the TaskGroup definition in a
>>> sensor, e.g. extract the _task_group field from serialized DAG, and query
>>> the DB for the TI states within.
>>>
>>> You are right that it might not be clean nor does it keep TaskGroup as a
>>> UI concept.
>>>
>>>
>>> Cheers,
>>> Kevin Y
>>>
>>> On Fri, Mar 5, 2021 at 8:19 PM Vikram Koka <vi...@astronomer.io.invalid>
>>> wrote:
>>>
>>>> Kevin,
>>>>
>>>> I am not sure I understand your response to Nathan.
>>>>
>>>> I agree that it is also a valid use case, but I don't see how it can be
>>>> cleanly done while keeping TaskGroup only as a UI concept.
>>>> Would this require extending the TaskGroup concept to the backend?
>>>>
>>>> Best regards,
>>>> Vikram
>>>>
>>>> On Fri, Mar 5, 2021 at 1:31 AM Kevin Yang <yr...@gmail.com> wrote:
>>>>
>>>>> Hi Nathan,
>>>>>
>>>>> Thanks a lot for your input and it is indeed a valid use case. This
>>>>> can be done either keeping TaskGroup as a UI concept or bringing it into
>>>>> the backend. I'm curious to hear what others think.
>>>>>
>>>>>
>>>>> Cheers,
>>>>> Kevin Y
>>>>>
>>>>> On Thu, Mar 4, 2021 at 12:57 AM Nathan Hadfield <
>>>>> Nathan.Hadfield@king.com> wrote:
>>>>>
>>>>>> Hi Kevin,
>>>>>>
>>>>>>
>>>>>>
>>>>>> A quick piece of input from our recent experiences of working with
>>>>>> TaskGroup is that we often have dependencies across DAGs that require
>>>>>> waiting upon the completion of all the tasks in a group.  At the moment,
>>>>>> you basically have two options:
>>>>>>
>>>>>>
>>>>>>
>>>>>>    1. Create a sensor task in a DAG for every task in the group
>>>>>>    2. Create a Dummy task after the group that a sensor waits on
>>>>>>
>>>>>>
>>>>>>
>>>>>> So, I would certainly like TaskGroups to have some notion of run
>>>>>> status as to better enable downstream decision making.
>>>>>>
>>>>>>
>>>>>>
>>>>>> I’ve already created a feature ticket to try to add some kind of
>>>>>> TaskGroup Sensor but perhaps this can also form part of the wider
>>>>>> discussions here.
>>>>>>
>>>>>>
>>>>>>
>>>>>> https://github.com/apache/airflow/issues/14563
>>>>>>
>>>>>>
>>>>>>
>>>>>> Cheers,
>>>>>>
>>>>>>
>>>>>>
>>>>>> Nathan
>>>>>>
>>>>>>
>>>>>>
>>>>>> *From: *Kevin Yang <yr...@gmail.com>
>>>>>> *Date: *Thursday, 4 March 2021 at 05:21
>>>>>> *To: *dev@airflow.apache.org <de...@airflow.apache.org>
>>>>>> *Subject: *[DISCUSS] TaskGroup in Tree View
>>>>>>
>>>>>> Hi team,
>>>>>>
>>>>>>
>>>>>>
>>>>>> We are very glad to see the introduction of TaskGroup in Airflow 2.0
>>>>>> and really like it. Thanks to Yu Qian and everyone that contributed to it.
>>>>>> To continue moving towards the goal of replacing SubDagOperator with
>>>>>> TaskGroup, I'd like to kick off a discussion on bringing TaskGroup into
>>>>>> Tree View.
>>>>>>
>>>>>>
>>>>>>
>>>>>> *Why do we need TaskGroup in Tree View?*
>>>>>>
>>>>>> For owners of larger DAGs, say a DAG with 500 tasks, Tree View is the
>>>>>> preferred view for its loading speed and simpler representation.
>>>>>> SubDagOperator is often used to provide an isolated view into a subset of
>>>>>> tasks in such large DAGs. To replace such SubDag use cases, TaskGroup will
>>>>>> need to support Tree View.
>>>>>>
>>>>>>
>>>>>>
>>>>>> *What should TaskGroup look like in Tree View?*
>>>>>>
>>>>>> We didn't have a conclusion during the 1st iteration of TaskGroup. In
>>>>>> Airbnb, we use SubDag mostly for providing a zoom in view on a small set of
>>>>>> tasks and the SubDag zoom in feature worked well for us. We'd like to see
>>>>>> TaskGroup provide a zoom in option for both Graph View and Tree View but
>>>>>> also like to hear everyone's thoughts.
>>>>>>
>>>>>>
>>>>>>
>>>>>> *What needs to be in TaskGroup and what doesn't?*
>>>>>>
>>>>>> TaskGroup started off as a pure UI concept while SubDag is something
>>>>>> more, e.g. it has its own DagRun thus isolated scheduling decisions, it can
>>>>>> serve as a logical isolation layer that holds different sets of DAG level
>>>>>> params, etc. While we only use SubDag as a UI feature, I think it would be
>>>>>> a good opportunity for us to discuss what should be TaskGroup and what
>>>>>> shouldn't.
>>>>>>
>>>>>>
>>>>>>
>>>>>> Please don't hesitate to share your thoughts.
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> Cheers,
>>>>>>
>>>>>> Kevin Y
>>>>>>
>>>>>

Re: [DISCUSS] TaskGroup in Tree View

Posted by Kaxil Naik <ka...@gmail.com>.
Hi all, interesting discussion. I would love to hear about some more
use-cases where TaskGroup needs to be something more than the UI concept.

All of Kevin's use-cases can be achieved while keeping it as a UI
concept.Xinbin can you please expand a bit on your use case.

Regards,
Kaxil

On Sat, Mar 6, 2021, 10:08 Xinbin Huang <bi...@gmail.com> wrote:

> Hi Kevin, Vikram, and Nathan,
>
> I think we don't need to restrict too much on keeping TaskGroup only as a
> UI concept. We are already using TaskGroup to author DAGs and create
> dependencies, which already lies a bit outside the UI.
> To fully replace SubDagOperator, I think it's necessary to expand
> TaskGroup as a *container for tasks* than just UI concept.
>
> As for TaskGroupSensor specifically, I land with the same approach as
> Kevin, and I have created a draft PR here:
> https://github.com/apache/airflow/pull/14640
>
> Cheers
> Bin
>
> On Fri, Mar 5, 2021 at 10:00 PM Kevin Yang <yr...@gmail.com> wrote:
>
>> Hi Vikram,
>>
>> Good point. What I had in mind was getting the TaskGroup definition in a
>> sensor, e.g. extract the _task_group field from serialized DAG, and query
>> the DB for the TI states within.
>>
>> You are right that it might not be clean nor does it keep TaskGroup as a
>> UI concept.
>>
>>
>> Cheers,
>> Kevin Y
>>
>> On Fri, Mar 5, 2021 at 8:19 PM Vikram Koka <vi...@astronomer.io.invalid>
>> wrote:
>>
>>> Kevin,
>>>
>>> I am not sure I understand your response to Nathan.
>>>
>>> I agree that it is also a valid use case, but I don't see how it can be
>>> cleanly done while keeping TaskGroup only as a UI concept.
>>> Would this require extending the TaskGroup concept to the backend?
>>>
>>> Best regards,
>>> Vikram
>>>
>>> On Fri, Mar 5, 2021 at 1:31 AM Kevin Yang <yr...@gmail.com> wrote:
>>>
>>>> Hi Nathan,
>>>>
>>>> Thanks a lot for your input and it is indeed a valid use case. This can
>>>> be done either keeping TaskGroup as a UI concept or bringing it into the
>>>> backend. I'm curious to hear what others think.
>>>>
>>>>
>>>> Cheers,
>>>> Kevin Y
>>>>
>>>> On Thu, Mar 4, 2021 at 12:57 AM Nathan Hadfield <
>>>> Nathan.Hadfield@king.com> wrote:
>>>>
>>>>> Hi Kevin,
>>>>>
>>>>>
>>>>>
>>>>> A quick piece of input from our recent experiences of working with
>>>>> TaskGroup is that we often have dependencies across DAGs that require
>>>>> waiting upon the completion of all the tasks in a group.  At the moment,
>>>>> you basically have two options:
>>>>>
>>>>>
>>>>>
>>>>>    1. Create a sensor task in a DAG for every task in the group
>>>>>    2. Create a Dummy task after the group that a sensor waits on
>>>>>
>>>>>
>>>>>
>>>>> So, I would certainly like TaskGroups to have some notion of run
>>>>> status as to better enable downstream decision making.
>>>>>
>>>>>
>>>>>
>>>>> I’ve already created a feature ticket to try to add some kind of
>>>>> TaskGroup Sensor but perhaps this can also form part of the wider
>>>>> discussions here.
>>>>>
>>>>>
>>>>>
>>>>> https://github.com/apache/airflow/issues/14563
>>>>>
>>>>>
>>>>>
>>>>> Cheers,
>>>>>
>>>>>
>>>>>
>>>>> Nathan
>>>>>
>>>>>
>>>>>
>>>>> *From: *Kevin Yang <yr...@gmail.com>
>>>>> *Date: *Thursday, 4 March 2021 at 05:21
>>>>> *To: *dev@airflow.apache.org <de...@airflow.apache.org>
>>>>> *Subject: *[DISCUSS] TaskGroup in Tree View
>>>>>
>>>>> Hi team,
>>>>>
>>>>>
>>>>>
>>>>> We are very glad to see the introduction of TaskGroup in Airflow 2.0
>>>>> and really like it. Thanks to Yu Qian and everyone that contributed to it.
>>>>> To continue moving towards the goal of replacing SubDagOperator with
>>>>> TaskGroup, I'd like to kick off a discussion on bringing TaskGroup into
>>>>> Tree View.
>>>>>
>>>>>
>>>>>
>>>>> *Why do we need TaskGroup in Tree View?*
>>>>>
>>>>> For owners of larger DAGs, say a DAG with 500 tasks, Tree View is the
>>>>> preferred view for its loading speed and simpler representation.
>>>>> SubDagOperator is often used to provide an isolated view into a subset of
>>>>> tasks in such large DAGs. To replace such SubDag use cases, TaskGroup will
>>>>> need to support Tree View.
>>>>>
>>>>>
>>>>>
>>>>> *What should TaskGroup look like in Tree View?*
>>>>>
>>>>> We didn't have a conclusion during the 1st iteration of TaskGroup. In
>>>>> Airbnb, we use SubDag mostly for providing a zoom in view on a small set of
>>>>> tasks and the SubDag zoom in feature worked well for us. We'd like to see
>>>>> TaskGroup provide a zoom in option for both Graph View and Tree View but
>>>>> also like to hear everyone's thoughts.
>>>>>
>>>>>
>>>>>
>>>>> *What needs to be in TaskGroup and what doesn't?*
>>>>>
>>>>> TaskGroup started off as a pure UI concept while SubDag is something
>>>>> more, e.g. it has its own DagRun thus isolated scheduling decisions, it can
>>>>> serve as a logical isolation layer that holds different sets of DAG level
>>>>> params, etc. While we only use SubDag as a UI feature, I think it would be
>>>>> a good opportunity for us to discuss what should be TaskGroup and what
>>>>> shouldn't.
>>>>>
>>>>>
>>>>>
>>>>> Please don't hesitate to share your thoughts.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> Cheers,
>>>>>
>>>>> Kevin Y
>>>>>
>>>>

Re: [DISCUSS] TaskGroup in Tree View

Posted by Xinbin Huang <bi...@gmail.com>.
Hi Kevin, Vikram, and Nathan,

I think we don't need to restrict too much on keeping TaskGroup only as a
UI concept. We are already using TaskGroup to author DAGs and create
dependencies, which already lies a bit outside the UI.
To fully replace SubDagOperator, I think it's necessary to expand TaskGroup
as a *container for tasks* than just UI concept.

As for TaskGroupSensor specifically, I land with the same approach as
Kevin, and I have created a draft PR here:
https://github.com/apache/airflow/pull/14640

Cheers
Bin

On Fri, Mar 5, 2021 at 10:00 PM Kevin Yang <yr...@gmail.com> wrote:

> Hi Vikram,
>
> Good point. What I had in mind was getting the TaskGroup definition in a
> sensor, e.g. extract the _task_group field from serialized DAG, and query
> the DB for the TI states within.
>
> You are right that it might not be clean nor does it keep TaskGroup as a
> UI concept.
>
>
> Cheers,
> Kevin Y
>
> On Fri, Mar 5, 2021 at 8:19 PM Vikram Koka <vi...@astronomer.io.invalid>
> wrote:
>
>> Kevin,
>>
>> I am not sure I understand your response to Nathan.
>>
>> I agree that it is also a valid use case, but I don't see how it can be
>> cleanly done while keeping TaskGroup only as a UI concept.
>> Would this require extending the TaskGroup concept to the backend?
>>
>> Best regards,
>> Vikram
>>
>> On Fri, Mar 5, 2021 at 1:31 AM Kevin Yang <yr...@gmail.com> wrote:
>>
>>> Hi Nathan,
>>>
>>> Thanks a lot for your input and it is indeed a valid use case. This can
>>> be done either keeping TaskGroup as a UI concept or bringing it into the
>>> backend. I'm curious to hear what others think.
>>>
>>>
>>> Cheers,
>>> Kevin Y
>>>
>>> On Thu, Mar 4, 2021 at 12:57 AM Nathan Hadfield <
>>> Nathan.Hadfield@king.com> wrote:
>>>
>>>> Hi Kevin,
>>>>
>>>>
>>>>
>>>> A quick piece of input from our recent experiences of working with
>>>> TaskGroup is that we often have dependencies across DAGs that require
>>>> waiting upon the completion of all the tasks in a group.  At the moment,
>>>> you basically have two options:
>>>>
>>>>
>>>>
>>>>    1. Create a sensor task in a DAG for every task in the group
>>>>    2. Create a Dummy task after the group that a sensor waits on
>>>>
>>>>
>>>>
>>>> So, I would certainly like TaskGroups to have some notion of run status
>>>> as to better enable downstream decision making.
>>>>
>>>>
>>>>
>>>> I’ve already created a feature ticket to try to add some kind of
>>>> TaskGroup Sensor but perhaps this can also form part of the wider
>>>> discussions here.
>>>>
>>>>
>>>>
>>>> https://github.com/apache/airflow/issues/14563
>>>>
>>>>
>>>>
>>>> Cheers,
>>>>
>>>>
>>>>
>>>> Nathan
>>>>
>>>>
>>>>
>>>> *From: *Kevin Yang <yr...@gmail.com>
>>>> *Date: *Thursday, 4 March 2021 at 05:21
>>>> *To: *dev@airflow.apache.org <de...@airflow.apache.org>
>>>> *Subject: *[DISCUSS] TaskGroup in Tree View
>>>>
>>>> Hi team,
>>>>
>>>>
>>>>
>>>> We are very glad to see the introduction of TaskGroup in Airflow 2.0
>>>> and really like it. Thanks to Yu Qian and everyone that contributed to it.
>>>> To continue moving towards the goal of replacing SubDagOperator with
>>>> TaskGroup, I'd like to kick off a discussion on bringing TaskGroup into
>>>> Tree View.
>>>>
>>>>
>>>>
>>>> *Why do we need TaskGroup in Tree View?*
>>>>
>>>> For owners of larger DAGs, say a DAG with 500 tasks, Tree View is the
>>>> preferred view for its loading speed and simpler representation.
>>>> SubDagOperator is often used to provide an isolated view into a subset of
>>>> tasks in such large DAGs. To replace such SubDag use cases, TaskGroup will
>>>> need to support Tree View.
>>>>
>>>>
>>>>
>>>> *What should TaskGroup look like in Tree View?*
>>>>
>>>> We didn't have a conclusion during the 1st iteration of TaskGroup. In
>>>> Airbnb, we use SubDag mostly for providing a zoom in view on a small set of
>>>> tasks and the SubDag zoom in feature worked well for us. We'd like to see
>>>> TaskGroup provide a zoom in option for both Graph View and Tree View but
>>>> also like to hear everyone's thoughts.
>>>>
>>>>
>>>>
>>>> *What needs to be in TaskGroup and what doesn't?*
>>>>
>>>> TaskGroup started off as a pure UI concept while SubDag is something
>>>> more, e.g. it has its own DagRun thus isolated scheduling decisions, it can
>>>> serve as a logical isolation layer that holds different sets of DAG level
>>>> params, etc. While we only use SubDag as a UI feature, I think it would be
>>>> a good opportunity for us to discuss what should be TaskGroup and what
>>>> shouldn't.
>>>>
>>>>
>>>>
>>>> Please don't hesitate to share your thoughts.
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> Cheers,
>>>>
>>>> Kevin Y
>>>>
>>>

Re: [DISCUSS] TaskGroup in Tree View

Posted by Kevin Yang <yr...@gmail.com>.
Hi Vikram,

Good point. What I had in mind was getting the TaskGroup definition in a
sensor, e.g. extract the _task_group field from serialized DAG, and query
the DB for the TI states within.

You are right that it might not be clean nor does it keep TaskGroup as a UI
concept.


Cheers,
Kevin Y

On Fri, Mar 5, 2021 at 8:19 PM Vikram Koka <vi...@astronomer.io.invalid>
wrote:

> Kevin,
>
> I am not sure I understand your response to Nathan.
>
> I agree that it is also a valid use case, but I don't see how it can be
> cleanly done while keeping TaskGroup only as a UI concept.
> Would this require extending the TaskGroup concept to the backend?
>
> Best regards,
> Vikram
>
> On Fri, Mar 5, 2021 at 1:31 AM Kevin Yang <yr...@gmail.com> wrote:
>
>> Hi Nathan,
>>
>> Thanks a lot for your input and it is indeed a valid use case. This can
>> be done either keeping TaskGroup as a UI concept or bringing it into the
>> backend. I'm curious to hear what others think.
>>
>>
>> Cheers,
>> Kevin Y
>>
>> On Thu, Mar 4, 2021 at 12:57 AM Nathan Hadfield <Na...@king.com>
>> wrote:
>>
>>> Hi Kevin,
>>>
>>>
>>>
>>> A quick piece of input from our recent experiences of working with
>>> TaskGroup is that we often have dependencies across DAGs that require
>>> waiting upon the completion of all the tasks in a group.  At the moment,
>>> you basically have two options:
>>>
>>>
>>>
>>>    1. Create a sensor task in a DAG for every task in the group
>>>    2. Create a Dummy task after the group that a sensor waits on
>>>
>>>
>>>
>>> So, I would certainly like TaskGroups to have some notion of run status
>>> as to better enable downstream decision making.
>>>
>>>
>>>
>>> I’ve already created a feature ticket to try to add some kind of
>>> TaskGroup Sensor but perhaps this can also form part of the wider
>>> discussions here.
>>>
>>>
>>>
>>> https://github.com/apache/airflow/issues/14563
>>>
>>>
>>>
>>> Cheers,
>>>
>>>
>>>
>>> Nathan
>>>
>>>
>>>
>>> *From: *Kevin Yang <yr...@gmail.com>
>>> *Date: *Thursday, 4 March 2021 at 05:21
>>> *To: *dev@airflow.apache.org <de...@airflow.apache.org>
>>> *Subject: *[DISCUSS] TaskGroup in Tree View
>>>
>>> Hi team,
>>>
>>>
>>>
>>> We are very glad to see the introduction of TaskGroup in Airflow 2.0 and
>>> really like it. Thanks to Yu Qian and everyone that contributed to it. To
>>> continue moving towards the goal of replacing SubDagOperator with
>>> TaskGroup, I'd like to kick off a discussion on bringing TaskGroup into
>>> Tree View.
>>>
>>>
>>>
>>> *Why do we need TaskGroup in Tree View?*
>>>
>>> For owners of larger DAGs, say a DAG with 500 tasks, Tree View is the
>>> preferred view for its loading speed and simpler representation.
>>> SubDagOperator is often used to provide an isolated view into a subset of
>>> tasks in such large DAGs. To replace such SubDag use cases, TaskGroup will
>>> need to support Tree View.
>>>
>>>
>>>
>>> *What should TaskGroup look like in Tree View?*
>>>
>>> We didn't have a conclusion during the 1st iteration of TaskGroup. In
>>> Airbnb, we use SubDag mostly for providing a zoom in view on a small set of
>>> tasks and the SubDag zoom in feature worked well for us. We'd like to see
>>> TaskGroup provide a zoom in option for both Graph View and Tree View but
>>> also like to hear everyone's thoughts.
>>>
>>>
>>>
>>> *What needs to be in TaskGroup and what doesn't?*
>>>
>>> TaskGroup started off as a pure UI concept while SubDag is something
>>> more, e.g. it has its own DagRun thus isolated scheduling decisions, it can
>>> serve as a logical isolation layer that holds different sets of DAG level
>>> params, etc. While we only use SubDag as a UI feature, I think it would be
>>> a good opportunity for us to discuss what should be TaskGroup and what
>>> shouldn't.
>>>
>>>
>>>
>>> Please don't hesitate to share your thoughts.
>>>
>>>
>>>
>>>
>>>
>>> Cheers,
>>>
>>> Kevin Y
>>>
>>

Re: [DISCUSS] TaskGroup in Tree View

Posted by Vikram Koka <vi...@astronomer.io.INVALID>.
Kevin,

I am not sure I understand your response to Nathan.

I agree that it is also a valid use case, but I don't see how it can be
cleanly done while keeping TaskGroup only as a UI concept.
Would this require extending the TaskGroup concept to the backend?

Best regards,
Vikram

On Fri, Mar 5, 2021 at 1:31 AM Kevin Yang <yr...@gmail.com> wrote:

> Hi Nathan,
>
> Thanks a lot for your input and it is indeed a valid use case. This can be
> done either keeping TaskGroup as a UI concept or bringing it into the
> backend. I'm curious to hear what others think.
>
>
> Cheers,
> Kevin Y
>
> On Thu, Mar 4, 2021 at 12:57 AM Nathan Hadfield <Na...@king.com>
> wrote:
>
>> Hi Kevin,
>>
>>
>>
>> A quick piece of input from our recent experiences of working with
>> TaskGroup is that we often have dependencies across DAGs that require
>> waiting upon the completion of all the tasks in a group.  At the moment,
>> you basically have two options:
>>
>>
>>
>>    1. Create a sensor task in a DAG for every task in the group
>>    2. Create a Dummy task after the group that a sensor waits on
>>
>>
>>
>> So, I would certainly like TaskGroups to have some notion of run status
>> as to better enable downstream decision making.
>>
>>
>>
>> I’ve already created a feature ticket to try to add some kind of
>> TaskGroup Sensor but perhaps this can also form part of the wider
>> discussions here.
>>
>>
>>
>> https://github.com/apache/airflow/issues/14563
>>
>>
>>
>> Cheers,
>>
>>
>>
>> Nathan
>>
>>
>>
>> *From: *Kevin Yang <yr...@gmail.com>
>> *Date: *Thursday, 4 March 2021 at 05:21
>> *To: *dev@airflow.apache.org <de...@airflow.apache.org>
>> *Subject: *[DISCUSS] TaskGroup in Tree View
>>
>> Hi team,
>>
>>
>>
>> We are very glad to see the introduction of TaskGroup in Airflow 2.0 and
>> really like it. Thanks to Yu Qian and everyone that contributed to it. To
>> continue moving towards the goal of replacing SubDagOperator with
>> TaskGroup, I'd like to kick off a discussion on bringing TaskGroup into
>> Tree View.
>>
>>
>>
>> *Why do we need TaskGroup in Tree View?*
>>
>> For owners of larger DAGs, say a DAG with 500 tasks, Tree View is the
>> preferred view for its loading speed and simpler representation.
>> SubDagOperator is often used to provide an isolated view into a subset of
>> tasks in such large DAGs. To replace such SubDag use cases, TaskGroup will
>> need to support Tree View.
>>
>>
>>
>> *What should TaskGroup look like in Tree View?*
>>
>> We didn't have a conclusion during the 1st iteration of TaskGroup. In
>> Airbnb, we use SubDag mostly for providing a zoom in view on a small set of
>> tasks and the SubDag zoom in feature worked well for us. We'd like to see
>> TaskGroup provide a zoom in option for both Graph View and Tree View but
>> also like to hear everyone's thoughts.
>>
>>
>>
>> *What needs to be in TaskGroup and what doesn't?*
>>
>> TaskGroup started off as a pure UI concept while SubDag is something
>> more, e.g. it has its own DagRun thus isolated scheduling decisions, it can
>> serve as a logical isolation layer that holds different sets of DAG level
>> params, etc. While we only use SubDag as a UI feature, I think it would be
>> a good opportunity for us to discuss what should be TaskGroup and what
>> shouldn't.
>>
>>
>>
>> Please don't hesitate to share your thoughts.
>>
>>
>>
>>
>>
>> Cheers,
>>
>> Kevin Y
>>
>

Re: [DISCUSS] TaskGroup in Tree View

Posted by Kevin Yang <yr...@gmail.com>.
Hi Nathan,

Thanks a lot for your input and it is indeed a valid use case. This can be
done either keeping TaskGroup as a UI concept or bringing it into the
backend. I'm curious to hear what others think.


Cheers,
Kevin Y

On Thu, Mar 4, 2021 at 12:57 AM Nathan Hadfield <Na...@king.com>
wrote:

> Hi Kevin,
>
>
>
> A quick piece of input from our recent experiences of working with
> TaskGroup is that we often have dependencies across DAGs that require
> waiting upon the completion of all the tasks in a group.  At the moment,
> you basically have two options:
>
>
>
>    1. Create a sensor task in a DAG for every task in the group
>    2. Create a Dummy task after the group that a sensor waits on
>
>
>
> So, I would certainly like TaskGroups to have some notion of run status as
> to better enable downstream decision making.
>
>
>
> I’ve already created a feature ticket to try to add some kind of TaskGroup
> Sensor but perhaps this can also form part of the wider discussions here.
>
>
>
> https://github.com/apache/airflow/issues/14563
>
>
>
> Cheers,
>
>
>
> Nathan
>
>
>
> *From: *Kevin Yang <yr...@gmail.com>
> *Date: *Thursday, 4 March 2021 at 05:21
> *To: *dev@airflow.apache.org <de...@airflow.apache.org>
> *Subject: *[DISCUSS] TaskGroup in Tree View
>
> Hi team,
>
>
>
> We are very glad to see the introduction of TaskGroup in Airflow 2.0 and
> really like it. Thanks to Yu Qian and everyone that contributed to it. To
> continue moving towards the goal of replacing SubDagOperator with
> TaskGroup, I'd like to kick off a discussion on bringing TaskGroup into
> Tree View.
>
>
>
> *Why do we need TaskGroup in Tree View?*
>
> For owners of larger DAGs, say a DAG with 500 tasks, Tree View is the
> preferred view for its loading speed and simpler representation.
> SubDagOperator is often used to provide an isolated view into a subset of
> tasks in such large DAGs. To replace such SubDag use cases, TaskGroup will
> need to support Tree View.
>
>
>
> *What should TaskGroup look like in Tree View?*
>
> We didn't have a conclusion during the 1st iteration of TaskGroup. In
> Airbnb, we use SubDag mostly for providing a zoom in view on a small set of
> tasks and the SubDag zoom in feature worked well for us. We'd like to see
> TaskGroup provide a zoom in option for both Graph View and Tree View but
> also like to hear everyone's thoughts.
>
>
>
> *What needs to be in TaskGroup and what doesn't?*
>
> TaskGroup started off as a pure UI concept while SubDag is something more,
> e.g. it has its own DagRun thus isolated scheduling decisions, it can serve
> as a logical isolation layer that holds different sets of DAG level params,
> etc. While we only use SubDag as a UI feature, I think it would be a good
> opportunity for us to discuss what should be TaskGroup and what shouldn't.
>
>
>
> Please don't hesitate to share your thoughts.
>
>
>
>
>
> Cheers,
>
> Kevin Y
>

Re: [DISCUSS] TaskGroup in Tree View

Posted by Nathan Hadfield <Na...@king.com>.
Hi Kevin,

A quick piece of input from our recent experiences of working with TaskGroup is that we often have dependencies across DAGs that require waiting upon the completion of all the tasks in a group.  At the moment, you basically have two options:


  1.  Create a sensor task in a DAG for every task in the group
  2.  Create a Dummy task after the group that a sensor waits on

So, I would certainly like TaskGroups to have some notion of run status as to better enable downstream decision making.

I’ve already created a feature ticket to try to add some kind of TaskGroup Sensor but perhaps this can also form part of the wider discussions here.

https://github.com/apache/airflow/issues/14563

Cheers,

Nathan

From: Kevin Yang <yr...@gmail.com>
Date: Thursday, 4 March 2021 at 05:21
To: dev@airflow.apache.org <de...@airflow.apache.org>
Subject: [DISCUSS] TaskGroup in Tree View
Hi team,

We are very glad to see the introduction of TaskGroup in Airflow 2.0 and really like it. Thanks to Yu Qian and everyone that contributed to it. To continue moving towards the goal of replacing SubDagOperator with TaskGroup, I'd like to kick off a discussion on bringing TaskGroup into Tree View.

Why do we need TaskGroup in Tree View?
For owners of larger DAGs, say a DAG with 500 tasks, Tree View is the preferred view for its loading speed and simpler representation. SubDagOperator is often used to provide an isolated view into a subset of tasks in such large DAGs. To replace such SubDag use cases, TaskGroup will need to support Tree View.

What should TaskGroup look like in Tree View?
We didn't have a conclusion during the 1st iteration of TaskGroup. In Airbnb, we use SubDag mostly for providing a zoom in view on a small set of tasks and the SubDag zoom in feature worked well for us. We'd like to see TaskGroup provide a zoom in option for both Graph View and Tree View but also like to hear everyone's thoughts.

What needs to be in TaskGroup and what doesn't?
TaskGroup started off as a pure UI concept while SubDag is something more, e.g. it has its own DagRun thus isolated scheduling decisions, it can serve as a logical isolation layer that holds different sets of DAG level params, etc. While we only use SubDag as a UI feature, I think it would be a good opportunity for us to discuss what should be TaskGroup and what shouldn't.

Please don't hesitate to share your thoughts.


Cheers,
Kevin Y

Re: [DISCUSS] TaskGroup in Tree View

Posted by Kevin Yang <yr...@gmail.com>.
Yes, those are correct. 👍

On Wed, Mar 3, 2021 at 9:47 PM Vikram Koka <vi...@astronomer.io.invalid>
wrote:

> Hey Kevin,
>
> One immediate clarifying question:
> - For your use case, it seems that you want to continue using TaskGroup
> only as a "pure UI concept".
> - But, you want it's representation to also be in the Tree View.
> - You are not proposing any "execution or scheduling" enhancements [again
> for your use case] with TaskGroups
>
> Is the above correct?
>
> Vikram
>
>
>
> On Wed, Mar 3, 2021 at 9:21 PM Kevin Yang <yr...@gmail.com> wrote:
>
>> Hi team,
>>
>> We are very glad to see the introduction of TaskGroup in Airflow 2.0 and
>> really like it. Thanks to Yu Qian and everyone that contributed to it. To
>> continue moving towards the goal of replacing SubDagOperator with
>> TaskGroup, I'd like to kick off a discussion on bringing TaskGroup into
>> Tree View.
>>
>> *Why do we need TaskGroup in Tree View?*
>> For owners of larger DAGs, say a DAG with 500 tasks, Tree View is the
>> preferred view for its loading speed and simpler representation.
>> SubDagOperator is often used to provide an isolated view into a subset of
>> tasks in such large DAGs. To replace such SubDag use cases, TaskGroup will
>> need to support Tree View.
>>
>> *What should TaskGroup look like in Tree View**?*
>> We didn't have a conclusion during the 1st iteration of TaskGroup. In
>> Airbnb, we use SubDag mostly for providing a zoom in view on a small set of
>> tasks and the SubDag zoom in feature worked well for us. We'd like to see
>> TaskGroup provide a zoom in option for both Graph View and Tree View but
>> also like to hear everyone's thoughts.
>>
>> *What needs to be in TaskGroup and what doesn't?*
>> TaskGroup started off as a pure UI concept while SubDag is something
>> more, e.g. it has its own DagRun thus isolated scheduling decisions, it can
>> serve as a logical isolation layer that holds different sets of DAG level
>> params, etc. While we only use SubDag as a UI feature, I think it would be
>> a good opportunity for us to discuss what should be TaskGroup and what
>> shouldn't.
>>
>> Please don't hesitate to share your thoughts.
>>
>>
>> Cheers,
>> Kevin Y
>>
>

Re: [DISCUSS] TaskGroup in Tree View

Posted by Vikram Koka <vi...@astronomer.io.INVALID>.
Hey Kevin,

One immediate clarifying question:
- For your use case, it seems that you want to continue using TaskGroup
only as a "pure UI concept".
- But, you want it's representation to also be in the Tree View.
- You are not proposing any "execution or scheduling" enhancements [again
for your use case] with TaskGroups

Is the above correct?

Vikram



On Wed, Mar 3, 2021 at 9:21 PM Kevin Yang <yr...@gmail.com> wrote:

> Hi team,
>
> We are very glad to see the introduction of TaskGroup in Airflow 2.0 and
> really like it. Thanks to Yu Qian and everyone that contributed to it. To
> continue moving towards the goal of replacing SubDagOperator with
> TaskGroup, I'd like to kick off a discussion on bringing TaskGroup into
> Tree View.
>
> *Why do we need TaskGroup in Tree View?*
> For owners of larger DAGs, say a DAG with 500 tasks, Tree View is the
> preferred view for its loading speed and simpler representation.
> SubDagOperator is often used to provide an isolated view into a subset of
> tasks in such large DAGs. To replace such SubDag use cases, TaskGroup will
> need to support Tree View.
>
> *What should TaskGroup look like in Tree View**?*
> We didn't have a conclusion during the 1st iteration of TaskGroup. In
> Airbnb, we use SubDag mostly for providing a zoom in view on a small set of
> tasks and the SubDag zoom in feature worked well for us. We'd like to see
> TaskGroup provide a zoom in option for both Graph View and Tree View but
> also like to hear everyone's thoughts.
>
> *What needs to be in TaskGroup and what doesn't?*
> TaskGroup started off as a pure UI concept while SubDag is something more,
> e.g. it has its own DagRun thus isolated scheduling decisions, it can serve
> as a logical isolation layer that holds different sets of DAG level params,
> etc. While we only use SubDag as a UI feature, I think it would be a good
> opportunity for us to discuss what should be TaskGroup and what shouldn't.
>
> Please don't hesitate to share your thoughts.
>
>
> Cheers,
> Kevin Y
>