You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@airflow.apache.org by Yu Qian <yu...@gmail.com> on 2020/09/01 07:03:21 UTC

Re: [AIP-34] Rewrite SubDagOperator

The vote for this AIP-34
<https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-34+TaskGroup%3A+A+UI+task+grouping+concept+as+an+alternative+to+SubDagOperator>
passed. However, there's an interesting discussion going on here
<https://github.com/apache/airflow/pull/10153#discussion_r480247681>
regarding whether task_id should be automatically prefixed with group_id of
TaskGroup. So I'm bringing it up in this email thread for discussion.

Plan A: Prefix task_id with group_id of TaskGroup. This is the original
plan in AIP-34
<https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-34+TaskGroup%3A+A+UI+task+grouping+concept+as+an+alternative+to+SubDagOperator>.
The task_id argument passed to an operator just needs to be unique across
the TaskGroup. The actual task_id is prefixed with the group_id so task_id
is guaranteed to be unique across the DAG.

Plan B: Do not prefix task_id with group_id of TaskGroup. The task_id
argument passed to the operator is the actual task_id. So the user is
forced to make sure task_id is unique across the whole DAG.

Obviously the convenience of Plan A is not free of charge. I’m summarizing
some of the pros and cons in this table. There are two examples at the
bottom illustrating the different usage. I was convinced by houqp on the
github comments and some of my own experiments that Plan B has more
advantages and avoids surprises. I'm going to update AIP-34
<https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-34+TaskGroup%3A+A+UI+task+grouping+concept+as+an+alternative+to+SubDagOperator>
according to Plan B unless I hear strong objections before 20200903 7am UTC.




Plan A

Plan B

Ease of Use

Easier to use for new DAGs

Slightly more work on the user to maintain task_id uniqueness

Implementation

A little more complicated. Each group needs to know its parent’s group_id
in order to prefix the group_id correctly.

Implementation is simpler. No need to know the parent TaskGroup’s group_id.

Ease of Migration

task_id will change if TaskGroup is introduced into an existing DAG.
Existing tasks put into a TaskGroup will appear like new tasks if the DAG
already has some historical DagRun. This may pose a barrier to adoption of
TaskGroup.

No change in task_id when an existing task is put into a TaskGroup.
Migrating existing DAGs to adopt TaskGroup will be easier.

Actual task_id

Actual task_id tend to be longer because it’s always prefixed with
group_id, especially if the task is in a nested TaskGroup.

Actual task_id tend to be shorter because users control the actual task_id
themselves.

Graph label

Labels on Graph View tend to be shorter because task_id only needs to be
unique within the TaskGroup

Labels on Graph View tend to be longer because it displays the actual
task_id, which is a unique str across the DAG.


Plan A Example:

def create_section():

    dummies = [DummyOperator(task_id=f'task-{i + 1}') for i in range(5)]

    with TaskGroup("inside_section_1") as inside_section_1:

        _ = [DummyOperator(task_id=f'task-{i + 1}',) for i in range(3)]

    with TaskGroup("inside_section_2") as inside_section_2:

        _ = [DummyOperator(task_id=f'task-{i + 1}',) for i in range(3)]

    dummies[-1] >> inside_section_1

    dummies[-2] >> inside_section_2

    inside_section_1 >> inside_section_2


with DAG(dag_id="example_task_group", start_date=days_ago(2)) as dag:

    start = DummyOperator(task_id="start")

    with TaskGroup("section_1", tooltip="Tasks for Section 1") as section_1:

        create_section()

    some_other_task = DummyOperator(task_id="some-other-task")

    with TaskGroup("section_2", tooltip="Tasks for Section 2") as section_2:

        create_section()

    end = DummyOperator(task_id='end')

    start >> section_1 >> some_other_task >> section_2 >> end


Plan B Example:

def create_section(section_num):

    dummies = [DummyOperator(task_id=f'task-{section_num}.{i + 1}') for i
in range(5)]

    with TaskGroup(f"section_{section_num}.1") as inside_section_1:

        _ = [DummyOperator(task_id=f'task-{section_num}.1.{i + 1}',) for i
in range(3)]

    with TaskGroup(f"section_{section_num}.2") as inside_section_2:

        _ = [DummyOperator(task_id=f'task-{section_num}.2.{i + 1}',) for i
in range(3)]

    dummies[-1] >> inside_section_1

    dummies[-2] >> inside_section_2

    inside_section_1 >> inside_section_2


with DAG(dag_id="example_task_group", start_date=days_ago(2)) as dag:

    start = DummyOperator(task_id="start")

    with TaskGroup("section_1", tooltip="Tasks for Section 1") as section_1:

        create_section(1)

    some_other_task = DummyOperator(task_id="some-other-task")

    with TaskGroup("section_2", tooltip="Tasks for Section 2") as section_2:

        create_section(2)

    end = DummyOperator(task_id='end')

    start >> section_1 >> some_other_task >> section_2 >> end


On Sat, Aug 22, 2020 at 1:02 AM Gerard Casas Saez
<gc...@twitter.com.invalid> wrote:

> Agree on this being non-blocking.
>
> Regarding moving to vote, you can take care. Just open a new email thread
> on dev list and call for a vote. You can see this example from Tomek for
> AIP-31:
>
> https://lists.apache.org/thread.html/r16ee2b6263f5849a2e2b1b6c3ba39fb3d643022f052216cb19a0a8da%40%3Cdev.airflow.apache.org%3E
>
> Best,
>
>
> Gerard Casas Saez
> Twitter | Cortex | @casassaez <http://twitter.com/casassaez>
>
>
> On Thu, Aug 20, 2020 at 7:10 PM Yu Qian <yu...@gmail.com> wrote:
>
> > Hi, Gerard, yes I agree it's possible to do this at UI level without any
> > fundamental change to the implementation. If expand_group() sees that two
> > groups are fully connected (i.e. every task in one parent group depends
> on
> > every task in another parent group), it can decide to collapse all those
> > children edges into a single edge between the parent groups to reduce the
> > burden of the layout() function. However, I did not find any existing
> > algorithm to do this within dagre so we'll likely need to implement this
> > ourselves. Another hiccup is that at the moment it doesn't seem to be
> > possible to call setEdge() between two parent groups (aka clusters). If
> > someone has ideas how to do this please feel free to contribute.
> >
> > One other consideration is that this example is only an extreme case.
> There
> > are other in-between cases that still require user intervention. Let's
> say
> > if 90% of tasks in group1 depends on 90% of tasks in group2 and both
> groups
> > have more than 100 tasks. This will still cause a lot of edges on the
> graph
> > and it's even harder to reduce because the parent groups are not fully
> > connected so it's inaccurate to reduce them to a single edge between the
> > parents. In those cases, the user may still need to do something
> > themselves. e.g. adding some DummyOperator to the DAG to cut down the
> > edges. There will be some tradeoff because DummyOperator takes a short
> > while to execute like you mentioned.
> >
> > There are lots of room for improvements, but I don't think that's a
> > blocking issue for this AIP? So if you can move it to the voting stage
> > that'll be fantastic.
> >
> >
> > On Thu, Aug 20, 2020 at 4:21 PM 耀 周 <zh...@icloud.com.invalid>
> > wrote:
> >
> > > +1
> > >
> > > > 2020年8月18日 23:55,Gerard Casas Saez <gc...@twitter.com.INVALID>
> > 写道:
> > > >
> > > > Is it not possible to solve this at the UI level? Aka tell dagre to
> > only
> > > > add 1 edge to the group instead of to all nodes in the group? No need
> > to
> > > do
> > > > SubDag behaviour, but just reduce the edges on the graph. Should
> reduce
> > > > load time if I understand correctly.
> > > >
> > > > I would strongly avoid the Dummy operator since it will introduce
> > delays
> > > on
> > > > operator execution (as it will need to execute 1 dummy operator and
> > that
> > > > can be expensive imo).
> > > >
> > > > Overall though proposal looks good, unless anyone opposes it, I would
> > > move
> > > > this to vote mode :D
> > > >
> > > > Gerard Casas Saez
> > > > Twitter | Cortex | @casassaez <http://twitter.com/casassaez>
> > > >
> > > >
> > > > On Mon, Aug 17, 2020 at 9:56 AM Yu Qian <yu...@gmail.com>
> wrote:
> > > >
> > > >> Hi, All,
> > > >> Here's the updated AIP-34
> > > >> <
> > > >>
> > >
> >
> https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-34+TaskGroup%3A+A+UI+task+grouping+concept+as+an+alternative+to+SubDagOperator
> > > >>> .
> > > >> The PR has been fine-tuned with better UI interactions and added
> > > >> serialization of TaskGroup:
> > > https://github.com/apache/airflow/pull/10153
> > > >>
> > > >> Here's some experiment results:
> > > >> A made up dag containing 403 tasks, and 5696 edges. Grouped like
> this.
> > > Note
> > > >> there's a inside_section_2 is intentionally made to depend on all
> > tasks
> > > >> in inside_section_1 to generate a large number of edges. The
> > > observation is
> > > >> that opening the top level graph is very quick, around 270ms.
> > Expanding
> > > >> groups that don't have a lot of dense dependencies on other groups
> are
> > > also
> > > >> hardly noticeable. E.g expanding section_1 takes 330ms. The part
> that
> > > takes
> > > >> time is when expanding both groups inside_section_1 and
> > inside_section_2
> > > >> Because there are 2500 edges between these two inner groups, it took
> > 63
> > > >> seconds to expand both of them. Majority of the time (more than
> > > 62seconds)
> > > >> is actually taken by the layout() function in dagre. In other words,
> > > it's
> > > >> very fast to add nodes and edges, but laying them out on the graph
> > takes
> > > >> time. This issue is not actually a problem specific to TaskGroup.
> > > Without
> > > >> TaskGroup, if a DAG contains too many edges, it takes time to layout
> > the
> > > >> graph too.
> > > >>
> > > >> On the other hand, a more realistic experiment with production DAG
> > > >> containing about 400 tasks and 700 edges showed that grouping tasks
> > into
> > > >> three levels of nested TaskGroup cut the upfront page opening time
> > from
> > > >> around 6s to 500ms. (Obviously the time is paid back when user
> > gradually
> > > >> expands all the groups one by one, but normally people don't need to
> > > expand
> > > >> every group every time so it's still a big saving). The experiments
> > are
> > > >> done on OS X Mojave, 2.2 GHz, Intel Core i7, 16GB Memory, Chrome.
> > > >>
> > > >> I can see a few possible improvements to TaskGroup (or how it's
> used)
> > > that
> > > >> can be done as a next-step:
> > > >> 1). Like Gerard suggested, we can implement lazy-loading. Instead of
> > > >> displaying the whole DAG, we can limit the Graph View to show only a
> > > single
> > > >> TaskGroup, omitting its edges going out to other TaskGroups. This
> > > behaviour
> > > >> is more like SubDagOperator where users can zoom into/out of a
> > TaskGroup
> > > >> and look at only tasks within that TaskGroup as if those are the
> only
> > > tasks
> > > >> on the DAG. This can be done with either background javascript calls
> > or
> > > by
> > > >> making a new get request with filtering parameters. Obviously the
> > > downside
> > > >> is that it's not as explicit as showing all the dependencies on the
> > > graph.
> > > >> 2). Users can improve the organization of the DAG themselves to
> reduce
> > > the
> > > >> number of edges. E.g. if every task in group2 depends on every tasks
> > in
> > > >> group1, instead of doing group1 >> group2, they can add a
> > DummyOperator
> > > in
> > > >> between and do this: group1 >> dummy >> group2. This cuts down the
> > > number
> > > >> of edges significantly and page load becomes much faster.
> > > >> 3). If we really want, we can improve the >> operator of TaskGroup
> to
> > > do 2)
> > > >> automatically. If it sees that both sides of >> are TaskGroup, it
> can
> > > >> create a DummyOperator on behalf of the user. The downside is that
> it
> > > may
> > > >> be too much magic.
> > > >>
> > > >> Thanks,
> > > >> Qian
> > > >>
> > > >> def create_section():
> > > >> """
> > > >> Create tasks in the outer section.
> > > >> """
> > > >> dummies = [DummyOperator(task_id=f'task-{i + 1}') for i in
> range(100)]
> > > >>
> > > >> with TaskGroup("inside_section_1") as inside_section_1:
> > > >> _ = [DummyOperator(task_id=f'task-{i + 1}',) for i in range(50)]
> > > >>
> > > >> with TaskGroup("inside_section_2") as inside_section_2:
> > > >> _ = [DummyOperator(task_id=f'task-{i + 1}',) for i in range(50)]
> > > >>
> > > >> dummies[-1] >> inside_section_1
> > > >> dummies[-2] >> inside_section_2
> > > >> inside_section_1 >> inside_section_2
> > > >>
> > > >>
> > > >> with DAG(dag_id="example_task_group", start_date=days_ago(2)) as
> dag:
> > > >> start = DummyOperator(task_id="start")
> > > >>
> > > >> with TaskGroup("section_1") as section_1:
> > > >> create_section()
> > > >>
> > > >> some_other_task = DummyOperator(task_id="some-other-task")
> > > >>
> > > >> with TaskGroup("section_2") as section_2:
> > > >> create_section()
> > > >>
> > > >> end = DummyOperator(task_id='end')
> > > >>
> > > >> start >> section_1 >> some_other_task >> section_2 >> end
> > > >>
> > > >>
> > > >> On Sat, Aug 15, 2020 at 6:56 AM Gerard Casas Saez
> > > >> <gc...@twitter.com.invalid> wrote:
> > > >>
> > > >>> Re graph times. That makes sense. Let me know what you find. We may
> > be
> > > >> able
> > > >>> to contribute on the lazy loading part.
> > > >>>
> > > >>> Looking forward to see the updated AIP!
> > > >>>
> > > >>>
> > > >>> Gerard Casas Saez
> > > >>> Twitter | Cortex | @casassaez <http://twitter.com/casassaez>
> > > >>>
> > > >>>
> > > >>> On Fri, Aug 14, 2020 at 6:14 AM Kaxil Naik <ka...@gmail.com>
> > > wrote:
> > > >>>
> > > >>>> Permissions granted, let me know if you face any issues.
> > > >>>>
> > > >>>> On Fri, Aug 14, 2020 at 1:10 PM Yu Qian <yu...@gmail.com>
> > wrote:
> > > >>>>
> > > >>>>> Hi, Kaxil, my ID for cwiki.apache.org is yuqian1990. Thank you!
> > > >>>>>
> > > >>>>> On Fri, Aug 14, 2020 at 7:35 PM Kaxil Naik <ka...@gmail.com>
> > > >>> wrote:
> > > >>>>>
> > > >>>>>> What's your ID i.e. if you haven't created an account yet,
> please
> > > >>>> create
> > > >>>>>> one at https://cwiki.apache.org/confluence/signup.action and
> send
> > > >> us
> > > >>>>> your
> > > >>>>>> ID and we will add permissions.
> > > >>>>>>
> > > >>>>>> Thanks. I'll edit the AIP. May I request permission to edit it?
> > > >>>>>>> My wiki user email is yuqian1990@gmail.com.
> > > >>>>>>
> > > >>>>>>
> > > >>>>>> On Fri, Aug 14, 2020 at 9:45 AM Yu Qian <yu...@gmail.com>
> > > >>> wrote:
> > > >>>>>>
> > > >>>>>>> Re, Xinbin. Thanks. I'll edit the AIP. May I request permission
> > > >> to
> > > >>>> edit
> > > >>>>>> it?
> > > >>>>>>> My wiki user email is yuqian1990@gmail.com.
> > > >>>>>>>
> > > >>>>>>> Re Gerard: yes the UI loads all the nodes as json from the web
> > > >>> server
> > > >>>>> at
> > > >>>>>>> once. However, it only adds the top level nodes and edges to
> the
> > > >>>> graph
> > > >>>>>> when
> > > >>>>>>> the Graph View page is first opened. And then adds the expanded
> > > >>> nodes
> > > >>>>> to
> > > >>>>>>> the graph as the user expands them. From what I've experienced
> > > >> with
> > > >>>>> DAGs
> > > >>>>>>> containing around 400 tasks (not using TaskGroup or
> > > >>> SubDagOperator),
> > > >>>>>>> opening the whole dag in Graph View usually takes 5 seconds.
> Less
> > > >>>> than
> > > >>>>>> 60ms
> > > >>>>>>> of that is taken by loading the data from webserver. The
> > > >> remaining
> > > >>>>> 4.9s+
> > > >>>>>> is
> > > >>>>>>> taken by javascript functions in dagre-d3.min.js such as
> > > >>> createNodes,
> > > >>>>>>> createEdgeLabels, etc and by rendering the graph. With
> TaskGroup
> > > >>>> being
> > > >>>>>> used
> > > >>>>>>> to group tasks into a smaller number of top-level nodes, the
> > > >> amount
> > > >>>> of
> > > >>>>>> data
> > > >>>>>>> loaded from webserver will remain about the same compared to a
> > > >> flat
> > > >>>> dag
> > > >>>>>> of
> > > >>>>>>> the same size, but the number of nodes and edges needed to be
> > > >> plot
> > > >>> on
> > > >>>>> the
> > > >>>>>>> graph can be reduced significantly. So in theory this should
> > > >> speed
> > > >>> up
> > > >>>>> the
> > > >>>>>>> time it takes to open Graph View even without lazy-loading the
> > > >> data
> > > >>>>> (I'll
> > > >>>>>>> experiment to find out). That said, if it comes to a point
> > > >>>> lazy-loading
> > > >>>>>>> helps, we can still implement it as an improvement.
> > > >>>>>>>
> > > >>>>>>> Re James: the Tree View looks as if all all the groups are
> fully
> > > >>>>>> expanded.
> > > >>>>>>> (because under the hood all the tasks are in a single DAG). I'm
> > > >>> less
> > > >>>>>>> worried about Tree View at the moment because it already has a
> > > >>>>> mechanism
> > > >>>>>>> for collapsing tasks by the dependency tree. That said, the
> Tree
> > > >>> View
> > > >>>>> can
> > > >>>>>>> definitely be improved too with TaskGroup. (e.g. collapse tasks
> > > >> in
> > > >>>> the
> > > >>>>>> same
> > > >>>>>>> TaskGroup when Tree View is first opened).
> > > >>>>>>>
> > > >>>>>>> For both suggestions, implementing them don't require
> fundamental
> > > >>>>> changes
> > > >>>>>>> to the idea. I think we can have a basic working TaskGroup
> first,
> > > >>> and
> > > >>>>>> then
> > > >>>>>>> improve it incrementally in several PRs as we get more feedback
> > > >>> from
> > > >>>>> the
> > > >>>>>>> community. What do you think?
> > > >>>>>>>
> > > >>>>>>> Qian
> > > >>>>>>>
> > > >>>>>>>
> > > >>>>>>> On Wed, Aug 12, 2020 at 9:15 AM James Coder <
> jcoder01@gmail.com>
> > > >>>>> wrote:
> > > >>>>>>>
> > > >>>>>>>> I agree this looks great, one question, how does the tree view
> > > >>>> look?
> > > >>>>>>>>
> > > >>>>>>>> James Coder
> > > >>>>>>>>
> > > >>>>>>>>> On Aug 11, 2020, at 6:48 PM, Gerard Casas Saez <
> > > >>>>>> gcasassaez@twitter.com
> > > >>>>>>> .invalid>
> > > >>>>>>>> wrote:
> > > >>>>>>>>>
> > > >>>>>>>>> First of all, this is awesome!!
> > > >>>>>>>>>
> > > >>>>>>>>> Secondly, checking your UI code, seems you are loading all
> > > >>>>> operators
> > > >>>>>> at
> > > >>>>>>>>> once. Wondering if we can load them as needed (aka load
> > > >>> whenever
> > > >>>> we
> > > >>>>>>> click
> > > >>>>>>>>> the TaskGroup). Some of our DAGs are so large that take
> > > >> forever
> > > >>>> to
> > > >>>>>> load
> > > >>>>>>>> on
> > > >>>>>>>>> the Graph view, so worried about this still being an issue
> > > >>> here.
> > > >>>> It
> > > >>>>>> may
> > > >>>>>>>> be
> > > >>>>>>>>> easily solvable by implementing lazy loading of the graph.
> > > >> Not
> > > >>>> sure
> > > >>>>>> how
> > > >>>>>>>>> easy to implement/add to the UI extension (and dont want to
> > > >>> push
> > > >>>>> for
> > > >>>>>>>> early
> > > >>>>>>>>> optimization as its the root of all evil).
> > > >>>>>>>>> Gerard Casas Saez
> > > >>>>>>>>> Twitter | Cortex | @casassaez <http://twitter.com/casassaez>
> > > >>>>>>>>>
> > > >>>>>>>>>
> > > >>>>>>>>>> On Tue, Aug 11, 2020 at 10:35 AM Xinbin Huang <
> > > >>>>>> bin.huangxb@gmail.com>
> > > >>>>>>>> wrote:
> > > >>>>>>>>>>
> > > >>>>>>>>>> Hi Yu,
> > > >>>>>>>>>>
> > > >>>>>>>>>> Thank you so much for taking on this. I was fairly
> > > >> distracted
> > > >>>>>>> previously
> > > >>>>>>>>>> and I didn't have the time to update the proposal. In fact,
> > > >>>> after
> > > >>>>>>>>>> discussing with Ash, Kaxil and Daniel, the direction of this
> > > >>> AIP
> > > >>>>> has
> > > >>>>>>>> been
> > > >>>>>>>>>> changed to favor the concept of TaskGroup instead of
> > > >> rewriting
> > > >>>>>>>>>> SubDagOperator (though it may may sense to deprecate SubDag
> > > >>> in a
> > > >>>>>>> future
> > > >>>>>>>>>> date.).
> > > >>>>>>>>>>
> > > >>>>>>>>>> Your PR is amazing and it has implemented the desire
> > > >>> features. I
> > > >>>>>> think
> > > >>>>>>>> we
> > > >>>>>>>>>> can focus on your new PR instead. Do you mind updating the
> > > >> AIP
> > > >>>>> based
> > > >>>>>>> on
> > > >>>>>>>>>> what you have done in your PR?
> > > >>>>>>>>>>
> > > >>>>>>>>>> Best,
> > > >>>>>>>>>> Bin
> > > >>>>>>>>>>
> > > >>>>>>>>>>
> > > >>>>>>>>>>> On Tue, Aug 11, 2020 at 7:11 AM Yu Qian <
> > > >>> yuqian1990@gmail.com>
> > > >>>>>>> wrote:
> > > >>>>>>>>>>>
> > > >>>>>>>>>>> Hi, all, I've added the basic UI changes to my proposed
> > > >>>>>>> implementation
> > > >>>>>>>> of
> > > >>>>>>>>>>> TaskGroup as UI grouping concept:
> > > >>>>>>>>>>> https://github.com/apache/airflow/pull/10153
> > > >>>>>>>>>>>
> > > >>>>>>>>>>> I think Chris had a pretty good specification of TaskGroup
> > > >> so
> > > >>>> i'm
> > > >>>>>>>> quoting
> > > >>>>>>>>>>> it here. The only thing I don't fully agree with is the
> > > >>>>> restriction
> > > >>>>>>>>>>> "... **cannot*
> > > >>>>>>>>>>> have dependencies between a Task in a TaskGroup and either
> > > >> a*
> > > >>>>>>>>>>> *   Task in a different TaskGroup or a Task not in any
> > > >>>> group*". I
> > > >>>>>>> think
> > > >>>>>>>>>>> this is over restrictive. Since TaskGroup is a UI concept,
> > > >>>> tasks
> > > >>>>>> can
> > > >>>>>>>> have
> > > >>>>>>>>>>> dependencies on tasks in other TaskGroup or not in any
> > > >>>> TaskGroup.
> > > >>>>>> In
> > > >>>>>>> my
> > > >>>>>>>>>> PR,
> > > >>>>>>>>>>> this is allowed. The graph edges will update accordingly
> > > >> when
> > > >>>>>>>> TaskGroups
> > > >>>>>>>>>>> are expanded/collapsed. TaskGroup is only helping to make
> > > >> the
> > > >>>> UI
> > > >>>>>> look
> > > >>>>>>>>>> less
> > > >>>>>>>>>>> crowded. Under the hood, everything is still a DAG of tasks
> > > >>> and
> > > >>>>>> edges
> > > >>>>>>>> so
> > > >>>>>>>>>>> things work normally. Here's a screenshot
> > > >>>>>>>>>>> <
> > > >>>>>>>>>>>
> > > >>>>>>>>>>
> > > >>>>>>>>
> > > >>>>>>>
> > > >>>>>>
> > > >>>>>
> > > >>>>
> > > >>>
> > > >>
> > >
> >
> https://raw.githubusercontent.com/yuqian90/airflow/gif_for_demo/airflow/www/static/screen-shot-short.gif
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>> of the UI interaction.
> > > >>>>>>>>>>>
> > > >>>>>>>>>>>
> > > >>>>>>>>>>>
> > > >>>>>>>>>>>
> > > >>>>>>>>>>>
> > > >>>>>>>>>>>
> > > >>>>>>>>>>>
> > > >>>>>>>>>>>
> > > >>>>>>>>>>>
> > > >>>>>>>>>>>
> > > >>>>>>>>>>>
> > > >>>>>>>>>>> *   - Tasks can be added to a TaskGroup   - You *can* have
> > > >>>>>>> dependencies
> > > >>>>>>>>>>> between Tasks in the same TaskGroup, but   *cannot* have
> > > >>>>>> dependencies
> > > >>>>>>>>>>> between a Task in a TaskGroup and either a   Task in a
> > > >>>> different
> > > >>>>>>>>>> TaskGroup
> > > >>>>>>>>>>> or a Task not in any group   - You *can* have dependencies
> > > >>>>> between
> > > >>>>>> a
> > > >>>>>>>>>>> TaskGroup and either other   TaskGroups or Tasks not in any
> > > >>>> group
> > > >>>>>> -
> > > >>>>>>>> The
> > > >>>>>>>>>>> UI will by default render a TaskGroup as a single "object",
> > > >>> but
> > > >>>>>>> which
> > > >>>>>>>>>> you
> > > >>>>>>>>>>> expand or zoom into in some way   - You'd need some way to
> > > >>>>>> determine
> > > >>>>>>>> what
> > > >>>>>>>>>>> the "status" of a TaskGroup was   at least for UI display
> > > >>>>> purposes*
> > > >>>>>>>>>>>
> > > >>>>>>>>>>>
> > > >>>>>>>>>>> Regarding Jake's comment, I agree it's possible to
> > > >> implement
> > > >>>> the
> > > >>>>>>>>>> "retrying
> > > >>>>>>>>>>> tasks in a group" pattern he mentioned as an optional
> > > >> feature
> > > >>>> of
> > > >>>>>>>>>> TaskGroup
> > > >>>>>>>>>>> although that may go against having TaskGroup as a pure UI
> > > >>>>> concept.
> > > >>>>>>> For
> > > >>>>>>>>>> the
> > > >>>>>>>>>>> motivating example Jake provided, I suggest implementing
> > > >> both
> > > >>>>>>>>>>> SubmitLongRunningJobTask and PollJobStatusSensor in a
> > > >> single
> > > >>>>>>> operator.
> > > >>>>>>>> It
> > > >>>>>>>>>>> can do something like BaseSensorOperator.execute() does in
> > > >>>>>>> "reschedule"
> > > >>>>>>>>>>> mode, i.e. it first executes some code to submit the long
> > > >>>> running
> > > >>>>>> job
> > > >>>>>>>> to
> > > >>>>>>>>>>> the external service, and store the state (e.g. in XCom).
> > > >>> Then
> > > >>>>>>>> reschedule
> > > >>>>>>>>>>> itself. Subsequent runs then pokes for the completion
> > > >> state.
> > > >>>>>>>>>>>
> > > >>>>>>>>>>>
> > > >>>>>>>>>>> On Thu, Aug 6, 2020 at 2:08 AM Jacob Ferriero
> > > >>>>>>>>>> <jferriero@google.com.invalid
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>> wrote:
> > > >>>>>>>>>>>
> > > >>>>>>>>>>>> I really like this idea of a TaskGroup container as I
> > > >> think
> > > >>>> this
> > > >>>>>>> will
> > > >>>>>>>>>> be
> > > >>>>>>>>>>>> much easier to use than SubDag.
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>> I'd like to propose an optional behavior for special retry
> > > >>>>>> mechanics
> > > >>>>>>>>>> via
> > > >>>>>>>>>>> a
> > > >>>>>>>>>>>> TaskGroup.retry_all property.
> > > >>>>>>>>>>>> This way I could use TaskGroup to replace my favorite use
> > > >> of
> > > >>>>>> SubDag
> > > >>>>>>>> for
> > > >>>>>>>>>>>> atomically retrying tasks of the pattern "act on external
> > > >>>> state
> > > >>>>>> then
> > > >>>>>>>>>>>> reschedule poll until desired state reached".
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>> Motivating use case I have for a SubDag is very simple two
> > > >>>> task
> > > >>>>>>> group
> > > >>>>>>>>>>>> [SubmitLongRunningJobTask >> PollJobStatusSensor].
> > > >>>>>>>>>>>> I use SubDag is because it gives me an easy way to retry
> > > >> the
> > > >>>>>>>>>>> SubmitJobTask
> > > >>>>>>>>>>>> if something about the PollJobSensor fails.
> > > >>>>>>>>>>>> This pattern would be really nice for jobs that are
> > > >> expected
> > > >>>> to
> > > >>>>>> run
> > > >>>>>>> a
> > > >>>>>>>>>>> long
> > > >>>>>>>>>>>> time (because we can use sensor can use reschedule mode
> > > >>>> freeing
> > > >>>>> up
> > > >>>>>>>>>> slots)
> > > >>>>>>>>>>>> but might fail for a retryable reason.
> > > >>>>>>>>>>>> However, using SubDag to meet this use case defeats the
> > > >>>> purpose
> > > >>>>>>>> because
> > > >>>>>>>>>>>> SubDag infamously
> > > >>>>>>>>>>>> <
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>
> > > >>>>>>>>>>
> > > >>>>>>>>
> > > >>>>>>>
> > > >>>>>>
> > > >>>>>
> > > >>>>
> > > >>>
> > > >>
> > >
> >
> https://medium.com/@team_24989/fixing-subdagoperator-deadlock-in-airflow-6c64312ebb10
> > > >>>>>>>>>>>>>
> > > >>>>>>>>>>>> blocks a "controller" slot for the entire duration.
> > > >>>>>>>>>>>> This may feel like a cyclic behavior but reality it is
> > > >> very
> > > >>>>> common
> > > >>>>>>> for
> > > >>>>>>>>>> a
> > > >>>>>>>>>>>> single operator to submit job / wait til done.
> > > >>>>>>>>>>>> We could use this case refactor many operators (e.g. BQ,
> > > >>>>> Dataproc,
> > > >>>>>>>>>>>> Dataflow) to be implemented as TaskGroup[SubmitTask >>
> > > >>>> PollTask]
> > > >>>>>>> with
> > > >>>>>>>>>> an
> > > >>>>>>>>>>>> optional reschedule mode if user knows that this job may
> > > >>> take
> > > >>>> a
> > > >>>>>> long
> > > >>>>>>>>>>> time.
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>> I'd be happy to the development work on adding this
> > > >> specific
> > > >>>>> retry
> > > >>>>>>>>>>> behavior
> > > >>>>>>>>>>>> to TaskGroup once the base concept is implemented if
> > > >> others
> > > >>> in
> > > >>>>> the
> > > >>>>>>>>>>>> community would find this a useful feature.
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>> Cheers,
> > > >>>>>>>>>>>> Jake
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>> On Tue, Aug 4, 2020 at 10:07 AM Jarek Potiuk <
> > > >>>>>>>> Jarek.Potiuk@polidea.com
> > > >>>>>>>>>>>
> > > >>>>>>>>>>>> wrote:
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>>> All for it :) . I think we are getting closer to have
> > > >>> regular
> > > >>>>>>>>>> planning
> > > >>>>>>>>>>>> and
> > > >>>>>>>>>>>>> making some structured approach to 2.0 and starting task
> > > >>>> force
> > > >>>>>> for
> > > >>>>>>> it
> > > >>>>>>>>>>>> soon,
> > > >>>>>>>>>>>>> so I think this should be perfectly fine to discuss and
> > > >>> even
> > > >>>>>> start
> > > >>>>>>>>>>>>> implementing what's beyond as soon as we make sure that
> > > >> we
> > > >>>> are
> > > >>>>>>>>>>>> prioritizing
> > > >>>>>>>>>>>>> 2.0 work.
> > > >>>>>>>>>>>>>
> > > >>>>>>>>>>>>> J,
> > > >>>>>>>>>>>>>
> > > >>>>>>>>>>>>>
> > > >>>>>>>>>>>>> On Tue, Aug 4, 2020 at 12:09 PM Yu Qian <
> > > >>>> yuqian1990@gmail.com>
> > > >>>>>>>>>> wrote:
> > > >>>>>>>>>>>>>
> > > >>>>>>>>>>>>>> Hi Jarek,
> > > >>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>> I agree we should not change the behaviour of the
> > > >> existing
> > > >>>>>>>>>>>> SubDagOperator
> > > >>>>>>>>>>>>>> till Airflow 2.1. Is it okay to continue the discussion
> > > >>>> about
> > > >>>>>>>>>>> TaskGroup
> > > >>>>>>>>>>>>> as
> > > >>>>>>>>>>>>>> a brand new concept/feature independent from the
> > > >> existing
> > > >>>>>>>>>>>> SubDagOperator?
> > > >>>>>>>>>>>>>> In other words, shall we add TaskGroup as a UI grouping
> > > >>>>> concept
> > > >>>>>>>>>> like
> > > >>>>>>>>>>>> Ash
> > > >>>>>>>>>>>>>> suggested, and not touch SubDagOperator atl all.
> > > >> Whenever
> > > >>> we
> > > >>>>> are
> > > >>>>>>>>>>> ready
> > > >>>>>>>>>>>>> with
> > > >>>>>>>>>>>>>> TaskGroup, we then deprecate SubDagOperator in Airflow
> > > >>> 2.1.
> > > >>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>> I really like Ash's idea of simplifying the
> > > >> SubDagOperator
> > > >>>>> idea
> > > >>>>>>>>>> into
> > > >>>>>>>>>>> a
> > > >>>>>>>>>>>>>> simple UI grouping concept. I think Xinbin's idea of
> > > >>>>>> "reattaching
> > > >>>>>>>>>> all
> > > >>>>>>>>>>>> the
> > > >>>>>>>>>>>>>> tasks to the root DAG" is the way to go. And I see James
> > > >>>>> pointed
> > > >>>>>>>>>> out
> > > >>>>>>>>>>> we
> > > >>>>>>>>>>>>>> need some helper functions to simplify dependencies
> > > >>> setting
> > > >>>> of
> > > >>>>>>>>>>>> TaskGroup.
> > > >>>>>>>>>>>>>> Xinbin put up a pretty elegant example in his PR
> > > >>>>>>>>>>>>>> <https://github.com/apache/airflow/pull/9243>. I think
> > > >>>> having
> > > >>>>>>>>>>>> TaskGroup
> > > >>>>>>>>>>>>> as
> > > >>>>>>>>>>>>>> a UI concept should be a relatively small change. We can
> > > >>>>>> simplify
> > > >>>>>>>>>>>>> Xinbin's
> > > >>>>>>>>>>>>>> PR further. So I put up this alternative proposal here:
> > > >>>>>>>>>>>>>> https://github.com/apache/airflow/pull/10153
> > > >>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>> I have not done any UI changes due to lack of experience
> > > >>>> with
> > > >>>>>> web
> > > >>>>>>>>>> UI.
> > > >>>>>>>>>>>> If
> > > >>>>>>>>>>>>>> anyone's interested, please take a look at the PR.
> > > >>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>> Qian
> > > >>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>> On Mon, Jun 22, 2020 at 5:15 AM Jarek Potiuk <
> > > >>>>>>>>>>> Jarek.Potiuk@polidea.com
> > > >>>>>>>>>>>>>
> > > >>>>>>>>>>>>>> wrote:
> > > >>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>> Similar point here to the other ideas that are popping
> > > >>> up.
> > > >>>>>> Maybe
> > > >>>>>>>>>> we
> > > >>>>>>>>>>>>>> should
> > > >>>>>>>>>>>>>>> just focus on completing 2.0 and make all discussions
> > > >>> about
> > > >>>>>>>>>> further
> > > >>>>>>>>>>>>>>> improvements to 2.1? While those are important
> > > >>> discussions
> > > >>>>> (and
> > > >>>>>>>>>> we
> > > >>>>>>>>>>>>> should
> > > >>>>>>>>>>>>>>> continue them in the  near future !) I think at this
> > > >>> point
> > > >>>>>>>>>> focusing
> > > >>>>>>>>>>>> on
> > > >>>>>>>>>>>>>>> delivering 2.0 in its current shape should be our focus
> > > >>>> now ?
> > > >>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>> J.
> > > >>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>> On Thu, Jun 18, 2020 at 6:35 PM Xinbin Huang <
> > > >>>>>>>>>>> bin.huangxb@gmail.com>
> > > >>>>>>>>>>>>>>> wrote:
> > > >>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>> Hi Daniel
> > > >>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>> I agree that the TaskGroup should have the same API
> > > >> as a
> > > >>>> DAG
> > > >>>>>>>>>>> object
> > > >>>>>>>>>>>>>>> related
> > > >>>>>>>>>>>>>>>> to task dependencies, but it will not have anything
> > > >>>> related
> > > >>>>> to
> > > >>>>>>>>>>>> actual
> > > >>>>>>>>>>>>>>>> execution or scheduling.
> > > >>>>>>>>>>>>>>>> I will update the AIP according to this over the
> > > >>> weekend.
> > > >>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>> We could even make a “DAGTemplate” object s.t. when
> > > >> you
> > > >>>>>>>>>> import
> > > >>>>>>>>>>>> the
> > > >>>>>>>>>>>>>>> object
> > > >>>>>>>>>>>>>>>> you can import it with parameters to determine the
> > > >> shape
> > > >>>> of
> > > >>>>>> the
> > > >>>>>>>>>>>> DAG.
> > > >>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>> Can you elaborate a bit more on this? Does it serve a
> > > >>>>> similar
> > > >>>>>>>>>>>> purpose
> > > >>>>>>>>>>>>>> as
> > > >>>>>>>>>>>>>>> a
> > > >>>>>>>>>>>>>>>> DAG factory function?
> > > >>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>> On Thu, Jun 18, 2020 at 9:13 AM Daniel Imberman <
> > > >>>>>>>>>>>>>>> daniel.imberman@gmail.com
> > > >>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>> wrote:
> > > >>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>> Hi Bin,
> > > >>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>> Why not give the TaskGroup the same API as a DAG
> > > >> object
> > > >>>>> (e.g.
> > > >>>>>>>>>>> the
> > > >>>>>>>>>>>>>>> bitwise
> > > >>>>>>>>>>>>>>>>> operator fro task dependencies). We could even make a
> > > >>>>>>>>>>>> “DAGTemplate”
> > > >>>>>>>>>>>>>>>> object
> > > >>>>>>>>>>>>>>>>> s.t. when you import the object you can import it
> > > >> with
> > > >>>>>>>>>>> parameters
> > > >>>>>>>>>>>>> to
> > > >>>>>>>>>>>>>>>>> determine the shape of the DAG.
> > > >>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>> On Wed, Jun 17, 2020 at 8:54 PM, Xinbin Huang <
> > > >>>>>>>>>>>>> bin.huangxb@gmail.com
> > > >>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>> wrote:
> > > >>>>>>>>>>>>>>>>> The TaskGroup will not take schedule interval as a
> > > >>>>> parameter
> > > >>>>>>>>>>>>> itself,
> > > >>>>>>>>>>>>>>> and
> > > >>>>>>>>>>>>>>>> it
> > > >>>>>>>>>>>>>>>>> depends on the DAG where it attaches to. In my
> > > >> opinion,
> > > >>>> the
> > > >>>>>>>>>>>>> TaskGroup
> > > >>>>>>>>>>>>>>>> will
> > > >>>>>>>>>>>>>>>>> only contain a group of tasks with interdependencies,
> > > >>> and
> > > >>>>> the
> > > >>>>>>>>>>>>>> TaskGroup
> > > >>>>>>>>>>>>>>>>> behaves like a task. It doesn't contain any
> > > >>>>>>>>>>> execution/scheduling
> > > >>>>>>>>>>>>>> logic
> > > >>>>>>>>>>>>>>>>> (i.e. schedule_interval, concurrency, max_active_runs
> > > >>>> etc.)
> > > >>>>>>>>>>> like
> > > >>>>>>>>>>>> a
> > > >>>>>>>>>>>>>> DAG
> > > >>>>>>>>>>>>>>>>> does.
> > > >>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>> For example, there is the scenario that the schedule
> > > >>>>>>>>>> interval
> > > >>>>>>>>>>>> of
> > > >>>>>>>>>>>>>> DAG
> > > >>>>>>>>>>>>>>> is
> > > >>>>>>>>>>>>>>>>> 1 hour and the schedule interval of TaskGroup is 20
> > > >>> min.
> > > >>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>> I am curious why you ask this. Is this a use case
> > > >> that
> > > >>>> you
> > > >>>>>>>>>> want
> > > >>>>>>>>>>>> to
> > > >>>>>>>>>>>>>>>> achieve?
> > > >>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>> Bin
> > > >>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>> On Wed, Jun 17, 2020 at 7:59 PM 蒋晓峰 <
> > > >>>>>>>>>> thanosxnicholas@gmail.com
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>>>> wrote:
> > > >>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>> Hi Bin,
> > > >>>>>>>>>>>>>>>>>> Using TaskGroup, Is the schedule interval of
> > > >> TaskGroup
> > > >>>> the
> > > >>>>>>>>>>> same
> > > >>>>>>>>>>>>> as
> > > >>>>>>>>>>>>>>> the
> > > >>>>>>>>>>>>>>>>>> parent DAG? My main concern is whether the schedule
> > > >>>>>>>>>> interval
> > > >>>>>>>>>>> of
> > > >>>>>>>>>>>>>>>> TaskGroup
> > > >>>>>>>>>>>>>>>>>> could be different with that of the DAG? For
> > > >> example,
> > > >>>>> there
> > > >>>>>>>>>>> is
> > > >>>>>>>>>>>>> the
> > > >>>>>>>>>>>>>>>>> scenario
> > > >>>>>>>>>>>>>>>>>> that the schedule interval of DAG is 1 hour and the
> > > >>>>>>>>>> schedule
> > > >>>>>>>>>>>>>> interval
> > > >>>>>>>>>>>>>>>> of
> > > >>>>>>>>>>>>>>>>>> TaskGroup is 20 min.
> > > >>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>> Cheers,
> > > >>>>>>>>>>>>>>>>>> Nicholas
> > > >>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>> On Thu, Jun 18, 2020 at 10:30 AM Xinbin Huang <
> > > >>>>>>>>>>>>>> bin.huangxb@gmail.com
> > > >>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>> wrote:
> > > >>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>> Hi Nicholas,
> > > >>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>> I am not sure about the old behavior of
> > > >>> SubDagOperator,
> > > >>>>>>>>>>> maybe
> > > >>>>>>>>>>>>> it
> > > >>>>>>>>>>>>>>> will
> > > >>>>>>>>>>>>>>>>>> throw
> > > >>>>>>>>>>>>>>>>>>> an error? But in the original proposal, the
> > > >> subdag's
> > > >>>>>>>>>>>>>>>> schedule_interval
> > > >>>>>>>>>>>>>>>>>> will
> > > >>>>>>>>>>>>>>>>>>> be ignored. Or if we decide to use TaskGroup to
> > > >>> replace
> > > >>>>>>>>>>>> SubDag,
> > > >>>>>>>>>>>>>>> there
> > > >>>>>>>>>>>>>>>>>> will
> > > >>>>>>>>>>>>>>>>>>> be no subdag schedule_interval.
> > > >>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>> Bin
> > > >>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>> On Wed, Jun 17, 2020 at 6:21 PM 蒋晓峰 <
> > > >>>>>>>>>>>> thanosxnicholas@gmail.com
> > > >>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>> wrote:
> > > >>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>> Hi Bin,
> > > >>>>>>>>>>>>>>>>>>>> Thanks for your good proposal. I was confused
> > > >>> whether
> > > >>>>>>>>>> the
> > > >>>>>>>>>>>>>>> schedule
> > > >>>>>>>>>>>>>>>>>>>> interval of SubDAG is different from that of the
> > > >>>> parent
> > > >>>>>>>>>>>> DAG?
> > > >>>>>>>>>>>>> I
> > > >>>>>>>>>>>>>>> have
> > > >>>>>>>>>>>>>>>>>>>> discussed with Jiajie Zhong about the schedule
> > > >>>> interval
> > > >>>>>>>>>>> of
> > > >>>>>>>>>>>>>>> SubDAG.
> > > >>>>>>>>>>>>>>>> If
> > > >>>>>>>>>>>>>>>>>> the
> > > >>>>>>>>>>>>>>>>>>>> SubDagOperator has a different schedule interval,
> > > >>> what
> > > >>>>>>>>>>> will
> > > >>>>>>>>>>>>>>> happen
> > > >>>>>>>>>>>>>>>>> for
> > > >>>>>>>>>>>>>>>>>>> the
> > > >>>>>>>>>>>>>>>>>>>> scheduler to schedule the parent DAG?
> > > >>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>> Regards,
> > > >>>>>>>>>>>>>>>>>>>> Nicholas Jiang
> > > >>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>> On Thu, Jun 18, 2020 at 8:04 AM Xinbin Huang <
> > > >>>>>>>>>>>>>>>> bin.huangxb@gmail.com>
> > > >>>>>>>>>>>>>>>>>>>> wrote:
> > > >>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>> Thank you, Max, Kaxil, and everyone's feedback!
> > > >>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>> I have rethought about the concept of subdag and
> > > >>> task
> > > >>>>>>>>>>>>>> groups. I
> > > >>>>>>>>>>>>>>>>> think
> > > >>>>>>>>>>>>>>>>>>> the
> > > >>>>>>>>>>>>>>>>>>>>> better way to approach this is to entirely remove
> > > >>>>>>>>>>> subdag
> > > >>>>>>>>>>>>> and
> > > >>>>>>>>>>>>>>>>>> introduce
> > > >>>>>>>>>>>>>>>>>>>> the
> > > >>>>>>>>>>>>>>>>>>>>> concept of TaskGroup, which is a container of
> > > >> tasks
> > > >>>>>>>>>>> along
> > > >>>>>>>>>>>>>> with
> > > >>>>>>>>>>>>>>>>> their
> > > >>>>>>>>>>>>>>>>>>>>> dependencies *without execution/scheduling logic
> > > >>> as a
> > > >>>>>>>>>>>> DAG*.
> > > >>>>>>>>>>>>>> The
> > > >>>>>>>>>>>>>>>>> only
> > > >>>>>>>>>>>>>>>>>>>>> purpose of it is to group a list of tasks, but
> > > >> you
> > > >>>>>>>>>>> still
> > > >>>>>>>>>>>>> need
> > > >>>>>>>>>>>>>>> to
> > > >>>>>>>>>>>>>>>>> add
> > > >>>>>>>>>>>>>>>>>> it
> > > >>>>>>>>>>>>>>>>>>>> to
> > > >>>>>>>>>>>>>>>>>>>>> a DAG for execution.
> > > >>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>> Here is a small code snippet.
> > > >>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>> ```
> > > >>>>>>>>>>>>>>>>>>>>> class TaskGroup:
> > > >>>>>>>>>>>>>>>>>>>>> """
> > > >>>>>>>>>>>>>>>>>>>>> A TaskGroup contains a group of tasks.
> > > >>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>> If default_args is missing, it will take default
> > > >>> args
> > > >>>>>>>>>>>> from
> > > >>>>>>>>>>>>>> the
> > > >>>>>>>>>>>>>>>>>> DAG.
> > > >>>>>>>>>>>>>>>>>>>>> """
> > > >>>>>>>>>>>>>>>>>>>>> def __init__(self, group_id, default_args):
> > > >>>>>>>>>>>>>>>>>>>>> pass
> > > >>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>> """
> > > >>>>>>>>>>>>>>>>>>>>> You can add tasks to a task group similar to
> > > >> adding
> > > >>>>>>>>>>> tasks
> > > >>>>>>>>>>>>> to
> > > >>>>>>>>>>>>>> a
> > > >>>>>>>>>>>>>>>> DAG
> > > >>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>> This can be declared in a separate file from the
> > > >>> dag
> > > >>>>>>>>>>> file
> > > >>>>>>>>>>>>>>>>>>>>> """
> > > >>>>>>>>>>>>>>>>>>>>> download_group = TaskGroup(group_id='download',
> > > >>>>>>>>>>>>>>>>>>>> default_args=default_args)
> > > >>>>>>>>>>>>>>>>>>>>> download_group.add_task(task1)
> > > >>>>>>>>>>>>>>>>>>>>> task2.dag = download_group
> > > >>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>> with download_group:
> > > >>>>>>>>>>>>>>>>>>>>> task3 = DummyOperator(task_id='task3')
> > > >>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>> [task, task2] >> task3
> > > >>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>> """Add it to a DAG for execution"""
> > > >>>>>>>>>>>>>>>>>>>>> with DAG(dag_id='start_download_dag',
> > > >>>>>>>>>>>>>>> default_args=default_args,
> > > >>>>>>>>>>>>>>>>>>>>> schedule_interval='@daily', ...) as dag:
> > > >>>>>>>>>>>>>>>>>>>>> start = DummyOperator(task_id='start')
> > > >>>>>>>>>>>>>>>>>>>>> start >> download_group
> > > >>>>>>>>>>>>>>>>>>>>> # this is equivalent to
> > > >>>>>>>>>>>>>>>>>>>>> # start >> [task, task2] >> task3
> > > >>>>>>>>>>>>>>>>>>>>> ```
> > > >>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>> With this, we can still reuse a group of tasks
> > > >> and
> > > >>>>>>>>>> set
> > > >>>>>>>>>>>>>>>> dependencies
> > > >>>>>>>>>>>>>>>>>>>> between
> > > >>>>>>>>>>>>>>>>>>>>> them; it avoids the boilerplate code from using
> > > >>>>>>>>>>>>>> SubDagOperator,
> > > >>>>>>>>>>>>>>>> and
> > > >>>>>>>>>>>>>>>>>> we
> > > >>>>>>>>>>>>>>>>>>>> can
> > > >>>>>>>>>>>>>>>>>>>>> declare dependencies as `task >> task_group >>
> > > >>> task`.
> > > >>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>> User migration wise, we can introduce it before
> > > >>>>>>>>>> Airflow
> > > >>>>>>>>>>>> 2.0
> > > >>>>>>>>>>>>>> and
> > > >>>>>>>>>>>>>>>>> allow
> > > >>>>>>>>>>>>>>>>>>>>> gradual transition. Then we can decide if we
> > > >> still
> > > >>>>>>>>>> want
> > > >>>>>>>>>>>> to
> > > >>>>>>>>>>>>>> keep
> > > >>>>>>>>>>>>>>>> the
> > > >>>>>>>>>>>>>>>>>>>>> SubDagOperator or simply remove it.
> > > >>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>> Any thoughts?
> > > >>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>> Cheers,
> > > >>>>>>>>>>>>>>>>>>>>> Bin
> > > >>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>> On Wed, Jun 17, 2020 at 7:37 AM Maxime
> > > >> Beauchemin <
> > > >>>>>>>>>>>>>>>>>>>>> maximebeauchemin@gmail.com> wrote:
> > > >>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>> +1, proposal looks good.
> > > >>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>> The original intention was really to have tasks
> > > >>>>>>>>>>> groups
> > > >>>>>>>>>>>>> and
> > > >>>>>>>>>>>>>> a
> > > >>>>>>>>>>>>>>>>>>>> zoom-in/out
> > > >>>>>>>>>>>>>>>>>>>>> in
> > > >>>>>>>>>>>>>>>>>>>>>> the UI. The original reasoning was to reuse the
> > > >>> DAG
> > > >>>>>>>>>>>>> object
> > > >>>>>>>>>>>>>>>> since
> > > >>>>>>>>>>>>>>>>> it
> > > >>>>>>>>>>>>>>>>>>> is
> > > >>>>>>>>>>>>>>>>>>>> a
> > > >>>>>>>>>>>>>>>>>>>>>> group of tasks, but as highlighted here it does
> > > >>>>>>>>>>> create
> > > >>>>>>>>>>>>>>>> underlying
> > > >>>>>>>>>>>>>>>>>>>>>> confusions since a DAG is much more than just a
> > > >>>>>>>>>> group
> > > >>>>>>>>>>>> of
> > > >>>>>>>>>>>>>>> tasks.
> > > >>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>> Max
> > > >>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>> On Mon, Jun 15, 2020 at 2:43 AM Poornima Joshi <
> > > >>>>>>>>>>>>>>>>>>>>> joshipoornima06@gmail.com>
> > > >>>>>>>>>>>>>>>>>>>>>> wrote:
> > > >>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>> Thank you for your email.
> > > >>>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>> On Sat, Jun 13, 2020 at 12:18 AM Xinbin Huang <
> > > >>>>>>>>>>>>>>>>>>> bin.huangxb@gmail.com
> > > >>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>> wrote:
> > > >>>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>>>>> - *Unpack SubDags during dag parsing*: This
> > > >>>>>>>>>>>>>> rewrites
> > > >>>>>>>>>>>>>>>> the
> > > >>>>>>>>>>>>>>>>>>>>>>>> *DagBag.bag_dag*
> > > >>>>>>>>>>>>>>>>>>>>>>>>>> method to unpack subdag while parsing, and
> > > >>>>>>>>>> it
> > > >>>>>>>>>>>>> will
> > > >>>>>>>>>>>>>>>> give a
> > > >>>>>>>>>>>>>>>>>>>> flat
> > > >>>>>>>>>>>>>>>>>>>>>>>>>> structure at
> > > >>>>>>>>>>>>>>>>>>>>>>>>>> the task level
> > > >>>>>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>>>> The serialized_dag representation already
> > > >>>>>>>>>> does
> > > >>>>>>>>>>>>> this I
> > > >>>>>>>>>>>>>>>>> think.
> > > >>>>>>>>>>>>>>>>>> At
> > > >>>>>>>>>>>>>>>>>>>>> least
> > > >>>>>>>>>>>>>>>>>>>>>>> if
> > > >>>>>>>>>>>>>>>>>>>>>>>>> I've understood your idea here correctly.
> > > >>>>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>>> I am not sure about serialized_dag
> > > >>>>>>>>>>> representation,
> > > >>>>>>>>>>>>> but
> > > >>>>>>>>>>>>>> at
> > > >>>>>>>>>>>>>>>>> least
> > > >>>>>>>>>>>>>>>>>>> it
> > > >>>>>>>>>>>>>>>>>>>>> will
> > > >>>>>>>>>>>>>>>>>>>>>>>> still keep the subdag entry in the DAG table?
> > > >>>>>>>>>> In
> > > >>>>>>>>>>> my
> > > >>>>>>>>>>>>>>>> proposal
> > > >>>>>>>>>>>>>>>>> as
> > > >>>>>>>>>>>>>>>>>>>> also
> > > >>>>>>>>>>>>>>>>>>>>> in
> > > >>>>>>>>>>>>>>>>>>>>>>> the
> > > >>>>>>>>>>>>>>>>>>>>>>>> draft PR, the idea is to *extract the tasks
> > > >>>>>>>>>> from
> > > >>>>>>>>>>>> the
> > > >>>>>>>>>>>>>>> subdag
> > > >>>>>>>>>>>>>>>>> and
> > > >>>>>>>>>>>>>>>>>>> add
> > > >>>>>>>>>>>>>>>>>>>>>> them
> > > >>>>>>>>>>>>>>>>>>>>>>>> back to the root_dag. *So the runtime DAG
> > > >> graph
> > > >>>>>>>>>>>> will
> > > >>>>>>>>>>>>>> look
> > > >>>>>>>>>>>>>>>>>> exactly
> > > >>>>>>>>>>>>>>>>>>>> the
> > > >>>>>>>>>>>>>>>>>>>>>>>> same as without subdag but with metadata
> > > >>>>>>>>>> attached
> > > >>>>>>>>>>>> to
> > > >>>>>>>>>>>>>>> those
> > > >>>>>>>>>>>>>>>>>>>> sections.
> > > >>>>>>>>>>>>>>>>>>>>>>> These
> > > >>>>>>>>>>>>>>>>>>>>>>>> metadata will be later on used to render in
> > > >> the
> > > >>>>>>>>>>> UI.
> > > >>>>>>>>>>>>> So
> > > >>>>>>>>>>>>>>>> after
> > > >>>>>>>>>>>>>>>>>>>> parsing
> > > >>>>>>>>>>>>>>>>>>>>> (
> > > >>>>>>>>>>>>>>>>>>>>>>>> *DagBag.process_file()*), it will just output
> > > >>>>>>>>>> the
> > > >>>>>>>>>>>>>>> *root_dag
> > > >>>>>>>>>>>>>>>>>>>> *instead
> > > >>>>>>>>>>>>>>>>>>>>> of
> > > >>>>>>>>>>>>>>>>>>>>>>> *root_dag +
> > > >>>>>>>>>>>>>>>>>>>>>>>> subdag + subdag + nested subdag* etc.
> > > >>>>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>>> - e.g. section-1-* will have metadata
> > > >>>>>>>>>>>>>>>>>> current_group=section-1,
> > > >>>>>>>>>>>>>>>>>>>>>>>> parent_group=<the-root-dag-id> (welcome for
> > > >>>>>>>>>>> naming
> > > >>>>>>>>>>>>>>>>>>> suggestions),
> > > >>>>>>>>>>>>>>>>>>>>> the
> > > >>>>>>>>>>>>>>>>>>>>>>>> reason for parent_group is that we can have
> > > >>>>>>>>>>> nested
> > > >>>>>>>>>>>>>> group
> > > >>>>>>>>>>>>>>>> and
> > > >>>>>>>>>>>>>>>>>>>> still
> > > >>>>>>>>>>>>>>>>>>>>>> be
> > > >>>>>>>>>>>>>>>>>>>>>>>> able to capture the dependency.
> > > >>>>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>>> Runtime DAG:
> > > >>>>>>>>>>>>>>>>>>>>>>>> [image: image.png]
> > > >>>>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>>> While at the UI, what we see would be
> > > >> something
> > > >>>>>>>>>>>> like
> > > >>>>>>>>>>>>>> this
> > > >>>>>>>>>>>>>>>> by
> > > >>>>>>>>>>>>>>>>>>>>> utilizing
> > > >>>>>>>>>>>>>>>>>>>>>>> the
> > > >>>>>>>>>>>>>>>>>>>>>>>> metadata, and then we can expand or zoom into
> > > >>>>>>>>>> in
> > > >>>>>>>>>>>> some
> > > >>>>>>>>>>>>>>> way.
> > > >>>>>>>>>>>>>>>>>>>>>>>> [image: image.png]
> > > >>>>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>>> The benefits I can see is that:
> > > >>>>>>>>>>>>>>>>>>>>>>>> 1. We don't need to deal with the extra
> > > >>>>>>>>>>> complexity
> > > >>>>>>>>>>>> of
> > > >>>>>>>>>>>>>>>> SubDag
> > > >>>>>>>>>>>>>>>>>> for
> > > >>>>>>>>>>>>>>>>>>>>>>> execution
> > > >>>>>>>>>>>>>>>>>>>>>>>> and scheduling. It will be the same as not
> > > >>>>>>>>>> using
> > > >>>>>>>>>>>>>> SubDag.
> > > >>>>>>>>>>>>>>>>>>>>>>>> 2. Still have the benefits of modularized and
> > > >>>>>>>>>>>>> reusable
> > > >>>>>>>>>>>>>>> dag
> > > >>>>>>>>>>>>>>>>> code
> > > >>>>>>>>>>>>>>>>>>> and
> > > >>>>>>>>>>>>>>>>>>>>>>>> declare dependencies between them. And with
> > > >> the
> > > >>>>>>>>>>> new
> > > >>>>>>>>>>>>>>>>>>> SubDagOperator
> > > >>>>>>>>>>>>>>>>>>>>> (see
> > > >>>>>>>>>>>>>>>>>>>>>>> AIP
> > > >>>>>>>>>>>>>>>>>>>>>>>> or draft PR), we can use the same dag_factory
> > > >>>>>>>>>>>>> function
> > > >>>>>>>>>>>>>>> for
> > > >>>>>>>>>>>>>>>>>>>>> generating 1
> > > >>>>>>>>>>>>>>>>>>>>>>>> dag, a lot of dynamic dags, or used for SubDag
> > > >>>>>>>>>>> (in
> > > >>>>>>>>>>>>> this
> > > >>>>>>>>>>>>>>>> case,
> > > >>>>>>>>>>>>>>>>>> it
> > > >>>>>>>>>>>>>>>>>>>> will
> > > >>>>>>>>>>>>>>>>>>>>>>> just
> > > >>>>>>>>>>>>>>>>>>>>>>>> extract all underlying tasks and append to the
> > > >>>>>>>>>>> root
> > > >>>>>>>>>>>>>> dag).
> > > >>>>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>>> - Then it gets to the idea of replacing subdag
> > > >>>>>>>>>>>> with a
> > > >>>>>>>>>>>>>>>>>> simpler
> > > >>>>>>>>>>>>>>>>>>>>>> concept
> > > >>>>>>>>>>>>>>>>>>>>>>>> by Ash: the proposed change basically drains
> > > >>>>>>>>>> out
> > > >>>>>>>>>>>> the
> > > >>>>>>>>>>>>>>>>>> contents
> > > >>>>>>>>>>>>>>>>>>>> of
> > > >>>>>>>>>>>>>>>>>>>>> a
> > > >>>>>>>>>>>>>>>>>>>>>>> SubDag
> > > >>>>>>>>>>>>>>>>>>>>>>>> and becomes more like
> > > >>>>>>>>>>>>>>>>>>>> ExtractSubdagTasksAndAppendToRootdagOperator
> > > >>>>>>>>>>>>>>>>>>>>>>> (forgive
> > > >>>>>>>>>>>>>>>>>>>>>>>> me about the crazy name..). In this case, it
> > > >> is
> > > >>>>>>>>>>>> still
> > > >>>>>>>>>>>>>>>>>>> necessary
> > > >>>>>>>>>>>>>>>>>>>> to
> > > >>>>>>>>>>>>>>>>>>>>>>> keep the
> > > >>>>>>>>>>>>>>>>>>>>>>>> concept of subdag as it is nothing more than a
> > > >>>>>>>>>>>> name?
> > > >>>>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>>> That's why the TaskGroup idea comes up. Thanks
> > > >>>>>>>>>>>> Chris
> > > >>>>>>>>>>>>>>> Palmer
> > > >>>>>>>>>>>>>>>>> for
> > > >>>>>>>>>>>>>>>>>>>>> helping
> > > >>>>>>>>>>>>>>>>>>>>>>>> conceptualize the functionality of TaskGroup,
> > > >> I
> > > >>>>>>>>>>>> will
> > > >>>>>>>>>>>>>> just
> > > >>>>>>>>>>>>>>>>> paste
> > > >>>>>>>>>>>>>>>>>>> it
> > > >>>>>>>>>>>>>>>>>>>>>> here.
> > > >>>>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>>>> - Tasks can be added to a TaskGroup
> > > >>>>>>>>>>>>>>>>>>>>>>>>> - You *can* have dependencies between Tasks
> > > >>>>>>>>>> in
> > > >>>>>>>>>>>> the
> > > >>>>>>>>>>>>>> same
> > > >>>>>>>>>>>>>>>>>>>> TaskGroup,
> > > >>>>>>>>>>>>>>>>>>>>>> but
> > > >>>>>>>>>>>>>>>>>>>>>>>>> *cannot* have dependencies between a Task in
> > > >>>>>>>>>> a
> > > >>>>>>>>>>>>>>> TaskGroup
> > > >>>>>>>>>>>>>>>>>> and
> > > >>>>>>>>>>>>>>>>>>>>>> either a
> > > >>>>>>>>>>>>>>>>>>>>>>>>> Task in a different TaskGroup or a Task not
> > > >>>>>>>>>> in
> > > >>>>>>>>>>>> any
> > > >>>>>>>>>>>>>>> group
> > > >>>>>>>>>>>>>>>>>>>>>>>>> - You *can* have dependencies between a
> > > >>>>>>>>>>> TaskGroup
> > > >>>>>>>>>>>>> and
> > > >>>>>>>>>>>>>>>>>> either
> > > >>>>>>>>>>>>>>>>>>>>> other
> > > >>>>>>>>>>>>>>>>>>>>>>>>> TaskGroups or Tasks not in any group
> > > >>>>>>>>>>>>>>>>>>>>>>>>> - The UI will by default render a TaskGroup
> > > >>>>>>>>>> as
> > > >>>>>>>>>>> a
> > > >>>>>>>>>>>>>> single
> > > >>>>>>>>>>>>>>>>>>>> "object",
> > > >>>>>>>>>>>>>>>>>>>>>> but
> > > >>>>>>>>>>>>>>>>>>>>>>>>> which you expand or zoom into in some way
> > > >>>>>>>>>>>>>>>>>>>>>>>>> - You'd need some way to determine what the
> > > >>>>>>>>>>>>> "status"
> > > >>>>>>>>>>>>>>> of a
> > > >>>>>>>>>>>>>>>>>>>>> TaskGroup
> > > >>>>>>>>>>>>>>>>>>>>>>> was
> > > >>>>>>>>>>>>>>>>>>>>>>>>> at least for UI display purposes
> > > >>>>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>>> I agree with Chris:
> > > >>>>>>>>>>>>>>>>>>>>>>>> - From the backend's view (scheduler &
> > > >>>>>>>>>>> executor), I
> > > >>>>>>>>>>>>>> think
> > > >>>>>>>>>>>>>>>>>>> TaskGroup
> > > >>>>>>>>>>>>>>>>>>>>>>> should
> > > >>>>>>>>>>>>>>>>>>>>>>>> be ignored during execution. (unless we decide
> > > >>>>>>>>>> to
> > > >>>>>>>>>>>>>>> implement
> > > >>>>>>>>>>>>>>>>>> some
> > > >>>>>>>>>>>>>>>>>>>>>> metadata
> > > >>>>>>>>>>>>>>>>>>>>>>>> operations that allows start/stop a group of
> > > >>>>>>>>>>> tasks
> > > >>>>>>>>>>>>>> etc.)
> > > >>>>>>>>>>>>>>>>>>>>>>>> - From the UI's View, it should be able to
> > > >> pick
> > > >>>>>>>>>>> up
> > > >>>>>>>>>>>>> the
> > > >>>>>>>>>>>>>>>>>> individual
> > > >>>>>>>>>>>>>>>>>>>>>> tasks'
> > > >>>>>>>>>>>>>>>>>>>>>>>> status and then determine the TaskGroup's
> > > >>>>>>>>>> status
> > > >>>>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>>> Bin
> > > >>>>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>>> On Fri, Jun 12, 2020 at 10:28 AM Daniel
> > > >>>>>>>>>> Imberman
> > > >>>>>>>>>>> <
> > > >>>>>>>>>>>>>>>>>>>>>>>> daniel.imberman@gmail.com> wrote:
> > > >>>>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>>>> I hadn’t thought about using the `>>`
> > > >> operator
> > > >>>>>>>>>>> to
> > > >>>>>>>>>>>>> tie
> > > >>>>>>>>>>>>>>> dags
> > > >>>>>>>>>>>>>>>>>>>> together
> > > >>>>>>>>>>>>>>>>>>>>>> but
> > > >>>>>>>>>>>>>>>>>>>>>>> I
> > > >>>>>>>>>>>>>>>>>>>>>>>>> think that sounds pretty great! I wonder if
> > > >> we
> > > >>>>>>>>>>>> could
> > > >>>>>>>>>>>>>>>>>> essentially
> > > >>>>>>>>>>>>>>>>>>>>> write
> > > >>>>>>>>>>>>>>>>>>>>>>> in
> > > >>>>>>>>>>>>>>>>>>>>>>>>> the ability to set dependencies to all
> > > >>>>>>>>>>>> starter-tasks
> > > >>>>>>>>>>>>>> for
> > > >>>>>>>>>>>>>>>>> that
> > > >>>>>>>>>>>>>>>>>>> DAG.
> > > >>>>>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>>>> I’m personally ok with SubDag being a mostly
> > > >>>>>>>>>> UI
> > > >>>>>>>>>>>>>> concept.
> > > >>>>>>>>>>>>>>>> It
> > > >>>>>>>>>>>>>>>>>>>> doesn’t
> > > >>>>>>>>>>>>>>>>>>>>>> need
> > > >>>>>>>>>>>>>>>>>>>>>>>>> to execute separately, you’re just adding
> > > >> more
> > > >>>>>>>>>>>> tasks
> > > >>>>>>>>>>>>>> to
> > > >>>>>>>>>>>>>>>> the
> > > >>>>>>>>>>>>>>>>>>> queue
> > > >>>>>>>>>>>>>>>>>>>>> that
> > > >>>>>>>>>>>>>>>>>>>>>>> will
> > > >>>>>>>>>>>>>>>>>>>>>>>>> be executed when there are resources
> > > >>>>>>>>>> available.
> > > >>>>>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>>>> via Newton Mail [
> > > >>>>>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>
> > > >>>>>>>>>>
> > > >>>>>>>>
> > > >>>>>>>
> > > >>>>>>
> > > >>>>>
> > > >>>>
> > > >>>
> > > >>
> > >
> >
> https://cloudmagic.com/k/d/mailapp?ct=dx&cv=10.0.50&pv=10.14.6&source=email_footer_2
> > > >>>>>>>>>>>>>>>>>>>>>>>>> ]
> > > >>>>>>>>>>>>>>>>>>>>>>>>> On Fri, Jun 12, 2020 at 9:45 AM, Chris Palmer
> > > >>>>>>>>>> <
> > > >>>>>>>>>>>>>>>>>>> chris@crpalmer.com
> > > >>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>> wrote:
> > > >>>>>>>>>>>>>>>>>>>>>>>>> I agree that SubDAGs are an overly complex
> > > >>>>>>>>>>>>>> abstraction.
> > > >>>>>>>>>>>>>>> I
> > > >>>>>>>>>>>>>>>>>> think
> > > >>>>>>>>>>>>>>>>>>>> what
> > > >>>>>>>>>>>>>>>>>>>>>> is
> > > >>>>>>>>>>>>>>>>>>>>>>>>> needed/useful is a TaskGroup concept. On a
> > > >>>>>>>>>> high
> > > >>>>>>>>>>>>> level
> > > >>>>>>>>>>>>>> I
> > > >>>>>>>>>>>>>>>>> think
> > > >>>>>>>>>>>>>>>>>>> you
> > > >>>>>>>>>>>>>>>>>>>>> want
> > > >>>>>>>>>>>>>>>>>>>>>>>>> this
> > > >>>>>>>>>>>>>>>>>>>>>>>>> functionality:
> > > >>>>>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>>>> - Tasks can be added to a TaskGroup
> > > >>>>>>>>>>>>>>>>>>>>>>>>> - You *can* have dependencies between Tasks
> > > >> in
> > > >>>>>>>>>>> the
> > > >>>>>>>>>>>>>> same
> > > >>>>>>>>>>>>>>>>>>> TaskGroup,
> > > >>>>>>>>>>>>>>>>>>>>> but
> > > >>>>>>>>>>>>>>>>>>>>>>>>> *cannot* have dependencies between a Task in
> > > >> a
> > > >>>>>>>>>>>>>> TaskGroup
> > > >>>>>>>>>>>>>>>> and
> > > >>>>>>>>>>>>>>>>>>>> either
> > > >>>>>>>>>>>>>>>>>>>>> a
> > > >>>>>>>>>>>>>>>>>>>>>>>>> Task in a different TaskGroup or a Task not
> > > >> in
> > > >>>>>>>>>>> any
> > > >>>>>>>>>>>>>> group
> > > >>>>>>>>>>>>>>>>>>>>>>>>> - You *can* have dependencies between a
> > > >>>>>>>>>>> TaskGroup
> > > >>>>>>>>>>>>> and
> > > >>>>>>>>>>>>>>>> either
> > > >>>>>>>>>>>>>>>>>>> other
> > > >>>>>>>>>>>>>>>>>>>>>>>>> TaskGroups or Tasks not in any group
> > > >>>>>>>>>>>>>>>>>>>>>>>>> - The UI will by default render a TaskGroup
> > > >>>>>>>>>> as a
> > > >>>>>>>>>>>>>> single
> > > >>>>>>>>>>>>>>>>>>> "object",
> > > >>>>>>>>>>>>>>>>>>>>> but
> > > >>>>>>>>>>>>>>>>>>>>>>>>> which you expand or zoom into in some way
> > > >>>>>>>>>>>>>>>>>>>>>>>>> - You'd need some way to determine what the
> > > >>>>>>>>>>>> "status"
> > > >>>>>>>>>>>>>> of
> > > >>>>>>>>>>>>>>> a
> > > >>>>>>>>>>>>>>>>>>>> TaskGroup
> > > >>>>>>>>>>>>>>>>>>>>>> was
> > > >>>>>>>>>>>>>>>>>>>>>>>>> at least for UI display purposes
> > > >>>>>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>>>> Not sure if it would need to be a top level
> > > >>>>>>>>>>> object
> > > >>>>>>>>>>>>>> with
> > > >>>>>>>>>>>>>>>> its
> > > >>>>>>>>>>>>>>>>>> own
> > > >>>>>>>>>>>>>>>>>>>>>> database
> > > >>>>>>>>>>>>>>>>>>>>>>>>> table and model or just another attribute on
> > > >>>>>>>>>>>> tasks.
> > > >>>>>>>>>>>>> I
> > > >>>>>>>>>>>>>>>> think
> > > >>>>>>>>>>>>>>>>>> you
> > > >>>>>>>>>>>>>>>>>>>>> could
> > > >>>>>>>>>>>>>>>>>>>>>>>>> build
> > > >>>>>>>>>>>>>>>>>>>>>>>>> it in a way such that from the schedulers
> > > >>>>>>>>>> point
> > > >>>>>>>>>>> of
> > > >>>>>>>>>>>>>> view
> > > >>>>>>>>>>>>>>> a
> > > >>>>>>>>>>>>>>>>> DAG
> > > >>>>>>>>>>>>>>>>>>> with
> > > >>>>>>>>>>>>>>>>>>>>>>>>> TaskGroups doesn't get treated any
> > > >>>>>>>>>> differently.
> > > >>>>>>>>>>> So
> > > >>>>>>>>>>>>> it
> > > >>>>>>>>>>>>>>>> really
> > > >>>>>>>>>>>>>>>>>>> just
> > > >>>>>>>>>>>>>>>>>>>>>>> becomes
> > > >>>>>>>>>>>>>>>>>>>>>>>>> a
> > > >>>>>>>>>>>>>>>>>>>>>>>>> shortcut for setting dependencies between
> > > >> sets
> > > >>>>>>>>>>> of
> > > >>>>>>>>>>>>>> Tasks,
> > > >>>>>>>>>>>>>>>> and
> > > >>>>>>>>>>>>>>>>>>>> allows
> > > >>>>>>>>>>>>>>>>>>>>>> the
> > > >>>>>>>>>>>>>>>>>>>>>>> UI
> > > >>>>>>>>>>>>>>>>>>>>>>>>> to simplify the render of the DAG structure.
> > > >>>>>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>>>> Chris
> > > >>>>>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>>>> On Fri, Jun 12, 2020 at 12:12 PM Dan Davydov
> > > >>>>>>>>>>>>>>>>>>>>>>> <ddavydov@twitter.com.invalid
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>>>> wrote:
> > > >>>>>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>>>>> Agree with James (and think it's actually
> > > >>>>>>>>>> the
> > > >>>>>>>>>>>> more
> > > >>>>>>>>>>>>>>>>> important
> > > >>>>>>>>>>>>>>>>>>>> issue
> > > >>>>>>>>>>>>>>>>>>>>>> to
> > > >>>>>>>>>>>>>>>>>>>>>>>>> fix),
> > > >>>>>>>>>>>>>>>>>>>>>>>>>> but I am still convinced Ash' idea is the
> > > >>>>>>>>>>> right
> > > >>>>>>>>>>>>> way
> > > >>>>>>>>>>>>>>>>> forward
> > > >>>>>>>>>>>>>>>>>>>> (just
> > > >>>>>>>>>>>>>>>>>>>>> it
> > > >>>>>>>>>>>>>>>>>>>>>>>>> might
> > > >>>>>>>>>>>>>>>>>>>>>>>>>> require a bit more work to deprecate than
> > > >>>>>>>>>>> adding
> > > >>>>>>>>>>>>>>> visual
> > > >>>>>>>>>>>>>>>>>>> grouping
> > > >>>>>>>>>>>>>>>>>>>>> in
> > > >>>>>>>>>>>>>>>>>>>>>>> the
> > > >>>>>>>>>>>>>>>>>>>>>>>>>> UI).
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>>>>> There was a previous thread about this FYI
> > > >>>>>>>>>>> with
> > > >>>>>>>>>>>>> more
> > > >>>>>>>>>>>>>>>>> context
> > > >>>>>>>>>>>>>>>>>>> on
> > > >>>>>>>>>>>>>>>>>>>>> why
> > > >>>>>>>>>>>>>>>>>>>>>>>>> subdags
> > > >>>>>>>>>>>>>>>>>>>>>>>>>> are bad and potential solutions:
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>
> > > >>>>>>>>>>
> > > >>>> https://www.mail-archive.com/dev@airflow.apache.org/msg01202.html
> > > >>>>>>>>>>>>>>>>>>>>>> . A
> > > >>>>>>>>>>>>>>>>>>>>>>>>>> solution I outline there to Jame's problem
> > > >>>>>>>>>> is
> > > >>>>>>>>>>>> e.g.
> > > >>>>>>>>>>>>>>>>> enabling
> > > >>>>>>>>>>>>>>>>>>> the
> > > >>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>>>> operator
> > > >>>>>>>>>>>>>>>>>>>>>>>>>> for Airflow operators to work with DAGs as
> > > >>>>>>>>>>>> well. I
> > > >>>>>>>>>>>>>> see
> > > >>>>>>>>>>>>>>>>> this
> > > >>>>>>>>>>>>>>>>>>>> being
> > > >>>>>>>>>>>>>>>>>>>>>>>>> separate
> > > >>>>>>>>>>>>>>>>>>>>>>>>>> from Ash' solution for DAG grouping in the
> > > >>>>>>>>>> UI
> > > >>>>>>>>>>>> but
> > > >>>>>>>>>>>>>> one
> > > >>>>>>>>>>>>>>> of
> > > >>>>>>>>>>>>>>>>> the
> > > >>>>>>>>>>>>>>>>>>> two
> > > >>>>>>>>>>>>>>>>>>>>>> items
> > > >>>>>>>>>>>>>>>>>>>>>>>>>> required to replace all existing subdag
> > > >>>>>>>>>>>>>> functionality.
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>>>>> I've been working with subdags for 3 years
> > > >>>>>>>>>> and
> > > >>>>>>>>>>>>> they
> > > >>>>>>>>>>>>>>> are
> > > >>>>>>>>>>>>>>>>>>> always a
> > > >>>>>>>>>>>>>>>>>>>>>> giant
> > > >>>>>>>>>>>>>>>>>>>>>>>>> pain
> > > >>>>>>>>>>>>>>>>>>>>>>>>>> to use. They are a constant source of user
> > > >>>>>>>>>>>>> confusion
> > > >>>>>>>>>>>>>>> and
> > > >>>>>>>>>>>>>>>>>>>> breakages
> > > >>>>>>>>>>>>>>>>>>>>>>>>> during
> > > >>>>>>>>>>>>>>>>>>>>>>>>>> upgrades. Would love to see them gone :).
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>>>>> On Fri, Jun 12, 2020 at 11:11 AM James
> > > >>>>>>>>>> Coder <
> > > >>>>>>>>>>>>>>>>>>>> jcoder01@gmail.com>
> > > >>>>>>>>>>>>>>>>>>>>>>>>> wrote:
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>> I'm not sure I totally agree it's just a
> > > >>>>>>>>>> UI
> > > >>>>>>>>>>>>>>> concept. I
> > > >>>>>>>>>>>>>>>>> use
> > > >>>>>>>>>>>>>>>>>>> the
> > > >>>>>>>>>>>>>>>>>>>>>>> subdag
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>> operator to simplify dependencies too. If
> > > >>>>>>>>>>> you
> > > >>>>>>>>>>>>>> have a
> > > >>>>>>>>>>>>>>>>> group
> > > >>>>>>>>>>>>>>>>>>> of
> > > >>>>>>>>>>>>>>>>>>>>>> tasks
> > > >>>>>>>>>>>>>>>>>>>>>>>>> that
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>> need to finish before another group of
> > > >>>>>>>>>> tasks
> > > >>>>>>>>>>>>>> start,
> > > >>>>>>>>>>>>>>>>> using
> > > >>>>>>>>>>>>>>>>>> a
> > > >>>>>>>>>>>>>>>>>>>>> subdag
> > > >>>>>>>>>>>>>>>>>>>>>>> is
> > > >>>>>>>>>>>>>>>>>>>>>>>>> a
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>> pretty quick way to set those dependencies
> > > >>>>>>>>>>>> and I
> > > >>>>>>>>>>>>>>> think
> > > >>>>>>>>>>>>>>>>>> also
> > > >>>>>>>>>>>>>>>>>>>> make
> > > >>>>>>>>>>>>>>>>>>>>>> it
> > > >>>>>>>>>>>>>>>>>>>>>>>>>> easier
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>> to follow the dag code.
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>> On Fri, Jun 12, 2020 at 9:53 AM Kyle
> > > >>>>>>>>>> Hamlin
> > > >>>>>>>>>>> <
> > > >>>>>>>>>>>>>>>>>>>>> hamlin.kn@gmail.com>
> > > >>>>>>>>>>>>>>>>>>>>>>>>> wrote:
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>> I second Ash’s grouping concept.
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>> On Fri, Jun 12, 2020 at 5:10 AM Ash
> > > >>>>>>>>>>>>>> Berlin-Taylor
> > > >>>>>>>>>>>>>>> <
> > > >>>>>>>>>>>>>>>>>>>>>> ash@apache.org
> > > >>>>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Question:
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Do we even need the SubDagOperator
> > > >>>>>>>>>>>> anymore?
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Would removing it entirely and just
> > > >>>>>>>>>>>>> replacing
> > > >>>>>>>>>>>>>> it
> > > >>>>>>>>>>>>>>>>> with
> > > >>>>>>>>>>>>>>>>>> a
> > > >>>>>>>>>>>>>>>>>>> UI
> > > >>>>>>>>>>>>>>>>>>>>>>>>> grouping
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> concept be conceptually simpler, less
> > > >>>>>>>>>> to
> > > >>>>>>>>>>>> get
> > > >>>>>>>>>>>>>>>> wrong,
> > > >>>>>>>>>>>>>>>>>> and
> > > >>>>>>>>>>>>>>>>>>>>> closer
> > > >>>>>>>>>>>>>>>>>>>>>>> to
> > > >>>>>>>>>>>>>>>>>>>>>>>>>> what
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> users actually want to achieve with
> > > >>>>>>>>>>>> subdags?
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> With your proposed change, tasks in
> > > >>>>>>>>>>>> subdags
> > > >>>>>>>>>>>>>>> could
> > > >>>>>>>>>>>>>>>>>> start
> > > >>>>>>>>>>>>>>>>>>>>>> running
> > > >>>>>>>>>>>>>>>>>>>>>>> in
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> parallel (a good change) -- so should
> > > >>>>>>>>>> we
> > > >>>>>>>>>>>> not
> > > >>>>>>>>>>>>>>> also
> > > >>>>>>>>>>>>>>>>> just
> > > >>>>>>>>>>>>>>>>>>>>>>> _enitrely_
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>> remove
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> the concept of a sub dag and replace
> > > >>>>>>>>>> it
> > > >>>>>>>>>>>> with
> > > >>>>>>>>>>>>>>>>> something
> > > >>>>>>>>>>>>>>>>>>>>>> simpler.
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Problems with subdags (I think. I
> > > >>>>>>>>>>> haven't
> > > >>>>>>>>>>>>> used
> > > >>>>>>>>>>>>>>>> them
> > > >>>>>>>>>>>>>>>>>>>>>> extensively
> > > >>>>>>>>>>>>>>>>>>>>>>> so
> > > >>>>>>>>>>>>>>>>>>>>>>>>>> may
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> be wrong on some of these):
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> - They need their own dag_id, but it
> > > >>>>>>>>>>>> has(?)
> > > >>>>>>>>>>>>> to
> > > >>>>>>>>>>>>>>> be
> > > >>>>>>>>>>>>>>>> of
> > > >>>>>>>>>>>>>>>>>> the
> > > >>>>>>>>>>>>>>>>>>>>> form
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> `parent_dag_id.subdag_id`.
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> - They need their own
> > > >>>>>>>>>> schedule_interval,
> > > >>>>>>>>>>>> but
> > > >>>>>>>>>>>>>> it
> > > >>>>>>>>>>>>>>>> has
> > > >>>>>>>>>>>>>>>>> to
> > > >>>>>>>>>>>>>>>>>>>> match
> > > >>>>>>>>>>>>>>>>>>>>>> the
> > > >>>>>>>>>>>>>>>>>>>>>>>>>> parent
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>> dag
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> - Sub dags can be paused on their own.
> > > >>>>>>>>>>>> (Does
> > > >>>>>>>>>>>>>> it
> > > >>>>>>>>>>>>>>>> make
> > > >>>>>>>>>>>>>>>>>>> sense
> > > >>>>>>>>>>>>>>>>>>>>> to
> > > >>>>>>>>>>>>>>>>>>>>>> do
> > > >>>>>>>>>>>>>>>>>>>>>>>>>> this?
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Pausing just a sub dag would mean the
> > > >>>>>>>>>>> sub
> > > >>>>>>>>>>>>> dag
> > > >>>>>>>>>>>>>>>> would
> > > >>>>>>>>>>>>>>>>>>> never
> > > >>>>>>>>>>>>>>>>>>>>>>>>> execute, so
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> the SubDagOperator would fail too.
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> - You had to choose the executor to
> > > >>>>>>>>>>>>> operator a
> > > >>>>>>>>>>>>>>>>> subdag
> > > >>>>>>>>>>>>>>>>>>> with
> > > >>>>>>>>>>>>>>>>>>>>> --
> > > >>>>>>>>>>>>>>>>>>>>>>>>> always
> > > >>>>>>>>>>>>>>>>>>>>>>>>>> a
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> bit of a kludge.
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thoughts?
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> -ash
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Jun 12 2020, at 12:01 pm, Ash
> > > >>>>>>>>>>>>>> Berlin-Taylor <
> > > >>>>>>>>>>>>>>>>>>>>>> ash@apache.org>
> > > >>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Workon sub-dags is much needed, I'm
> > > >>>>>>>>>>>>> excited
> > > >>>>>>>>>>>>>> to
> > > >>>>>>>>>>>>>>>> see
> > > >>>>>>>>>>>>>>>>>> how
> > > >>>>>>>>>>>>>>>>>>>>> this
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>> progresses.
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> - *Unpack SubDags during dag
> > > >>>>>>>>>>> parsing*:
> > > >>>>>>>>>>>>> This
> > > >>>>>>>>>>>>>>>>>> rewrites
> > > >>>>>>>>>>>>>>>>>>>> the
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> *DagBag.bag_dag*
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> method to unpack subdag while
> > > >>>>>>>>>>> parsing,
> > > >>>>>>>>>>>>> and
> > > >>>>>>>>>>>>>> it
> > > >>>>>>>>>>>>>>>>> will
> > > >>>>>>>>>>>>>>>>>>>> give a
> > > >>>>>>>>>>>>>>>>>>>>>>> flat
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> structure at
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the task level
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> The serialized_dag representation
> > > >>>>>>>>>>>> already
> > > >>>>>>>>>>>>>> does
> > > >>>>>>>>>>>>>>>>> this
> > > >>>>>>>>>>>>>>>>>> I
> > > >>>>>>>>>>>>>>>>>>>>> think.
> > > >>>>>>>>>>>>>>>>>>>>>>> At
> > > >>>>>>>>>>>>>>>>>>>>>>>>>> least
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>> if
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I've understood your idea here
> > > >>>>>>>>>>>> correctly.
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> -ash
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Jun 12 2020, at 9:51 am, Xinbin
> > > >>>>>>>>>>>> Huang <
> > > >>>>>>>>>>>>>>>>>>>>>>> bin.huangxb@gmail.com
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi everyone,
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Sending a message to everyone and
> > > >>>>>>>>>>>> collect
> > > >>>>>>>>>>>>>>>>> feedback
> > > >>>>>>>>>>>>>>>>>> on
> > > >>>>>>>>>>>>>>>>>>>> the
> > > >>>>>>>>>>>>>>>>>>>>>>>>> AIP-34
> > > >>>>>>>>>>>>>>>>>>>>>>>>>> on
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> rewriting SubDagOperator. This was
> > > >>>>>>>>>>>>>> previously
> > > >>>>>>>>>>>>>>>>>> briefly
> > > >>>>>>>>>>>>>>>>>>>>>>>>> mentioned in
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>> the
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> discussion about what needs to be
> > > >>>>>>>>>>> done
> > > >>>>>>>>>>>>> for
> > > >>>>>>>>>>>>>>>>> Airflow
> > > >>>>>>>>>>>>>>>>>>> 2.0,
> > > >>>>>>>>>>>>>>>>>>>>> and
> > > >>>>>>>>>>>>>>>>>>>>>>>>> one of
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>> the
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ideas is to make SubDagOperator
> > > >>>>>>>>>>> attach
> > > >>>>>>>>>>>>>> tasks
> > > >>>>>>>>>>>>>>>> back
> > > >>>>>>>>>>>>>>>>>> to
> > > >>>>>>>>>>>>>>>>>>>> the
> > > >>>>>>>>>>>>>>>>>>>>>> root
> > > >>>>>>>>>>>>>>>>>>>>>>>>> DAG.
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> This AIP-34 focuses on solving
> > > >>>>>>>>>>>>>> SubDagOperator
> > > >>>>>>>>>>>>>>>>>> related
> > > >>>>>>>>>>>>>>>>>>>>>> issues
> > > >>>>>>>>>>>>>>>>>>>>>>> by
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> reattaching
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> all tasks back to the root dag
> > > >>>>>>>>>> while
> > > >>>>>>>>>>>>>>> respecting
> > > >>>>>>>>>>>>>>>>>>>>>> dependencies
> > > >>>>>>>>>>>>>>>>>>>>>>>>>> during
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> parsing. The original grouping
> > > >>>>>>>>>> effect
> > > >>>>>>>>>>>> on
> > > >>>>>>>>>>>>>> the
> > > >>>>>>>>>>>>>>> UI
> > > >>>>>>>>>>>>>>>>>> will
> > > >>>>>>>>>>>>>>>>>>> be
> > > >>>>>>>>>>>>>>>>>>>>>>>>> achieved
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>> through
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> grouping related tasks by metadata.
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> This also makes the dag_factory
> > > >>>>>>>>>>>> function
> > > >>>>>>>>>>>>>> more
> > > >>>>>>>>>>>>>>>>>>> reusable
> > > >>>>>>>>>>>>>>>>>>>>>>> because
> > > >>>>>>>>>>>>>>>>>>>>>>>>> you
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>> don't
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> need to have parent_dag_name and
> > > >>>>>>>>>>>>>>> child_dag_name
> > > >>>>>>>>>>>>>>>>> in
> > > >>>>>>>>>>>>>>>>>>> the
> > > >>>>>>>>>>>>>>>>>>>>>>> function
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> signature
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> anymore.
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Changes proposed:
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> - *Unpack SubDags during dag
> > > >>>>>>>>>>> parsing*:
> > > >>>>>>>>>>>>> This
> > > >>>>>>>>>>>>>>>>>> rewrites
> > > >>>>>>>>>>>>>>>>>>>> the
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> *DagBag.bag_dag*
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> method to unpack subdag while
> > > >>>>>>>>>>> parsing,
> > > >>>>>>>>>>>>> and
> > > >>>>>>>>>>>>>> it
> > > >>>>>>>>>>>>>>>>> will
> > > >>>>>>>>>>>>>>>>>>>> give a
> > > >>>>>>>>>>>>>>>>>>>>>>> flat
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> structure at
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the task level
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> - *Simplify SubDagOperator*: The
> > > >>>>>>>>>> new
> > > >>>>>>>>>>>>>>>>> SubDagOperator
> > > >>>>>>>>>>>>>>>>>>>> acts
> > > >>>>>>>>>>>>>>>>>>>>>>> like a
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> container and most of the original
> > > >>>>>>>>>>>>> methods
> > > >>>>>>>>>>>>>>> are
> > > >>>>>>>>>>>>>>>>>>> removed.
> > > >>>>>>>>>>>>>>>>>>>>> The
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> signature is
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> also changed to *subdag_factory
> > > >>>>>>>>>> *with
> > > >>>>>>>>>>>>>>>>> *subdag_args
> > > >>>>>>>>>>>>>>>>>>> *and
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> *subdag_kwargs*.
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> This is similar to the
> > > >>>>>>>>>> PythonOperator
> > > >>>>>>>>>>>>>>>> signature.
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> - *Add a TaskGroup model and add
> > > >>>>>>>>>>>>>>> current_group
> > > >>>>>>>>>>>>>>>> &
> > > >>>>>>>>>>>>>>>>>>>>>> parent_group
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> attributes
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> to BaseOperator*: This metadata is
> > > >>>>>>>>>>> used
> > > >>>>>>>>>>>>> to
> > > >>>>>>>>>>>>>>>> group
> > > >>>>>>>>>>>>>>>>>>> tasks
> > > >>>>>>>>>>>>>>>>>>>>> for
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> rendering at
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> UI level. It may potentially extend
> > > >>>>>>>>>>>>> further
> > > >>>>>>>>>>>>>>> to
> > > >>>>>>>>>>>>>>>>>> group
> > > >>>>>>>>>>>>>>>>>>>>>>> arbitrary
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>> tasks
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> outside the context of subdag to
> > > >>>>>>>>>>> allow
> > > >>>>>>>>>>>>>>>>> group-level
> > > >>>>>>>>>>>>>>>>>>>>>> operations
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>> (i.e.
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> stop/trigger a group of task within
> > > >>>>>>>>>>> the
> > > >>>>>>>>>>>>>> dag)
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> - *Webserver UI for SubDag*:
> > > >>>>>>>>>> Proposed
> > > >>>>>>>>>>>> UI
> > > >>>>>>>>>>>>>>>>>> modification
> > > >>>>>>>>>>>>>>>>>>>> to
> > > >>>>>>>>>>>>>>>>>>>>>>> allow
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> (un)collapse a group of tasks for a
> > > >>>>>>>>>>>> flat
> > > >>>>>>>>>>>>>>>>> structure
> > > >>>>>>>>>>>>>>>>>> to
> > > >>>>>>>>>>>>>>>>>>>>> pair
> > > >>>>>>>>>>>>>>>>>>>>>>> with
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>> the
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> first
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> change instead of the original
> > > >>>>>>>>>>>>> hierarchical
> > > >>>>>>>>>>>>>>>>>>> structure.
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Please see related documents and
> > > >>>>>>>>>> PRs
> > > >>>>>>>>>>>> for
> > > >>>>>>>>>>>>>>>> details:
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> AIP:
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>
> > > >>>>>>>>>>
> > > >>>>>>>>
> > > >>>>>>>
> > > >>>>>>
> > > >>>>>
> > > >>>>
> > > >>>
> > > >>
> > >
> >
> https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-34+Rewrite+SubDagOperator
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Original Issue:
> > > >>>>>>>>>>>>>>>>>>>>>>> https://github.com/apache/airflow/issues/8078
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Draft PR:
> > > >>>>>>>>>>>>>>>>>>> https://github.com/apache/airflow/pull/9243
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Please let me know if there are any
> > > >>>>>>>>>>>>> aspects
> > > >>>>>>>>>>>>>>>> that
> > > >>>>>>>>>>>>>>>>>> you
> > > >>>>>>>>>>>>>>>>>>>>>>>>>> agree/disagree
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> with or
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> need more clarification (especially
> > > >>>>>>>>>>> the
> > > >>>>>>>>>>>>>> third
> > > >>>>>>>>>>>>>>>>>> change
> > > >>>>>>>>>>>>>>>>>>>>>>> regarding
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> TaskGroup).
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Any comments are welcome and I am
> > > >>>>>>>>>>>> looking
> > > >>>>>>>>>>>>>>>> forward
> > > >>>>>>>>>>>>>>>>>> to
> > > >>>>>>>>>>>>>>>>>>>> it!
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Cheers
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Bin
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>> --
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>> Kyle Hamlin
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>> --
> > > >>>>>>>>>>>>>>>>>>>>>>> Thanks & Regards
> > > >>>>>>>>>>>>>>>>>>>>>>> Poornima
> > > >>>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>> --
> > > >>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>> Jarek Potiuk
> > > >>>>>>>>>>>>>>> Polidea <https://www.polidea.com/> | Principal
> > > >> Software
> > > >>>>>> Engineer
> > > >>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>> M: +48 660 796 129 <+48%20660%20796%20129>
> > > >> <+48660796129
> > > >>>>>>>>>>>>> <+48%20660%20796%20129>>
> > > >>>>>>>>>>>>>>> [image: Polidea] <https://www.polidea.com/>
> > > >>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>
> > > >>>>>>>>>>>>>
> > > >>>>>>>>>>>>> --
> > > >>>>>>>>>>>>>
> > > >>>>>>>>>>>>> Jarek Potiuk
> > > >>>>>>>>>>>>> Polidea <https://www.polidea.com/> | Principal Software
> > > >>>>> Engineer
> > > >>>>>>>>>>>>>
> > > >>>>>>>>>>>>> M: +48 660 796 129 <+48%20660%20796%20129> <+48660796129
> > > >>>>>>>>>>>>> <+48%20660%20796%20129>>
> > > >>>>>>>>>>>>> [image: Polidea] <https://www.polidea.com/>
> > > >>>>>>>>>>>>>
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>> --
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>> *Jacob Ferriero*
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>> Strategic Cloud Engineer: Data Engineering
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>> jferriero@google.com
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>> 617-714-2509
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>
> > > >>>>>>>>>>
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>>>>
> > > >>>>>>
> > > >>>>>
> > > >>>>
> > > >>>
> > > >>
> > >
> > >
> >
>

Re: [AIP-34] Rewrite SubDagOperator

Posted by Yu Qian <yu...@gmail.com>.
Okay. On one hand, we want to automatically prefix task_id so that users
don't have to parametrize task_id themselves inside TaskGroup to maintain
task_id uniqueness. On the other hand, we don't want people to be surprised
when they introduce TaskGroup to an existing DAG and all of a sudden
task_id of existing tasks become prefixed with group_id.

It's actually not difficult to have the best of both worlds. Like Gerard
suggested, we can add an option prefix_group_id=True/False to TaskGroup to
control whether children tasks should have their task_id prefixed with
group_id automatically. That way, if users want the Plan A behaviour, they
can set prefix_group_id=True. They can set it to False to achieve Plan B
behaviour. I think it makes sense to make prefix_group_id=True by default
since that's the behaviour AIP-34 already described.
Setting prefix_group_id to False is mostly for keeping task_id of existing
DAGs unchanged when adopting TaskGroup.

I'll update the PR <https://github.com/apache/airflow/pull/10153> to have
the option prefix_group_id=True/False unless I hear objections.


On Wed, Sep 2, 2020 at 12:08 AM Gerard Casas Saez
<gc...@twitter.com.invalid> wrote:

> As I mentioned in the issue, I believe prefixing group_id is a nice thing
> as it makes TaskGroup an equivalent for SubDagOperator. Internally we have
> a similar concept to TaskGroup called FlattenedSubDagOperator that
> append the group_id to the task_id.
>
> One of the main usages internally for this operator is hyperparameter
> tuning ML models. For that we provide an abstraction where users  provide a
> SubDag that takes in dictionary of hyperparameters (through XComArg) and
> push and xcom that is a dictionary of metrics.This task group is usually a
> combination of model training and model analysis, but it can be whatever
> you want. We create a hyperparameter tuning DAG for the user easily by
> instantiating this SubDag/TaskGroup many times for the number of
> experiments they need to perform.
>
> Now, if group_id is not appended to task_id, this type of reuse of task
> groups would not be possible. You would need to ask the user to parametrize
> task_id and that's a bit counter intuitive as Airflow task_id are not
> templatized. Another option is to make this behaviour customizable and have
> a flag that activates it on the TaskGroup.
>
>
> Gerard Casas Saez
> Twitter | Cortex | @casassaez <http://twitter.com/casassaez>
>
>
> On Tue, Sep 1, 2020 at 1:03 AM Yu Qian <yu...@gmail.com> wrote:
>
> > The vote for this AIP-34
> > <
> >
> https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-34+TaskGroup%3A+A+UI+task+grouping+concept+as+an+alternative+to+SubDagOperator
> > >
> > passed. However, there's an interesting discussion going on here
> > <https://github.com/apache/airflow/pull/10153#discussion_r480247681>
> > regarding whether task_id should be automatically prefixed with group_id
> of
> > TaskGroup. So I'm bringing it up in this email thread for discussion.
> >
> > Plan A: Prefix task_id with group_id of TaskGroup. This is the original
> > plan in AIP-34
> > <
> >
> https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-34+TaskGroup%3A+A+UI+task+grouping+concept+as+an+alternative+to+SubDagOperator
> > >.
> > The task_id argument passed to an operator just needs to be unique across
> > the TaskGroup. The actual task_id is prefixed with the group_id so
> task_id
> > is guaranteed to be unique across the DAG.
> >
> > Plan B: Do not prefix task_id with group_id of TaskGroup. The task_id
> > argument passed to the operator is the actual task_id. So the user is
> > forced to make sure task_id is unique across the whole DAG.
> >
> > Obviously the convenience of Plan A is not free of charge. I’m
> summarizing
> > some of the pros and cons in this table. There are two examples at the
> > bottom illustrating the different usage. I was convinced by houqp on the
> > github comments and some of my own experiments that Plan B has more
> > advantages and avoids surprises. I'm going to update AIP-34
> > <
> >
> https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-34+TaskGroup%3A+A+UI+task+grouping+concept+as+an+alternative+to+SubDagOperator
> > >
> > according to Plan B unless I hear strong objections before 20200903 7am
> > UTC.
> >
> >
> >
> >
> > Plan A
> >
> > Plan B
> >
> > Ease of Use
> >
> > Easier to use for new DAGs
> >
> > Slightly more work on the user to maintain task_id uniqueness
> >
> > Implementation
> >
> > A little more complicated. Each group needs to know its parent’s group_id
> > in order to prefix the group_id correctly.
> >
> > Implementation is simpler. No need to know the parent TaskGroup’s
> group_id.
> >
> > Ease of Migration
> >
> > task_id will change if TaskGroup is introduced into an existing DAG.
> > Existing tasks put into a TaskGroup will appear like new tasks if the DAG
> > already has some historical DagRun. This may pose a barrier to adoption
> of
> > TaskGroup.
> >
> > No change in task_id when an existing task is put into a TaskGroup.
> > Migrating existing DAGs to adopt TaskGroup will be easier.
> >
> > Actual task_id
> >
> > Actual task_id tend to be longer because it’s always prefixed with
> > group_id, especially if the task is in a nested TaskGroup.
> >
> > Actual task_id tend to be shorter because users control the actual
> task_id
> > themselves.
> >
> > Graph label
> >
> > Labels on Graph View tend to be shorter because task_id only needs to be
> > unique within the TaskGroup
> >
> > Labels on Graph View tend to be longer because it displays the actual
> > task_id, which is a unique str across the DAG.
> >
> >
> > Plan A Example:
> >
> > def create_section():
> >
> >     dummies = [DummyOperator(task_id=f'task-{i + 1}') for i in range(5)]
> >
> >     with TaskGroup("inside_section_1") as inside_section_1:
> >
> >         _ = [DummyOperator(task_id=f'task-{i + 1}',) for i in range(3)]
> >
> >     with TaskGroup("inside_section_2") as inside_section_2:
> >
> >         _ = [DummyOperator(task_id=f'task-{i + 1}',) for i in range(3)]
> >
> >     dummies[-1] >> inside_section_1
> >
> >     dummies[-2] >> inside_section_2
> >
> >     inside_section_1 >> inside_section_2
> >
> >
> > with DAG(dag_id="example_task_group", start_date=days_ago(2)) as dag:
> >
> >     start = DummyOperator(task_id="start")
> >
> >     with TaskGroup("section_1", tooltip="Tasks for Section 1") as
> > section_1:
> >
> >         create_section()
> >
> >     some_other_task = DummyOperator(task_id="some-other-task")
> >
> >     with TaskGroup("section_2", tooltip="Tasks for Section 2") as
> > section_2:
> >
> >         create_section()
> >
> >     end = DummyOperator(task_id='end')
> >
> >     start >> section_1 >> some_other_task >> section_2 >> end
> >
> >
> > Plan B Example:
> >
> > def create_section(section_num):
> >
> >     dummies = [DummyOperator(task_id=f'task-{section_num}.{i + 1}') for i
> > in range(5)]
> >
> >     with TaskGroup(f"section_{section_num}.1") as inside_section_1:
> >
> >         _ = [DummyOperator(task_id=f'task-{section_num}.1.{i + 1}',) for
> i
> > in range(3)]
> >
> >     with TaskGroup(f"section_{section_num}.2") as inside_section_2:
> >
> >         _ = [DummyOperator(task_id=f'task-{section_num}.2.{i + 1}',) for
> i
> > in range(3)]
> >
> >     dummies[-1] >> inside_section_1
> >
> >     dummies[-2] >> inside_section_2
> >
> >     inside_section_1 >> inside_section_2
> >
> >
> > with DAG(dag_id="example_task_group", start_date=days_ago(2)) as dag:
> >
> >     start = DummyOperator(task_id="start")
> >
> >     with TaskGroup("section_1", tooltip="Tasks for Section 1") as
> > section_1:
> >
> >         create_section(1)
> >
> >     some_other_task = DummyOperator(task_id="some-other-task")
> >
> >     with TaskGroup("section_2", tooltip="Tasks for Section 2") as
> > section_2:
> >
> >         create_section(2)
> >
> >     end = DummyOperator(task_id='end')
> >
> >     start >> section_1 >> some_other_task >> section_2 >> end
> >
> >
> > On Sat, Aug 22, 2020 at 1:02 AM Gerard Casas Saez
> > <gc...@twitter.com.invalid> wrote:
> >
> > > Agree on this being non-blocking.
> > >
> > > Regarding moving to vote, you can take care. Just open a new email
> thread
> > > on dev list and call for a vote. You can see this example from Tomek
> for
> > > AIP-31:
> > >
> > >
> >
> https://lists.apache.org/thread.html/r16ee2b6263f5849a2e2b1b6c3ba39fb3d643022f052216cb19a0a8da%40%3Cdev.airflow.apache.org%3E
> > >
> > > Best,
> > >
> > >
> > > Gerard Casas Saez
> > > Twitter | Cortex | @casassaez <http://twitter.com/casassaez>
> > >
> > >
> > > On Thu, Aug 20, 2020 at 7:10 PM Yu Qian <yu...@gmail.com> wrote:
> > >
> > > > Hi, Gerard, yes I agree it's possible to do this at UI level without
> > any
> > > > fundamental change to the implementation. If expand_group() sees that
> > two
> > > > groups are fully connected (i.e. every task in one parent group
> depends
> > > on
> > > > every task in another parent group), it can decide to collapse all
> > those
> > > > children edges into a single edge between the parent groups to reduce
> > the
> > > > burden of the layout() function. However, I did not find any existing
> > > > algorithm to do this within dagre so we'll likely need to implement
> > this
> > > > ourselves. Another hiccup is that at the moment it doesn't seem to be
> > > > possible to call setEdge() between two parent groups (aka clusters).
> If
> > > > someone has ideas how to do this please feel free to contribute.
> > > >
> > > > One other consideration is that this example is only an extreme case.
> > > There
> > > > are other in-between cases that still require user intervention.
> Let's
> > > say
> > > > if 90% of tasks in group1 depends on 90% of tasks in group2 and both
> > > groups
> > > > have more than 100 tasks. This will still cause a lot of edges on the
> > > graph
> > > > and it's even harder to reduce because the parent groups are not
> fully
> > > > connected so it's inaccurate to reduce them to a single edge between
> > the
> > > > parents. In those cases, the user may still need to do something
> > > > themselves. e.g. adding some DummyOperator to the DAG to cut down the
> > > > edges. There will be some tradeoff because DummyOperator takes a
> short
> > > > while to execute like you mentioned.
> > > >
> > > > There are lots of room for improvements, but I don't think that's a
> > > > blocking issue for this AIP? So if you can move it to the voting
> stage
> > > > that'll be fantastic.
> > > >
> > > >
> > > > On Thu, Aug 20, 2020 at 4:21 PM 耀 周 <zh...@icloud.com.invalid>
> > > > wrote:
> > > >
> > > > > +1
> > > > >
> > > > > > 2020年8月18日 23:55,Gerard Casas Saez <gcasassaez@twitter.com
> > .INVALID>
> > > > 写道:
> > > > > >
> > > > > > Is it not possible to solve this at the UI level? Aka tell dagre
> to
> > > > only
> > > > > > add 1 edge to the group instead of to all nodes in the group? No
> > need
> > > > to
> > > > > do
> > > > > > SubDag behaviour, but just reduce the edges on the graph. Should
> > > reduce
> > > > > > load time if I understand correctly.
> > > > > >
> > > > > > I would strongly avoid the Dummy operator since it will introduce
> > > > delays
> > > > > on
> > > > > > operator execution (as it will need to execute 1 dummy operator
> and
> > > > that
> > > > > > can be expensive imo).
> > > > > >
> > > > > > Overall though proposal looks good, unless anyone opposes it, I
> > would
> > > > > move
> > > > > > this to vote mode :D
> > > > > >
> > > > > > Gerard Casas Saez
> > > > > > Twitter | Cortex | @casassaez <http://twitter.com/casassaez>
> > > > > >
> > > > > >
> > > > > > On Mon, Aug 17, 2020 at 9:56 AM Yu Qian <yu...@gmail.com>
> > > wrote:
> > > > > >
> > > > > >> Hi, All,
> > > > > >> Here's the updated AIP-34
> > > > > >> <
> > > > > >>
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-34+TaskGroup%3A+A+UI+task+grouping+concept+as+an+alternative+to+SubDagOperator
> > > > > >>> .
> > > > > >> The PR has been fine-tuned with better UI interactions and added
> > > > > >> serialization of TaskGroup:
> > > > > https://github.com/apache/airflow/pull/10153
> > > > > >>
> > > > > >> Here's some experiment results:
> > > > > >> A made up dag containing 403 tasks, and 5696 edges. Grouped like
> > > this.
> > > > > Note
> > > > > >> there's a inside_section_2 is intentionally made to depend on
> all
> > > > tasks
> > > > > >> in inside_section_1 to generate a large number of edges. The
> > > > > observation is
> > > > > >> that opening the top level graph is very quick, around 270ms.
> > > > Expanding
> > > > > >> groups that don't have a lot of dense dependencies on other
> groups
> > > are
> > > > > also
> > > > > >> hardly noticeable. E.g expanding section_1 takes 330ms. The part
> > > that
> > > > > takes
> > > > > >> time is when expanding both groups inside_section_1 and
> > > > inside_section_2
> > > > > >> Because there are 2500 edges between these two inner groups, it
> > took
> > > > 63
> > > > > >> seconds to expand both of them. Majority of the time (more than
> > > > > 62seconds)
> > > > > >> is actually taken by the layout() function in dagre. In other
> > words,
> > > > > it's
> > > > > >> very fast to add nodes and edges, but laying them out on the
> graph
> > > > takes
> > > > > >> time. This issue is not actually a problem specific to
> TaskGroup.
> > > > > Without
> > > > > >> TaskGroup, if a DAG contains too many edges, it takes time to
> > layout
> > > > the
> > > > > >> graph too.
> > > > > >>
> > > > > >> On the other hand, a more realistic experiment with production
> DAG
> > > > > >> containing about 400 tasks and 700 edges showed that grouping
> > tasks
> > > > into
> > > > > >> three levels of nested TaskGroup cut the upfront page opening
> time
> > > > from
> > > > > >> around 6s to 500ms. (Obviously the time is paid back when user
> > > > gradually
> > > > > >> expands all the groups one by one, but normally people don't
> need
> > to
> > > > > expand
> > > > > >> every group every time so it's still a big saving). The
> > experiments
> > > > are
> > > > > >> done on OS X Mojave, 2.2 GHz, Intel Core i7, 16GB Memory,
> Chrome.
> > > > > >>
> > > > > >> I can see a few possible improvements to TaskGroup (or how it's
> > > used)
> > > > > that
> > > > > >> can be done as a next-step:
> > > > > >> 1). Like Gerard suggested, we can implement lazy-loading.
> Instead
> > of
> > > > > >> displaying the whole DAG, we can limit the Graph View to show
> > only a
> > > > > single
> > > > > >> TaskGroup, omitting its edges going out to other TaskGroups.
> This
> > > > > behaviour
> > > > > >> is more like SubDagOperator where users can zoom into/out of a
> > > > TaskGroup
> > > > > >> and look at only tasks within that TaskGroup as if those are the
> > > only
> > > > > tasks
> > > > > >> on the DAG. This can be done with either background javascript
> > calls
> > > > or
> > > > > by
> > > > > >> making a new get request with filtering parameters. Obviously
> the
> > > > > downside
> > > > > >> is that it's not as explicit as showing all the dependencies on
> > the
> > > > > graph.
> > > > > >> 2). Users can improve the organization of the DAG themselves to
> > > reduce
> > > > > the
> > > > > >> number of edges. E.g. if every task in group2 depends on every
> > tasks
> > > > in
> > > > > >> group1, instead of doing group1 >> group2, they can add a
> > > > DummyOperator
> > > > > in
> > > > > >> between and do this: group1 >> dummy >> group2. This cuts down
> the
> > > > > number
> > > > > >> of edges significantly and page load becomes much faster.
> > > > > >> 3). If we really want, we can improve the >> operator of
> TaskGroup
> > > to
> > > > > do 2)
> > > > > >> automatically. If it sees that both sides of >> are TaskGroup,
> it
> > > can
> > > > > >> create a DummyOperator on behalf of the user. The downside is
> that
> > > it
> > > > > may
> > > > > >> be too much magic.
> > > > > >>
> > > > > >> Thanks,
> > > > > >> Qian
> > > > > >>
> > > > > >> def create_section():
> > > > > >> """
> > > > > >> Create tasks in the outer section.
> > > > > >> """
> > > > > >> dummies = [DummyOperator(task_id=f'task-{i + 1}') for i in
> > > range(100)]
> > > > > >>
> > > > > >> with TaskGroup("inside_section_1") as inside_section_1:
> > > > > >> _ = [DummyOperator(task_id=f'task-{i + 1}',) for i in range(50)]
> > > > > >>
> > > > > >> with TaskGroup("inside_section_2") as inside_section_2:
> > > > > >> _ = [DummyOperator(task_id=f'task-{i + 1}',) for i in range(50)]
> > > > > >>
> > > > > >> dummies[-1] >> inside_section_1
> > > > > >> dummies[-2] >> inside_section_2
> > > > > >> inside_section_1 >> inside_section_2
> > > > > >>
> > > > > >>
> > > > > >> with DAG(dag_id="example_task_group", start_date=days_ago(2)) as
> > > dag:
> > > > > >> start = DummyOperator(task_id="start")
> > > > > >>
> > > > > >> with TaskGroup("section_1") as section_1:
> > > > > >> create_section()
> > > > > >>
> > > > > >> some_other_task = DummyOperator(task_id="some-other-task")
> > > > > >>
> > > > > >> with TaskGroup("section_2") as section_2:
> > > > > >> create_section()
> > > > > >>
> > > > > >> end = DummyOperator(task_id='end')
> > > > > >>
> > > > > >> start >> section_1 >> some_other_task >> section_2 >> end
> > > > > >>
> > > > > >>
> > > > > >> On Sat, Aug 15, 2020 at 6:56 AM Gerard Casas Saez
> > > > > >> <gc...@twitter.com.invalid> wrote:
> > > > > >>
> > > > > >>> Re graph times. That makes sense. Let me know what you find. We
> > may
> > > > be
> > > > > >> able
> > > > > >>> to contribute on the lazy loading part.
> > > > > >>>
> > > > > >>> Looking forward to see the updated AIP!
> > > > > >>>
> > > > > >>>
> > > > > >>> Gerard Casas Saez
> > > > > >>> Twitter | Cortex | @casassaez <http://twitter.com/casassaez>
> > > > > >>>
> > > > > >>>
> > > > > >>> On Fri, Aug 14, 2020 at 6:14 AM Kaxil Naik <
> kaxilnaik@gmail.com>
> > > > > wrote:
> > > > > >>>
> > > > > >>>> Permissions granted, let me know if you face any issues.
> > > > > >>>>
> > > > > >>>> On Fri, Aug 14, 2020 at 1:10 PM Yu Qian <yuqian1990@gmail.com
> >
> > > > wrote:
> > > > > >>>>
> > > > > >>>>> Hi, Kaxil, my ID for cwiki.apache.org is yuqian1990. Thank
> > you!
> > > > > >>>>>
> > > > > >>>>> On Fri, Aug 14, 2020 at 7:35 PM Kaxil Naik <
> > kaxilnaik@gmail.com>
> > > > > >>> wrote:
> > > > > >>>>>
> > > > > >>>>>> What's your ID i.e. if you haven't created an account yet,
> > > please
> > > > > >>>> create
> > > > > >>>>>> one at https://cwiki.apache.org/confluence/signup.action
> and
> > > send
> > > > > >> us
> > > > > >>>>> your
> > > > > >>>>>> ID and we will add permissions.
> > > > > >>>>>>
> > > > > >>>>>> Thanks. I'll edit the AIP. May I request permission to edit
> > it?
> > > > > >>>>>>> My wiki user email is yuqian1990@gmail.com.
> > > > > >>>>>>
> > > > > >>>>>>
> > > > > >>>>>> On Fri, Aug 14, 2020 at 9:45 AM Yu Qian <
> yuqian1990@gmail.com
> > >
> > > > > >>> wrote:
> > > > > >>>>>>
> > > > > >>>>>>> Re, Xinbin. Thanks. I'll edit the AIP. May I request
> > permission
> > > > > >> to
> > > > > >>>> edit
> > > > > >>>>>> it?
> > > > > >>>>>>> My wiki user email is yuqian1990@gmail.com.
> > > > > >>>>>>>
> > > > > >>>>>>> Re Gerard: yes the UI loads all the nodes as json from the
> > web
> > > > > >>> server
> > > > > >>>>> at
> > > > > >>>>>>> once. However, it only adds the top level nodes and edges
> to
> > > the
> > > > > >>>> graph
> > > > > >>>>>> when
> > > > > >>>>>>> the Graph View page is first opened. And then adds the
> > expanded
> > > > > >>> nodes
> > > > > >>>>> to
> > > > > >>>>>>> the graph as the user expands them. From what I've
> > experienced
> > > > > >> with
> > > > > >>>>> DAGs
> > > > > >>>>>>> containing around 400 tasks (not using TaskGroup or
> > > > > >>> SubDagOperator),
> > > > > >>>>>>> opening the whole dag in Graph View usually takes 5
> seconds.
> > > Less
> > > > > >>>> than
> > > > > >>>>>> 60ms
> > > > > >>>>>>> of that is taken by loading the data from webserver. The
> > > > > >> remaining
> > > > > >>>>> 4.9s+
> > > > > >>>>>> is
> > > > > >>>>>>> taken by javascript functions in dagre-d3.min.js such as
> > > > > >>> createNodes,
> > > > > >>>>>>> createEdgeLabels, etc and by rendering the graph. With
> > > TaskGroup
> > > > > >>>> being
> > > > > >>>>>> used
> > > > > >>>>>>> to group tasks into a smaller number of top-level nodes,
> the
> > > > > >> amount
> > > > > >>>> of
> > > > > >>>>>> data
> > > > > >>>>>>> loaded from webserver will remain about the same compared
> to
> > a
> > > > > >> flat
> > > > > >>>> dag
> > > > > >>>>>> of
> > > > > >>>>>>> the same size, but the number of nodes and edges needed to
> be
> > > > > >> plot
> > > > > >>> on
> > > > > >>>>> the
> > > > > >>>>>>> graph can be reduced significantly. So in theory this
> should
> > > > > >> speed
> > > > > >>> up
> > > > > >>>>> the
> > > > > >>>>>>> time it takes to open Graph View even without lazy-loading
> > the
> > > > > >> data
> > > > > >>>>> (I'll
> > > > > >>>>>>> experiment to find out). That said, if it comes to a point
> > > > > >>>> lazy-loading
> > > > > >>>>>>> helps, we can still implement it as an improvement.
> > > > > >>>>>>>
> > > > > >>>>>>> Re James: the Tree View looks as if all all the groups are
> > > fully
> > > > > >>>>>> expanded.
> > > > > >>>>>>> (because under the hood all the tasks are in a single DAG).
> > I'm
> > > > > >>> less
> > > > > >>>>>>> worried about Tree View at the moment because it already
> has
> > a
> > > > > >>>>> mechanism
> > > > > >>>>>>> for collapsing tasks by the dependency tree. That said, the
> > > Tree
> > > > > >>> View
> > > > > >>>>> can
> > > > > >>>>>>> definitely be improved too with TaskGroup. (e.g. collapse
> > tasks
> > > > > >> in
> > > > > >>>> the
> > > > > >>>>>> same
> > > > > >>>>>>> TaskGroup when Tree View is first opened).
> > > > > >>>>>>>
> > > > > >>>>>>> For both suggestions, implementing them don't require
> > > fundamental
> > > > > >>>>> changes
> > > > > >>>>>>> to the idea. I think we can have a basic working TaskGroup
> > > first,
> > > > > >>> and
> > > > > >>>>>> then
> > > > > >>>>>>> improve it incrementally in several PRs as we get more
> > feedback
> > > > > >>> from
> > > > > >>>>> the
> > > > > >>>>>>> community. What do you think?
> > > > > >>>>>>>
> > > > > >>>>>>> Qian
> > > > > >>>>>>>
> > > > > >>>>>>>
> > > > > >>>>>>> On Wed, Aug 12, 2020 at 9:15 AM James Coder <
> > > jcoder01@gmail.com>
> > > > > >>>>> wrote:
> > > > > >>>>>>>
> > > > > >>>>>>>> I agree this looks great, one question, how does the tree
> > view
> > > > > >>>> look?
> > > > > >>>>>>>>
> > > > > >>>>>>>> James Coder
> > > > > >>>>>>>>
> > > > > >>>>>>>>> On Aug 11, 2020, at 6:48 PM, Gerard Casas Saez <
> > > > > >>>>>> gcasassaez@twitter.com
> > > > > >>>>>>> .invalid>
> > > > > >>>>>>>> wrote:
> > > > > >>>>>>>>>
> > > > > >>>>>>>>> First of all, this is awesome!!
> > > > > >>>>>>>>>
> > > > > >>>>>>>>> Secondly, checking your UI code, seems you are loading
> all
> > > > > >>>>> operators
> > > > > >>>>>> at
> > > > > >>>>>>>>> once. Wondering if we can load them as needed (aka load
> > > > > >>> whenever
> > > > > >>>> we
> > > > > >>>>>>> click
> > > > > >>>>>>>>> the TaskGroup). Some of our DAGs are so large that take
> > > > > >> forever
> > > > > >>>> to
> > > > > >>>>>> load
> > > > > >>>>>>>> on
> > > > > >>>>>>>>> the Graph view, so worried about this still being an
> issue
> > > > > >>> here.
> > > > > >>>> It
> > > > > >>>>>> may
> > > > > >>>>>>>> be
> > > > > >>>>>>>>> easily solvable by implementing lazy loading of the
> graph.
> > > > > >> Not
> > > > > >>>> sure
> > > > > >>>>>> how
> > > > > >>>>>>>>> easy to implement/add to the UI extension (and dont want
> to
> > > > > >>> push
> > > > > >>>>> for
> > > > > >>>>>>>> early
> > > > > >>>>>>>>> optimization as its the root of all evil).
> > > > > >>>>>>>>> Gerard Casas Saez
> > > > > >>>>>>>>> Twitter | Cortex | @casassaez <
> > http://twitter.com/casassaez>
> > > > > >>>>>>>>>
> > > > > >>>>>>>>>
> > > > > >>>>>>>>>> On Tue, Aug 11, 2020 at 10:35 AM Xinbin Huang <
> > > > > >>>>>> bin.huangxb@gmail.com>
> > > > > >>>>>>>> wrote:
> > > > > >>>>>>>>>>
> > > > > >>>>>>>>>> Hi Yu,
> > > > > >>>>>>>>>>
> > > > > >>>>>>>>>> Thank you so much for taking on this. I was fairly
> > > > > >> distracted
> > > > > >>>>>>> previously
> > > > > >>>>>>>>>> and I didn't have the time to update the proposal. In
> > fact,
> > > > > >>>> after
> > > > > >>>>>>>>>> discussing with Ash, Kaxil and Daniel, the direction of
> > this
> > > > > >>> AIP
> > > > > >>>>> has
> > > > > >>>>>>>> been
> > > > > >>>>>>>>>> changed to favor the concept of TaskGroup instead of
> > > > > >> rewriting
> > > > > >>>>>>>>>> SubDagOperator (though it may may sense to deprecate
> > SubDag
> > > > > >>> in a
> > > > > >>>>>>> future
> > > > > >>>>>>>>>> date.).
> > > > > >>>>>>>>>>
> > > > > >>>>>>>>>> Your PR is amazing and it has implemented the desire
> > > > > >>> features. I
> > > > > >>>>>> think
> > > > > >>>>>>>> we
> > > > > >>>>>>>>>> can focus on your new PR instead. Do you mind updating
> the
> > > > > >> AIP
> > > > > >>>>> based
> > > > > >>>>>>> on
> > > > > >>>>>>>>>> what you have done in your PR?
> > > > > >>>>>>>>>>
> > > > > >>>>>>>>>> Best,
> > > > > >>>>>>>>>> Bin
> > > > > >>>>>>>>>>
> > > > > >>>>>>>>>>
> > > > > >>>>>>>>>>> On Tue, Aug 11, 2020 at 7:11 AM Yu Qian <
> > > > > >>> yuqian1990@gmail.com>
> > > > > >>>>>>> wrote:
> > > > > >>>>>>>>>>>
> > > > > >>>>>>>>>>> Hi, all, I've added the basic UI changes to my proposed
> > > > > >>>>>>> implementation
> > > > > >>>>>>>> of
> > > > > >>>>>>>>>>> TaskGroup as UI grouping concept:
> > > > > >>>>>>>>>>> https://github.com/apache/airflow/pull/10153
> > > > > >>>>>>>>>>>
> > > > > >>>>>>>>>>> I think Chris had a pretty good specification of
> > TaskGroup
> > > > > >> so
> > > > > >>>> i'm
> > > > > >>>>>>>> quoting
> > > > > >>>>>>>>>>> it here. The only thing I don't fully agree with is the
> > > > > >>>>> restriction
> > > > > >>>>>>>>>>> "... **cannot*
> > > > > >>>>>>>>>>> have dependencies between a Task in a TaskGroup and
> > either
> > > > > >> a*
> > > > > >>>>>>>>>>> *   Task in a different TaskGroup or a Task not in any
> > > > > >>>> group*". I
> > > > > >>>>>>> think
> > > > > >>>>>>>>>>> this is over restrictive. Since TaskGroup is a UI
> > concept,
> > > > > >>>> tasks
> > > > > >>>>>> can
> > > > > >>>>>>>> have
> > > > > >>>>>>>>>>> dependencies on tasks in other TaskGroup or not in any
> > > > > >>>> TaskGroup.
> > > > > >>>>>> In
> > > > > >>>>>>> my
> > > > > >>>>>>>>>> PR,
> > > > > >>>>>>>>>>> this is allowed. The graph edges will update
> accordingly
> > > > > >> when
> > > > > >>>>>>>> TaskGroups
> > > > > >>>>>>>>>>> are expanded/collapsed. TaskGroup is only helping to
> make
> > > > > >> the
> > > > > >>>> UI
> > > > > >>>>>> look
> > > > > >>>>>>>>>> less
> > > > > >>>>>>>>>>> crowded. Under the hood, everything is still a DAG of
> > tasks
> > > > > >>> and
> > > > > >>>>>> edges
> > > > > >>>>>>>> so
> > > > > >>>>>>>>>>> things work normally. Here's a screenshot
> > > > > >>>>>>>>>>> <
> > > > > >>>>>>>>>>>
> > > > > >>>>>>>>>>
> > > > > >>>>>>>>
> > > > > >>>>>>>
> > > > > >>>>>>
> > > > > >>>>>
> > > > > >>>>
> > > > > >>>
> > > > > >>
> > > > >
> > > >
> > >
> >
> https://raw.githubusercontent.com/yuqian90/airflow/gif_for_demo/airflow/www/static/screen-shot-short.gif
> > > > > >>>>>>>>>>>>
> > > > > >>>>>>>>>>> of the UI interaction.
> > > > > >>>>>>>>>>>
> > > > > >>>>>>>>>>>
> > > > > >>>>>>>>>>>
> > > > > >>>>>>>>>>>
> > > > > >>>>>>>>>>>
> > > > > >>>>>>>>>>>
> > > > > >>>>>>>>>>>
> > > > > >>>>>>>>>>>
> > > > > >>>>>>>>>>>
> > > > > >>>>>>>>>>>
> > > > > >>>>>>>>>>>
> > > > > >>>>>>>>>>> *   - Tasks can be added to a TaskGroup   - You *can*
> > have
> > > > > >>>>>>> dependencies
> > > > > >>>>>>>>>>> between Tasks in the same TaskGroup, but   *cannot*
> have
> > > > > >>>>>> dependencies
> > > > > >>>>>>>>>>> between a Task in a TaskGroup and either a   Task in a
> > > > > >>>> different
> > > > > >>>>>>>>>> TaskGroup
> > > > > >>>>>>>>>>> or a Task not in any group   - You *can* have
> > dependencies
> > > > > >>>>> between
> > > > > >>>>>> a
> > > > > >>>>>>>>>>> TaskGroup and either other   TaskGroups or Tasks not in
> > any
> > > > > >>>> group
> > > > > >>>>>> -
> > > > > >>>>>>>> The
> > > > > >>>>>>>>>>> UI will by default render a TaskGroup as a single
> > "object",
> > > > > >>> but
> > > > > >>>>>>> which
> > > > > >>>>>>>>>> you
> > > > > >>>>>>>>>>> expand or zoom into in some way   - You'd need some way
> > to
> > > > > >>>>>> determine
> > > > > >>>>>>>> what
> > > > > >>>>>>>>>>> the "status" of a TaskGroup was   at least for UI
> display
> > > > > >>>>> purposes*
> > > > > >>>>>>>>>>>
> > > > > >>>>>>>>>>>
> > > > > >>>>>>>>>>> Regarding Jake's comment, I agree it's possible to
> > > > > >> implement
> > > > > >>>> the
> > > > > >>>>>>>>>> "retrying
> > > > > >>>>>>>>>>> tasks in a group" pattern he mentioned as an optional
> > > > > >> feature
> > > > > >>>> of
> > > > > >>>>>>>>>> TaskGroup
> > > > > >>>>>>>>>>> although that may go against having TaskGroup as a pure
> > UI
> > > > > >>>>> concept.
> > > > > >>>>>>> For
> > > > > >>>>>>>>>> the
> > > > > >>>>>>>>>>> motivating example Jake provided, I suggest
> implementing
> > > > > >> both
> > > > > >>>>>>>>>>> SubmitLongRunningJobTask and PollJobStatusSensor in a
> > > > > >> single
> > > > > >>>>>>> operator.
> > > > > >>>>>>>> It
> > > > > >>>>>>>>>>> can do something like BaseSensorOperator.execute() does
> > in
> > > > > >>>>>>> "reschedule"
> > > > > >>>>>>>>>>> mode, i.e. it first executes some code to submit the
> long
> > > > > >>>> running
> > > > > >>>>>> job
> > > > > >>>>>>>> to
> > > > > >>>>>>>>>>> the external service, and store the state (e.g. in
> XCom).
> > > > > >>> Then
> > > > > >>>>>>>> reschedule
> > > > > >>>>>>>>>>> itself. Subsequent runs then pokes for the completion
> > > > > >> state.
> > > > > >>>>>>>>>>>
> > > > > >>>>>>>>>>>
> > > > > >>>>>>>>>>> On Thu, Aug 6, 2020 at 2:08 AM Jacob Ferriero
> > > > > >>>>>>>>>> <jferriero@google.com.invalid
> > > > > >>>>>>>>>>>>
> > > > > >>>>>>>>>>> wrote:
> > > > > >>>>>>>>>>>
> > > > > >>>>>>>>>>>> I really like this idea of a TaskGroup container as I
> > > > > >> think
> > > > > >>>> this
> > > > > >>>>>>> will
> > > > > >>>>>>>>>> be
> > > > > >>>>>>>>>>>> much easier to use than SubDag.
> > > > > >>>>>>>>>>>>
> > > > > >>>>>>>>>>>> I'd like to propose an optional behavior for special
> > retry
> > > > > >>>>>> mechanics
> > > > > >>>>>>>>>> via
> > > > > >>>>>>>>>>> a
> > > > > >>>>>>>>>>>> TaskGroup.retry_all property.
> > > > > >>>>>>>>>>>> This way I could use TaskGroup to replace my favorite
> > use
> > > > > >> of
> > > > > >>>>>> SubDag
> > > > > >>>>>>>> for
> > > > > >>>>>>>>>>>> atomically retrying tasks of the pattern "act on
> > external
> > > > > >>>> state
> > > > > >>>>>> then
> > > > > >>>>>>>>>>>> reschedule poll until desired state reached".
> > > > > >>>>>>>>>>>>
> > > > > >>>>>>>>>>>> Motivating use case I have for a SubDag is very simple
> > two
> > > > > >>>> task
> > > > > >>>>>>> group
> > > > > >>>>>>>>>>>> [SubmitLongRunningJobTask >> PollJobStatusSensor].
> > > > > >>>>>>>>>>>> I use SubDag is because it gives me an easy way to
> retry
> > > > > >> the
> > > > > >>>>>>>>>>> SubmitJobTask
> > > > > >>>>>>>>>>>> if something about the PollJobSensor fails.
> > > > > >>>>>>>>>>>> This pattern would be really nice for jobs that are
> > > > > >> expected
> > > > > >>>> to
> > > > > >>>>>> run
> > > > > >>>>>>> a
> > > > > >>>>>>>>>>> long
> > > > > >>>>>>>>>>>> time (because we can use sensor can use reschedule
> mode
> > > > > >>>> freeing
> > > > > >>>>> up
> > > > > >>>>>>>>>> slots)
> > > > > >>>>>>>>>>>> but might fail for a retryable reason.
> > > > > >>>>>>>>>>>> However, using SubDag to meet this use case defeats
> the
> > > > > >>>> purpose
> > > > > >>>>>>>> because
> > > > > >>>>>>>>>>>> SubDag infamously
> > > > > >>>>>>>>>>>> <
> > > > > >>>>>>>>>>>>
> > > > > >>>>>>>>>>>
> > > > > >>>>>>>>>>
> > > > > >>>>>>>>
> > > > > >>>>>>>
> > > > > >>>>>>
> > > > > >>>>>
> > > > > >>>>
> > > > > >>>
> > > > > >>
> > > > >
> > > >
> > >
> >
> https://medium.com/@team_24989/fixing-subdagoperator-deadlock-in-airflow-6c64312ebb10
> > > > > >>>>>>>>>>>>>
> > > > > >>>>>>>>>>>> blocks a "controller" slot for the entire duration.
> > > > > >>>>>>>>>>>> This may feel like a cyclic behavior but reality it is
> > > > > >> very
> > > > > >>>>> common
> > > > > >>>>>>> for
> > > > > >>>>>>>>>> a
> > > > > >>>>>>>>>>>> single operator to submit job / wait til done.
> > > > > >>>>>>>>>>>> We could use this case refactor many operators (e.g.
> BQ,
> > > > > >>>>> Dataproc,
> > > > > >>>>>>>>>>>> Dataflow) to be implemented as TaskGroup[SubmitTask >>
> > > > > >>>> PollTask]
> > > > > >>>>>>> with
> > > > > >>>>>>>>>> an
> > > > > >>>>>>>>>>>> optional reschedule mode if user knows that this job
> may
> > > > > >>> take
> > > > > >>>> a
> > > > > >>>>>> long
> > > > > >>>>>>>>>>> time.
> > > > > >>>>>>>>>>>>
> > > > > >>>>>>>>>>>> I'd be happy to the development work on adding this
> > > > > >> specific
> > > > > >>>>> retry
> > > > > >>>>>>>>>>> behavior
> > > > > >>>>>>>>>>>> to TaskGroup once the base concept is implemented if
> > > > > >> others
> > > > > >>> in
> > > > > >>>>> the
> > > > > >>>>>>>>>>>> community would find this a useful feature.
> > > > > >>>>>>>>>>>>
> > > > > >>>>>>>>>>>> Cheers,
> > > > > >>>>>>>>>>>> Jake
> > > > > >>>>>>>>>>>>
> > > > > >>>>>>>>>>>> On Tue, Aug 4, 2020 at 10:07 AM Jarek Potiuk <
> > > > > >>>>>>>> Jarek.Potiuk@polidea.com
> > > > > >>>>>>>>>>>
> > > > > >>>>>>>>>>>> wrote:
> > > > > >>>>>>>>>>>>
> > > > > >>>>>>>>>>>>> All for it :) . I think we are getting closer to have
> > > > > >>> regular
> > > > > >>>>>>>>>> planning
> > > > > >>>>>>>>>>>> and
> > > > > >>>>>>>>>>>>> making some structured approach to 2.0 and starting
> > task
> > > > > >>>> force
> > > > > >>>>>> for
> > > > > >>>>>>> it
> > > > > >>>>>>>>>>>> soon,
> > > > > >>>>>>>>>>>>> so I think this should be perfectly fine to discuss
> and
> > > > > >>> even
> > > > > >>>>>> start
> > > > > >>>>>>>>>>>>> implementing what's beyond as soon as we make sure
> that
> > > > > >> we
> > > > > >>>> are
> > > > > >>>>>>>>>>>> prioritizing
> > > > > >>>>>>>>>>>>> 2.0 work.
> > > > > >>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>> J,
> > > > > >>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>> On Tue, Aug 4, 2020 at 12:09 PM Yu Qian <
> > > > > >>>> yuqian1990@gmail.com>
> > > > > >>>>>>>>>> wrote:
> > > > > >>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>> Hi Jarek,
> > > > > >>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>> I agree we should not change the behaviour of the
> > > > > >> existing
> > > > > >>>>>>>>>>>> SubDagOperator
> > > > > >>>>>>>>>>>>>> till Airflow 2.1. Is it okay to continue the
> > discussion
> > > > > >>>> about
> > > > > >>>>>>>>>>> TaskGroup
> > > > > >>>>>>>>>>>>> as
> > > > > >>>>>>>>>>>>>> a brand new concept/feature independent from the
> > > > > >> existing
> > > > > >>>>>>>>>>>> SubDagOperator?
> > > > > >>>>>>>>>>>>>> In other words, shall we add TaskGroup as a UI
> > grouping
> > > > > >>>>> concept
> > > > > >>>>>>>>>> like
> > > > > >>>>>>>>>>>> Ash
> > > > > >>>>>>>>>>>>>> suggested, and not touch SubDagOperator atl all.
> > > > > >> Whenever
> > > > > >>> we
> > > > > >>>>> are
> > > > > >>>>>>>>>>> ready
> > > > > >>>>>>>>>>>>> with
> > > > > >>>>>>>>>>>>>> TaskGroup, we then deprecate SubDagOperator in
> Airflow
> > > > > >>> 2.1.
> > > > > >>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>> I really like Ash's idea of simplifying the
> > > > > >> SubDagOperator
> > > > > >>>>> idea
> > > > > >>>>>>>>>> into
> > > > > >>>>>>>>>>> a
> > > > > >>>>>>>>>>>>>> simple UI grouping concept. I think Xinbin's idea of
> > > > > >>>>>> "reattaching
> > > > > >>>>>>>>>> all
> > > > > >>>>>>>>>>>> the
> > > > > >>>>>>>>>>>>>> tasks to the root DAG" is the way to go. And I see
> > James
> > > > > >>>>> pointed
> > > > > >>>>>>>>>> out
> > > > > >>>>>>>>>>> we
> > > > > >>>>>>>>>>>>>> need some helper functions to simplify dependencies
> > > > > >>> setting
> > > > > >>>> of
> > > > > >>>>>>>>>>>> TaskGroup.
> > > > > >>>>>>>>>>>>>> Xinbin put up a pretty elegant example in his PR
> > > > > >>>>>>>>>>>>>> <https://github.com/apache/airflow/pull/9243>. I
> > think
> > > > > >>>> having
> > > > > >>>>>>>>>>>> TaskGroup
> > > > > >>>>>>>>>>>>> as
> > > > > >>>>>>>>>>>>>> a UI concept should be a relatively small change. We
> > can
> > > > > >>>>>> simplify
> > > > > >>>>>>>>>>>>> Xinbin's
> > > > > >>>>>>>>>>>>>> PR further. So I put up this alternative proposal
> > here:
> > > > > >>>>>>>>>>>>>> https://github.com/apache/airflow/pull/10153
> > > > > >>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>> I have not done any UI changes due to lack of
> > experience
> > > > > >>>> with
> > > > > >>>>>> web
> > > > > >>>>>>>>>> UI.
> > > > > >>>>>>>>>>>> If
> > > > > >>>>>>>>>>>>>> anyone's interested, please take a look at the PR.
> > > > > >>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>> Qian
> > > > > >>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>> On Mon, Jun 22, 2020 at 5:15 AM Jarek Potiuk <
> > > > > >>>>>>>>>>> Jarek.Potiuk@polidea.com
> > > > > >>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>> wrote:
> > > > > >>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>> Similar point here to the other ideas that are
> > popping
> > > > > >>> up.
> > > > > >>>>>> Maybe
> > > > > >>>>>>>>>> we
> > > > > >>>>>>>>>>>>>> should
> > > > > >>>>>>>>>>>>>>> just focus on completing 2.0 and make all
> discussions
> > > > > >>> about
> > > > > >>>>>>>>>> further
> > > > > >>>>>>>>>>>>>>> improvements to 2.1? While those are important
> > > > > >>> discussions
> > > > > >>>>> (and
> > > > > >>>>>>>>>> we
> > > > > >>>>>>>>>>>>> should
> > > > > >>>>>>>>>>>>>>> continue them in the  near future !) I think at
> this
> > > > > >>> point
> > > > > >>>>>>>>>> focusing
> > > > > >>>>>>>>>>>> on
> > > > > >>>>>>>>>>>>>>> delivering 2.0 in its current shape should be our
> > focus
> > > > > >>>> now ?
> > > > > >>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>> J.
> > > > > >>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>> On Thu, Jun 18, 2020 at 6:35 PM Xinbin Huang <
> > > > > >>>>>>>>>>> bin.huangxb@gmail.com>
> > > > > >>>>>>>>>>>>>>> wrote:
> > > > > >>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>> Hi Daniel
> > > > > >>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>> I agree that the TaskGroup should have the same
> API
> > > > > >> as a
> > > > > >>>> DAG
> > > > > >>>>>>>>>>> object
> > > > > >>>>>>>>>>>>>>> related
> > > > > >>>>>>>>>>>>>>>> to task dependencies, but it will not have
> anything
> > > > > >>>> related
> > > > > >>>>> to
> > > > > >>>>>>>>>>>> actual
> > > > > >>>>>>>>>>>>>>>> execution or scheduling.
> > > > > >>>>>>>>>>>>>>>> I will update the AIP according to this over the
> > > > > >>> weekend.
> > > > > >>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>> We could even make a “DAGTemplate” object s.t.
> when
> > > > > >> you
> > > > > >>>>>>>>>> import
> > > > > >>>>>>>>>>>> the
> > > > > >>>>>>>>>>>>>>> object
> > > > > >>>>>>>>>>>>>>>> you can import it with parameters to determine the
> > > > > >> shape
> > > > > >>>> of
> > > > > >>>>>> the
> > > > > >>>>>>>>>>>> DAG.
> > > > > >>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>> Can you elaborate a bit more on this? Does it
> serve
> > a
> > > > > >>>>> similar
> > > > > >>>>>>>>>>>> purpose
> > > > > >>>>>>>>>>>>>> as
> > > > > >>>>>>>>>>>>>>> a
> > > > > >>>>>>>>>>>>>>>> DAG factory function?
> > > > > >>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>> On Thu, Jun 18, 2020 at 9:13 AM Daniel Imberman <
> > > > > >>>>>>>>>>>>>>> daniel.imberman@gmail.com
> > > > > >>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>> wrote:
> > > > > >>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>> Hi Bin,
> > > > > >>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>> Why not give the TaskGroup the same API as a DAG
> > > > > >> object
> > > > > >>>>> (e.g.
> > > > > >>>>>>>>>>> the
> > > > > >>>>>>>>>>>>>>> bitwise
> > > > > >>>>>>>>>>>>>>>>> operator fro task dependencies). We could even
> > make a
> > > > > >>>>>>>>>>>> “DAGTemplate”
> > > > > >>>>>>>>>>>>>>>> object
> > > > > >>>>>>>>>>>>>>>>> s.t. when you import the object you can import it
> > > > > >> with
> > > > > >>>>>>>>>>> parameters
> > > > > >>>>>>>>>>>>> to
> > > > > >>>>>>>>>>>>>>>>> determine the shape of the DAG.
> > > > > >>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>> On Wed, Jun 17, 2020 at 8:54 PM, Xinbin Huang <
> > > > > >>>>>>>>>>>>> bin.huangxb@gmail.com
> > > > > >>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>> wrote:
> > > > > >>>>>>>>>>>>>>>>> The TaskGroup will not take schedule interval as
> a
> > > > > >>>>> parameter
> > > > > >>>>>>>>>>>>> itself,
> > > > > >>>>>>>>>>>>>>> and
> > > > > >>>>>>>>>>>>>>>> it
> > > > > >>>>>>>>>>>>>>>>> depends on the DAG where it attaches to. In my
> > > > > >> opinion,
> > > > > >>>> the
> > > > > >>>>>>>>>>>>> TaskGroup
> > > > > >>>>>>>>>>>>>>>> will
> > > > > >>>>>>>>>>>>>>>>> only contain a group of tasks with
> > interdependencies,
> > > > > >>> and
> > > > > >>>>> the
> > > > > >>>>>>>>>>>>>> TaskGroup
> > > > > >>>>>>>>>>>>>>>>> behaves like a task. It doesn't contain any
> > > > > >>>>>>>>>>> execution/scheduling
> > > > > >>>>>>>>>>>>>> logic
> > > > > >>>>>>>>>>>>>>>>> (i.e. schedule_interval, concurrency,
> > max_active_runs
> > > > > >>>> etc.)
> > > > > >>>>>>>>>>> like
> > > > > >>>>>>>>>>>> a
> > > > > >>>>>>>>>>>>>> DAG
> > > > > >>>>>>>>>>>>>>>>> does.
> > > > > >>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>> For example, there is the scenario that the
> > schedule
> > > > > >>>>>>>>>> interval
> > > > > >>>>>>>>>>>> of
> > > > > >>>>>>>>>>>>>> DAG
> > > > > >>>>>>>>>>>>>>> is
> > > > > >>>>>>>>>>>>>>>>> 1 hour and the schedule interval of TaskGroup is
> 20
> > > > > >>> min.
> > > > > >>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>> I am curious why you ask this. Is this a use case
> > > > > >> that
> > > > > >>>> you
> > > > > >>>>>>>>>> want
> > > > > >>>>>>>>>>>> to
> > > > > >>>>>>>>>>>>>>>> achieve?
> > > > > >>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>> Bin
> > > > > >>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>> On Wed, Jun 17, 2020 at 7:59 PM 蒋晓峰 <
> > > > > >>>>>>>>>> thanosxnicholas@gmail.com
> > > > > >>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>> wrote:
> > > > > >>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>> Hi Bin,
> > > > > >>>>>>>>>>>>>>>>>> Using TaskGroup, Is the schedule interval of
> > > > > >> TaskGroup
> > > > > >>>> the
> > > > > >>>>>>>>>>> same
> > > > > >>>>>>>>>>>>> as
> > > > > >>>>>>>>>>>>>>> the
> > > > > >>>>>>>>>>>>>>>>>> parent DAG? My main concern is whether the
> > schedule
> > > > > >>>>>>>>>> interval
> > > > > >>>>>>>>>>> of
> > > > > >>>>>>>>>>>>>>>> TaskGroup
> > > > > >>>>>>>>>>>>>>>>>> could be different with that of the DAG? For
> > > > > >> example,
> > > > > >>>>> there
> > > > > >>>>>>>>>>> is
> > > > > >>>>>>>>>>>>> the
> > > > > >>>>>>>>>>>>>>>>> scenario
> > > > > >>>>>>>>>>>>>>>>>> that the schedule interval of DAG is 1 hour and
> > the
> > > > > >>>>>>>>>> schedule
> > > > > >>>>>>>>>>>>>> interval
> > > > > >>>>>>>>>>>>>>>> of
> > > > > >>>>>>>>>>>>>>>>>> TaskGroup is 20 min.
> > > > > >>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>> Cheers,
> > > > > >>>>>>>>>>>>>>>>>> Nicholas
> > > > > >>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>> On Thu, Jun 18, 2020 at 10:30 AM Xinbin Huang <
> > > > > >>>>>>>>>>>>>> bin.huangxb@gmail.com
> > > > > >>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>> wrote:
> > > > > >>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>> Hi Nicholas,
> > > > > >>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>> I am not sure about the old behavior of
> > > > > >>> SubDagOperator,
> > > > > >>>>>>>>>>> maybe
> > > > > >>>>>>>>>>>>> it
> > > > > >>>>>>>>>>>>>>> will
> > > > > >>>>>>>>>>>>>>>>>> throw
> > > > > >>>>>>>>>>>>>>>>>>> an error? But in the original proposal, the
> > > > > >> subdag's
> > > > > >>>>>>>>>>>>>>>> schedule_interval
> > > > > >>>>>>>>>>>>>>>>>> will
> > > > > >>>>>>>>>>>>>>>>>>> be ignored. Or if we decide to use TaskGroup to
> > > > > >>> replace
> > > > > >>>>>>>>>>>> SubDag,
> > > > > >>>>>>>>>>>>>>> there
> > > > > >>>>>>>>>>>>>>>>>> will
> > > > > >>>>>>>>>>>>>>>>>>> be no subdag schedule_interval.
> > > > > >>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>> Bin
> > > > > >>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>> On Wed, Jun 17, 2020 at 6:21 PM 蒋晓峰 <
> > > > > >>>>>>>>>>>> thanosxnicholas@gmail.com
> > > > > >>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>> wrote:
> > > > > >>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>> Hi Bin,
> > > > > >>>>>>>>>>>>>>>>>>>> Thanks for your good proposal. I was confused
> > > > > >>> whether
> > > > > >>>>>>>>>> the
> > > > > >>>>>>>>>>>>>>> schedule
> > > > > >>>>>>>>>>>>>>>>>>>> interval of SubDAG is different from that of
> the
> > > > > >>>> parent
> > > > > >>>>>>>>>>>> DAG?
> > > > > >>>>>>>>>>>>> I
> > > > > >>>>>>>>>>>>>>> have
> > > > > >>>>>>>>>>>>>>>>>>>> discussed with Jiajie Zhong about the schedule
> > > > > >>>> interval
> > > > > >>>>>>>>>>> of
> > > > > >>>>>>>>>>>>>>> SubDAG.
> > > > > >>>>>>>>>>>>>>>> If
> > > > > >>>>>>>>>>>>>>>>>> the
> > > > > >>>>>>>>>>>>>>>>>>>> SubDagOperator has a different schedule
> > interval,
> > > > > >>> what
> > > > > >>>>>>>>>>> will
> > > > > >>>>>>>>>>>>>>> happen
> > > > > >>>>>>>>>>>>>>>>> for
> > > > > >>>>>>>>>>>>>>>>>>> the
> > > > > >>>>>>>>>>>>>>>>>>>> scheduler to schedule the parent DAG?
> > > > > >>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>> Regards,
> > > > > >>>>>>>>>>>>>>>>>>>> Nicholas Jiang
> > > > > >>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>> On Thu, Jun 18, 2020 at 8:04 AM Xinbin Huang <
> > > > > >>>>>>>>>>>>>>>> bin.huangxb@gmail.com>
> > > > > >>>>>>>>>>>>>>>>>>>> wrote:
> > > > > >>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>>> Thank you, Max, Kaxil, and everyone's
> feedback!
> > > > > >>>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>>> I have rethought about the concept of subdag
> > and
> > > > > >>> task
> > > > > >>>>>>>>>>>>>> groups. I
> > > > > >>>>>>>>>>>>>>>>> think
> > > > > >>>>>>>>>>>>>>>>>>> the
> > > > > >>>>>>>>>>>>>>>>>>>>> better way to approach this is to entirely
> > remove
> > > > > >>>>>>>>>>> subdag
> > > > > >>>>>>>>>>>>> and
> > > > > >>>>>>>>>>>>>>>>>> introduce
> > > > > >>>>>>>>>>>>>>>>>>>> the
> > > > > >>>>>>>>>>>>>>>>>>>>> concept of TaskGroup, which is a container of
> > > > > >> tasks
> > > > > >>>>>>>>>>> along
> > > > > >>>>>>>>>>>>>> with
> > > > > >>>>>>>>>>>>>>>>> their
> > > > > >>>>>>>>>>>>>>>>>>>>> dependencies *without execution/scheduling
> > logic
> > > > > >>> as a
> > > > > >>>>>>>>>>>> DAG*.
> > > > > >>>>>>>>>>>>>> The
> > > > > >>>>>>>>>>>>>>>>> only
> > > > > >>>>>>>>>>>>>>>>>>>>> purpose of it is to group a list of tasks,
> but
> > > > > >> you
> > > > > >>>>>>>>>>> still
> > > > > >>>>>>>>>>>>> need
> > > > > >>>>>>>>>>>>>>> to
> > > > > >>>>>>>>>>>>>>>>> add
> > > > > >>>>>>>>>>>>>>>>>> it
> > > > > >>>>>>>>>>>>>>>>>>>> to
> > > > > >>>>>>>>>>>>>>>>>>>>> a DAG for execution.
> > > > > >>>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>>> Here is a small code snippet.
> > > > > >>>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>>> ```
> > > > > >>>>>>>>>>>>>>>>>>>>> class TaskGroup:
> > > > > >>>>>>>>>>>>>>>>>>>>> """
> > > > > >>>>>>>>>>>>>>>>>>>>> A TaskGroup contains a group of tasks.
> > > > > >>>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>>> If default_args is missing, it will take
> > default
> > > > > >>> args
> > > > > >>>>>>>>>>>> from
> > > > > >>>>>>>>>>>>>> the
> > > > > >>>>>>>>>>>>>>>>>> DAG.
> > > > > >>>>>>>>>>>>>>>>>>>>> """
> > > > > >>>>>>>>>>>>>>>>>>>>> def __init__(self, group_id, default_args):
> > > > > >>>>>>>>>>>>>>>>>>>>> pass
> > > > > >>>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>>> """
> > > > > >>>>>>>>>>>>>>>>>>>>> You can add tasks to a task group similar to
> > > > > >> adding
> > > > > >>>>>>>>>>> tasks
> > > > > >>>>>>>>>>>>> to
> > > > > >>>>>>>>>>>>>> a
> > > > > >>>>>>>>>>>>>>>> DAG
> > > > > >>>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>>> This can be declared in a separate file from
> > the
> > > > > >>> dag
> > > > > >>>>>>>>>>> file
> > > > > >>>>>>>>>>>>>>>>>>>>> """
> > > > > >>>>>>>>>>>>>>>>>>>>> download_group =
> TaskGroup(group_id='download',
> > > > > >>>>>>>>>>>>>>>>>>>> default_args=default_args)
> > > > > >>>>>>>>>>>>>>>>>>>>> download_group.add_task(task1)
> > > > > >>>>>>>>>>>>>>>>>>>>> task2.dag = download_group
> > > > > >>>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>>> with download_group:
> > > > > >>>>>>>>>>>>>>>>>>>>> task3 = DummyOperator(task_id='task3')
> > > > > >>>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>>> [task, task2] >> task3
> > > > > >>>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>>> """Add it to a DAG for execution"""
> > > > > >>>>>>>>>>>>>>>>>>>>> with DAG(dag_id='start_download_dag',
> > > > > >>>>>>>>>>>>>>> default_args=default_args,
> > > > > >>>>>>>>>>>>>>>>>>>>> schedule_interval='@daily', ...) as dag:
> > > > > >>>>>>>>>>>>>>>>>>>>> start = DummyOperator(task_id='start')
> > > > > >>>>>>>>>>>>>>>>>>>>> start >> download_group
> > > > > >>>>>>>>>>>>>>>>>>>>> # this is equivalent to
> > > > > >>>>>>>>>>>>>>>>>>>>> # start >> [task, task2] >> task3
> > > > > >>>>>>>>>>>>>>>>>>>>> ```
> > > > > >>>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>>> With this, we can still reuse a group of
> tasks
> > > > > >> and
> > > > > >>>>>>>>>> set
> > > > > >>>>>>>>>>>>>>>> dependencies
> > > > > >>>>>>>>>>>>>>>>>>>> between
> > > > > >>>>>>>>>>>>>>>>>>>>> them; it avoids the boilerplate code from
> using
> > > > > >>>>>>>>>>>>>> SubDagOperator,
> > > > > >>>>>>>>>>>>>>>> and
> > > > > >>>>>>>>>>>>>>>>>> we
> > > > > >>>>>>>>>>>>>>>>>>>> can
> > > > > >>>>>>>>>>>>>>>>>>>>> declare dependencies as `task >> task_group
> >>
> > > > > >>> task`.
> > > > > >>>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>>> User migration wise, we can introduce it
> before
> > > > > >>>>>>>>>> Airflow
> > > > > >>>>>>>>>>>> 2.0
> > > > > >>>>>>>>>>>>>> and
> > > > > >>>>>>>>>>>>>>>>> allow
> > > > > >>>>>>>>>>>>>>>>>>>>> gradual transition. Then we can decide if we
> > > > > >> still
> > > > > >>>>>>>>>> want
> > > > > >>>>>>>>>>>> to
> > > > > >>>>>>>>>>>>>> keep
> > > > > >>>>>>>>>>>>>>>> the
> > > > > >>>>>>>>>>>>>>>>>>>>> SubDagOperator or simply remove it.
> > > > > >>>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>>> Any thoughts?
> > > > > >>>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>>> Cheers,
> > > > > >>>>>>>>>>>>>>>>>>>>> Bin
> > > > > >>>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>>> On Wed, Jun 17, 2020 at 7:37 AM Maxime
> > > > > >> Beauchemin <
> > > > > >>>>>>>>>>>>>>>>>>>>> maximebeauchemin@gmail.com> wrote:
> > > > > >>>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>>>> +1, proposal looks good.
> > > > > >>>>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>>>> The original intention was really to have
> > tasks
> > > > > >>>>>>>>>>> groups
> > > > > >>>>>>>>>>>>> and
> > > > > >>>>>>>>>>>>>> a
> > > > > >>>>>>>>>>>>>>>>>>>> zoom-in/out
> > > > > >>>>>>>>>>>>>>>>>>>>> in
> > > > > >>>>>>>>>>>>>>>>>>>>>> the UI. The original reasoning was to reuse
> > the
> > > > > >>> DAG
> > > > > >>>>>>>>>>>>> object
> > > > > >>>>>>>>>>>>>>>> since
> > > > > >>>>>>>>>>>>>>>>> it
> > > > > >>>>>>>>>>>>>>>>>>> is
> > > > > >>>>>>>>>>>>>>>>>>>> a
> > > > > >>>>>>>>>>>>>>>>>>>>>> group of tasks, but as highlighted here it
> > does
> > > > > >>>>>>>>>>> create
> > > > > >>>>>>>>>>>>>>>> underlying
> > > > > >>>>>>>>>>>>>>>>>>>>>> confusions since a DAG is much more than
> just
> > a
> > > > > >>>>>>>>>> group
> > > > > >>>>>>>>>>>> of
> > > > > >>>>>>>>>>>>>>> tasks.
> > > > > >>>>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>>>> Max
> > > > > >>>>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>>>> On Mon, Jun 15, 2020 at 2:43 AM Poornima
> > Joshi <
> > > > > >>>>>>>>>>>>>>>>>>>>> joshipoornima06@gmail.com>
> > > > > >>>>>>>>>>>>>>>>>>>>>> wrote:
> > > > > >>>>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>>>>> Thank you for your email.
> > > > > >>>>>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>>>>> On Sat, Jun 13, 2020 at 12:18 AM Xinbin
> > Huang <
> > > > > >>>>>>>>>>>>>>>>>>> bin.huangxb@gmail.com
> > > > > >>>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>>>>> wrote:
> > > > > >>>>>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>> - *Unpack SubDags during dag parsing*:
> > This
> > > > > >>>>>>>>>>>>>> rewrites
> > > > > >>>>>>>>>>>>>>>> the
> > > > > >>>>>>>>>>>>>>>>>>>>>>>> *DagBag.bag_dag*
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>> method to unpack subdag while parsing,
> and
> > > > > >>>>>>>>>> it
> > > > > >>>>>>>>>>>>> will
> > > > > >>>>>>>>>>>>>>>> give a
> > > > > >>>>>>>>>>>>>>>>>>>> flat
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>> structure at
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>> the task level
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>> The serialized_dag representation already
> > > > > >>>>>>>>>> does
> > > > > >>>>>>>>>>>>> this I
> > > > > >>>>>>>>>>>>>>>>> think.
> > > > > >>>>>>>>>>>>>>>>>> At
> > > > > >>>>>>>>>>>>>>>>>>>>> least
> > > > > >>>>>>>>>>>>>>>>>>>>>>> if
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>> I've understood your idea here correctly.
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>>>>>> I am not sure about serialized_dag
> > > > > >>>>>>>>>>> representation,
> > > > > >>>>>>>>>>>>> but
> > > > > >>>>>>>>>>>>>> at
> > > > > >>>>>>>>>>>>>>>>> least
> > > > > >>>>>>>>>>>>>>>>>>> it
> > > > > >>>>>>>>>>>>>>>>>>>>> will
> > > > > >>>>>>>>>>>>>>>>>>>>>>>> still keep the subdag entry in the DAG
> > table?
> > > > > >>>>>>>>>> In
> > > > > >>>>>>>>>>> my
> > > > > >>>>>>>>>>>>>>>> proposal
> > > > > >>>>>>>>>>>>>>>>> as
> > > > > >>>>>>>>>>>>>>>>>>>> also
> > > > > >>>>>>>>>>>>>>>>>>>>> in
> > > > > >>>>>>>>>>>>>>>>>>>>>>> the
> > > > > >>>>>>>>>>>>>>>>>>>>>>>> draft PR, the idea is to *extract the
> tasks
> > > > > >>>>>>>>>> from
> > > > > >>>>>>>>>>>> the
> > > > > >>>>>>>>>>>>>>> subdag
> > > > > >>>>>>>>>>>>>>>>> and
> > > > > >>>>>>>>>>>>>>>>>>> add
> > > > > >>>>>>>>>>>>>>>>>>>>>> them
> > > > > >>>>>>>>>>>>>>>>>>>>>>>> back to the root_dag. *So the runtime DAG
> > > > > >> graph
> > > > > >>>>>>>>>>>> will
> > > > > >>>>>>>>>>>>>> look
> > > > > >>>>>>>>>>>>>>>>>> exactly
> > > > > >>>>>>>>>>>>>>>>>>>> the
> > > > > >>>>>>>>>>>>>>>>>>>>>>>> same as without subdag but with metadata
> > > > > >>>>>>>>>> attached
> > > > > >>>>>>>>>>>> to
> > > > > >>>>>>>>>>>>>>> those
> > > > > >>>>>>>>>>>>>>>>>>>> sections.
> > > > > >>>>>>>>>>>>>>>>>>>>>>> These
> > > > > >>>>>>>>>>>>>>>>>>>>>>>> metadata will be later on used to render
> in
> > > > > >> the
> > > > > >>>>>>>>>>> UI.
> > > > > >>>>>>>>>>>>> So
> > > > > >>>>>>>>>>>>>>>> after
> > > > > >>>>>>>>>>>>>>>>>>>> parsing
> > > > > >>>>>>>>>>>>>>>>>>>>> (
> > > > > >>>>>>>>>>>>>>>>>>>>>>>> *DagBag.process_file()*), it will just
> > output
> > > > > >>>>>>>>>> the
> > > > > >>>>>>>>>>>>>>> *root_dag
> > > > > >>>>>>>>>>>>>>>>>>>> *instead
> > > > > >>>>>>>>>>>>>>>>>>>>> of
> > > > > >>>>>>>>>>>>>>>>>>>>>>> *root_dag +
> > > > > >>>>>>>>>>>>>>>>>>>>>>>> subdag + subdag + nested subdag* etc.
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>>>>>> - e.g. section-1-* will have metadata
> > > > > >>>>>>>>>>>>>>>>>> current_group=section-1,
> > > > > >>>>>>>>>>>>>>>>>>>>>>>> parent_group=<the-root-dag-id> (welcome
> for
> > > > > >>>>>>>>>>> naming
> > > > > >>>>>>>>>>>>>>>>>>> suggestions),
> > > > > >>>>>>>>>>>>>>>>>>>>> the
> > > > > >>>>>>>>>>>>>>>>>>>>>>>> reason for parent_group is that we can
> have
> > > > > >>>>>>>>>>> nested
> > > > > >>>>>>>>>>>>>> group
> > > > > >>>>>>>>>>>>>>>> and
> > > > > >>>>>>>>>>>>>>>>>>>> still
> > > > > >>>>>>>>>>>>>>>>>>>>>> be
> > > > > >>>>>>>>>>>>>>>>>>>>>>>> able to capture the dependency.
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>>>>>> Runtime DAG:
> > > > > >>>>>>>>>>>>>>>>>>>>>>>> [image: image.png]
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>>>>>> While at the UI, what we see would be
> > > > > >> something
> > > > > >>>>>>>>>>>> like
> > > > > >>>>>>>>>>>>>> this
> > > > > >>>>>>>>>>>>>>>> by
> > > > > >>>>>>>>>>>>>>>>>>>>> utilizing
> > > > > >>>>>>>>>>>>>>>>>>>>>>> the
> > > > > >>>>>>>>>>>>>>>>>>>>>>>> metadata, and then we can expand or zoom
> > into
> > > > > >>>>>>>>>> in
> > > > > >>>>>>>>>>>> some
> > > > > >>>>>>>>>>>>>>> way.
> > > > > >>>>>>>>>>>>>>>>>>>>>>>> [image: image.png]
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>>>>>> The benefits I can see is that:
> > > > > >>>>>>>>>>>>>>>>>>>>>>>> 1. We don't need to deal with the extra
> > > > > >>>>>>>>>>> complexity
> > > > > >>>>>>>>>>>> of
> > > > > >>>>>>>>>>>>>>>> SubDag
> > > > > >>>>>>>>>>>>>>>>>> for
> > > > > >>>>>>>>>>>>>>>>>>>>>>> execution
> > > > > >>>>>>>>>>>>>>>>>>>>>>>> and scheduling. It will be the same as not
> > > > > >>>>>>>>>> using
> > > > > >>>>>>>>>>>>>> SubDag.
> > > > > >>>>>>>>>>>>>>>>>>>>>>>> 2. Still have the benefits of modularized
> > and
> > > > > >>>>>>>>>>>>> reusable
> > > > > >>>>>>>>>>>>>>> dag
> > > > > >>>>>>>>>>>>>>>>> code
> > > > > >>>>>>>>>>>>>>>>>>> and
> > > > > >>>>>>>>>>>>>>>>>>>>>>>> declare dependencies between them. And
> with
> > > > > >> the
> > > > > >>>>>>>>>>> new
> > > > > >>>>>>>>>>>>>>>>>>> SubDagOperator
> > > > > >>>>>>>>>>>>>>>>>>>>> (see
> > > > > >>>>>>>>>>>>>>>>>>>>>>> AIP
> > > > > >>>>>>>>>>>>>>>>>>>>>>>> or draft PR), we can use the same
> > dag_factory
> > > > > >>>>>>>>>>>>> function
> > > > > >>>>>>>>>>>>>>> for
> > > > > >>>>>>>>>>>>>>>>>>>>> generating 1
> > > > > >>>>>>>>>>>>>>>>>>>>>>>> dag, a lot of dynamic dags, or used for
> > SubDag
> > > > > >>>>>>>>>>> (in
> > > > > >>>>>>>>>>>>> this
> > > > > >>>>>>>>>>>>>>>> case,
> > > > > >>>>>>>>>>>>>>>>>> it
> > > > > >>>>>>>>>>>>>>>>>>>> will
> > > > > >>>>>>>>>>>>>>>>>>>>>>> just
> > > > > >>>>>>>>>>>>>>>>>>>>>>>> extract all underlying tasks and append to
> > the
> > > > > >>>>>>>>>>> root
> > > > > >>>>>>>>>>>>>> dag).
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>>>>>> - Then it gets to the idea of replacing
> > subdag
> > > > > >>>>>>>>>>>> with a
> > > > > >>>>>>>>>>>>>>>>>> simpler
> > > > > >>>>>>>>>>>>>>>>>>>>>> concept
> > > > > >>>>>>>>>>>>>>>>>>>>>>>> by Ash: the proposed change basically
> drains
> > > > > >>>>>>>>>> out
> > > > > >>>>>>>>>>>> the
> > > > > >>>>>>>>>>>>>>>>>> contents
> > > > > >>>>>>>>>>>>>>>>>>>> of
> > > > > >>>>>>>>>>>>>>>>>>>>> a
> > > > > >>>>>>>>>>>>>>>>>>>>>>> SubDag
> > > > > >>>>>>>>>>>>>>>>>>>>>>>> and becomes more like
> > > > > >>>>>>>>>>>>>>>>>>>> ExtractSubdagTasksAndAppendToRootdagOperator
> > > > > >>>>>>>>>>>>>>>>>>>>>>> (forgive
> > > > > >>>>>>>>>>>>>>>>>>>>>>>> me about the crazy name..). In this case,
> it
> > > > > >> is
> > > > > >>>>>>>>>>>> still
> > > > > >>>>>>>>>>>>>>>>>>> necessary
> > > > > >>>>>>>>>>>>>>>>>>>> to
> > > > > >>>>>>>>>>>>>>>>>>>>>>> keep the
> > > > > >>>>>>>>>>>>>>>>>>>>>>>> concept of subdag as it is nothing more
> > than a
> > > > > >>>>>>>>>>>> name?
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>>>>>> That's why the TaskGroup idea comes up.
> > Thanks
> > > > > >>>>>>>>>>>> Chris
> > > > > >>>>>>>>>>>>>>> Palmer
> > > > > >>>>>>>>>>>>>>>>> for
> > > > > >>>>>>>>>>>>>>>>>>>>> helping
> > > > > >>>>>>>>>>>>>>>>>>>>>>>> conceptualize the functionality of
> > TaskGroup,
> > > > > >> I
> > > > > >>>>>>>>>>>> will
> > > > > >>>>>>>>>>>>>> just
> > > > > >>>>>>>>>>>>>>>>> paste
> > > > > >>>>>>>>>>>>>>>>>>> it
> > > > > >>>>>>>>>>>>>>>>>>>>>> here.
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>> - Tasks can be added to a TaskGroup
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>> - You *can* have dependencies between
> Tasks
> > > > > >>>>>>>>>> in
> > > > > >>>>>>>>>>>> the
> > > > > >>>>>>>>>>>>>> same
> > > > > >>>>>>>>>>>>>>>>>>>> TaskGroup,
> > > > > >>>>>>>>>>>>>>>>>>>>>> but
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>> *cannot* have dependencies between a Task
> > in
> > > > > >>>>>>>>>> a
> > > > > >>>>>>>>>>>>>>> TaskGroup
> > > > > >>>>>>>>>>>>>>>>>> and
> > > > > >>>>>>>>>>>>>>>>>>>>>> either a
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>> Task in a different TaskGroup or a Task
> not
> > > > > >>>>>>>>>> in
> > > > > >>>>>>>>>>>> any
> > > > > >>>>>>>>>>>>>>> group
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>> - You *can* have dependencies between a
> > > > > >>>>>>>>>>> TaskGroup
> > > > > >>>>>>>>>>>>> and
> > > > > >>>>>>>>>>>>>>>>>> either
> > > > > >>>>>>>>>>>>>>>>>>>>> other
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>> TaskGroups or Tasks not in any group
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>> - The UI will by default render a
> TaskGroup
> > > > > >>>>>>>>>> as
> > > > > >>>>>>>>>>> a
> > > > > >>>>>>>>>>>>>> single
> > > > > >>>>>>>>>>>>>>>>>>>> "object",
> > > > > >>>>>>>>>>>>>>>>>>>>>> but
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>> which you expand or zoom into in some way
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>> - You'd need some way to determine what
> the
> > > > > >>>>>>>>>>>>> "status"
> > > > > >>>>>>>>>>>>>>> of a
> > > > > >>>>>>>>>>>>>>>>>>>>> TaskGroup
> > > > > >>>>>>>>>>>>>>>>>>>>>>> was
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>> at least for UI display purposes
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>>>>>> I agree with Chris:
> > > > > >>>>>>>>>>>>>>>>>>>>>>>> - From the backend's view (scheduler &
> > > > > >>>>>>>>>>> executor), I
> > > > > >>>>>>>>>>>>>> think
> > > > > >>>>>>>>>>>>>>>>>>> TaskGroup
> > > > > >>>>>>>>>>>>>>>>>>>>>>> should
> > > > > >>>>>>>>>>>>>>>>>>>>>>>> be ignored during execution. (unless we
> > decide
> > > > > >>>>>>>>>> to
> > > > > >>>>>>>>>>>>>>> implement
> > > > > >>>>>>>>>>>>>>>>>> some
> > > > > >>>>>>>>>>>>>>>>>>>>>> metadata
> > > > > >>>>>>>>>>>>>>>>>>>>>>>> operations that allows start/stop a group
> of
> > > > > >>>>>>>>>>> tasks
> > > > > >>>>>>>>>>>>>> etc.)
> > > > > >>>>>>>>>>>>>>>>>>>>>>>> - From the UI's View, it should be able to
> > > > > >> pick
> > > > > >>>>>>>>>>> up
> > > > > >>>>>>>>>>>>> the
> > > > > >>>>>>>>>>>>>>>>>> individual
> > > > > >>>>>>>>>>>>>>>>>>>>>> tasks'
> > > > > >>>>>>>>>>>>>>>>>>>>>>>> status and then determine the TaskGroup's
> > > > > >>>>>>>>>> status
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>>>>>> Bin
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>>>>>> On Fri, Jun 12, 2020 at 10:28 AM Daniel
> > > > > >>>>>>>>>> Imberman
> > > > > >>>>>>>>>>> <
> > > > > >>>>>>>>>>>>>>>>>>>>>>>> daniel.imberman@gmail.com> wrote:
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>> I hadn’t thought about using the `>>`
> > > > > >> operator
> > > > > >>>>>>>>>>> to
> > > > > >>>>>>>>>>>>> tie
> > > > > >>>>>>>>>>>>>>> dags
> > > > > >>>>>>>>>>>>>>>>>>>> together
> > > > > >>>>>>>>>>>>>>>>>>>>>> but
> > > > > >>>>>>>>>>>>>>>>>>>>>>> I
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>> think that sounds pretty great! I wonder
> if
> > > > > >> we
> > > > > >>>>>>>>>>>> could
> > > > > >>>>>>>>>>>>>>>>>> essentially
> > > > > >>>>>>>>>>>>>>>>>>>>> write
> > > > > >>>>>>>>>>>>>>>>>>>>>>> in
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>> the ability to set dependencies to all
> > > > > >>>>>>>>>>>> starter-tasks
> > > > > >>>>>>>>>>>>>> for
> > > > > >>>>>>>>>>>>>>>>> that
> > > > > >>>>>>>>>>>>>>>>>>> DAG.
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>> I’m personally ok with SubDag being a
> > mostly
> > > > > >>>>>>>>>> UI
> > > > > >>>>>>>>>>>>>> concept.
> > > > > >>>>>>>>>>>>>>>> It
> > > > > >>>>>>>>>>>>>>>>>>>> doesn’t
> > > > > >>>>>>>>>>>>>>>>>>>>>> need
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>> to execute separately, you’re just adding
> > > > > >> more
> > > > > >>>>>>>>>>>> tasks
> > > > > >>>>>>>>>>>>>> to
> > > > > >>>>>>>>>>>>>>>> the
> > > > > >>>>>>>>>>>>>>>>>>> queue
> > > > > >>>>>>>>>>>>>>>>>>>>> that
> > > > > >>>>>>>>>>>>>>>>>>>>>>> will
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>> be executed when there are resources
> > > > > >>>>>>>>>> available.
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>> via Newton Mail [
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>
> > > > > >>>>>>>>>>>
> > > > > >>>>>>>>>>
> > > > > >>>>>>>>
> > > > > >>>>>>>
> > > > > >>>>>>
> > > > > >>>>>
> > > > > >>>>
> > > > > >>>
> > > > > >>
> > > > >
> > > >
> > >
> >
> https://cloudmagic.com/k/d/mailapp?ct=dx&cv=10.0.50&pv=10.14.6&source=email_footer_2
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>> ]
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>> On Fri, Jun 12, 2020 at 9:45 AM, Chris
> > Palmer
> > > > > >>>>>>>>>> <
> > > > > >>>>>>>>>>>>>>>>>>> chris@crpalmer.com
> > > > > >>>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>>>>> wrote:
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>> I agree that SubDAGs are an overly
> complex
> > > > > >>>>>>>>>>>>>> abstraction.
> > > > > >>>>>>>>>>>>>>> I
> > > > > >>>>>>>>>>>>>>>>>> think
> > > > > >>>>>>>>>>>>>>>>>>>> what
> > > > > >>>>>>>>>>>>>>>>>>>>>> is
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>> needed/useful is a TaskGroup concept. On
> a
> > > > > >>>>>>>>>> high
> > > > > >>>>>>>>>>>>> level
> > > > > >>>>>>>>>>>>>> I
> > > > > >>>>>>>>>>>>>>>>> think
> > > > > >>>>>>>>>>>>>>>>>>> you
> > > > > >>>>>>>>>>>>>>>>>>>>> want
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>> this
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>> functionality:
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>> - Tasks can be added to a TaskGroup
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>> - You *can* have dependencies between
> Tasks
> > > > > >> in
> > > > > >>>>>>>>>>> the
> > > > > >>>>>>>>>>>>>> same
> > > > > >>>>>>>>>>>>>>>>>>> TaskGroup,
> > > > > >>>>>>>>>>>>>>>>>>>>> but
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>> *cannot* have dependencies between a Task
> > in
> > > > > >> a
> > > > > >>>>>>>>>>>>>> TaskGroup
> > > > > >>>>>>>>>>>>>>>> and
> > > > > >>>>>>>>>>>>>>>>>>>> either
> > > > > >>>>>>>>>>>>>>>>>>>>> a
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>> Task in a different TaskGroup or a Task
> not
> > > > > >> in
> > > > > >>>>>>>>>>> any
> > > > > >>>>>>>>>>>>>> group
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>> - You *can* have dependencies between a
> > > > > >>>>>>>>>>> TaskGroup
> > > > > >>>>>>>>>>>>> and
> > > > > >>>>>>>>>>>>>>>> either
> > > > > >>>>>>>>>>>>>>>>>>> other
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>> TaskGroups or Tasks not in any group
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>> - The UI will by default render a
> TaskGroup
> > > > > >>>>>>>>>> as a
> > > > > >>>>>>>>>>>>>> single
> > > > > >>>>>>>>>>>>>>>>>>> "object",
> > > > > >>>>>>>>>>>>>>>>>>>>> but
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>> which you expand or zoom into in some way
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>> - You'd need some way to determine what
> the
> > > > > >>>>>>>>>>>> "status"
> > > > > >>>>>>>>>>>>>> of
> > > > > >>>>>>>>>>>>>>> a
> > > > > >>>>>>>>>>>>>>>>>>>> TaskGroup
> > > > > >>>>>>>>>>>>>>>>>>>>>> was
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>> at least for UI display purposes
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>> Not sure if it would need to be a top
> level
> > > > > >>>>>>>>>>> object
> > > > > >>>>>>>>>>>>>> with
> > > > > >>>>>>>>>>>>>>>> its
> > > > > >>>>>>>>>>>>>>>>>> own
> > > > > >>>>>>>>>>>>>>>>>>>>>> database
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>> table and model or just another attribute
> > on
> > > > > >>>>>>>>>>>> tasks.
> > > > > >>>>>>>>>>>>> I
> > > > > >>>>>>>>>>>>>>>> think
> > > > > >>>>>>>>>>>>>>>>>> you
> > > > > >>>>>>>>>>>>>>>>>>>>> could
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>> build
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>> it in a way such that from the schedulers
> > > > > >>>>>>>>>> point
> > > > > >>>>>>>>>>> of
> > > > > >>>>>>>>>>>>>> view
> > > > > >>>>>>>>>>>>>>> a
> > > > > >>>>>>>>>>>>>>>>> DAG
> > > > > >>>>>>>>>>>>>>>>>>> with
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>> TaskGroups doesn't get treated any
> > > > > >>>>>>>>>> differently.
> > > > > >>>>>>>>>>> So
> > > > > >>>>>>>>>>>>> it
> > > > > >>>>>>>>>>>>>>>> really
> > > > > >>>>>>>>>>>>>>>>>>> just
> > > > > >>>>>>>>>>>>>>>>>>>>>>> becomes
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>> a
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>> shortcut for setting dependencies between
> > > > > >> sets
> > > > > >>>>>>>>>>> of
> > > > > >>>>>>>>>>>>>> Tasks,
> > > > > >>>>>>>>>>>>>>>> and
> > > > > >>>>>>>>>>>>>>>>>>>> allows
> > > > > >>>>>>>>>>>>>>>>>>>>>> the
> > > > > >>>>>>>>>>>>>>>>>>>>>>> UI
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>> to simplify the render of the DAG
> > structure.
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>> Chris
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>> On Fri, Jun 12, 2020 at 12:12 PM Dan
> > Davydov
> > > > > >>>>>>>>>>>>>>>>>>>>>>> <ddavydov@twitter.com.invalid
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>> wrote:
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>> Agree with James (and think it's
> actually
> > > > > >>>>>>>>>> the
> > > > > >>>>>>>>>>>> more
> > > > > >>>>>>>>>>>>>>>>> important
> > > > > >>>>>>>>>>>>>>>>>>>> issue
> > > > > >>>>>>>>>>>>>>>>>>>>>> to
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>> fix),
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>> but I am still convinced Ash' idea is
> the
> > > > > >>>>>>>>>>> right
> > > > > >>>>>>>>>>>>> way
> > > > > >>>>>>>>>>>>>>>>> forward
> > > > > >>>>>>>>>>>>>>>>>>>> (just
> > > > > >>>>>>>>>>>>>>>>>>>>> it
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>> might
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>> require a bit more work to deprecate
> than
> > > > > >>>>>>>>>>> adding
> > > > > >>>>>>>>>>>>>>> visual
> > > > > >>>>>>>>>>>>>>>>>>> grouping
> > > > > >>>>>>>>>>>>>>>>>>>>> in
> > > > > >>>>>>>>>>>>>>>>>>>>>>> the
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>> UI).
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>> There was a previous thread about this
> FYI
> > > > > >>>>>>>>>>> with
> > > > > >>>>>>>>>>>>> more
> > > > > >>>>>>>>>>>>>>>>> context
> > > > > >>>>>>>>>>>>>>>>>>> on
> > > > > >>>>>>>>>>>>>>>>>>>>> why
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>> subdags
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>> are bad and potential solutions:
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>
> > > > > >>>>
> > https://www.mail-archive.com/dev@airflow.apache.org/msg01202.html
> > > > > >>>>>>>>>>>>>>>>>>>>>> . A
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>> solution I outline there to Jame's
> problem
> > > > > >>>>>>>>>> is
> > > > > >>>>>>>>>>>> e.g.
> > > > > >>>>>>>>>>>>>>>>> enabling
> > > > > >>>>>>>>>>>>>>>>>>> the
> > > > > >>>>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>> operator
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>> for Airflow operators to work with DAGs
> as
> > > > > >>>>>>>>>>>> well. I
> > > > > >>>>>>>>>>>>>> see
> > > > > >>>>>>>>>>>>>>>>> this
> > > > > >>>>>>>>>>>>>>>>>>>> being
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>> separate
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>> from Ash' solution for DAG grouping in
> the
> > > > > >>>>>>>>>> UI
> > > > > >>>>>>>>>>>> but
> > > > > >>>>>>>>>>>>>> one
> > > > > >>>>>>>>>>>>>>> of
> > > > > >>>>>>>>>>>>>>>>> the
> > > > > >>>>>>>>>>>>>>>>>>> two
> > > > > >>>>>>>>>>>>>>>>>>>>>> items
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>> required to replace all existing subdag
> > > > > >>>>>>>>>>>>>> functionality.
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>> I've been working with subdags for 3
> years
> > > > > >>>>>>>>>> and
> > > > > >>>>>>>>>>>>> they
> > > > > >>>>>>>>>>>>>>> are
> > > > > >>>>>>>>>>>>>>>>>>> always a
> > > > > >>>>>>>>>>>>>>>>>>>>>> giant
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>> pain
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>> to use. They are a constant source of
> user
> > > > > >>>>>>>>>>>>> confusion
> > > > > >>>>>>>>>>>>>>> and
> > > > > >>>>>>>>>>>>>>>>>>>> breakages
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>> during
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>> upgrades. Would love to see them gone
> :).
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>> On Fri, Jun 12, 2020 at 11:11 AM James
> > > > > >>>>>>>>>> Coder <
> > > > > >>>>>>>>>>>>>>>>>>>> jcoder01@gmail.com>
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>> wrote:
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>> I'm not sure I totally agree it's just
> a
> > > > > >>>>>>>>>> UI
> > > > > >>>>>>>>>>>>>>> concept. I
> > > > > >>>>>>>>>>>>>>>>> use
> > > > > >>>>>>>>>>>>>>>>>>> the
> > > > > >>>>>>>>>>>>>>>>>>>>>>> subdag
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>> operator to simplify dependencies too.
> If
> > > > > >>>>>>>>>>> you
> > > > > >>>>>>>>>>>>>> have a
> > > > > >>>>>>>>>>>>>>>>> group
> > > > > >>>>>>>>>>>>>>>>>>> of
> > > > > >>>>>>>>>>>>>>>>>>>>>> tasks
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>> that
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>> need to finish before another group of
> > > > > >>>>>>>>>> tasks
> > > > > >>>>>>>>>>>>>> start,
> > > > > >>>>>>>>>>>>>>>>> using
> > > > > >>>>>>>>>>>>>>>>>> a
> > > > > >>>>>>>>>>>>>>>>>>>>> subdag
> > > > > >>>>>>>>>>>>>>>>>>>>>>> is
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>> a
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>> pretty quick way to set those
> > dependencies
> > > > > >>>>>>>>>>>> and I
> > > > > >>>>>>>>>>>>>>> think
> > > > > >>>>>>>>>>>>>>>>>> also
> > > > > >>>>>>>>>>>>>>>>>>>> make
> > > > > >>>>>>>>>>>>>>>>>>>>>> it
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>> easier
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>> to follow the dag code.
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>> On Fri, Jun 12, 2020 at 9:53 AM Kyle
> > > > > >>>>>>>>>> Hamlin
> > > > > >>>>>>>>>>> <
> > > > > >>>>>>>>>>>>>>>>>>>>> hamlin.kn@gmail.com>
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>> wrote:
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>> I second Ash’s grouping concept.
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>> On Fri, Jun 12, 2020 at 5:10 AM Ash
> > > > > >>>>>>>>>>>>>> Berlin-Taylor
> > > > > >>>>>>>>>>>>>>> <
> > > > > >>>>>>>>>>>>>>>>>>>>>> ash@apache.org
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Question:
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Do we even need the SubDagOperator
> > > > > >>>>>>>>>>>> anymore?
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Would removing it entirely and just
> > > > > >>>>>>>>>>>>> replacing
> > > > > >>>>>>>>>>>>>> it
> > > > > >>>>>>>>>>>>>>>>> with
> > > > > >>>>>>>>>>>>>>>>>> a
> > > > > >>>>>>>>>>>>>>>>>>> UI
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>> grouping
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> concept be conceptually simpler, less
> > > > > >>>>>>>>>> to
> > > > > >>>>>>>>>>>> get
> > > > > >>>>>>>>>>>>>>>> wrong,
> > > > > >>>>>>>>>>>>>>>>>> and
> > > > > >>>>>>>>>>>>>>>>>>>>> closer
> > > > > >>>>>>>>>>>>>>>>>>>>>>> to
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>> what
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> users actually want to achieve with
> > > > > >>>>>>>>>>>> subdags?
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> With your proposed change, tasks in
> > > > > >>>>>>>>>>>> subdags
> > > > > >>>>>>>>>>>>>>> could
> > > > > >>>>>>>>>>>>>>>>>> start
> > > > > >>>>>>>>>>>>>>>>>>>>>> running
> > > > > >>>>>>>>>>>>>>>>>>>>>>> in
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> parallel (a good change) -- so should
> > > > > >>>>>>>>>> we
> > > > > >>>>>>>>>>>> not
> > > > > >>>>>>>>>>>>>>> also
> > > > > >>>>>>>>>>>>>>>>> just
> > > > > >>>>>>>>>>>>>>>>>>>>>>> _enitrely_
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>> remove
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> the concept of a sub dag and replace
> > > > > >>>>>>>>>> it
> > > > > >>>>>>>>>>>> with
> > > > > >>>>>>>>>>>>>>>>> something
> > > > > >>>>>>>>>>>>>>>>>>>>>> simpler.
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Problems with subdags (I think. I
> > > > > >>>>>>>>>>> haven't
> > > > > >>>>>>>>>>>>> used
> > > > > >>>>>>>>>>>>>>>> them
> > > > > >>>>>>>>>>>>>>>>>>>>>> extensively
> > > > > >>>>>>>>>>>>>>>>>>>>>>> so
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>> may
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> be wrong on some of these):
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> - They need their own dag_id, but it
> > > > > >>>>>>>>>>>> has(?)
> > > > > >>>>>>>>>>>>> to
> > > > > >>>>>>>>>>>>>>> be
> > > > > >>>>>>>>>>>>>>>> of
> > > > > >>>>>>>>>>>>>>>>>> the
> > > > > >>>>>>>>>>>>>>>>>>>>> form
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> `parent_dag_id.subdag_id`.
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> - They need their own
> > > > > >>>>>>>>>> schedule_interval,
> > > > > >>>>>>>>>>>> but
> > > > > >>>>>>>>>>>>>> it
> > > > > >>>>>>>>>>>>>>>> has
> > > > > >>>>>>>>>>>>>>>>> to
> > > > > >>>>>>>>>>>>>>>>>>>> match
> > > > > >>>>>>>>>>>>>>>>>>>>>> the
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>> parent
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>> dag
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> - Sub dags can be paused on their
> own.
> > > > > >>>>>>>>>>>> (Does
> > > > > >>>>>>>>>>>>>> it
> > > > > >>>>>>>>>>>>>>>> make
> > > > > >>>>>>>>>>>>>>>>>>> sense
> > > > > >>>>>>>>>>>>>>>>>>>>> to
> > > > > >>>>>>>>>>>>>>>>>>>>>> do
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>> this?
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Pausing just a sub dag would mean the
> > > > > >>>>>>>>>>> sub
> > > > > >>>>>>>>>>>>> dag
> > > > > >>>>>>>>>>>>>>>> would
> > > > > >>>>>>>>>>>>>>>>>>> never
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>> execute, so
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> the SubDagOperator would fail too.
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> - You had to choose the executor to
> > > > > >>>>>>>>>>>>> operator a
> > > > > >>>>>>>>>>>>>>>>> subdag
> > > > > >>>>>>>>>>>>>>>>>>> with
> > > > > >>>>>>>>>>>>>>>>>>>>> --
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>> always
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>> a
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> bit of a kludge.
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thoughts?
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> -ash
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Jun 12 2020, at 12:01 pm, Ash
> > > > > >>>>>>>>>>>>>> Berlin-Taylor <
> > > > > >>>>>>>>>>>>>>>>>>>>>> ash@apache.org>
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Workon sub-dags is much needed, I'm
> > > > > >>>>>>>>>>>>> excited
> > > > > >>>>>>>>>>>>>> to
> > > > > >>>>>>>>>>>>>>>> see
> > > > > >>>>>>>>>>>>>>>>>> how
> > > > > >>>>>>>>>>>>>>>>>>>>> this
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>> progresses.
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> - *Unpack SubDags during dag
> > > > > >>>>>>>>>>> parsing*:
> > > > > >>>>>>>>>>>>> This
> > > > > >>>>>>>>>>>>>>>>>> rewrites
> > > > > >>>>>>>>>>>>>>>>>>>> the
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> *DagBag.bag_dag*
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> method to unpack subdag while
> > > > > >>>>>>>>>>> parsing,
> > > > > >>>>>>>>>>>>> and
> > > > > >>>>>>>>>>>>>> it
> > > > > >>>>>>>>>>>>>>>>> will
> > > > > >>>>>>>>>>>>>>>>>>>> give a
> > > > > >>>>>>>>>>>>>>>>>>>>>>> flat
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> structure at
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the task level
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> The serialized_dag representation
> > > > > >>>>>>>>>>>> already
> > > > > >>>>>>>>>>>>>> does
> > > > > >>>>>>>>>>>>>>>>> this
> > > > > >>>>>>>>>>>>>>>>>> I
> > > > > >>>>>>>>>>>>>>>>>>>>> think.
> > > > > >>>>>>>>>>>>>>>>>>>>>>> At
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>> least
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>> if
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I've understood your idea here
> > > > > >>>>>>>>>>>> correctly.
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> -ash
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Jun 12 2020, at 9:51 am, Xinbin
> > > > > >>>>>>>>>>>> Huang <
> > > > > >>>>>>>>>>>>>>>>>>>>>>> bin.huangxb@gmail.com
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi everyone,
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Sending a message to everyone and
> > > > > >>>>>>>>>>>> collect
> > > > > >>>>>>>>>>>>>>>>> feedback
> > > > > >>>>>>>>>>>>>>>>>> on
> > > > > >>>>>>>>>>>>>>>>>>>> the
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>> AIP-34
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>> on
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> rewriting SubDagOperator. This was
> > > > > >>>>>>>>>>>>>> previously
> > > > > >>>>>>>>>>>>>>>>>> briefly
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>> mentioned in
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>> the
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> discussion about what needs to be
> > > > > >>>>>>>>>>> done
> > > > > >>>>>>>>>>>>> for
> > > > > >>>>>>>>>>>>>>>>> Airflow
> > > > > >>>>>>>>>>>>>>>>>>> 2.0,
> > > > > >>>>>>>>>>>>>>>>>>>>> and
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>> one of
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>> the
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ideas is to make SubDagOperator
> > > > > >>>>>>>>>>> attach
> > > > > >>>>>>>>>>>>>> tasks
> > > > > >>>>>>>>>>>>>>>> back
> > > > > >>>>>>>>>>>>>>>>>> to
> > > > > >>>>>>>>>>>>>>>>>>>> the
> > > > > >>>>>>>>>>>>>>>>>>>>>> root
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>> DAG.
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> This AIP-34 focuses on solving
> > > > > >>>>>>>>>>>>>> SubDagOperator
> > > > > >>>>>>>>>>>>>>>>>> related
> > > > > >>>>>>>>>>>>>>>>>>>>>> issues
> > > > > >>>>>>>>>>>>>>>>>>>>>>> by
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> reattaching
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> all tasks back to the root dag
> > > > > >>>>>>>>>> while
> > > > > >>>>>>>>>>>>>>> respecting
> > > > > >>>>>>>>>>>>>>>>>>>>>> dependencies
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>> during
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> parsing. The original grouping
> > > > > >>>>>>>>>> effect
> > > > > >>>>>>>>>>>> on
> > > > > >>>>>>>>>>>>>> the
> > > > > >>>>>>>>>>>>>>> UI
> > > > > >>>>>>>>>>>>>>>>>> will
> > > > > >>>>>>>>>>>>>>>>>>> be
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>> achieved
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>> through
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> grouping related tasks by metadata.
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> This also makes the dag_factory
> > > > > >>>>>>>>>>>> function
> > > > > >>>>>>>>>>>>>> more
> > > > > >>>>>>>>>>>>>>>>>>> reusable
> > > > > >>>>>>>>>>>>>>>>>>>>>>> because
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>> you
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>> don't
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> need to have parent_dag_name and
> > > > > >>>>>>>>>>>>>>> child_dag_name
> > > > > >>>>>>>>>>>>>>>>> in
> > > > > >>>>>>>>>>>>>>>>>>> the
> > > > > >>>>>>>>>>>>>>>>>>>>>>> function
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> signature
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> anymore.
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Changes proposed:
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> - *Unpack SubDags during dag
> > > > > >>>>>>>>>>> parsing*:
> > > > > >>>>>>>>>>>>> This
> > > > > >>>>>>>>>>>>>>>>>> rewrites
> > > > > >>>>>>>>>>>>>>>>>>>> the
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> *DagBag.bag_dag*
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> method to unpack subdag while
> > > > > >>>>>>>>>>> parsing,
> > > > > >>>>>>>>>>>>> and
> > > > > >>>>>>>>>>>>>> it
> > > > > >>>>>>>>>>>>>>>>> will
> > > > > >>>>>>>>>>>>>>>>>>>> give a
> > > > > >>>>>>>>>>>>>>>>>>>>>>> flat
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> structure at
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the task level
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> - *Simplify SubDagOperator*: The
> > > > > >>>>>>>>>> new
> > > > > >>>>>>>>>>>>>>>>> SubDagOperator
> > > > > >>>>>>>>>>>>>>>>>>>> acts
> > > > > >>>>>>>>>>>>>>>>>>>>>>> like a
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> container and most of the original
> > > > > >>>>>>>>>>>>> methods
> > > > > >>>>>>>>>>>>>>> are
> > > > > >>>>>>>>>>>>>>>>>>> removed.
> > > > > >>>>>>>>>>>>>>>>>>>>> The
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> signature is
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> also changed to *subdag_factory
> > > > > >>>>>>>>>> *with
> > > > > >>>>>>>>>>>>>>>>> *subdag_args
> > > > > >>>>>>>>>>>>>>>>>>> *and
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> *subdag_kwargs*.
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> This is similar to the
> > > > > >>>>>>>>>> PythonOperator
> > > > > >>>>>>>>>>>>>>>> signature.
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> - *Add a TaskGroup model and add
> > > > > >>>>>>>>>>>>>>> current_group
> > > > > >>>>>>>>>>>>>>>> &
> > > > > >>>>>>>>>>>>>>>>>>>>>> parent_group
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> attributes
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> to BaseOperator*: This metadata is
> > > > > >>>>>>>>>>> used
> > > > > >>>>>>>>>>>>> to
> > > > > >>>>>>>>>>>>>>>> group
> > > > > >>>>>>>>>>>>>>>>>>> tasks
> > > > > >>>>>>>>>>>>>>>>>>>>> for
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> rendering at
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> UI level. It may potentially extend
> > > > > >>>>>>>>>>>>> further
> > > > > >>>>>>>>>>>>>>> to
> > > > > >>>>>>>>>>>>>>>>>> group
> > > > > >>>>>>>>>>>>>>>>>>>>>>> arbitrary
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>> tasks
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> outside the context of subdag to
> > > > > >>>>>>>>>>> allow
> > > > > >>>>>>>>>>>>>>>>> group-level
> > > > > >>>>>>>>>>>>>>>>>>>>>> operations
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>> (i.e.
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> stop/trigger a group of task within
> > > > > >>>>>>>>>>> the
> > > > > >>>>>>>>>>>>>> dag)
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> - *Webserver UI for SubDag*:
> > > > > >>>>>>>>>> Proposed
> > > > > >>>>>>>>>>>> UI
> > > > > >>>>>>>>>>>>>>>>>> modification
> > > > > >>>>>>>>>>>>>>>>>>>> to
> > > > > >>>>>>>>>>>>>>>>>>>>>>> allow
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> (un)collapse a group of tasks for a
> > > > > >>>>>>>>>>>> flat
> > > > > >>>>>>>>>>>>>>>>> structure
> > > > > >>>>>>>>>>>>>>>>>> to
> > > > > >>>>>>>>>>>>>>>>>>>>> pair
> > > > > >>>>>>>>>>>>>>>>>>>>>>> with
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>> the
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> first
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> change instead of the original
> > > > > >>>>>>>>>>>>> hierarchical
> > > > > >>>>>>>>>>>>>>>>>>> structure.
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Please see related documents and
> > > > > >>>>>>>>>> PRs
> > > > > >>>>>>>>>>>> for
> > > > > >>>>>>>>>>>>>>>> details:
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> AIP:
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>
> > > > > >>>>>>>>>>>
> > > > > >>>>>>>>>>
> > > > > >>>>>>>>
> > > > > >>>>>>>
> > > > > >>>>>>
> > > > > >>>>>
> > > > > >>>>
> > > > > >>>
> > > > > >>
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-34+Rewrite+SubDagOperator
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Original Issue:
> > > > > >>>>>>>>>>>>>>>>>>>>>>>
> > https://github.com/apache/airflow/issues/8078
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Draft PR:
> > > > > >>>>>>>>>>>>>>>>>>> https://github.com/apache/airflow/pull/9243
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Please let me know if there are any
> > > > > >>>>>>>>>>>>> aspects
> > > > > >>>>>>>>>>>>>>>> that
> > > > > >>>>>>>>>>>>>>>>>> you
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>> agree/disagree
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> with or
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> need more clarification (especially
> > > > > >>>>>>>>>>> the
> > > > > >>>>>>>>>>>>>> third
> > > > > >>>>>>>>>>>>>>>>>> change
> > > > > >>>>>>>>>>>>>>>>>>>>>>> regarding
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> TaskGroup).
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Any comments are welcome and I am
> > > > > >>>>>>>>>>>> looking
> > > > > >>>>>>>>>>>>>>>> forward
> > > > > >>>>>>>>>>>>>>>>>> to
> > > > > >>>>>>>>>>>>>>>>>>>> it!
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Cheers
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Bin
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>> --
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>> Kyle Hamlin
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>>>>> --
> > > > > >>>>>>>>>>>>>>>>>>>>>>> Thanks & Regards
> > > > > >>>>>>>>>>>>>>>>>>>>>>> Poornima
> > > > > >>>>>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>> --
> > > > > >>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>> Jarek Potiuk
> > > > > >>>>>>>>>>>>>>> Polidea <https://www.polidea.com/> | Principal
> > > > > >> Software
> > > > > >>>>>> Engineer
> > > > > >>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>> M: +48 660 796 129 <+48%20660%20796%20129>
> > > > > >> <+48660796129
> > > > > >>>>>>>>>>>>> <+48%20660%20796%20129>>
> > > > > >>>>>>>>>>>>>>> [image: Polidea] <https://www.polidea.com/>
> > > > > >>>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>> --
> > > > > >>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>> Jarek Potiuk
> > > > > >>>>>>>>>>>>> Polidea <https://www.polidea.com/> | Principal
> > Software
> > > > > >>>>> Engineer
> > > > > >>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>> M: +48 660 796 129 <+48%20660%20796%20129>
> > <+48660796129
> > > > > >>>>>>>>>>>>> <+48%20660%20796%20129>>
> > > > > >>>>>>>>>>>>> [image: Polidea] <https://www.polidea.com/>
> > > > > >>>>>>>>>>>>>
> > > > > >>>>>>>>>>>>
> > > > > >>>>>>>>>>>>
> > > > > >>>>>>>>>>>> --
> > > > > >>>>>>>>>>>>
> > > > > >>>>>>>>>>>> *Jacob Ferriero*
> > > > > >>>>>>>>>>>>
> > > > > >>>>>>>>>>>> Strategic Cloud Engineer: Data Engineering
> > > > > >>>>>>>>>>>>
> > > > > >>>>>>>>>>>> jferriero@google.com
> > > > > >>>>>>>>>>>>
> > > > > >>>>>>>>>>>> 617-714-2509
> > > > > >>>>>>>>>>>>
> > > > > >>>>>>>>>>>
> > > > > >>>>>>>>>>
> > > > > >>>>>>>>
> > > > > >>>>>>>>
> > > > > >>>>>>>
> > > > > >>>>>>
> > > > > >>>>>
> > > > > >>>>
> > > > > >>>
> > > > > >>
> > > > >
> > > > >
> > > >
> > >
> >
>

Re: [AIP-34] Rewrite SubDagOperator

Posted by Gerard Casas Saez <gc...@twitter.com.INVALID>.
As I mentioned in the issue, I believe prefixing group_id is a nice thing
as it makes TaskGroup an equivalent for SubDagOperator. Internally we have
a similar concept to TaskGroup called FlattenedSubDagOperator that
append the group_id to the task_id.

One of the main usages internally for this operator is hyperparameter
tuning ML models. For that we provide an abstraction where users  provide a
SubDag that takes in dictionary of hyperparameters (through XComArg) and
push and xcom that is a dictionary of metrics.This task group is usually a
combination of model training and model analysis, but it can be whatever
you want. We create a hyperparameter tuning DAG for the user easily by
instantiating this SubDag/TaskGroup many times for the number of
experiments they need to perform.

Now, if group_id is not appended to task_id, this type of reuse of task
groups would not be possible. You would need to ask the user to parametrize
task_id and that's a bit counter intuitive as Airflow task_id are not
templatized. Another option is to make this behaviour customizable and have
a flag that activates it on the TaskGroup.


Gerard Casas Saez
Twitter | Cortex | @casassaez <http://twitter.com/casassaez>


On Tue, Sep 1, 2020 at 1:03 AM Yu Qian <yu...@gmail.com> wrote:

> The vote for this AIP-34
> <
> https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-34+TaskGroup%3A+A+UI+task+grouping+concept+as+an+alternative+to+SubDagOperator
> >
> passed. However, there's an interesting discussion going on here
> <https://github.com/apache/airflow/pull/10153#discussion_r480247681>
> regarding whether task_id should be automatically prefixed with group_id of
> TaskGroup. So I'm bringing it up in this email thread for discussion.
>
> Plan A: Prefix task_id with group_id of TaskGroup. This is the original
> plan in AIP-34
> <
> https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-34+TaskGroup%3A+A+UI+task+grouping+concept+as+an+alternative+to+SubDagOperator
> >.
> The task_id argument passed to an operator just needs to be unique across
> the TaskGroup. The actual task_id is prefixed with the group_id so task_id
> is guaranteed to be unique across the DAG.
>
> Plan B: Do not prefix task_id with group_id of TaskGroup. The task_id
> argument passed to the operator is the actual task_id. So the user is
> forced to make sure task_id is unique across the whole DAG.
>
> Obviously the convenience of Plan A is not free of charge. I’m summarizing
> some of the pros and cons in this table. There are two examples at the
> bottom illustrating the different usage. I was convinced by houqp on the
> github comments and some of my own experiments that Plan B has more
> advantages and avoids surprises. I'm going to update AIP-34
> <
> https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-34+TaskGroup%3A+A+UI+task+grouping+concept+as+an+alternative+to+SubDagOperator
> >
> according to Plan B unless I hear strong objections before 20200903 7am
> UTC.
>
>
>
>
> Plan A
>
> Plan B
>
> Ease of Use
>
> Easier to use for new DAGs
>
> Slightly more work on the user to maintain task_id uniqueness
>
> Implementation
>
> A little more complicated. Each group needs to know its parent’s group_id
> in order to prefix the group_id correctly.
>
> Implementation is simpler. No need to know the parent TaskGroup’s group_id.
>
> Ease of Migration
>
> task_id will change if TaskGroup is introduced into an existing DAG.
> Existing tasks put into a TaskGroup will appear like new tasks if the DAG
> already has some historical DagRun. This may pose a barrier to adoption of
> TaskGroup.
>
> No change in task_id when an existing task is put into a TaskGroup.
> Migrating existing DAGs to adopt TaskGroup will be easier.
>
> Actual task_id
>
> Actual task_id tend to be longer because it’s always prefixed with
> group_id, especially if the task is in a nested TaskGroup.
>
> Actual task_id tend to be shorter because users control the actual task_id
> themselves.
>
> Graph label
>
> Labels on Graph View tend to be shorter because task_id only needs to be
> unique within the TaskGroup
>
> Labels on Graph View tend to be longer because it displays the actual
> task_id, which is a unique str across the DAG.
>
>
> Plan A Example:
>
> def create_section():
>
>     dummies = [DummyOperator(task_id=f'task-{i + 1}') for i in range(5)]
>
>     with TaskGroup("inside_section_1") as inside_section_1:
>
>         _ = [DummyOperator(task_id=f'task-{i + 1}',) for i in range(3)]
>
>     with TaskGroup("inside_section_2") as inside_section_2:
>
>         _ = [DummyOperator(task_id=f'task-{i + 1}',) for i in range(3)]
>
>     dummies[-1] >> inside_section_1
>
>     dummies[-2] >> inside_section_2
>
>     inside_section_1 >> inside_section_2
>
>
> with DAG(dag_id="example_task_group", start_date=days_ago(2)) as dag:
>
>     start = DummyOperator(task_id="start")
>
>     with TaskGroup("section_1", tooltip="Tasks for Section 1") as
> section_1:
>
>         create_section()
>
>     some_other_task = DummyOperator(task_id="some-other-task")
>
>     with TaskGroup("section_2", tooltip="Tasks for Section 2") as
> section_2:
>
>         create_section()
>
>     end = DummyOperator(task_id='end')
>
>     start >> section_1 >> some_other_task >> section_2 >> end
>
>
> Plan B Example:
>
> def create_section(section_num):
>
>     dummies = [DummyOperator(task_id=f'task-{section_num}.{i + 1}') for i
> in range(5)]
>
>     with TaskGroup(f"section_{section_num}.1") as inside_section_1:
>
>         _ = [DummyOperator(task_id=f'task-{section_num}.1.{i + 1}',) for i
> in range(3)]
>
>     with TaskGroup(f"section_{section_num}.2") as inside_section_2:
>
>         _ = [DummyOperator(task_id=f'task-{section_num}.2.{i + 1}',) for i
> in range(3)]
>
>     dummies[-1] >> inside_section_1
>
>     dummies[-2] >> inside_section_2
>
>     inside_section_1 >> inside_section_2
>
>
> with DAG(dag_id="example_task_group", start_date=days_ago(2)) as dag:
>
>     start = DummyOperator(task_id="start")
>
>     with TaskGroup("section_1", tooltip="Tasks for Section 1") as
> section_1:
>
>         create_section(1)
>
>     some_other_task = DummyOperator(task_id="some-other-task")
>
>     with TaskGroup("section_2", tooltip="Tasks for Section 2") as
> section_2:
>
>         create_section(2)
>
>     end = DummyOperator(task_id='end')
>
>     start >> section_1 >> some_other_task >> section_2 >> end
>
>
> On Sat, Aug 22, 2020 at 1:02 AM Gerard Casas Saez
> <gc...@twitter.com.invalid> wrote:
>
> > Agree on this being non-blocking.
> >
> > Regarding moving to vote, you can take care. Just open a new email thread
> > on dev list and call for a vote. You can see this example from Tomek for
> > AIP-31:
> >
> >
> https://lists.apache.org/thread.html/r16ee2b6263f5849a2e2b1b6c3ba39fb3d643022f052216cb19a0a8da%40%3Cdev.airflow.apache.org%3E
> >
> > Best,
> >
> >
> > Gerard Casas Saez
> > Twitter | Cortex | @casassaez <http://twitter.com/casassaez>
> >
> >
> > On Thu, Aug 20, 2020 at 7:10 PM Yu Qian <yu...@gmail.com> wrote:
> >
> > > Hi, Gerard, yes I agree it's possible to do this at UI level without
> any
> > > fundamental change to the implementation. If expand_group() sees that
> two
> > > groups are fully connected (i.e. every task in one parent group depends
> > on
> > > every task in another parent group), it can decide to collapse all
> those
> > > children edges into a single edge between the parent groups to reduce
> the
> > > burden of the layout() function. However, I did not find any existing
> > > algorithm to do this within dagre so we'll likely need to implement
> this
> > > ourselves. Another hiccup is that at the moment it doesn't seem to be
> > > possible to call setEdge() between two parent groups (aka clusters). If
> > > someone has ideas how to do this please feel free to contribute.
> > >
> > > One other consideration is that this example is only an extreme case.
> > There
> > > are other in-between cases that still require user intervention. Let's
> > say
> > > if 90% of tasks in group1 depends on 90% of tasks in group2 and both
> > groups
> > > have more than 100 tasks. This will still cause a lot of edges on the
> > graph
> > > and it's even harder to reduce because the parent groups are not fully
> > > connected so it's inaccurate to reduce them to a single edge between
> the
> > > parents. In those cases, the user may still need to do something
> > > themselves. e.g. adding some DummyOperator to the DAG to cut down the
> > > edges. There will be some tradeoff because DummyOperator takes a short
> > > while to execute like you mentioned.
> > >
> > > There are lots of room for improvements, but I don't think that's a
> > > blocking issue for this AIP? So if you can move it to the voting stage
> > > that'll be fantastic.
> > >
> > >
> > > On Thu, Aug 20, 2020 at 4:21 PM 耀 周 <zh...@icloud.com.invalid>
> > > wrote:
> > >
> > > > +1
> > > >
> > > > > 2020年8月18日 23:55,Gerard Casas Saez <gcasassaez@twitter.com
> .INVALID>
> > > 写道:
> > > > >
> > > > > Is it not possible to solve this at the UI level? Aka tell dagre to
> > > only
> > > > > add 1 edge to the group instead of to all nodes in the group? No
> need
> > > to
> > > > do
> > > > > SubDag behaviour, but just reduce the edges on the graph. Should
> > reduce
> > > > > load time if I understand correctly.
> > > > >
> > > > > I would strongly avoid the Dummy operator since it will introduce
> > > delays
> > > > on
> > > > > operator execution (as it will need to execute 1 dummy operator and
> > > that
> > > > > can be expensive imo).
> > > > >
> > > > > Overall though proposal looks good, unless anyone opposes it, I
> would
> > > > move
> > > > > this to vote mode :D
> > > > >
> > > > > Gerard Casas Saez
> > > > > Twitter | Cortex | @casassaez <http://twitter.com/casassaez>
> > > > >
> > > > >
> > > > > On Mon, Aug 17, 2020 at 9:56 AM Yu Qian <yu...@gmail.com>
> > wrote:
> > > > >
> > > > >> Hi, All,
> > > > >> Here's the updated AIP-34
> > > > >> <
> > > > >>
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-34+TaskGroup%3A+A+UI+task+grouping+concept+as+an+alternative+to+SubDagOperator
> > > > >>> .
> > > > >> The PR has been fine-tuned with better UI interactions and added
> > > > >> serialization of TaskGroup:
> > > > https://github.com/apache/airflow/pull/10153
> > > > >>
> > > > >> Here's some experiment results:
> > > > >> A made up dag containing 403 tasks, and 5696 edges. Grouped like
> > this.
> > > > Note
> > > > >> there's a inside_section_2 is intentionally made to depend on all
> > > tasks
> > > > >> in inside_section_1 to generate a large number of edges. The
> > > > observation is
> > > > >> that opening the top level graph is very quick, around 270ms.
> > > Expanding
> > > > >> groups that don't have a lot of dense dependencies on other groups
> > are
> > > > also
> > > > >> hardly noticeable. E.g expanding section_1 takes 330ms. The part
> > that
> > > > takes
> > > > >> time is when expanding both groups inside_section_1 and
> > > inside_section_2
> > > > >> Because there are 2500 edges between these two inner groups, it
> took
> > > 63
> > > > >> seconds to expand both of them. Majority of the time (more than
> > > > 62seconds)
> > > > >> is actually taken by the layout() function in dagre. In other
> words,
> > > > it's
> > > > >> very fast to add nodes and edges, but laying them out on the graph
> > > takes
> > > > >> time. This issue is not actually a problem specific to TaskGroup.
> > > > Without
> > > > >> TaskGroup, if a DAG contains too many edges, it takes time to
> layout
> > > the
> > > > >> graph too.
> > > > >>
> > > > >> On the other hand, a more realistic experiment with production DAG
> > > > >> containing about 400 tasks and 700 edges showed that grouping
> tasks
> > > into
> > > > >> three levels of nested TaskGroup cut the upfront page opening time
> > > from
> > > > >> around 6s to 500ms. (Obviously the time is paid back when user
> > > gradually
> > > > >> expands all the groups one by one, but normally people don't need
> to
> > > > expand
> > > > >> every group every time so it's still a big saving). The
> experiments
> > > are
> > > > >> done on OS X Mojave, 2.2 GHz, Intel Core i7, 16GB Memory, Chrome.
> > > > >>
> > > > >> I can see a few possible improvements to TaskGroup (or how it's
> > used)
> > > > that
> > > > >> can be done as a next-step:
> > > > >> 1). Like Gerard suggested, we can implement lazy-loading. Instead
> of
> > > > >> displaying the whole DAG, we can limit the Graph View to show
> only a
> > > > single
> > > > >> TaskGroup, omitting its edges going out to other TaskGroups. This
> > > > behaviour
> > > > >> is more like SubDagOperator where users can zoom into/out of a
> > > TaskGroup
> > > > >> and look at only tasks within that TaskGroup as if those are the
> > only
> > > > tasks
> > > > >> on the DAG. This can be done with either background javascript
> calls
> > > or
> > > > by
> > > > >> making a new get request with filtering parameters. Obviously the
> > > > downside
> > > > >> is that it's not as explicit as showing all the dependencies on
> the
> > > > graph.
> > > > >> 2). Users can improve the organization of the DAG themselves to
> > reduce
> > > > the
> > > > >> number of edges. E.g. if every task in group2 depends on every
> tasks
> > > in
> > > > >> group1, instead of doing group1 >> group2, they can add a
> > > DummyOperator
> > > > in
> > > > >> between and do this: group1 >> dummy >> group2. This cuts down the
> > > > number
> > > > >> of edges significantly and page load becomes much faster.
> > > > >> 3). If we really want, we can improve the >> operator of TaskGroup
> > to
> > > > do 2)
> > > > >> automatically. If it sees that both sides of >> are TaskGroup, it
> > can
> > > > >> create a DummyOperator on behalf of the user. The downside is that
> > it
> > > > may
> > > > >> be too much magic.
> > > > >>
> > > > >> Thanks,
> > > > >> Qian
> > > > >>
> > > > >> def create_section():
> > > > >> """
> > > > >> Create tasks in the outer section.
> > > > >> """
> > > > >> dummies = [DummyOperator(task_id=f'task-{i + 1}') for i in
> > range(100)]
> > > > >>
> > > > >> with TaskGroup("inside_section_1") as inside_section_1:
> > > > >> _ = [DummyOperator(task_id=f'task-{i + 1}',) for i in range(50)]
> > > > >>
> > > > >> with TaskGroup("inside_section_2") as inside_section_2:
> > > > >> _ = [DummyOperator(task_id=f'task-{i + 1}',) for i in range(50)]
> > > > >>
> > > > >> dummies[-1] >> inside_section_1
> > > > >> dummies[-2] >> inside_section_2
> > > > >> inside_section_1 >> inside_section_2
> > > > >>
> > > > >>
> > > > >> with DAG(dag_id="example_task_group", start_date=days_ago(2)) as
> > dag:
> > > > >> start = DummyOperator(task_id="start")
> > > > >>
> > > > >> with TaskGroup("section_1") as section_1:
> > > > >> create_section()
> > > > >>
> > > > >> some_other_task = DummyOperator(task_id="some-other-task")
> > > > >>
> > > > >> with TaskGroup("section_2") as section_2:
> > > > >> create_section()
> > > > >>
> > > > >> end = DummyOperator(task_id='end')
> > > > >>
> > > > >> start >> section_1 >> some_other_task >> section_2 >> end
> > > > >>
> > > > >>
> > > > >> On Sat, Aug 15, 2020 at 6:56 AM Gerard Casas Saez
> > > > >> <gc...@twitter.com.invalid> wrote:
> > > > >>
> > > > >>> Re graph times. That makes sense. Let me know what you find. We
> may
> > > be
> > > > >> able
> > > > >>> to contribute on the lazy loading part.
> > > > >>>
> > > > >>> Looking forward to see the updated AIP!
> > > > >>>
> > > > >>>
> > > > >>> Gerard Casas Saez
> > > > >>> Twitter | Cortex | @casassaez <http://twitter.com/casassaez>
> > > > >>>
> > > > >>>
> > > > >>> On Fri, Aug 14, 2020 at 6:14 AM Kaxil Naik <ka...@gmail.com>
> > > > wrote:
> > > > >>>
> > > > >>>> Permissions granted, let me know if you face any issues.
> > > > >>>>
> > > > >>>> On Fri, Aug 14, 2020 at 1:10 PM Yu Qian <yu...@gmail.com>
> > > wrote:
> > > > >>>>
> > > > >>>>> Hi, Kaxil, my ID for cwiki.apache.org is yuqian1990. Thank
> you!
> > > > >>>>>
> > > > >>>>> On Fri, Aug 14, 2020 at 7:35 PM Kaxil Naik <
> kaxilnaik@gmail.com>
> > > > >>> wrote:
> > > > >>>>>
> > > > >>>>>> What's your ID i.e. if you haven't created an account yet,
> > please
> > > > >>>> create
> > > > >>>>>> one at https://cwiki.apache.org/confluence/signup.action and
> > send
> > > > >> us
> > > > >>>>> your
> > > > >>>>>> ID and we will add permissions.
> > > > >>>>>>
> > > > >>>>>> Thanks. I'll edit the AIP. May I request permission to edit
> it?
> > > > >>>>>>> My wiki user email is yuqian1990@gmail.com.
> > > > >>>>>>
> > > > >>>>>>
> > > > >>>>>> On Fri, Aug 14, 2020 at 9:45 AM Yu Qian <yuqian1990@gmail.com
> >
> > > > >>> wrote:
> > > > >>>>>>
> > > > >>>>>>> Re, Xinbin. Thanks. I'll edit the AIP. May I request
> permission
> > > > >> to
> > > > >>>> edit
> > > > >>>>>> it?
> > > > >>>>>>> My wiki user email is yuqian1990@gmail.com.
> > > > >>>>>>>
> > > > >>>>>>> Re Gerard: yes the UI loads all the nodes as json from the
> web
> > > > >>> server
> > > > >>>>> at
> > > > >>>>>>> once. However, it only adds the top level nodes and edges to
> > the
> > > > >>>> graph
> > > > >>>>>> when
> > > > >>>>>>> the Graph View page is first opened. And then adds the
> expanded
> > > > >>> nodes
> > > > >>>>> to
> > > > >>>>>>> the graph as the user expands them. From what I've
> experienced
> > > > >> with
> > > > >>>>> DAGs
> > > > >>>>>>> containing around 400 tasks (not using TaskGroup or
> > > > >>> SubDagOperator),
> > > > >>>>>>> opening the whole dag in Graph View usually takes 5 seconds.
> > Less
> > > > >>>> than
> > > > >>>>>> 60ms
> > > > >>>>>>> of that is taken by loading the data from webserver. The
> > > > >> remaining
> > > > >>>>> 4.9s+
> > > > >>>>>> is
> > > > >>>>>>> taken by javascript functions in dagre-d3.min.js such as
> > > > >>> createNodes,
> > > > >>>>>>> createEdgeLabels, etc and by rendering the graph. With
> > TaskGroup
> > > > >>>> being
> > > > >>>>>> used
> > > > >>>>>>> to group tasks into a smaller number of top-level nodes, the
> > > > >> amount
> > > > >>>> of
> > > > >>>>>> data
> > > > >>>>>>> loaded from webserver will remain about the same compared to
> a
> > > > >> flat
> > > > >>>> dag
> > > > >>>>>> of
> > > > >>>>>>> the same size, but the number of nodes and edges needed to be
> > > > >> plot
> > > > >>> on
> > > > >>>>> the
> > > > >>>>>>> graph can be reduced significantly. So in theory this should
> > > > >> speed
> > > > >>> up
> > > > >>>>> the
> > > > >>>>>>> time it takes to open Graph View even without lazy-loading
> the
> > > > >> data
> > > > >>>>> (I'll
> > > > >>>>>>> experiment to find out). That said, if it comes to a point
> > > > >>>> lazy-loading
> > > > >>>>>>> helps, we can still implement it as an improvement.
> > > > >>>>>>>
> > > > >>>>>>> Re James: the Tree View looks as if all all the groups are
> > fully
> > > > >>>>>> expanded.
> > > > >>>>>>> (because under the hood all the tasks are in a single DAG).
> I'm
> > > > >>> less
> > > > >>>>>>> worried about Tree View at the moment because it already has
> a
> > > > >>>>> mechanism
> > > > >>>>>>> for collapsing tasks by the dependency tree. That said, the
> > Tree
> > > > >>> View
> > > > >>>>> can
> > > > >>>>>>> definitely be improved too with TaskGroup. (e.g. collapse
> tasks
> > > > >> in
> > > > >>>> the
> > > > >>>>>> same
> > > > >>>>>>> TaskGroup when Tree View is first opened).
> > > > >>>>>>>
> > > > >>>>>>> For both suggestions, implementing them don't require
> > fundamental
> > > > >>>>> changes
> > > > >>>>>>> to the idea. I think we can have a basic working TaskGroup
> > first,
> > > > >>> and
> > > > >>>>>> then
> > > > >>>>>>> improve it incrementally in several PRs as we get more
> feedback
> > > > >>> from
> > > > >>>>> the
> > > > >>>>>>> community. What do you think?
> > > > >>>>>>>
> > > > >>>>>>> Qian
> > > > >>>>>>>
> > > > >>>>>>>
> > > > >>>>>>> On Wed, Aug 12, 2020 at 9:15 AM James Coder <
> > jcoder01@gmail.com>
> > > > >>>>> wrote:
> > > > >>>>>>>
> > > > >>>>>>>> I agree this looks great, one question, how does the tree
> view
> > > > >>>> look?
> > > > >>>>>>>>
> > > > >>>>>>>> James Coder
> > > > >>>>>>>>
> > > > >>>>>>>>> On Aug 11, 2020, at 6:48 PM, Gerard Casas Saez <
> > > > >>>>>> gcasassaez@twitter.com
> > > > >>>>>>> .invalid>
> > > > >>>>>>>> wrote:
> > > > >>>>>>>>>
> > > > >>>>>>>>> First of all, this is awesome!!
> > > > >>>>>>>>>
> > > > >>>>>>>>> Secondly, checking your UI code, seems you are loading all
> > > > >>>>> operators
> > > > >>>>>> at
> > > > >>>>>>>>> once. Wondering if we can load them as needed (aka load
> > > > >>> whenever
> > > > >>>> we
> > > > >>>>>>> click
> > > > >>>>>>>>> the TaskGroup). Some of our DAGs are so large that take
> > > > >> forever
> > > > >>>> to
> > > > >>>>>> load
> > > > >>>>>>>> on
> > > > >>>>>>>>> the Graph view, so worried about this still being an issue
> > > > >>> here.
> > > > >>>> It
> > > > >>>>>> may
> > > > >>>>>>>> be
> > > > >>>>>>>>> easily solvable by implementing lazy loading of the graph.
> > > > >> Not
> > > > >>>> sure
> > > > >>>>>> how
> > > > >>>>>>>>> easy to implement/add to the UI extension (and dont want to
> > > > >>> push
> > > > >>>>> for
> > > > >>>>>>>> early
> > > > >>>>>>>>> optimization as its the root of all evil).
> > > > >>>>>>>>> Gerard Casas Saez
> > > > >>>>>>>>> Twitter | Cortex | @casassaez <
> http://twitter.com/casassaez>
> > > > >>>>>>>>>
> > > > >>>>>>>>>
> > > > >>>>>>>>>> On Tue, Aug 11, 2020 at 10:35 AM Xinbin Huang <
> > > > >>>>>> bin.huangxb@gmail.com>
> > > > >>>>>>>> wrote:
> > > > >>>>>>>>>>
> > > > >>>>>>>>>> Hi Yu,
> > > > >>>>>>>>>>
> > > > >>>>>>>>>> Thank you so much for taking on this. I was fairly
> > > > >> distracted
> > > > >>>>>>> previously
> > > > >>>>>>>>>> and I didn't have the time to update the proposal. In
> fact,
> > > > >>>> after
> > > > >>>>>>>>>> discussing with Ash, Kaxil and Daniel, the direction of
> this
> > > > >>> AIP
> > > > >>>>> has
> > > > >>>>>>>> been
> > > > >>>>>>>>>> changed to favor the concept of TaskGroup instead of
> > > > >> rewriting
> > > > >>>>>>>>>> SubDagOperator (though it may may sense to deprecate
> SubDag
> > > > >>> in a
> > > > >>>>>>> future
> > > > >>>>>>>>>> date.).
> > > > >>>>>>>>>>
> > > > >>>>>>>>>> Your PR is amazing and it has implemented the desire
> > > > >>> features. I
> > > > >>>>>> think
> > > > >>>>>>>> we
> > > > >>>>>>>>>> can focus on your new PR instead. Do you mind updating the
> > > > >> AIP
> > > > >>>>> based
> > > > >>>>>>> on
> > > > >>>>>>>>>> what you have done in your PR?
> > > > >>>>>>>>>>
> > > > >>>>>>>>>> Best,
> > > > >>>>>>>>>> Bin
> > > > >>>>>>>>>>
> > > > >>>>>>>>>>
> > > > >>>>>>>>>>> On Tue, Aug 11, 2020 at 7:11 AM Yu Qian <
> > > > >>> yuqian1990@gmail.com>
> > > > >>>>>>> wrote:
> > > > >>>>>>>>>>>
> > > > >>>>>>>>>>> Hi, all, I've added the basic UI changes to my proposed
> > > > >>>>>>> implementation
> > > > >>>>>>>> of
> > > > >>>>>>>>>>> TaskGroup as UI grouping concept:
> > > > >>>>>>>>>>> https://github.com/apache/airflow/pull/10153
> > > > >>>>>>>>>>>
> > > > >>>>>>>>>>> I think Chris had a pretty good specification of
> TaskGroup
> > > > >> so
> > > > >>>> i'm
> > > > >>>>>>>> quoting
> > > > >>>>>>>>>>> it here. The only thing I don't fully agree with is the
> > > > >>>>> restriction
> > > > >>>>>>>>>>> "... **cannot*
> > > > >>>>>>>>>>> have dependencies between a Task in a TaskGroup and
> either
> > > > >> a*
> > > > >>>>>>>>>>> *   Task in a different TaskGroup or a Task not in any
> > > > >>>> group*". I
> > > > >>>>>>> think
> > > > >>>>>>>>>>> this is over restrictive. Since TaskGroup is a UI
> concept,
> > > > >>>> tasks
> > > > >>>>>> can
> > > > >>>>>>>> have
> > > > >>>>>>>>>>> dependencies on tasks in other TaskGroup or not in any
> > > > >>>> TaskGroup.
> > > > >>>>>> In
> > > > >>>>>>> my
> > > > >>>>>>>>>> PR,
> > > > >>>>>>>>>>> this is allowed. The graph edges will update accordingly
> > > > >> when
> > > > >>>>>>>> TaskGroups
> > > > >>>>>>>>>>> are expanded/collapsed. TaskGroup is only helping to make
> > > > >> the
> > > > >>>> UI
> > > > >>>>>> look
> > > > >>>>>>>>>> less
> > > > >>>>>>>>>>> crowded. Under the hood, everything is still a DAG of
> tasks
> > > > >>> and
> > > > >>>>>> edges
> > > > >>>>>>>> so
> > > > >>>>>>>>>>> things work normally. Here's a screenshot
> > > > >>>>>>>>>>> <
> > > > >>>>>>>>>>>
> > > > >>>>>>>>>>
> > > > >>>>>>>>
> > > > >>>>>>>
> > > > >>>>>>
> > > > >>>>>
> > > > >>>>
> > > > >>>
> > > > >>
> > > >
> > >
> >
> https://raw.githubusercontent.com/yuqian90/airflow/gif_for_demo/airflow/www/static/screen-shot-short.gif
> > > > >>>>>>>>>>>>
> > > > >>>>>>>>>>> of the UI interaction.
> > > > >>>>>>>>>>>
> > > > >>>>>>>>>>>
> > > > >>>>>>>>>>>
> > > > >>>>>>>>>>>
> > > > >>>>>>>>>>>
> > > > >>>>>>>>>>>
> > > > >>>>>>>>>>>
> > > > >>>>>>>>>>>
> > > > >>>>>>>>>>>
> > > > >>>>>>>>>>>
> > > > >>>>>>>>>>>
> > > > >>>>>>>>>>> *   - Tasks can be added to a TaskGroup   - You *can*
> have
> > > > >>>>>>> dependencies
> > > > >>>>>>>>>>> between Tasks in the same TaskGroup, but   *cannot* have
> > > > >>>>>> dependencies
> > > > >>>>>>>>>>> between a Task in a TaskGroup and either a   Task in a
> > > > >>>> different
> > > > >>>>>>>>>> TaskGroup
> > > > >>>>>>>>>>> or a Task not in any group   - You *can* have
> dependencies
> > > > >>>>> between
> > > > >>>>>> a
> > > > >>>>>>>>>>> TaskGroup and either other   TaskGroups or Tasks not in
> any
> > > > >>>> group
> > > > >>>>>> -
> > > > >>>>>>>> The
> > > > >>>>>>>>>>> UI will by default render a TaskGroup as a single
> "object",
> > > > >>> but
> > > > >>>>>>> which
> > > > >>>>>>>>>> you
> > > > >>>>>>>>>>> expand or zoom into in some way   - You'd need some way
> to
> > > > >>>>>> determine
> > > > >>>>>>>> what
> > > > >>>>>>>>>>> the "status" of a TaskGroup was   at least for UI display
> > > > >>>>> purposes*
> > > > >>>>>>>>>>>
> > > > >>>>>>>>>>>
> > > > >>>>>>>>>>> Regarding Jake's comment, I agree it's possible to
> > > > >> implement
> > > > >>>> the
> > > > >>>>>>>>>> "retrying
> > > > >>>>>>>>>>> tasks in a group" pattern he mentioned as an optional
> > > > >> feature
> > > > >>>> of
> > > > >>>>>>>>>> TaskGroup
> > > > >>>>>>>>>>> although that may go against having TaskGroup as a pure
> UI
> > > > >>>>> concept.
> > > > >>>>>>> For
> > > > >>>>>>>>>> the
> > > > >>>>>>>>>>> motivating example Jake provided, I suggest implementing
> > > > >> both
> > > > >>>>>>>>>>> SubmitLongRunningJobTask and PollJobStatusSensor in a
> > > > >> single
> > > > >>>>>>> operator.
> > > > >>>>>>>> It
> > > > >>>>>>>>>>> can do something like BaseSensorOperator.execute() does
> in
> > > > >>>>>>> "reschedule"
> > > > >>>>>>>>>>> mode, i.e. it first executes some code to submit the long
> > > > >>>> running
> > > > >>>>>> job
> > > > >>>>>>>> to
> > > > >>>>>>>>>>> the external service, and store the state (e.g. in XCom).
> > > > >>> Then
> > > > >>>>>>>> reschedule
> > > > >>>>>>>>>>> itself. Subsequent runs then pokes for the completion
> > > > >> state.
> > > > >>>>>>>>>>>
> > > > >>>>>>>>>>>
> > > > >>>>>>>>>>> On Thu, Aug 6, 2020 at 2:08 AM Jacob Ferriero
> > > > >>>>>>>>>> <jferriero@google.com.invalid
> > > > >>>>>>>>>>>>
> > > > >>>>>>>>>>> wrote:
> > > > >>>>>>>>>>>
> > > > >>>>>>>>>>>> I really like this idea of a TaskGroup container as I
> > > > >> think
> > > > >>>> this
> > > > >>>>>>> will
> > > > >>>>>>>>>> be
> > > > >>>>>>>>>>>> much easier to use than SubDag.
> > > > >>>>>>>>>>>>
> > > > >>>>>>>>>>>> I'd like to propose an optional behavior for special
> retry
> > > > >>>>>> mechanics
> > > > >>>>>>>>>> via
> > > > >>>>>>>>>>> a
> > > > >>>>>>>>>>>> TaskGroup.retry_all property.
> > > > >>>>>>>>>>>> This way I could use TaskGroup to replace my favorite
> use
> > > > >> of
> > > > >>>>>> SubDag
> > > > >>>>>>>> for
> > > > >>>>>>>>>>>> atomically retrying tasks of the pattern "act on
> external
> > > > >>>> state
> > > > >>>>>> then
> > > > >>>>>>>>>>>> reschedule poll until desired state reached".
> > > > >>>>>>>>>>>>
> > > > >>>>>>>>>>>> Motivating use case I have for a SubDag is very simple
> two
> > > > >>>> task
> > > > >>>>>>> group
> > > > >>>>>>>>>>>> [SubmitLongRunningJobTask >> PollJobStatusSensor].
> > > > >>>>>>>>>>>> I use SubDag is because it gives me an easy way to retry
> > > > >> the
> > > > >>>>>>>>>>> SubmitJobTask
> > > > >>>>>>>>>>>> if something about the PollJobSensor fails.
> > > > >>>>>>>>>>>> This pattern would be really nice for jobs that are
> > > > >> expected
> > > > >>>> to
> > > > >>>>>> run
> > > > >>>>>>> a
> > > > >>>>>>>>>>> long
> > > > >>>>>>>>>>>> time (because we can use sensor can use reschedule mode
> > > > >>>> freeing
> > > > >>>>> up
> > > > >>>>>>>>>> slots)
> > > > >>>>>>>>>>>> but might fail for a retryable reason.
> > > > >>>>>>>>>>>> However, using SubDag to meet this use case defeats the
> > > > >>>> purpose
> > > > >>>>>>>> because
> > > > >>>>>>>>>>>> SubDag infamously
> > > > >>>>>>>>>>>> <
> > > > >>>>>>>>>>>>
> > > > >>>>>>>>>>>
> > > > >>>>>>>>>>
> > > > >>>>>>>>
> > > > >>>>>>>
> > > > >>>>>>
> > > > >>>>>
> > > > >>>>
> > > > >>>
> > > > >>
> > > >
> > >
> >
> https://medium.com/@team_24989/fixing-subdagoperator-deadlock-in-airflow-6c64312ebb10
> > > > >>>>>>>>>>>>>
> > > > >>>>>>>>>>>> blocks a "controller" slot for the entire duration.
> > > > >>>>>>>>>>>> This may feel like a cyclic behavior but reality it is
> > > > >> very
> > > > >>>>> common
> > > > >>>>>>> for
> > > > >>>>>>>>>> a
> > > > >>>>>>>>>>>> single operator to submit job / wait til done.
> > > > >>>>>>>>>>>> We could use this case refactor many operators (e.g. BQ,
> > > > >>>>> Dataproc,
> > > > >>>>>>>>>>>> Dataflow) to be implemented as TaskGroup[SubmitTask >>
> > > > >>>> PollTask]
> > > > >>>>>>> with
> > > > >>>>>>>>>> an
> > > > >>>>>>>>>>>> optional reschedule mode if user knows that this job may
> > > > >>> take
> > > > >>>> a
> > > > >>>>>> long
> > > > >>>>>>>>>>> time.
> > > > >>>>>>>>>>>>
> > > > >>>>>>>>>>>> I'd be happy to the development work on adding this
> > > > >> specific
> > > > >>>>> retry
> > > > >>>>>>>>>>> behavior
> > > > >>>>>>>>>>>> to TaskGroup once the base concept is implemented if
> > > > >> others
> > > > >>> in
> > > > >>>>> the
> > > > >>>>>>>>>>>> community would find this a useful feature.
> > > > >>>>>>>>>>>>
> > > > >>>>>>>>>>>> Cheers,
> > > > >>>>>>>>>>>> Jake
> > > > >>>>>>>>>>>>
> > > > >>>>>>>>>>>> On Tue, Aug 4, 2020 at 10:07 AM Jarek Potiuk <
> > > > >>>>>>>> Jarek.Potiuk@polidea.com
> > > > >>>>>>>>>>>
> > > > >>>>>>>>>>>> wrote:
> > > > >>>>>>>>>>>>
> > > > >>>>>>>>>>>>> All for it :) . I think we are getting closer to have
> > > > >>> regular
> > > > >>>>>>>>>> planning
> > > > >>>>>>>>>>>> and
> > > > >>>>>>>>>>>>> making some structured approach to 2.0 and starting
> task
> > > > >>>> force
> > > > >>>>>> for
> > > > >>>>>>> it
> > > > >>>>>>>>>>>> soon,
> > > > >>>>>>>>>>>>> so I think this should be perfectly fine to discuss and
> > > > >>> even
> > > > >>>>>> start
> > > > >>>>>>>>>>>>> implementing what's beyond as soon as we make sure that
> > > > >> we
> > > > >>>> are
> > > > >>>>>>>>>>>> prioritizing
> > > > >>>>>>>>>>>>> 2.0 work.
> > > > >>>>>>>>>>>>>
> > > > >>>>>>>>>>>>> J,
> > > > >>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>
> > > > >>>>>>>>>>>>> On Tue, Aug 4, 2020 at 12:09 PM Yu Qian <
> > > > >>>> yuqian1990@gmail.com>
> > > > >>>>>>>>>> wrote:
> > > > >>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>> Hi Jarek,
> > > > >>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>> I agree we should not change the behaviour of the
> > > > >> existing
> > > > >>>>>>>>>>>> SubDagOperator
> > > > >>>>>>>>>>>>>> till Airflow 2.1. Is it okay to continue the
> discussion
> > > > >>>> about
> > > > >>>>>>>>>>> TaskGroup
> > > > >>>>>>>>>>>>> as
> > > > >>>>>>>>>>>>>> a brand new concept/feature independent from the
> > > > >> existing
> > > > >>>>>>>>>>>> SubDagOperator?
> > > > >>>>>>>>>>>>>> In other words, shall we add TaskGroup as a UI
> grouping
> > > > >>>>> concept
> > > > >>>>>>>>>> like
> > > > >>>>>>>>>>>> Ash
> > > > >>>>>>>>>>>>>> suggested, and not touch SubDagOperator atl all.
> > > > >> Whenever
> > > > >>> we
> > > > >>>>> are
> > > > >>>>>>>>>>> ready
> > > > >>>>>>>>>>>>> with
> > > > >>>>>>>>>>>>>> TaskGroup, we then deprecate SubDagOperator in Airflow
> > > > >>> 2.1.
> > > > >>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>> I really like Ash's idea of simplifying the
> > > > >> SubDagOperator
> > > > >>>>> idea
> > > > >>>>>>>>>> into
> > > > >>>>>>>>>>> a
> > > > >>>>>>>>>>>>>> simple UI grouping concept. I think Xinbin's idea of
> > > > >>>>>> "reattaching
> > > > >>>>>>>>>> all
> > > > >>>>>>>>>>>> the
> > > > >>>>>>>>>>>>>> tasks to the root DAG" is the way to go. And I see
> James
> > > > >>>>> pointed
> > > > >>>>>>>>>> out
> > > > >>>>>>>>>>> we
> > > > >>>>>>>>>>>>>> need some helper functions to simplify dependencies
> > > > >>> setting
> > > > >>>> of
> > > > >>>>>>>>>>>> TaskGroup.
> > > > >>>>>>>>>>>>>> Xinbin put up a pretty elegant example in his PR
> > > > >>>>>>>>>>>>>> <https://github.com/apache/airflow/pull/9243>. I
> think
> > > > >>>> having
> > > > >>>>>>>>>>>> TaskGroup
> > > > >>>>>>>>>>>>> as
> > > > >>>>>>>>>>>>>> a UI concept should be a relatively small change. We
> can
> > > > >>>>>> simplify
> > > > >>>>>>>>>>>>> Xinbin's
> > > > >>>>>>>>>>>>>> PR further. So I put up this alternative proposal
> here:
> > > > >>>>>>>>>>>>>> https://github.com/apache/airflow/pull/10153
> > > > >>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>> I have not done any UI changes due to lack of
> experience
> > > > >>>> with
> > > > >>>>>> web
> > > > >>>>>>>>>> UI.
> > > > >>>>>>>>>>>> If
> > > > >>>>>>>>>>>>>> anyone's interested, please take a look at the PR.
> > > > >>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>> Qian
> > > > >>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>> On Mon, Jun 22, 2020 at 5:15 AM Jarek Potiuk <
> > > > >>>>>>>>>>> Jarek.Potiuk@polidea.com
> > > > >>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>> wrote:
> > > > >>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>> Similar point here to the other ideas that are
> popping
> > > > >>> up.
> > > > >>>>>> Maybe
> > > > >>>>>>>>>> we
> > > > >>>>>>>>>>>>>> should
> > > > >>>>>>>>>>>>>>> just focus on completing 2.0 and make all discussions
> > > > >>> about
> > > > >>>>>>>>>> further
> > > > >>>>>>>>>>>>>>> improvements to 2.1? While those are important
> > > > >>> discussions
> > > > >>>>> (and
> > > > >>>>>>>>>> we
> > > > >>>>>>>>>>>>> should
> > > > >>>>>>>>>>>>>>> continue them in the  near future !) I think at this
> > > > >>> point
> > > > >>>>>>>>>> focusing
> > > > >>>>>>>>>>>> on
> > > > >>>>>>>>>>>>>>> delivering 2.0 in its current shape should be our
> focus
> > > > >>>> now ?
> > > > >>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>> J.
> > > > >>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>> On Thu, Jun 18, 2020 at 6:35 PM Xinbin Huang <
> > > > >>>>>>>>>>> bin.huangxb@gmail.com>
> > > > >>>>>>>>>>>>>>> wrote:
> > > > >>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>> Hi Daniel
> > > > >>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>> I agree that the TaskGroup should have the same API
> > > > >> as a
> > > > >>>> DAG
> > > > >>>>>>>>>>> object
> > > > >>>>>>>>>>>>>>> related
> > > > >>>>>>>>>>>>>>>> to task dependencies, but it will not have anything
> > > > >>>> related
> > > > >>>>> to
> > > > >>>>>>>>>>>> actual
> > > > >>>>>>>>>>>>>>>> execution or scheduling.
> > > > >>>>>>>>>>>>>>>> I will update the AIP according to this over the
> > > > >>> weekend.
> > > > >>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>> We could even make a “DAGTemplate” object s.t. when
> > > > >> you
> > > > >>>>>>>>>> import
> > > > >>>>>>>>>>>> the
> > > > >>>>>>>>>>>>>>> object
> > > > >>>>>>>>>>>>>>>> you can import it with parameters to determine the
> > > > >> shape
> > > > >>>> of
> > > > >>>>>> the
> > > > >>>>>>>>>>>> DAG.
> > > > >>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>> Can you elaborate a bit more on this? Does it serve
> a
> > > > >>>>> similar
> > > > >>>>>>>>>>>> purpose
> > > > >>>>>>>>>>>>>> as
> > > > >>>>>>>>>>>>>>> a
> > > > >>>>>>>>>>>>>>>> DAG factory function?
> > > > >>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>> On Thu, Jun 18, 2020 at 9:13 AM Daniel Imberman <
> > > > >>>>>>>>>>>>>>> daniel.imberman@gmail.com
> > > > >>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>> wrote:
> > > > >>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>> Hi Bin,
> > > > >>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>> Why not give the TaskGroup the same API as a DAG
> > > > >> object
> > > > >>>>> (e.g.
> > > > >>>>>>>>>>> the
> > > > >>>>>>>>>>>>>>> bitwise
> > > > >>>>>>>>>>>>>>>>> operator fro task dependencies). We could even
> make a
> > > > >>>>>>>>>>>> “DAGTemplate”
> > > > >>>>>>>>>>>>>>>> object
> > > > >>>>>>>>>>>>>>>>> s.t. when you import the object you can import it
> > > > >> with
> > > > >>>>>>>>>>> parameters
> > > > >>>>>>>>>>>>> to
> > > > >>>>>>>>>>>>>>>>> determine the shape of the DAG.
> > > > >>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>> On Wed, Jun 17, 2020 at 8:54 PM, Xinbin Huang <
> > > > >>>>>>>>>>>>> bin.huangxb@gmail.com
> > > > >>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>> wrote:
> > > > >>>>>>>>>>>>>>>>> The TaskGroup will not take schedule interval as a
> > > > >>>>> parameter
> > > > >>>>>>>>>>>>> itself,
> > > > >>>>>>>>>>>>>>> and
> > > > >>>>>>>>>>>>>>>> it
> > > > >>>>>>>>>>>>>>>>> depends on the DAG where it attaches to. In my
> > > > >> opinion,
> > > > >>>> the
> > > > >>>>>>>>>>>>> TaskGroup
> > > > >>>>>>>>>>>>>>>> will
> > > > >>>>>>>>>>>>>>>>> only contain a group of tasks with
> interdependencies,
> > > > >>> and
> > > > >>>>> the
> > > > >>>>>>>>>>>>>> TaskGroup
> > > > >>>>>>>>>>>>>>>>> behaves like a task. It doesn't contain any
> > > > >>>>>>>>>>> execution/scheduling
> > > > >>>>>>>>>>>>>> logic
> > > > >>>>>>>>>>>>>>>>> (i.e. schedule_interval, concurrency,
> max_active_runs
> > > > >>>> etc.)
> > > > >>>>>>>>>>> like
> > > > >>>>>>>>>>>> a
> > > > >>>>>>>>>>>>>> DAG
> > > > >>>>>>>>>>>>>>>>> does.
> > > > >>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>> For example, there is the scenario that the
> schedule
> > > > >>>>>>>>>> interval
> > > > >>>>>>>>>>>> of
> > > > >>>>>>>>>>>>>> DAG
> > > > >>>>>>>>>>>>>>> is
> > > > >>>>>>>>>>>>>>>>> 1 hour and the schedule interval of TaskGroup is 20
> > > > >>> min.
> > > > >>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>> I am curious why you ask this. Is this a use case
> > > > >> that
> > > > >>>> you
> > > > >>>>>>>>>> want
> > > > >>>>>>>>>>>> to
> > > > >>>>>>>>>>>>>>>> achieve?
> > > > >>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>> Bin
> > > > >>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>> On Wed, Jun 17, 2020 at 7:59 PM 蒋晓峰 <
> > > > >>>>>>>>>> thanosxnicholas@gmail.com
> > > > >>>>>>>>>>>>
> > > > >>>>>>>>>>>>>> wrote:
> > > > >>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>> Hi Bin,
> > > > >>>>>>>>>>>>>>>>>> Using TaskGroup, Is the schedule interval of
> > > > >> TaskGroup
> > > > >>>> the
> > > > >>>>>>>>>>> same
> > > > >>>>>>>>>>>>> as
> > > > >>>>>>>>>>>>>>> the
> > > > >>>>>>>>>>>>>>>>>> parent DAG? My main concern is whether the
> schedule
> > > > >>>>>>>>>> interval
> > > > >>>>>>>>>>> of
> > > > >>>>>>>>>>>>>>>> TaskGroup
> > > > >>>>>>>>>>>>>>>>>> could be different with that of the DAG? For
> > > > >> example,
> > > > >>>>> there
> > > > >>>>>>>>>>> is
> > > > >>>>>>>>>>>>> the
> > > > >>>>>>>>>>>>>>>>> scenario
> > > > >>>>>>>>>>>>>>>>>> that the schedule interval of DAG is 1 hour and
> the
> > > > >>>>>>>>>> schedule
> > > > >>>>>>>>>>>>>> interval
> > > > >>>>>>>>>>>>>>>> of
> > > > >>>>>>>>>>>>>>>>>> TaskGroup is 20 min.
> > > > >>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>> Cheers,
> > > > >>>>>>>>>>>>>>>>>> Nicholas
> > > > >>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>> On Thu, Jun 18, 2020 at 10:30 AM Xinbin Huang <
> > > > >>>>>>>>>>>>>> bin.huangxb@gmail.com
> > > > >>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>> wrote:
> > > > >>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>> Hi Nicholas,
> > > > >>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>> I am not sure about the old behavior of
> > > > >>> SubDagOperator,
> > > > >>>>>>>>>>> maybe
> > > > >>>>>>>>>>>>> it
> > > > >>>>>>>>>>>>>>> will
> > > > >>>>>>>>>>>>>>>>>> throw
> > > > >>>>>>>>>>>>>>>>>>> an error? But in the original proposal, the
> > > > >> subdag's
> > > > >>>>>>>>>>>>>>>> schedule_interval
> > > > >>>>>>>>>>>>>>>>>> will
> > > > >>>>>>>>>>>>>>>>>>> be ignored. Or if we decide to use TaskGroup to
> > > > >>> replace
> > > > >>>>>>>>>>>> SubDag,
> > > > >>>>>>>>>>>>>>> there
> > > > >>>>>>>>>>>>>>>>>> will
> > > > >>>>>>>>>>>>>>>>>>> be no subdag schedule_interval.
> > > > >>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>> Bin
> > > > >>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>> On Wed, Jun 17, 2020 at 6:21 PM 蒋晓峰 <
> > > > >>>>>>>>>>>> thanosxnicholas@gmail.com
> > > > >>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>> wrote:
> > > > >>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>> Hi Bin,
> > > > >>>>>>>>>>>>>>>>>>>> Thanks for your good proposal. I was confused
> > > > >>> whether
> > > > >>>>>>>>>> the
> > > > >>>>>>>>>>>>>>> schedule
> > > > >>>>>>>>>>>>>>>>>>>> interval of SubDAG is different from that of the
> > > > >>>> parent
> > > > >>>>>>>>>>>> DAG?
> > > > >>>>>>>>>>>>> I
> > > > >>>>>>>>>>>>>>> have
> > > > >>>>>>>>>>>>>>>>>>>> discussed with Jiajie Zhong about the schedule
> > > > >>>> interval
> > > > >>>>>>>>>>> of
> > > > >>>>>>>>>>>>>>> SubDAG.
> > > > >>>>>>>>>>>>>>>> If
> > > > >>>>>>>>>>>>>>>>>> the
> > > > >>>>>>>>>>>>>>>>>>>> SubDagOperator has a different schedule
> interval,
> > > > >>> what
> > > > >>>>>>>>>>> will
> > > > >>>>>>>>>>>>>>> happen
> > > > >>>>>>>>>>>>>>>>> for
> > > > >>>>>>>>>>>>>>>>>>> the
> > > > >>>>>>>>>>>>>>>>>>>> scheduler to schedule the parent DAG?
> > > > >>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>> Regards,
> > > > >>>>>>>>>>>>>>>>>>>> Nicholas Jiang
> > > > >>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>> On Thu, Jun 18, 2020 at 8:04 AM Xinbin Huang <
> > > > >>>>>>>>>>>>>>>> bin.huangxb@gmail.com>
> > > > >>>>>>>>>>>>>>>>>>>> wrote:
> > > > >>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>> Thank you, Max, Kaxil, and everyone's feedback!
> > > > >>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>> I have rethought about the concept of subdag
> and
> > > > >>> task
> > > > >>>>>>>>>>>>>> groups. I
> > > > >>>>>>>>>>>>>>>>> think
> > > > >>>>>>>>>>>>>>>>>>> the
> > > > >>>>>>>>>>>>>>>>>>>>> better way to approach this is to entirely
> remove
> > > > >>>>>>>>>>> subdag
> > > > >>>>>>>>>>>>> and
> > > > >>>>>>>>>>>>>>>>>> introduce
> > > > >>>>>>>>>>>>>>>>>>>> the
> > > > >>>>>>>>>>>>>>>>>>>>> concept of TaskGroup, which is a container of
> > > > >> tasks
> > > > >>>>>>>>>>> along
> > > > >>>>>>>>>>>>>> with
> > > > >>>>>>>>>>>>>>>>> their
> > > > >>>>>>>>>>>>>>>>>>>>> dependencies *without execution/scheduling
> logic
> > > > >>> as a
> > > > >>>>>>>>>>>> DAG*.
> > > > >>>>>>>>>>>>>> The
> > > > >>>>>>>>>>>>>>>>> only
> > > > >>>>>>>>>>>>>>>>>>>>> purpose of it is to group a list of tasks, but
> > > > >> you
> > > > >>>>>>>>>>> still
> > > > >>>>>>>>>>>>> need
> > > > >>>>>>>>>>>>>>> to
> > > > >>>>>>>>>>>>>>>>> add
> > > > >>>>>>>>>>>>>>>>>> it
> > > > >>>>>>>>>>>>>>>>>>>> to
> > > > >>>>>>>>>>>>>>>>>>>>> a DAG for execution.
> > > > >>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>> Here is a small code snippet.
> > > > >>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>> ```
> > > > >>>>>>>>>>>>>>>>>>>>> class TaskGroup:
> > > > >>>>>>>>>>>>>>>>>>>>> """
> > > > >>>>>>>>>>>>>>>>>>>>> A TaskGroup contains a group of tasks.
> > > > >>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>> If default_args is missing, it will take
> default
> > > > >>> args
> > > > >>>>>>>>>>>> from
> > > > >>>>>>>>>>>>>> the
> > > > >>>>>>>>>>>>>>>>>> DAG.
> > > > >>>>>>>>>>>>>>>>>>>>> """
> > > > >>>>>>>>>>>>>>>>>>>>> def __init__(self, group_id, default_args):
> > > > >>>>>>>>>>>>>>>>>>>>> pass
> > > > >>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>> """
> > > > >>>>>>>>>>>>>>>>>>>>> You can add tasks to a task group similar to
> > > > >> adding
> > > > >>>>>>>>>>> tasks
> > > > >>>>>>>>>>>>> to
> > > > >>>>>>>>>>>>>> a
> > > > >>>>>>>>>>>>>>>> DAG
> > > > >>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>> This can be declared in a separate file from
> the
> > > > >>> dag
> > > > >>>>>>>>>>> file
> > > > >>>>>>>>>>>>>>>>>>>>> """
> > > > >>>>>>>>>>>>>>>>>>>>> download_group = TaskGroup(group_id='download',
> > > > >>>>>>>>>>>>>>>>>>>> default_args=default_args)
> > > > >>>>>>>>>>>>>>>>>>>>> download_group.add_task(task1)
> > > > >>>>>>>>>>>>>>>>>>>>> task2.dag = download_group
> > > > >>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>> with download_group:
> > > > >>>>>>>>>>>>>>>>>>>>> task3 = DummyOperator(task_id='task3')
> > > > >>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>> [task, task2] >> task3
> > > > >>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>> """Add it to a DAG for execution"""
> > > > >>>>>>>>>>>>>>>>>>>>> with DAG(dag_id='start_download_dag',
> > > > >>>>>>>>>>>>>>> default_args=default_args,
> > > > >>>>>>>>>>>>>>>>>>>>> schedule_interval='@daily', ...) as dag:
> > > > >>>>>>>>>>>>>>>>>>>>> start = DummyOperator(task_id='start')
> > > > >>>>>>>>>>>>>>>>>>>>> start >> download_group
> > > > >>>>>>>>>>>>>>>>>>>>> # this is equivalent to
> > > > >>>>>>>>>>>>>>>>>>>>> # start >> [task, task2] >> task3
> > > > >>>>>>>>>>>>>>>>>>>>> ```
> > > > >>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>> With this, we can still reuse a group of tasks
> > > > >> and
> > > > >>>>>>>>>> set
> > > > >>>>>>>>>>>>>>>> dependencies
> > > > >>>>>>>>>>>>>>>>>>>> between
> > > > >>>>>>>>>>>>>>>>>>>>> them; it avoids the boilerplate code from using
> > > > >>>>>>>>>>>>>> SubDagOperator,
> > > > >>>>>>>>>>>>>>>> and
> > > > >>>>>>>>>>>>>>>>>> we
> > > > >>>>>>>>>>>>>>>>>>>> can
> > > > >>>>>>>>>>>>>>>>>>>>> declare dependencies as `task >> task_group >>
> > > > >>> task`.
> > > > >>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>> User migration wise, we can introduce it before
> > > > >>>>>>>>>> Airflow
> > > > >>>>>>>>>>>> 2.0
> > > > >>>>>>>>>>>>>> and
> > > > >>>>>>>>>>>>>>>>> allow
> > > > >>>>>>>>>>>>>>>>>>>>> gradual transition. Then we can decide if we
> > > > >> still
> > > > >>>>>>>>>> want
> > > > >>>>>>>>>>>> to
> > > > >>>>>>>>>>>>>> keep
> > > > >>>>>>>>>>>>>>>> the
> > > > >>>>>>>>>>>>>>>>>>>>> SubDagOperator or simply remove it.
> > > > >>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>> Any thoughts?
> > > > >>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>> Cheers,
> > > > >>>>>>>>>>>>>>>>>>>>> Bin
> > > > >>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>> On Wed, Jun 17, 2020 at 7:37 AM Maxime
> > > > >> Beauchemin <
> > > > >>>>>>>>>>>>>>>>>>>>> maximebeauchemin@gmail.com> wrote:
> > > > >>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>>> +1, proposal looks good.
> > > > >>>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>>> The original intention was really to have
> tasks
> > > > >>>>>>>>>>> groups
> > > > >>>>>>>>>>>>> and
> > > > >>>>>>>>>>>>>> a
> > > > >>>>>>>>>>>>>>>>>>>> zoom-in/out
> > > > >>>>>>>>>>>>>>>>>>>>> in
> > > > >>>>>>>>>>>>>>>>>>>>>> the UI. The original reasoning was to reuse
> the
> > > > >>> DAG
> > > > >>>>>>>>>>>>> object
> > > > >>>>>>>>>>>>>>>> since
> > > > >>>>>>>>>>>>>>>>> it
> > > > >>>>>>>>>>>>>>>>>>> is
> > > > >>>>>>>>>>>>>>>>>>>> a
> > > > >>>>>>>>>>>>>>>>>>>>>> group of tasks, but as highlighted here it
> does
> > > > >>>>>>>>>>> create
> > > > >>>>>>>>>>>>>>>> underlying
> > > > >>>>>>>>>>>>>>>>>>>>>> confusions since a DAG is much more than just
> a
> > > > >>>>>>>>>> group
> > > > >>>>>>>>>>>> of
> > > > >>>>>>>>>>>>>>> tasks.
> > > > >>>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>>> Max
> > > > >>>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>>> On Mon, Jun 15, 2020 at 2:43 AM Poornima
> Joshi <
> > > > >>>>>>>>>>>>>>>>>>>>> joshipoornima06@gmail.com>
> > > > >>>>>>>>>>>>>>>>>>>>>> wrote:
> > > > >>>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>>>> Thank you for your email.
> > > > >>>>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>>>> On Sat, Jun 13, 2020 at 12:18 AM Xinbin
> Huang <
> > > > >>>>>>>>>>>>>>>>>>> bin.huangxb@gmail.com
> > > > >>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>>>> wrote:
> > > > >>>>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>> - *Unpack SubDags during dag parsing*:
> This
> > > > >>>>>>>>>>>>>> rewrites
> > > > >>>>>>>>>>>>>>>> the
> > > > >>>>>>>>>>>>>>>>>>>>>>>> *DagBag.bag_dag*
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>> method to unpack subdag while parsing, and
> > > > >>>>>>>>>> it
> > > > >>>>>>>>>>>>> will
> > > > >>>>>>>>>>>>>>>> give a
> > > > >>>>>>>>>>>>>>>>>>>> flat
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>> structure at
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>> the task level
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>>>>>> The serialized_dag representation already
> > > > >>>>>>>>>> does
> > > > >>>>>>>>>>>>> this I
> > > > >>>>>>>>>>>>>>>>> think.
> > > > >>>>>>>>>>>>>>>>>> At
> > > > >>>>>>>>>>>>>>>>>>>>> least
> > > > >>>>>>>>>>>>>>>>>>>>>>> if
> > > > >>>>>>>>>>>>>>>>>>>>>>>>> I've understood your idea here correctly.
> > > > >>>>>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>>>>> I am not sure about serialized_dag
> > > > >>>>>>>>>>> representation,
> > > > >>>>>>>>>>>>> but
> > > > >>>>>>>>>>>>>> at
> > > > >>>>>>>>>>>>>>>>> least
> > > > >>>>>>>>>>>>>>>>>>> it
> > > > >>>>>>>>>>>>>>>>>>>>> will
> > > > >>>>>>>>>>>>>>>>>>>>>>>> still keep the subdag entry in the DAG
> table?
> > > > >>>>>>>>>> In
> > > > >>>>>>>>>>> my
> > > > >>>>>>>>>>>>>>>> proposal
> > > > >>>>>>>>>>>>>>>>> as
> > > > >>>>>>>>>>>>>>>>>>>> also
> > > > >>>>>>>>>>>>>>>>>>>>> in
> > > > >>>>>>>>>>>>>>>>>>>>>>> the
> > > > >>>>>>>>>>>>>>>>>>>>>>>> draft PR, the idea is to *extract the tasks
> > > > >>>>>>>>>> from
> > > > >>>>>>>>>>>> the
> > > > >>>>>>>>>>>>>>> subdag
> > > > >>>>>>>>>>>>>>>>> and
> > > > >>>>>>>>>>>>>>>>>>> add
> > > > >>>>>>>>>>>>>>>>>>>>>> them
> > > > >>>>>>>>>>>>>>>>>>>>>>>> back to the root_dag. *So the runtime DAG
> > > > >> graph
> > > > >>>>>>>>>>>> will
> > > > >>>>>>>>>>>>>> look
> > > > >>>>>>>>>>>>>>>>>> exactly
> > > > >>>>>>>>>>>>>>>>>>>> the
> > > > >>>>>>>>>>>>>>>>>>>>>>>> same as without subdag but with metadata
> > > > >>>>>>>>>> attached
> > > > >>>>>>>>>>>> to
> > > > >>>>>>>>>>>>>>> those
> > > > >>>>>>>>>>>>>>>>>>>> sections.
> > > > >>>>>>>>>>>>>>>>>>>>>>> These
> > > > >>>>>>>>>>>>>>>>>>>>>>>> metadata will be later on used to render in
> > > > >> the
> > > > >>>>>>>>>>> UI.
> > > > >>>>>>>>>>>>> So
> > > > >>>>>>>>>>>>>>>> after
> > > > >>>>>>>>>>>>>>>>>>>> parsing
> > > > >>>>>>>>>>>>>>>>>>>>> (
> > > > >>>>>>>>>>>>>>>>>>>>>>>> *DagBag.process_file()*), it will just
> output
> > > > >>>>>>>>>> the
> > > > >>>>>>>>>>>>>>> *root_dag
> > > > >>>>>>>>>>>>>>>>>>>> *instead
> > > > >>>>>>>>>>>>>>>>>>>>> of
> > > > >>>>>>>>>>>>>>>>>>>>>>> *root_dag +
> > > > >>>>>>>>>>>>>>>>>>>>>>>> subdag + subdag + nested subdag* etc.
> > > > >>>>>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>>>>> - e.g. section-1-* will have metadata
> > > > >>>>>>>>>>>>>>>>>> current_group=section-1,
> > > > >>>>>>>>>>>>>>>>>>>>>>>> parent_group=<the-root-dag-id> (welcome for
> > > > >>>>>>>>>>> naming
> > > > >>>>>>>>>>>>>>>>>>> suggestions),
> > > > >>>>>>>>>>>>>>>>>>>>> the
> > > > >>>>>>>>>>>>>>>>>>>>>>>> reason for parent_group is that we can have
> > > > >>>>>>>>>>> nested
> > > > >>>>>>>>>>>>>> group
> > > > >>>>>>>>>>>>>>>> and
> > > > >>>>>>>>>>>>>>>>>>>> still
> > > > >>>>>>>>>>>>>>>>>>>>>> be
> > > > >>>>>>>>>>>>>>>>>>>>>>>> able to capture the dependency.
> > > > >>>>>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>>>>> Runtime DAG:
> > > > >>>>>>>>>>>>>>>>>>>>>>>> [image: image.png]
> > > > >>>>>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>>>>> While at the UI, what we see would be
> > > > >> something
> > > > >>>>>>>>>>>> like
> > > > >>>>>>>>>>>>>> this
> > > > >>>>>>>>>>>>>>>> by
> > > > >>>>>>>>>>>>>>>>>>>>> utilizing
> > > > >>>>>>>>>>>>>>>>>>>>>>> the
> > > > >>>>>>>>>>>>>>>>>>>>>>>> metadata, and then we can expand or zoom
> into
> > > > >>>>>>>>>> in
> > > > >>>>>>>>>>>> some
> > > > >>>>>>>>>>>>>>> way.
> > > > >>>>>>>>>>>>>>>>>>>>>>>> [image: image.png]
> > > > >>>>>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>>>>> The benefits I can see is that:
> > > > >>>>>>>>>>>>>>>>>>>>>>>> 1. We don't need to deal with the extra
> > > > >>>>>>>>>>> complexity
> > > > >>>>>>>>>>>> of
> > > > >>>>>>>>>>>>>>>> SubDag
> > > > >>>>>>>>>>>>>>>>>> for
> > > > >>>>>>>>>>>>>>>>>>>>>>> execution
> > > > >>>>>>>>>>>>>>>>>>>>>>>> and scheduling. It will be the same as not
> > > > >>>>>>>>>> using
> > > > >>>>>>>>>>>>>> SubDag.
> > > > >>>>>>>>>>>>>>>>>>>>>>>> 2. Still have the benefits of modularized
> and
> > > > >>>>>>>>>>>>> reusable
> > > > >>>>>>>>>>>>>>> dag
> > > > >>>>>>>>>>>>>>>>> code
> > > > >>>>>>>>>>>>>>>>>>> and
> > > > >>>>>>>>>>>>>>>>>>>>>>>> declare dependencies between them. And with
> > > > >> the
> > > > >>>>>>>>>>> new
> > > > >>>>>>>>>>>>>>>>>>> SubDagOperator
> > > > >>>>>>>>>>>>>>>>>>>>> (see
> > > > >>>>>>>>>>>>>>>>>>>>>>> AIP
> > > > >>>>>>>>>>>>>>>>>>>>>>>> or draft PR), we can use the same
> dag_factory
> > > > >>>>>>>>>>>>> function
> > > > >>>>>>>>>>>>>>> for
> > > > >>>>>>>>>>>>>>>>>>>>> generating 1
> > > > >>>>>>>>>>>>>>>>>>>>>>>> dag, a lot of dynamic dags, or used for
> SubDag
> > > > >>>>>>>>>>> (in
> > > > >>>>>>>>>>>>> this
> > > > >>>>>>>>>>>>>>>> case,
> > > > >>>>>>>>>>>>>>>>>> it
> > > > >>>>>>>>>>>>>>>>>>>> will
> > > > >>>>>>>>>>>>>>>>>>>>>>> just
> > > > >>>>>>>>>>>>>>>>>>>>>>>> extract all underlying tasks and append to
> the
> > > > >>>>>>>>>>> root
> > > > >>>>>>>>>>>>>> dag).
> > > > >>>>>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>>>>> - Then it gets to the idea of replacing
> subdag
> > > > >>>>>>>>>>>> with a
> > > > >>>>>>>>>>>>>>>>>> simpler
> > > > >>>>>>>>>>>>>>>>>>>>>> concept
> > > > >>>>>>>>>>>>>>>>>>>>>>>> by Ash: the proposed change basically drains
> > > > >>>>>>>>>> out
> > > > >>>>>>>>>>>> the
> > > > >>>>>>>>>>>>>>>>>> contents
> > > > >>>>>>>>>>>>>>>>>>>> of
> > > > >>>>>>>>>>>>>>>>>>>>> a
> > > > >>>>>>>>>>>>>>>>>>>>>>> SubDag
> > > > >>>>>>>>>>>>>>>>>>>>>>>> and becomes more like
> > > > >>>>>>>>>>>>>>>>>>>> ExtractSubdagTasksAndAppendToRootdagOperator
> > > > >>>>>>>>>>>>>>>>>>>>>>> (forgive
> > > > >>>>>>>>>>>>>>>>>>>>>>>> me about the crazy name..). In this case, it
> > > > >> is
> > > > >>>>>>>>>>>> still
> > > > >>>>>>>>>>>>>>>>>>> necessary
> > > > >>>>>>>>>>>>>>>>>>>> to
> > > > >>>>>>>>>>>>>>>>>>>>>>> keep the
> > > > >>>>>>>>>>>>>>>>>>>>>>>> concept of subdag as it is nothing more
> than a
> > > > >>>>>>>>>>>> name?
> > > > >>>>>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>>>>> That's why the TaskGroup idea comes up.
> Thanks
> > > > >>>>>>>>>>>> Chris
> > > > >>>>>>>>>>>>>>> Palmer
> > > > >>>>>>>>>>>>>>>>> for
> > > > >>>>>>>>>>>>>>>>>>>>> helping
> > > > >>>>>>>>>>>>>>>>>>>>>>>> conceptualize the functionality of
> TaskGroup,
> > > > >> I
> > > > >>>>>>>>>>>> will
> > > > >>>>>>>>>>>>>> just
> > > > >>>>>>>>>>>>>>>>> paste
> > > > >>>>>>>>>>>>>>>>>>> it
> > > > >>>>>>>>>>>>>>>>>>>>>> here.
> > > > >>>>>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>>>>>> - Tasks can be added to a TaskGroup
> > > > >>>>>>>>>>>>>>>>>>>>>>>>> - You *can* have dependencies between Tasks
> > > > >>>>>>>>>> in
> > > > >>>>>>>>>>>> the
> > > > >>>>>>>>>>>>>> same
> > > > >>>>>>>>>>>>>>>>>>>> TaskGroup,
> > > > >>>>>>>>>>>>>>>>>>>>>> but
> > > > >>>>>>>>>>>>>>>>>>>>>>>>> *cannot* have dependencies between a Task
> in
> > > > >>>>>>>>>> a
> > > > >>>>>>>>>>>>>>> TaskGroup
> > > > >>>>>>>>>>>>>>>>>> and
> > > > >>>>>>>>>>>>>>>>>>>>>> either a
> > > > >>>>>>>>>>>>>>>>>>>>>>>>> Task in a different TaskGroup or a Task not
> > > > >>>>>>>>>> in
> > > > >>>>>>>>>>>> any
> > > > >>>>>>>>>>>>>>> group
> > > > >>>>>>>>>>>>>>>>>>>>>>>>> - You *can* have dependencies between a
> > > > >>>>>>>>>>> TaskGroup
> > > > >>>>>>>>>>>>> and
> > > > >>>>>>>>>>>>>>>>>> either
> > > > >>>>>>>>>>>>>>>>>>>>> other
> > > > >>>>>>>>>>>>>>>>>>>>>>>>> TaskGroups or Tasks not in any group
> > > > >>>>>>>>>>>>>>>>>>>>>>>>> - The UI will by default render a TaskGroup
> > > > >>>>>>>>>> as
> > > > >>>>>>>>>>> a
> > > > >>>>>>>>>>>>>> single
> > > > >>>>>>>>>>>>>>>>>>>> "object",
> > > > >>>>>>>>>>>>>>>>>>>>>> but
> > > > >>>>>>>>>>>>>>>>>>>>>>>>> which you expand or zoom into in some way
> > > > >>>>>>>>>>>>>>>>>>>>>>>>> - You'd need some way to determine what the
> > > > >>>>>>>>>>>>> "status"
> > > > >>>>>>>>>>>>>>> of a
> > > > >>>>>>>>>>>>>>>>>>>>> TaskGroup
> > > > >>>>>>>>>>>>>>>>>>>>>>> was
> > > > >>>>>>>>>>>>>>>>>>>>>>>>> at least for UI display purposes
> > > > >>>>>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>>>>> I agree with Chris:
> > > > >>>>>>>>>>>>>>>>>>>>>>>> - From the backend's view (scheduler &
> > > > >>>>>>>>>>> executor), I
> > > > >>>>>>>>>>>>>> think
> > > > >>>>>>>>>>>>>>>>>>> TaskGroup
> > > > >>>>>>>>>>>>>>>>>>>>>>> should
> > > > >>>>>>>>>>>>>>>>>>>>>>>> be ignored during execution. (unless we
> decide
> > > > >>>>>>>>>> to
> > > > >>>>>>>>>>>>>>> implement
> > > > >>>>>>>>>>>>>>>>>> some
> > > > >>>>>>>>>>>>>>>>>>>>>> metadata
> > > > >>>>>>>>>>>>>>>>>>>>>>>> operations that allows start/stop a group of
> > > > >>>>>>>>>>> tasks
> > > > >>>>>>>>>>>>>> etc.)
> > > > >>>>>>>>>>>>>>>>>>>>>>>> - From the UI's View, it should be able to
> > > > >> pick
> > > > >>>>>>>>>>> up
> > > > >>>>>>>>>>>>> the
> > > > >>>>>>>>>>>>>>>>>> individual
> > > > >>>>>>>>>>>>>>>>>>>>>> tasks'
> > > > >>>>>>>>>>>>>>>>>>>>>>>> status and then determine the TaskGroup's
> > > > >>>>>>>>>> status
> > > > >>>>>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>>>>> Bin
> > > > >>>>>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>>>>> On Fri, Jun 12, 2020 at 10:28 AM Daniel
> > > > >>>>>>>>>> Imberman
> > > > >>>>>>>>>>> <
> > > > >>>>>>>>>>>>>>>>>>>>>>>> daniel.imberman@gmail.com> wrote:
> > > > >>>>>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>>>>>> I hadn’t thought about using the `>>`
> > > > >> operator
> > > > >>>>>>>>>>> to
> > > > >>>>>>>>>>>>> tie
> > > > >>>>>>>>>>>>>>> dags
> > > > >>>>>>>>>>>>>>>>>>>> together
> > > > >>>>>>>>>>>>>>>>>>>>>> but
> > > > >>>>>>>>>>>>>>>>>>>>>>> I
> > > > >>>>>>>>>>>>>>>>>>>>>>>>> think that sounds pretty great! I wonder if
> > > > >> we
> > > > >>>>>>>>>>>> could
> > > > >>>>>>>>>>>>>>>>>> essentially
> > > > >>>>>>>>>>>>>>>>>>>>> write
> > > > >>>>>>>>>>>>>>>>>>>>>>> in
> > > > >>>>>>>>>>>>>>>>>>>>>>>>> the ability to set dependencies to all
> > > > >>>>>>>>>>>> starter-tasks
> > > > >>>>>>>>>>>>>> for
> > > > >>>>>>>>>>>>>>>>> that
> > > > >>>>>>>>>>>>>>>>>>> DAG.
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>>>>>> I’m personally ok with SubDag being a
> mostly
> > > > >>>>>>>>>> UI
> > > > >>>>>>>>>>>>>> concept.
> > > > >>>>>>>>>>>>>>>> It
> > > > >>>>>>>>>>>>>>>>>>>> doesn’t
> > > > >>>>>>>>>>>>>>>>>>>>>> need
> > > > >>>>>>>>>>>>>>>>>>>>>>>>> to execute separately, you’re just adding
> > > > >> more
> > > > >>>>>>>>>>>> tasks
> > > > >>>>>>>>>>>>>> to
> > > > >>>>>>>>>>>>>>>> the
> > > > >>>>>>>>>>>>>>>>>>> queue
> > > > >>>>>>>>>>>>>>>>>>>>> that
> > > > >>>>>>>>>>>>>>>>>>>>>>> will
> > > > >>>>>>>>>>>>>>>>>>>>>>>>> be executed when there are resources
> > > > >>>>>>>>>> available.
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>>>>>> via Newton Mail [
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>
> > > > >>>>>>>>>>>>
> > > > >>>>>>>>>>>
> > > > >>>>>>>>>>
> > > > >>>>>>>>
> > > > >>>>>>>
> > > > >>>>>>
> > > > >>>>>
> > > > >>>>
> > > > >>>
> > > > >>
> > > >
> > >
> >
> https://cloudmagic.com/k/d/mailapp?ct=dx&cv=10.0.50&pv=10.14.6&source=email_footer_2
> > > > >>>>>>>>>>>>>>>>>>>>>>>>> ]
> > > > >>>>>>>>>>>>>>>>>>>>>>>>> On Fri, Jun 12, 2020 at 9:45 AM, Chris
> Palmer
> > > > >>>>>>>>>> <
> > > > >>>>>>>>>>>>>>>>>>> chris@crpalmer.com
> > > > >>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>>>> wrote:
> > > > >>>>>>>>>>>>>>>>>>>>>>>>> I agree that SubDAGs are an overly complex
> > > > >>>>>>>>>>>>>> abstraction.
> > > > >>>>>>>>>>>>>>> I
> > > > >>>>>>>>>>>>>>>>>> think
> > > > >>>>>>>>>>>>>>>>>>>> what
> > > > >>>>>>>>>>>>>>>>>>>>>> is
> > > > >>>>>>>>>>>>>>>>>>>>>>>>> needed/useful is a TaskGroup concept. On a
> > > > >>>>>>>>>> high
> > > > >>>>>>>>>>>>> level
> > > > >>>>>>>>>>>>>> I
> > > > >>>>>>>>>>>>>>>>> think
> > > > >>>>>>>>>>>>>>>>>>> you
> > > > >>>>>>>>>>>>>>>>>>>>> want
> > > > >>>>>>>>>>>>>>>>>>>>>>>>> this
> > > > >>>>>>>>>>>>>>>>>>>>>>>>> functionality:
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>>>>>> - Tasks can be added to a TaskGroup
> > > > >>>>>>>>>>>>>>>>>>>>>>>>> - You *can* have dependencies between Tasks
> > > > >> in
> > > > >>>>>>>>>>> the
> > > > >>>>>>>>>>>>>> same
> > > > >>>>>>>>>>>>>>>>>>> TaskGroup,
> > > > >>>>>>>>>>>>>>>>>>>>> but
> > > > >>>>>>>>>>>>>>>>>>>>>>>>> *cannot* have dependencies between a Task
> in
> > > > >> a
> > > > >>>>>>>>>>>>>> TaskGroup
> > > > >>>>>>>>>>>>>>>> and
> > > > >>>>>>>>>>>>>>>>>>>> either
> > > > >>>>>>>>>>>>>>>>>>>>> a
> > > > >>>>>>>>>>>>>>>>>>>>>>>>> Task in a different TaskGroup or a Task not
> > > > >> in
> > > > >>>>>>>>>>> any
> > > > >>>>>>>>>>>>>> group
> > > > >>>>>>>>>>>>>>>>>>>>>>>>> - You *can* have dependencies between a
> > > > >>>>>>>>>>> TaskGroup
> > > > >>>>>>>>>>>>> and
> > > > >>>>>>>>>>>>>>>> either
> > > > >>>>>>>>>>>>>>>>>>> other
> > > > >>>>>>>>>>>>>>>>>>>>>>>>> TaskGroups or Tasks not in any group
> > > > >>>>>>>>>>>>>>>>>>>>>>>>> - The UI will by default render a TaskGroup
> > > > >>>>>>>>>> as a
> > > > >>>>>>>>>>>>>> single
> > > > >>>>>>>>>>>>>>>>>>> "object",
> > > > >>>>>>>>>>>>>>>>>>>>> but
> > > > >>>>>>>>>>>>>>>>>>>>>>>>> which you expand or zoom into in some way
> > > > >>>>>>>>>>>>>>>>>>>>>>>>> - You'd need some way to determine what the
> > > > >>>>>>>>>>>> "status"
> > > > >>>>>>>>>>>>>> of
> > > > >>>>>>>>>>>>>>> a
> > > > >>>>>>>>>>>>>>>>>>>> TaskGroup
> > > > >>>>>>>>>>>>>>>>>>>>>> was
> > > > >>>>>>>>>>>>>>>>>>>>>>>>> at least for UI display purposes
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>>>>>> Not sure if it would need to be a top level
> > > > >>>>>>>>>>> object
> > > > >>>>>>>>>>>>>> with
> > > > >>>>>>>>>>>>>>>> its
> > > > >>>>>>>>>>>>>>>>>> own
> > > > >>>>>>>>>>>>>>>>>>>>>> database
> > > > >>>>>>>>>>>>>>>>>>>>>>>>> table and model or just another attribute
> on
> > > > >>>>>>>>>>>> tasks.
> > > > >>>>>>>>>>>>> I
> > > > >>>>>>>>>>>>>>>> think
> > > > >>>>>>>>>>>>>>>>>> you
> > > > >>>>>>>>>>>>>>>>>>>>> could
> > > > >>>>>>>>>>>>>>>>>>>>>>>>> build
> > > > >>>>>>>>>>>>>>>>>>>>>>>>> it in a way such that from the schedulers
> > > > >>>>>>>>>> point
> > > > >>>>>>>>>>> of
> > > > >>>>>>>>>>>>>> view
> > > > >>>>>>>>>>>>>>> a
> > > > >>>>>>>>>>>>>>>>> DAG
> > > > >>>>>>>>>>>>>>>>>>> with
> > > > >>>>>>>>>>>>>>>>>>>>>>>>> TaskGroups doesn't get treated any
> > > > >>>>>>>>>> differently.
> > > > >>>>>>>>>>> So
> > > > >>>>>>>>>>>>> it
> > > > >>>>>>>>>>>>>>>> really
> > > > >>>>>>>>>>>>>>>>>>> just
> > > > >>>>>>>>>>>>>>>>>>>>>>> becomes
> > > > >>>>>>>>>>>>>>>>>>>>>>>>> a
> > > > >>>>>>>>>>>>>>>>>>>>>>>>> shortcut for setting dependencies between
> > > > >> sets
> > > > >>>>>>>>>>> of
> > > > >>>>>>>>>>>>>> Tasks,
> > > > >>>>>>>>>>>>>>>> and
> > > > >>>>>>>>>>>>>>>>>>>> allows
> > > > >>>>>>>>>>>>>>>>>>>>>> the
> > > > >>>>>>>>>>>>>>>>>>>>>>> UI
> > > > >>>>>>>>>>>>>>>>>>>>>>>>> to simplify the render of the DAG
> structure.
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>>>>>> Chris
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>>>>>> On Fri, Jun 12, 2020 at 12:12 PM Dan
> Davydov
> > > > >>>>>>>>>>>>>>>>>>>>>>> <ddavydov@twitter.com.invalid
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>>>>>> wrote:
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>> Agree with James (and think it's actually
> > > > >>>>>>>>>> the
> > > > >>>>>>>>>>>> more
> > > > >>>>>>>>>>>>>>>>> important
> > > > >>>>>>>>>>>>>>>>>>>> issue
> > > > >>>>>>>>>>>>>>>>>>>>>> to
> > > > >>>>>>>>>>>>>>>>>>>>>>>>> fix),
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>> but I am still convinced Ash' idea is the
> > > > >>>>>>>>>>> right
> > > > >>>>>>>>>>>>> way
> > > > >>>>>>>>>>>>>>>>> forward
> > > > >>>>>>>>>>>>>>>>>>>> (just
> > > > >>>>>>>>>>>>>>>>>>>>> it
> > > > >>>>>>>>>>>>>>>>>>>>>>>>> might
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>> require a bit more work to deprecate than
> > > > >>>>>>>>>>> adding
> > > > >>>>>>>>>>>>>>> visual
> > > > >>>>>>>>>>>>>>>>>>> grouping
> > > > >>>>>>>>>>>>>>>>>>>>> in
> > > > >>>>>>>>>>>>>>>>>>>>>>> the
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>> UI).
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>> There was a previous thread about this FYI
> > > > >>>>>>>>>>> with
> > > > >>>>>>>>>>>>> more
> > > > >>>>>>>>>>>>>>>>> context
> > > > >>>>>>>>>>>>>>>>>>> on
> > > > >>>>>>>>>>>>>>>>>>>>> why
> > > > >>>>>>>>>>>>>>>>>>>>>>>>> subdags
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>> are bad and potential solutions:
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>
> > > > >>>>>>>>>>
> > > > >>>>
> https://www.mail-archive.com/dev@airflow.apache.org/msg01202.html
> > > > >>>>>>>>>>>>>>>>>>>>>> . A
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>> solution I outline there to Jame's problem
> > > > >>>>>>>>>> is
> > > > >>>>>>>>>>>> e.g.
> > > > >>>>>>>>>>>>>>>>> enabling
> > > > >>>>>>>>>>>>>>>>>>> the
> > > > >>>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>>>>>> operator
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>> for Airflow operators to work with DAGs as
> > > > >>>>>>>>>>>> well. I
> > > > >>>>>>>>>>>>>> see
> > > > >>>>>>>>>>>>>>>>> this
> > > > >>>>>>>>>>>>>>>>>>>> being
> > > > >>>>>>>>>>>>>>>>>>>>>>>>> separate
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>> from Ash' solution for DAG grouping in the
> > > > >>>>>>>>>> UI
> > > > >>>>>>>>>>>> but
> > > > >>>>>>>>>>>>>> one
> > > > >>>>>>>>>>>>>>> of
> > > > >>>>>>>>>>>>>>>>> the
> > > > >>>>>>>>>>>>>>>>>>> two
> > > > >>>>>>>>>>>>>>>>>>>>>> items
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>> required to replace all existing subdag
> > > > >>>>>>>>>>>>>> functionality.
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>> I've been working with subdags for 3 years
> > > > >>>>>>>>>> and
> > > > >>>>>>>>>>>>> they
> > > > >>>>>>>>>>>>>>> are
> > > > >>>>>>>>>>>>>>>>>>> always a
> > > > >>>>>>>>>>>>>>>>>>>>>> giant
> > > > >>>>>>>>>>>>>>>>>>>>>>>>> pain
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>> to use. They are a constant source of user
> > > > >>>>>>>>>>>>> confusion
> > > > >>>>>>>>>>>>>>> and
> > > > >>>>>>>>>>>>>>>>>>>> breakages
> > > > >>>>>>>>>>>>>>>>>>>>>>>>> during
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>> upgrades. Would love to see them gone :).
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>> On Fri, Jun 12, 2020 at 11:11 AM James
> > > > >>>>>>>>>> Coder <
> > > > >>>>>>>>>>>>>>>>>>>> jcoder01@gmail.com>
> > > > >>>>>>>>>>>>>>>>>>>>>>>>> wrote:
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>> I'm not sure I totally agree it's just a
> > > > >>>>>>>>>> UI
> > > > >>>>>>>>>>>>>>> concept. I
> > > > >>>>>>>>>>>>>>>>> use
> > > > >>>>>>>>>>>>>>>>>>> the
> > > > >>>>>>>>>>>>>>>>>>>>>>> subdag
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>> operator to simplify dependencies too. If
> > > > >>>>>>>>>>> you
> > > > >>>>>>>>>>>>>> have a
> > > > >>>>>>>>>>>>>>>>> group
> > > > >>>>>>>>>>>>>>>>>>> of
> > > > >>>>>>>>>>>>>>>>>>>>>> tasks
> > > > >>>>>>>>>>>>>>>>>>>>>>>>> that
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>> need to finish before another group of
> > > > >>>>>>>>>> tasks
> > > > >>>>>>>>>>>>>> start,
> > > > >>>>>>>>>>>>>>>>> using
> > > > >>>>>>>>>>>>>>>>>> a
> > > > >>>>>>>>>>>>>>>>>>>>> subdag
> > > > >>>>>>>>>>>>>>>>>>>>>>> is
> > > > >>>>>>>>>>>>>>>>>>>>>>>>> a
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>> pretty quick way to set those
> dependencies
> > > > >>>>>>>>>>>> and I
> > > > >>>>>>>>>>>>>>> think
> > > > >>>>>>>>>>>>>>>>>> also
> > > > >>>>>>>>>>>>>>>>>>>> make
> > > > >>>>>>>>>>>>>>>>>>>>>> it
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>> easier
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>> to follow the dag code.
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>> On Fri, Jun 12, 2020 at 9:53 AM Kyle
> > > > >>>>>>>>>> Hamlin
> > > > >>>>>>>>>>> <
> > > > >>>>>>>>>>>>>>>>>>>>> hamlin.kn@gmail.com>
> > > > >>>>>>>>>>>>>>>>>>>>>>>>> wrote:
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>> I second Ash’s grouping concept.
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>> On Fri, Jun 12, 2020 at 5:10 AM Ash
> > > > >>>>>>>>>>>>>> Berlin-Taylor
> > > > >>>>>>>>>>>>>>> <
> > > > >>>>>>>>>>>>>>>>>>>>>> ash@apache.org
> > > > >>>>>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Question:
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Do we even need the SubDagOperator
> > > > >>>>>>>>>>>> anymore?
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Would removing it entirely and just
> > > > >>>>>>>>>>>>> replacing
> > > > >>>>>>>>>>>>>> it
> > > > >>>>>>>>>>>>>>>>> with
> > > > >>>>>>>>>>>>>>>>>> a
> > > > >>>>>>>>>>>>>>>>>>> UI
> > > > >>>>>>>>>>>>>>>>>>>>>>>>> grouping
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> concept be conceptually simpler, less
> > > > >>>>>>>>>> to
> > > > >>>>>>>>>>>> get
> > > > >>>>>>>>>>>>>>>> wrong,
> > > > >>>>>>>>>>>>>>>>>> and
> > > > >>>>>>>>>>>>>>>>>>>>> closer
> > > > >>>>>>>>>>>>>>>>>>>>>>> to
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>> what
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> users actually want to achieve with
> > > > >>>>>>>>>>>> subdags?
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> With your proposed change, tasks in
> > > > >>>>>>>>>>>> subdags
> > > > >>>>>>>>>>>>>>> could
> > > > >>>>>>>>>>>>>>>>>> start
> > > > >>>>>>>>>>>>>>>>>>>>>> running
> > > > >>>>>>>>>>>>>>>>>>>>>>> in
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> parallel (a good change) -- so should
> > > > >>>>>>>>>> we
> > > > >>>>>>>>>>>> not
> > > > >>>>>>>>>>>>>>> also
> > > > >>>>>>>>>>>>>>>>> just
> > > > >>>>>>>>>>>>>>>>>>>>>>> _enitrely_
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>> remove
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> the concept of a sub dag and replace
> > > > >>>>>>>>>> it
> > > > >>>>>>>>>>>> with
> > > > >>>>>>>>>>>>>>>>> something
> > > > >>>>>>>>>>>>>>>>>>>>>> simpler.
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Problems with subdags (I think. I
> > > > >>>>>>>>>>> haven't
> > > > >>>>>>>>>>>>> used
> > > > >>>>>>>>>>>>>>>> them
> > > > >>>>>>>>>>>>>>>>>>>>>> extensively
> > > > >>>>>>>>>>>>>>>>>>>>>>> so
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>> may
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> be wrong on some of these):
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> - They need their own dag_id, but it
> > > > >>>>>>>>>>>> has(?)
> > > > >>>>>>>>>>>>> to
> > > > >>>>>>>>>>>>>>> be
> > > > >>>>>>>>>>>>>>>> of
> > > > >>>>>>>>>>>>>>>>>> the
> > > > >>>>>>>>>>>>>>>>>>>>> form
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> `parent_dag_id.subdag_id`.
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> - They need their own
> > > > >>>>>>>>>> schedule_interval,
> > > > >>>>>>>>>>>> but
> > > > >>>>>>>>>>>>>> it
> > > > >>>>>>>>>>>>>>>> has
> > > > >>>>>>>>>>>>>>>>> to
> > > > >>>>>>>>>>>>>>>>>>>> match
> > > > >>>>>>>>>>>>>>>>>>>>>> the
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>> parent
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>> dag
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> - Sub dags can be paused on their own.
> > > > >>>>>>>>>>>> (Does
> > > > >>>>>>>>>>>>>> it
> > > > >>>>>>>>>>>>>>>> make
> > > > >>>>>>>>>>>>>>>>>>> sense
> > > > >>>>>>>>>>>>>>>>>>>>> to
> > > > >>>>>>>>>>>>>>>>>>>>>> do
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>> this?
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Pausing just a sub dag would mean the
> > > > >>>>>>>>>>> sub
> > > > >>>>>>>>>>>>> dag
> > > > >>>>>>>>>>>>>>>> would
> > > > >>>>>>>>>>>>>>>>>>> never
> > > > >>>>>>>>>>>>>>>>>>>>>>>>> execute, so
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> the SubDagOperator would fail too.
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> - You had to choose the executor to
> > > > >>>>>>>>>>>>> operator a
> > > > >>>>>>>>>>>>>>>>> subdag
> > > > >>>>>>>>>>>>>>>>>>> with
> > > > >>>>>>>>>>>>>>>>>>>>> --
> > > > >>>>>>>>>>>>>>>>>>>>>>>>> always
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>> a
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> bit of a kludge.
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thoughts?
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> -ash
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Jun 12 2020, at 12:01 pm, Ash
> > > > >>>>>>>>>>>>>> Berlin-Taylor <
> > > > >>>>>>>>>>>>>>>>>>>>>> ash@apache.org>
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Workon sub-dags is much needed, I'm
> > > > >>>>>>>>>>>>> excited
> > > > >>>>>>>>>>>>>> to
> > > > >>>>>>>>>>>>>>>> see
> > > > >>>>>>>>>>>>>>>>>> how
> > > > >>>>>>>>>>>>>>>>>>>>> this
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>> progresses.
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> - *Unpack SubDags during dag
> > > > >>>>>>>>>>> parsing*:
> > > > >>>>>>>>>>>>> This
> > > > >>>>>>>>>>>>>>>>>> rewrites
> > > > >>>>>>>>>>>>>>>>>>>> the
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> *DagBag.bag_dag*
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> method to unpack subdag while
> > > > >>>>>>>>>>> parsing,
> > > > >>>>>>>>>>>>> and
> > > > >>>>>>>>>>>>>> it
> > > > >>>>>>>>>>>>>>>>> will
> > > > >>>>>>>>>>>>>>>>>>>> give a
> > > > >>>>>>>>>>>>>>>>>>>>>>> flat
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> structure at
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the task level
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> The serialized_dag representation
> > > > >>>>>>>>>>>> already
> > > > >>>>>>>>>>>>>> does
> > > > >>>>>>>>>>>>>>>>> this
> > > > >>>>>>>>>>>>>>>>>> I
> > > > >>>>>>>>>>>>>>>>>>>>> think.
> > > > >>>>>>>>>>>>>>>>>>>>>>> At
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>> least
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>> if
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I've understood your idea here
> > > > >>>>>>>>>>>> correctly.
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> -ash
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Jun 12 2020, at 9:51 am, Xinbin
> > > > >>>>>>>>>>>> Huang <
> > > > >>>>>>>>>>>>>>>>>>>>>>> bin.huangxb@gmail.com
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi everyone,
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Sending a message to everyone and
> > > > >>>>>>>>>>>> collect
> > > > >>>>>>>>>>>>>>>>> feedback
> > > > >>>>>>>>>>>>>>>>>> on
> > > > >>>>>>>>>>>>>>>>>>>> the
> > > > >>>>>>>>>>>>>>>>>>>>>>>>> AIP-34
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>> on
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> rewriting SubDagOperator. This was
> > > > >>>>>>>>>>>>>> previously
> > > > >>>>>>>>>>>>>>>>>> briefly
> > > > >>>>>>>>>>>>>>>>>>>>>>>>> mentioned in
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>> the
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> discussion about what needs to be
> > > > >>>>>>>>>>> done
> > > > >>>>>>>>>>>>> for
> > > > >>>>>>>>>>>>>>>>> Airflow
> > > > >>>>>>>>>>>>>>>>>>> 2.0,
> > > > >>>>>>>>>>>>>>>>>>>>> and
> > > > >>>>>>>>>>>>>>>>>>>>>>>>> one of
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>> the
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ideas is to make SubDagOperator
> > > > >>>>>>>>>>> attach
> > > > >>>>>>>>>>>>>> tasks
> > > > >>>>>>>>>>>>>>>> back
> > > > >>>>>>>>>>>>>>>>>> to
> > > > >>>>>>>>>>>>>>>>>>>> the
> > > > >>>>>>>>>>>>>>>>>>>>>> root
> > > > >>>>>>>>>>>>>>>>>>>>>>>>> DAG.
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> This AIP-34 focuses on solving
> > > > >>>>>>>>>>>>>> SubDagOperator
> > > > >>>>>>>>>>>>>>>>>> related
> > > > >>>>>>>>>>>>>>>>>>>>>> issues
> > > > >>>>>>>>>>>>>>>>>>>>>>> by
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> reattaching
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> all tasks back to the root dag
> > > > >>>>>>>>>> while
> > > > >>>>>>>>>>>>>>> respecting
> > > > >>>>>>>>>>>>>>>>>>>>>> dependencies
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>> during
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> parsing. The original grouping
> > > > >>>>>>>>>> effect
> > > > >>>>>>>>>>>> on
> > > > >>>>>>>>>>>>>> the
> > > > >>>>>>>>>>>>>>> UI
> > > > >>>>>>>>>>>>>>>>>> will
> > > > >>>>>>>>>>>>>>>>>>> be
> > > > >>>>>>>>>>>>>>>>>>>>>>>>> achieved
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>> through
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> grouping related tasks by metadata.
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> This also makes the dag_factory
> > > > >>>>>>>>>>>> function
> > > > >>>>>>>>>>>>>> more
> > > > >>>>>>>>>>>>>>>>>>> reusable
> > > > >>>>>>>>>>>>>>>>>>>>>>> because
> > > > >>>>>>>>>>>>>>>>>>>>>>>>> you
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>> don't
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> need to have parent_dag_name and
> > > > >>>>>>>>>>>>>>> child_dag_name
> > > > >>>>>>>>>>>>>>>>> in
> > > > >>>>>>>>>>>>>>>>>>> the
> > > > >>>>>>>>>>>>>>>>>>>>>>> function
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> signature
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> anymore.
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Changes proposed:
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> - *Unpack SubDags during dag
> > > > >>>>>>>>>>> parsing*:
> > > > >>>>>>>>>>>>> This
> > > > >>>>>>>>>>>>>>>>>> rewrites
> > > > >>>>>>>>>>>>>>>>>>>> the
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> *DagBag.bag_dag*
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> method to unpack subdag while
> > > > >>>>>>>>>>> parsing,
> > > > >>>>>>>>>>>>> and
> > > > >>>>>>>>>>>>>> it
> > > > >>>>>>>>>>>>>>>>> will
> > > > >>>>>>>>>>>>>>>>>>>> give a
> > > > >>>>>>>>>>>>>>>>>>>>>>> flat
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> structure at
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the task level
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> - *Simplify SubDagOperator*: The
> > > > >>>>>>>>>> new
> > > > >>>>>>>>>>>>>>>>> SubDagOperator
> > > > >>>>>>>>>>>>>>>>>>>> acts
> > > > >>>>>>>>>>>>>>>>>>>>>>> like a
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> container and most of the original
> > > > >>>>>>>>>>>>> methods
> > > > >>>>>>>>>>>>>>> are
> > > > >>>>>>>>>>>>>>>>>>> removed.
> > > > >>>>>>>>>>>>>>>>>>>>> The
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> signature is
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> also changed to *subdag_factory
> > > > >>>>>>>>>> *with
> > > > >>>>>>>>>>>>>>>>> *subdag_args
> > > > >>>>>>>>>>>>>>>>>>> *and
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> *subdag_kwargs*.
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> This is similar to the
> > > > >>>>>>>>>> PythonOperator
> > > > >>>>>>>>>>>>>>>> signature.
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> - *Add a TaskGroup model and add
> > > > >>>>>>>>>>>>>>> current_group
> > > > >>>>>>>>>>>>>>>> &
> > > > >>>>>>>>>>>>>>>>>>>>>> parent_group
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> attributes
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> to BaseOperator*: This metadata is
> > > > >>>>>>>>>>> used
> > > > >>>>>>>>>>>>> to
> > > > >>>>>>>>>>>>>>>> group
> > > > >>>>>>>>>>>>>>>>>>> tasks
> > > > >>>>>>>>>>>>>>>>>>>>> for
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> rendering at
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> UI level. It may potentially extend
> > > > >>>>>>>>>>>>> further
> > > > >>>>>>>>>>>>>>> to
> > > > >>>>>>>>>>>>>>>>>> group
> > > > >>>>>>>>>>>>>>>>>>>>>>> arbitrary
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>> tasks
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> outside the context of subdag to
> > > > >>>>>>>>>>> allow
> > > > >>>>>>>>>>>>>>>>> group-level
> > > > >>>>>>>>>>>>>>>>>>>>>> operations
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>> (i.e.
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> stop/trigger a group of task within
> > > > >>>>>>>>>>> the
> > > > >>>>>>>>>>>>>> dag)
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> - *Webserver UI for SubDag*:
> > > > >>>>>>>>>> Proposed
> > > > >>>>>>>>>>>> UI
> > > > >>>>>>>>>>>>>>>>>> modification
> > > > >>>>>>>>>>>>>>>>>>>> to
> > > > >>>>>>>>>>>>>>>>>>>>>>> allow
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> (un)collapse a group of tasks for a
> > > > >>>>>>>>>>>> flat
> > > > >>>>>>>>>>>>>>>>> structure
> > > > >>>>>>>>>>>>>>>>>> to
> > > > >>>>>>>>>>>>>>>>>>>>> pair
> > > > >>>>>>>>>>>>>>>>>>>>>>> with
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>> the
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> first
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> change instead of the original
> > > > >>>>>>>>>>>>> hierarchical
> > > > >>>>>>>>>>>>>>>>>>> structure.
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Please see related documents and
> > > > >>>>>>>>>> PRs
> > > > >>>>>>>>>>>> for
> > > > >>>>>>>>>>>>>>>> details:
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> AIP:
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>
> > > > >>>>>>>>>>>>
> > > > >>>>>>>>>>>
> > > > >>>>>>>>>>
> > > > >>>>>>>>
> > > > >>>>>>>
> > > > >>>>>>
> > > > >>>>>
> > > > >>>>
> > > > >>>
> > > > >>
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-34+Rewrite+SubDagOperator
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Original Issue:
> > > > >>>>>>>>>>>>>>>>>>>>>>>
> https://github.com/apache/airflow/issues/8078
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Draft PR:
> > > > >>>>>>>>>>>>>>>>>>> https://github.com/apache/airflow/pull/9243
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Please let me know if there are any
> > > > >>>>>>>>>>>>> aspects
> > > > >>>>>>>>>>>>>>>> that
> > > > >>>>>>>>>>>>>>>>>> you
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>> agree/disagree
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> with or
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> need more clarification (especially
> > > > >>>>>>>>>>> the
> > > > >>>>>>>>>>>>>> third
> > > > >>>>>>>>>>>>>>>>>> change
> > > > >>>>>>>>>>>>>>>>>>>>>>> regarding
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>> TaskGroup).
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Any comments are welcome and I am
> > > > >>>>>>>>>>>> looking
> > > > >>>>>>>>>>>>>>>> forward
> > > > >>>>>>>>>>>>>>>>>> to
> > > > >>>>>>>>>>>>>>>>>>>> it!
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Cheers
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Bin
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>> --
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>> Kyle Hamlin
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>>>> --
> > > > >>>>>>>>>>>>>>>>>>>>>>> Thanks & Regards
> > > > >>>>>>>>>>>>>>>>>>>>>>> Poornima
> > > > >>>>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>> --
> > > > >>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>> Jarek Potiuk
> > > > >>>>>>>>>>>>>>> Polidea <https://www.polidea.com/> | Principal
> > > > >> Software
> > > > >>>>>> Engineer
> > > > >>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>> M: +48 660 796 129 <+48%20660%20796%20129>
> > > > >> <+48660796129
> > > > >>>>>>>>>>>>> <+48%20660%20796%20129>>
> > > > >>>>>>>>>>>>>>> [image: Polidea] <https://www.polidea.com/>
> > > > >>>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>
> > > > >>>>>>>>>>>>>
> > > > >>>>>>>>>>>>> --
> > > > >>>>>>>>>>>>>
> > > > >>>>>>>>>>>>> Jarek Potiuk
> > > > >>>>>>>>>>>>> Polidea <https://www.polidea.com/> | Principal
> Software
> > > > >>>>> Engineer
> > > > >>>>>>>>>>>>>
> > > > >>>>>>>>>>>>> M: +48 660 796 129 <+48%20660%20796%20129>
> <+48660796129
> > > > >>>>>>>>>>>>> <+48%20660%20796%20129>>
> > > > >>>>>>>>>>>>> [image: Polidea] <https://www.polidea.com/>
> > > > >>>>>>>>>>>>>
> > > > >>>>>>>>>>>>
> > > > >>>>>>>>>>>>
> > > > >>>>>>>>>>>> --
> > > > >>>>>>>>>>>>
> > > > >>>>>>>>>>>> *Jacob Ferriero*
> > > > >>>>>>>>>>>>
> > > > >>>>>>>>>>>> Strategic Cloud Engineer: Data Engineering
> > > > >>>>>>>>>>>>
> > > > >>>>>>>>>>>> jferriero@google.com
> > > > >>>>>>>>>>>>
> > > > >>>>>>>>>>>> 617-714-2509
> > > > >>>>>>>>>>>>
> > > > >>>>>>>>>>>
> > > > >>>>>>>>>>
> > > > >>>>>>>>
> > > > >>>>>>>>
> > > > >>>>>>>
> > > > >>>>>>
> > > > >>>>>
> > > > >>>>
> > > > >>>
> > > > >>
> > > >
> > > >
> > >
> >
>